-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add indicspecies tool #149
Open
PaulineSGN
wants to merge
12
commits into
galaxyecology:master
Choose a base branch
from
PaulineSGN:PaulineSGN-add-indicspecies-tool
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 5 commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
c3b85ce
Add indicspecies tool
PaulineSGN 9020ab4
correct help section
PaulineSGN 8b56e8f
Update indval.xml
PaulineSGN 82b3dbc
Update tools/Ecoregionalization_workflow/indval.xml
PaulineSGN 1aef77f
Update indval.xml
PaulineSGN 4d8522b
add assert_contents in test
PaulineSGN dec2f57
reduce file size
PaulineSGN ace65db
testing with profile="23.0"
PaulineSGN 5c64d51
test with profile=21.05
PaulineSGN 0fc91dd
test with the profile that works locally (16.04)
PaulineSGN 8b52cc8
test with no profile (not recommended)
PaulineSGN afbe07d
Update profile to its origin (works locally)
PaulineSGN File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
#Seguineau Pauline | ||
#02/12/2024 | ||
#Indicspecies tool | ||
|
||
library(dplyr) | ||
library(indicspecies) | ||
|
||
#load arguments | ||
args = commandArgs(trailingOnly=TRUE) | ||
if (length(args)==0) | ||
{ | ||
stop("This tool needs at least one argument") | ||
}else{ | ||
clus_pts <- args[1] | ||
occ <- args[2] | ||
spe_name <- args[3] | ||
sign <- args[4] | ||
} | ||
|
||
###load data | ||
clus <- read.table(clus_pts, dec=".", sep="\t", header=T,na.strings = "na") #cluster points file (ecoregionalization workflow) | ||
data.col <- read.table(occ, dec=".", sep="\t", header=T,na.strings = "na") #occurrence file (merged table from ecoregionalization workflow) | ||
spe_name = strsplit(spe_name, ",") | ||
spname=NULL | ||
|
||
for (n in spe_name) { | ||
spname = cbind(names(data.col[as.numeric(n)]))} | ||
|
||
#Rename decimalLatitude and decimalLongitude columns from occurrence file | ||
if ("decimalLatitude" %in% colnames(data.col)) { | ||
colnames(data.col)[which(colnames(data.col) == "decimalLatitude")] <- "lat" | ||
} | ||
if ("decimalLongitude" %in% colnames(data.col)) { | ||
colnames(data.col)[which(colnames(data.col) == "decimalLongitude")] <- "long" | ||
} | ||
|
||
#Round lat and long of the data.col file to be able to put clus cluster in it | ||
data.col$lat = round(data.col$lat,digits = 2) | ||
data.col$long = round(data.col$long,digits = 2) | ||
|
||
# Creates a new "station" column that associates a unique identifier to each latitude-longitude pair | ||
data.col <- data.col %>% mutate(station = as.factor(paste(lat, long, sep = "_"))) | ||
|
||
# convert "station" to a unique numeric identifier | ||
data.col <- data.col %>% mutate(station = as.numeric(factor(station))) | ||
|
||
#Adding clusters to file | ||
clusta <- merge(data.col,clus, by=c("lat","long"), all.x = TRUE) | ||
|
||
#This generated duplicates with different clusters | ||
clusta <- aggregate(clusta, by=list(clusta$station,clusta$cluster), FUN=mean, na.rm=TRUE) | ||
clusta <- na.exclude(clusta) | ||
|
||
# indval on all clusters | ||
indval = multipatt(clusta[,spname], clusta$cluster,duleg = TRUE, control = how(nperm=999)) | ||
if (sign=="true"){ | ||
capture.output(summary(indval,indvalcomp=TRUE), file = "indval.txt") | ||
}else{ | ||
capture.output(summary(indval,indvalcomp=TRUE, alpha=1), file = "indval.txt")} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,108 @@ | ||
<tool id="indval" name="Indicspecies" version="0.1.0+galaxy0" profile="16.04"> | ||
<description>to identify indicator species within clusters/groups</description> | ||
<requirements> | ||
<requirement type="package" version="4.4.2">r-base</requirement> | ||
<requirement type="package" version="1.1.4">r-dplyr</requirement> | ||
<requirement type="package" version="1.7.15">r-indicspecies</requirement> | ||
<requirement type="package" version="0.9_4">r-permute</requirement> | ||
</requirements> | ||
<command detect_errors="exit_code"><![CDATA[ | ||
Rscript '$__tool_directory__/indval.R' '$clus' '$occ' '$name_species' '$sign' | ||
]]></command> | ||
<inputs> | ||
<param name="occ" type="data" format="tabular" label="Input your Occurences file (tabular format only)" help="See example below"/> | ||
<param name="clus" type="data" format="tabular" label="Input your cluster file (tabular format only)" help="See example below"/> | ||
<param name="name_species" type="data_column" label="Choose column(s) where your species are in your occurrence data file." data_ref="occ" multiple="true" use_header_names="true"/> | ||
<param name="sign" type="boolean" label="Display only significant results (p-value inferior to 0.05)" checked="true"/> | ||
</inputs> | ||
<outputs> | ||
<data name="output" from_work_dir="indval.txt" format="txt" label="Indicator values"/> | ||
</outputs> | ||
<tests> | ||
<test> | ||
<param name="occ" value="Merged_table.tabular"/> | ||
<param name="clus" value="Cluster_points.tabular"/> | ||
<param name="name_species" value="3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73"/> | ||
<param name="sign" value="true"/> | ||
<output name="output"> | ||
<assert_contents> | ||
<has_size value="4175" delta="50"/> | ||
yvanlebras marked this conversation as resolved.
Show resolved
Hide resolved
|
||
</assert_contents> | ||
</output> | ||
</test> | ||
</tests> | ||
<help><![CDATA[ | ||
**What it does?** | ||
----------------- | ||
|
||
This tool uses the multipatt function from indcispecies R package. The multipatt function is used in ecology to identify species indicators associated with groups of samples. It is an extension of IndVal (Indicator Value) by Dufrêne and Legendre (1997). It is commonly used to determine which species are characteristic of certain habitats or sample groups, which is useful in conservation biology, ecosystem management or biodiversity studies. This function creates combinations of the input clusters and compares each combination with the species in the input matrix. For each species it chooses the combination with the highest association value. Best matching patterns are tested for statistical significance of the associations. Indicator value indices will return the pattern that better matches the species observed pattern, whereas correlation indices will return the pattern that creates the highest inside/outside difference. Details are given in De Cáceres et al. (2010). | ||
|
||
**How to use it?** | ||
------------------ | ||
|
||
To use this tool, you will need two types of data file as this tool is conceive to be run with or after the Ecoregionalization workflow (https://ecology.usegalaxy.eu/training-material/topics/ecology/tutorials/Ecoregionalization_tutorial/tutorial.html). | ||
|
||
The first file is an occurrence file with at least three types of columns : latitude, longitude and occurrence (one column per taxon with their absence/presence data). | ||
|
||
**Example of occurence data input :** | ||
|
||
+----------+-----------+------------------------+-----------+-----+------+--------------+-----+ | ||
| lat | long |Acanthorhabdus_fragilis | Acarnidae | ... | Grav | Maxbearing | ... | | ||
+----------+-----------+------------------------+-----------+-----+------+--------------+-----+ | ||
|-65,999946|142,3360535| 0 | 1 | ... |28.59 | 3.67 | ... | | ||
+----------+-----------+------------------------+-----------+-----+------+--------------+-----+ | ||
|-66,335407| 141,3028 | 0 | 1 | ... |28.61 | 3.64 | ... | | ||
+----------+-----------+------------------------+-----------+-----+------+--------------+-----+ | ||
| ... | ... | ... | ... | ... | ... | ... | ... | | ||
+----------+-----------+------------------------+-----------+-----+------+--------------+-----+ | ||
|
||
The second file contains the cluster/sample group assigned to each latitude and longitude. | ||
|
||
**Example of cluster data input :** | ||
|
||
+----------+-----------+----------+ | ||
| lat | long | Cluster | | ||
+----------+-----------+----------+ | ||
|-65,999946|142,3360535| 1 | | ||
+----------+-----------+----------+ | ||
|-66,335407| 141,3028 | 2 | | ||
+----------+-----------+----------+ | ||
| ... | ... | ... | | ||
+----------+-----------+----------+ | ||
|
||
.. class:: infomark | ||
|
||
It is preferable to name your columns latitude and longitude in both file as in the examples. | ||
|
||
Then you need to indicate wich columns in your occurrence file contain species data. And finally, you have to choose if you want to display only significant results (p-value < 0.05) or if you want all results including no significants ones. | ||
|
||
**Interpretation of results** | ||
----------------------------- | ||
|
||
On the tool's output file, we can find a lot of information. Firstly, the function used, in this case IndVal.g, which indicates that the generalized indicator value index has been used. Next is the level of significance (alpha) according to the choice made earlier (p-value < 0.05 or all). This is followed by the number of species analyzed and the number of species identified as indicators. The following lines show how many species are specific to a single group, and whether one or more species are simultaneously associated with several groups. | ||
|
||
The following section lists the indicator species for each group, with associated statistics. | ||
|
||
Explanation of associated statics : | ||
|
||
- A (Specificity): This is the proportion of occurrences of the species in the group in relation to all the groups. A value of 1.0000 means that the species is exclusively present in this group. | ||
- B (Fidelity): The proportion of sites in the group where the species is present. A value of 0.5000 means that the species is present in 50% of the sites in the group. | ||
- stat : The IndVal statistical value (generalized indicator value index). A high value indicates a strong association between the species and the group. | ||
- p-value : The p-value of the association. A low p-value (< 0.05) means that the association is statistically significant. | ||
|
||
Why is a species not significant in a group? | ||
|
||
There are several possible reasons for the absence of significant species in certain groups: | ||
|
||
- Low specificity (A): Species are present in several groups, reducing their indicator value. | ||
- Low fidelity (B): Species are not frequent enough in the sites of the group. | ||
- Insufficient data : Lack of samples or too few occurrences to detect robust associations. | ||
- High variability between sites in the same group. | ||
|
||
]]></help> | ||
<citations> | ||
<citation type="doi">doi:10.2307/2963459</citation> | ||
<citation type="doi">doi:10.1111/j.1600-0706.2010.18334.x</citation> | ||
<citation type="doi">10.32614/CRAN.package.indicspecies</citation> | ||
</citations> | ||
</tool> |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you decrease it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested the tool with planemo locally and with different “profile” and the tool only works with 16.04. I couldn't get it to work with 23.0 or 21.05 (automatically set by planemo tool_init) and I don't really understand why. It seems that Galaxy or the docker container can't find the required packages with these “profile”, but maybe I'm mistaken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mh, this is strange, can you reproduce this locally? If I find time I will check it out.