You must be signed in to change notification settings - Fork 0
Example usage
If you have installed via
, first download the chromosome 19 FASTA and GFF3 sample files into a directory of your choice. -
If you have cloned the repo, first change to the
subdirectory, which contains Ensembl annotations and sequence for chromosome 19 of the human genome. ReplaceintronIC
in the following examples.
To collect and classify all (non-redundant) annotated introns, do the following:
$ intronIC -g Homo_sapiens.Chr19.Ensembl_91.fa.gz -a Homo_sapiens.Chr19.Ensembl_91.gff3.gz -n homo_sapiens
Information about the run will be printed to the screen; this same information (plus some additional details) can be found in the log.iic
[#] Starting run on [homo_sapiens (HomSap)]
[#] Run command: [/home/glarue/Documents/Coding/Python/Research/intronIC/intronIC -g /home/glarue/Documents/Coding/Python/Research/intronIC/test_data/Homo_sapiens.Chr19.Ensembl_91.fa.gz -a /home/glarue/Documents/Coding/Python/Research/intronIC/test_data/Homo_sapiens.Chr19.Ensembl_91.gff3.gz -n homo_sapiens]
[#] Using [cds,exon] features to define introns
[#] [58933] introns found in [Homo_sapiens.Chr19.Ensembl_91.gff3.gz]
[#] [38681] introns with duplicate coordinates excluded
[#] [8178] introns omitted from scoring based on the following criteria:
[#] * short (<30 nt): 66
[#] * ambiguous nucleotides in scoring regions: 0
[#] * non-canonical boundaries: 0
[#] * overlapping coordinates: 0
[#] * not in longest isoform: 8112
[#] Most common non-canonical splice sites:
[#] * AT-AG (17/328, 5.18%)
[#] * GT-TG (12/328, 3.66%)
[#] * GG-AG (12/328, 3.66%)
[#] * GA-AG (11/328, 3.35%)
[#] * AG-AG (10/328, 3.05%)
[#] [12] ([3] unique, [9] redundant) putatively misannotated U12 introns corrected in [homo_sapiens.annotation.iic]
[#] [12074] introns included in scoring analysis
[#] [11272] introns used to build U2 branch point matrix (5'SS in bottom [95]th percentile)
[#] Scoring introns using the following regions: [five, bp]
[#] Raw scores calculated for [20689] U2 and [387] U12 reference introns
[#] Raw scores calculated for [12074] experimental introns
[#] Non-redundant training sets: [20556] U2, [387] U12
[#] Training SVM using reference data
Starting optimization round 1/5
Starting optimization round 2/5
Starting optimization round 3/5
Starting optimization round 4/5
Starting optimization round 5/5
[#] Range for 'C' after [5] rounds of optimization: [976.5411685881514]-[976.5419176464368]
[#] Set classifier value for 'C': [976.5415431172212]
[#] Training classifier with optimized hyperparameters
[#] Average classifier performance on training data:
F1 [1.0]
P-R AUC [1.0]
[#] Classifier performance details:
precision recall f1-score support
U2 1.00 1.00 1.00 4112
U12 1.00 1.00 1.00 77
accuracy 1.00 4189
macro avg 1.00 1.00 1.00 4189
weighted avg 1.00 1.00 1.00 4189
[#] [1] putative U12 scores were not robust to boundary switching
[#] [10] putative AT-AC U12 introns found.
[#] [31] putative U12 introns found with scores > [90]%
[#] Adding scores to intron sequences file
[#] Generating figures
[#] Run finished in [7.161 minutes]
If only the intron sequences are desired, scoring can be bypassed using the -s
flag which will significantly reduce the processing time and produce only a subset of the output files:
$ intronIC -g /home/glarue/Documents/Coding/Python/Research/intronIC/test_data/Homo_sapiens.Chr19.Ensembl_91.fa.gz -a /home/glarue/Documents/Coding/Python/Research/intronIC/test_data/Homo_sapiens.Chr19.Ensembl_91.gff3.gz -n homo_sapiens -s
[#] Starting run on [homo_sapiens (HomSap)]
[#] Run command: [/home/glarue/Documents/Coding/Python/Research/intronIC/intronIC -g /home/glarue/Documents/Coding/Python/Research/intronIC/test_data/Homo_sapiens.Chr19.Ensembl_91.fa.gz -a /home/glarue/Documents/Coding/Python/Research/intronIC/test_data/Homo_sapiens.Chr19.Ensembl_91.gff3.gz -n homo_sapiens -s]
[#] Using [cds,exon] features to define introns
[#] [58933] introns found in [/home/glarue/Documents/Coding/Python/Research/intronIC/test_data/Homo_sapiens.Chr19.Ensembl_91.gff3.gz]
[#] [38681] introns with duplicate coordinates excluded
[#] [20252] intron sequences written to [homo_sapiens.introns.iic]
[#] Run finished in [27.41 seconds]
Many additional options exist for a variety of use cases. Run intronIC --help
for additional details and/or see the Usage info page.