Pipelines are intended to be run inside the Ensembl production environment. Please, make sure you have all the proper credential, keys, etc. set up.
git clone git@github.com:Ensembl/ensembl-production-imported.git
Install the python part (of the pipelines):
pip install ./ensembl-production-imported
Add lib/perl
to PERL5LIB
env (use instead of modules
),
export ENS_ROOT_DIR=$(pwd) # or whatever -- path to the dir to where the repo(s) was(were) cloned
export PERL5LIB=${PERL5LIB}:${ENS_ROOT_DIR}/ensembl-production-imported/lib/perl
N.B. Please, predefine ENS_ROOT_DIR
env.
To deal with the system specific configuration options Bio::EnsEMBL::EGPipeline::PrivateConfDetails module is used. The actual configuration is loaded from Bio::EnsEMBL::EGPipeline::PrivateConfDetails::Impl.
All the used options are listed in Impl.pm.example. Please, define them before running pipelines.
This can be done either by copying this file and editing it.
cp ensembl-production-imported/lib/perl/Bio/EnsEMBL/EGPipeline/PrivateConfDetails/Impl.pm{.example,}
# edit ensembl-production-imported/lib/perl/Bio/EnsEMBL/EGPipeline/PrivateConfDetails/Impl.pm
Or by creating a separate repo with lib/perl/Bio/EnsEMBL/EGPipeline/PrivateConfDetails/Impl.pm
and adding corresponding lib/perl
to your PERL5LIB
env.
You can override the default queue used to run pipeline by adding
-queue_name
option to the init_pipeline.pl
command (see below).
Every pipeline is derived from Bio::EnsEMBL::EGPipeline::PipeConfig::EGGeneric_conf (see EGGeneric documentation) for details.
And the same perl class prefix used for every pipeline:
Bio::EnsEMBL::EGPipeline::PipeConfig::
.
N.B. Don't forget to specify -reg_file
option for the beekeeper.pl -url $url -reg_file $REG_FILE -loop
command.
init_pipeline.pl Bio::EnsEMBL::EGPipeline::PipeConfig::RNAFeatures_conf \
$($CMD details script) \
-hive_force_init 1\
-queue_name $SPECIFIC_QUEUE_NAME \
-registry $REG_FILE \
-production_db "$($PROD_SERVER details url)""$PROD_DBNAME" \
-pipeline_tag "_${SPECIES_TAG}" \
-pipeline_dir $OUT_DIR/rna_features \
-species $SPECIES \
-eg_pipelines_dir $ENS_DIR/ensembl-production-imported \
${OTHER_OPTIONS} \
2> $OUT_DIR/init.stderr \
1> $OUT_DIR/init.stdout
SYNC_CMD=$(cat $OUT_DIR/init.stdout | grep -- -sync'$' | perl -pe 's/^\s*//; s/"//g')
# should get something like
# beekeeper.pl -url $url -sync
LOOP_CMD=$(cat $OUT_DIR/init.stdout | grep -- -loop | perl -pe 's/^\s*//; s/\s*#.*$//; s/"//g')
# should get something like
# beekeeper.pl -url $url -reg_file $REG_FILE -loop
$SYNC_CMD 2> $OUT_DIR/sync.stderr 1> $OUT_DIR/sync.stdout
$LOOP_CMD 2> $OUT_DIR/loop.stderr 1> $OUT_DIR/loop.stdout
Pipeline name | Module | Description | Document | Comment |
---|---|---|---|---|
EGGeneric | Bio::EnsEMBL::EGPipeline::PipeConfig::EGGeneric_conf | generic pipeline configuration | EGGeneric | All other pipelines are derived from this one |
RepeatModeler | Bio::EnsEMBL::EGPipeline::PipeConfig::RepeatModeler_conf | Building de-nove repeat libs | RepeatModeler | |
DNAFeatures | Bio::EnsEMBL::EGPipeline::PipeConfig::DNAFeatures_conf | repeat masking | DNAFeatures | redat_repeatmasker_library should be explicitly specified |
RNAFeatures | Bio::EnsEMBL::EGPipeline::PipeConfig::RNAFeatures_conf | Non-coding rna features (tRNA, miRNA, etc) discovery | RNAFeatures | |
RNAGenes | Bio::EnsEMBL::EGPipeline::PipeConfig::RNAGenes_conf | Create non-coding genes based on rna features | RNAGenes | Specify id_db_{host,port,user,dbname,...} options if run_context != "VB" |
SRAAlignment_BRC4 | Bio::EnsEMBL::EGPipeline::PipeConfig::SRAAlignment_BRC4_conf | Perform RNA(DNA) short read aligments | SRAAlignment_BRC4 | |
WGA2GenesDirect | Bio::EnsEMBL::EGPipeline::PipeConfig::WGA2GenesDirect_conf | Project transripts and create genes based on compara lastz mappings | WGA2GenesDirect | |
Xref_GPR | Bio::EnsEMBL::EGPipeline::PipeConfig::Xref_GPR_conf | Load Plant Reactome data | Xref_GPR | use -uppercase_gene_id 1 option to allow usage of uppercase gene stable IDs for mapping (i.e. for Oryza sativa (rice)) |
AlignmentXref | Bio::EnsEMBL::EGPipeline::PipeConfig::AlignmentXref_conf | Alignment bases xrefs | AlignmentXref | Used as a part of the AllXref pipeline |
Xref | Bio::EnsEMBL::EGPipeline::PipeConfig::Xref_conf | MD5-based UniParc/Uniprot Xref pipeline | Xref | Used as a part of the AllXref pipeline |
AllXref | Bio::EnsEMBL::EGPipeline::PipeConfig::AllXref_conf | Combined Xref/AlignmentXref pipeline | AllXref | |
FindPHIBaseCandidates | Bio::EnsEMBL::EGPipeline::PipeConfig::FindPHIBaseCandidates_conf | Load Xrefs from PHIBase | FindPHIBaseCandidates | |
Map_interspecies_interactions | Bio::EnsEMBL::EGPipeline::PipeConfig::Map_interspecies_interactions_conf | Loads interactions to Ensembl InterspeciesinteractionsDB from different sources | Map_interspecies_interactions |
Pipeline name | Module | Description | Document | Comment | Alternative |
---|---|---|---|---|---|
AnalyzeTables | Bio::EnsEMBL::EGPipeline::PipeConfig::AnalyzeTables_conf | Runs SQL ANALIZE / OPTIMIZE on tables for DBs present in the registry | |||
EC2Rhea | Bio::EnsEMBL::EGPipeline::PipeConfig::EC2Rhea_conf | Adding chemical and transport reactions (Rhea2RC) xrefs (used by 'microbes') | Specify ec2rhea_file as there's no default |
||
ExonerateAlignment | Bio::EnsEMBL::EGPipeline::PipeConfig::ExonerateAlignment_conf | Aligning Fasta files to a genome with Exonerate | Specify -exonerate_2_4_dir option if use exonerate-server ( -use_exonerate_server 1 ) |
||
ShortReadAlignment | Bio::EnsEMBL::EGPipeline::PipeConfig::ShortReadAlignment_conf | ||||
STARAlignment | Bio::EnsEMBL::EGPipeline::PipeConfig::STARAlignment_conf | ||||
BlastNucleotide | Bio::EnsEMBL::EGPipeline::PipeConfig::BlastNucleotide_conf | ||||
BlastProtein | Bio::EnsEMBL::EGPipeline::PipeConfig::BlastProtein_conf | EGPipeline::FileDump::GFF3Dumper could not be replaced with Production::Pipeline::GFF3::DumpFile as no join_align_feature param is provided |
|||
Bam2BigWig | Bio::EnsEMBL::EGPipeline::PipeConfig::Bam2BigWig_conf | ||||
ProjectGenes | Bio::EnsEMBL::EGPipeline::PipeConfig::ProjectGenes_conf | ||||
ProjectGeneDesc | Bio::EnsEMBL::EGPipeline::PipeConfig::ProjectGeneDesc_conf |
Old pipeline module | Alternative | Description | Document | Comment |
---|---|---|---|---|
CoreStatistics | Bio::EnsEMBL::Production::Pipeline::PipeConfig::CoreStatistics_conf | Core stats pipeline | use -skip_metadata_check 1 if core is not submitted (always for new species); set proper -pipeline_dir , -scratch_small_dir and -scratch_large_dir (see Bio::EnsEMBL::Production::Pipeline::PipeConfig::Base_conf ) |
|
FileDump | Bio::EnsEMBL::Production::Pipeline::PipeConfig::FileDump_conf | Serialize core | ||
FileDump{Compara,GFF} | same as above | |||
FileDumpVEP | Bio::EnsEMBL::Production::Pipeline::PipeConfig::FileDumpVEP_conf | Dump VEP data | ||
LoadGFF3 | Bio::EnsEMBL::Pipeline::PipeConfig::LoadGFF3_conf | Load gene models from GFF3 and accompanied files | See new_genome_loader for details | |
LoadGFF3Batch | Bio::EnsEMBL::Pipeline::PipeConfig::LoadGFF3Batch_conf | Batch load models from GFF3 files | See new_genome_loader for details | |
GeneTreeHighlighting | Bio::EnsEMBL::Production::Pipeline::PipeConfig::GeneTreeHighlighting | Populate compara table with GO and InterPro terms, to enable highlighting | ||
GetOrthologs | Bio::EnsEMBL::Production::Pipeline::PipeConfig::DumpOrtholog |
Runnable | Description | Document | Comment |
---|---|---|---|
Common::RunnableDB::CreateOFDatabase | |||
Analysis::Config::General |
Script | Description | Document | Comment |
---|---|---|---|
brc4/repeat_for_masker.pl | .... | ||
brc4/repeat_tab_to_list.pl | .... | ||
misc_scripts/get_trans.pl | get transcriptions and tranaslations | In pipelines use Bio::EnsEMBL::EGPipeline::Common::RunnableDB::DumpProteome and Bio::EnsEMBL::EGPipeline::Common::RunnableDB::DumpTranscriptome | |
misc_scripts/load_xref.pl | |||
misc_scripts/remove_entities.pl | |||
misc_scripts/gene_stable_id_mapping.pl | |||
misc_scripts/add_karyotype.pl | |||
misc_scripts/load_karyotype_from_gff.pl | |||
misc_scripts/gene_stable_id_mapping.pl | |||
rna_features/add_rfam_desc.pl | prepare Rfam db for RNAFeatures | RNAFeatures | |
rna_features/taxonomic_levels.pl | prepare Rfam db for RNAFeatures | RNAFeatures | |
phi_ontology/phi-base_ontologies.pl | normalising phi-base data .csv based on onlologies in scripts/phi_ontology | FindPHIBaseCandidates |
Script | Substitution | Document | Comment |
---|---|---|---|
production_db/analysis_desc_from_prod.pl | Bio::EnsEMBL::Production::Pipeline::PipeConfig::ProductionDBSync_conf | ||
production_db/attrib_type_from_prod.pl | Bio::EnsEMBL::Production::Pipeline::PipeConfig::ProductionDBSync_conf | ||
production_db/external_db_from_prod.pl | Bio::EnsEMBL::Production::Pipeline::PipeConfig::ProductionDBSync_conf | ||
production_db/add_species_analysis.pl | Bio::EnsEMBL::Production::Pipeline::PipeConfig::ProductionDBSync_conf |
See docs
Tests, tests, tests...
For obvoius reason the whole history of the source project had to go. Most of this code and documentation is inherited from the EnsemblGenomes project.
We appreciate the effort and time spent by developers of the EnsemblGenomes project.
Thank you!