Skip to content

Commit

Permalink
Update reference_genomes_custom_genomes.md
Browse files Browse the repository at this point in the history
Completely redone with cross-links. Easier to give a user one FAQ for this specific problem with full context re usual followup.
  • Loading branch information
jennaj authored Dec 12, 2023
1 parent 610e5d2 commit 80604d7
Showing 1 changed file with 23 additions and 7 deletions.
30 changes: 23 additions & 7 deletions faqs/galaxy/reference_genomes_custom_genomes.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,28 @@ contributors: [jennaj, Nurzhamalyrys]
---


A reference genome contains the nucleotide sequence of the chromosomes, scaffolds, transcripts, or contigs for single species. It is representative of a specific genome build or release. There are two options to use reference genomes in Galaxy: _native_ (provided by the server administrators and used by most of the tools) and _custom_ (uploaded by users in FASTA format).
A **reference genome** contains the nucleotide sequence of the chromosomes, scaffolds, transcripts, or contigs for single species. It is representative of a specific genome assembly build or release.

There are five basic steps to use a Custom Reference Genome:

There are two options for reference genomes in Galaxy.
* **Native**
* Index provided by the server administrators.
* Found on tool forms in a drop down menu.
* A database key is automatically assigned. See tip 1.
* The database is what links your data to a FASTA index. Example: used with BAM data
* **Custom**
* FASTA file uploaded by users.
* Input on tool forms then indexed at runtime by the tool.
* An optional custom database key can be created and [assigned by the user]({% link faqs/galaxy/datasets_change_dbkey.md %}).

1. Obtain a FASTA copy of the target genome.
2. Use FTP to upload the genome to Galaxy and load into a history as a dataset.
3. Clean up the format with the tool **NormalizeFasta** using the options to wrap sequence lines at 80 bases and to trim the title line at the first whitespace.
4. Make sure the chromosome identifiers are a match for other inputs.
5. Set a tool form's options to use a custom reference genome from the history and select the loaded genome.
There are five basic steps to use a **Custom Reference Genome**, plus one optional.
1. Obtain a FASTA copy of the target genome. See tip 2.
2. Upload the genome to Galaxy and to add it as a dataset in your history.
3. [Clean up the format0({% link faqs/galaxy/datasets_working_with_fasta.md %}) with the tool **NormalizeFasta** using the options to wrap sequence lines at 80 bases and to trim the title line at the first whitespace.
4. Make sure the [chromosome identifiers]({% link faqs/galaxy/datasets_chromosome_identifiers.md %}) are a match for other inputs.
5. Set a tool form's options to use a custom reference genome from the history and select the loaded genome FASTA.
6. (Optional) Create a [custom genome build's database]{% link faqs/galaxy/analysis_add_custom_build.md %}) that you can [assign to datasets]({% link faqs/galaxy/datasets_change_dbkey.md %}).

{% icon tip %} TIP 1: Avoid [assigning a native database]({% link faqs/galaxy/datasets_change_dbkey.md %}) to uploaded data unless you confirmed the data are based on the [same exact genome assembly]({% link faqs/galaxy/datasets_chromosome_identifiers.md %}) or you [adjusted the data to be a match]({% link topics/introduction/tutorials/data-manipulation-olympics/tutorial.html %}) **first**!

{% icon tip %} TIP 2: When choosing your reference genome, consider [choosing your reference annotation]{% link faqs/galaxy/analysis_differential_expression_help %}) at the same time. Standardize the format of both as a preparation step. Put the files in a dedicated "reference data" history for easy resuse.

0 comments on commit 80604d7

Please sign in to comment.