Update reference_genomes_custom_genomes.md

Completely redone with cross-links. Easier to give a user one FAQ for this specific problem with full context re usual followup.
galaxyproject · Dec 12, 2023 · 80604d7 · 80604d7
1 parent 610e5d2
commit 80604d7
Showing 1 changed file with 23 additions and 7 deletions.
diff --git a/faqs/galaxy/reference_genomes_custom_genomes.md b/faqs/galaxy/reference_genomes_custom_genomes.md
@@ -7,12 +7,28 @@ contributors: [jennaj, Nurzhamalyrys]
 ---
 
 
-A reference genome contains the nucleotide sequence of the chromosomes, scaffolds, transcripts, or contigs for single species. It is representative of a specific genome build or release. There are two options to use reference genomes in Galaxy: _native_ (provided by the server administrators and used by most of the tools) and _custom_ (uploaded by users in FASTA format). 
+A **reference genome** contains the nucleotide sequence of the chromosomes, scaffolds, transcripts, or contigs for single species. It is representative of a specific genome assembly build or release.
 
-There are five basic steps to use a Custom Reference Genome:
+
+There are two options for reference genomes in Galaxy.
+* **Native**
+   * Index provided by the server administrators.
+   * Found on tool forms in a drop down menu.
+   * A database key is automatically assigned. See tip 1.
+   * The database is what links your data to a FASTA index. Example: used with BAM data
+* **Custom** 
+   * FASTA file uploaded by users. 
+   * Input on tool forms then indexed at runtime by the tool.
+   * An optional custom database key can be created and [assigned by the user]({% link faqs/galaxy/datasets_change_dbkey.md %}).
 
-1. Obtain a FASTA copy of the target genome.
-2. Use FTP to upload the genome to Galaxy and load into a history as a dataset.
-3. Clean up the format with the tool **NormalizeFasta** using the options to wrap sequence lines at 80 bases and to trim the title line at the first whitespace.
-4. Make sure the chromosome identifiers are a match for other inputs.
-5. Set a tool form's options to use a custom reference genome from the history and select the loaded genome.
+There are five basic steps to use a **Custom Reference Genome**, plus one optional.
+1. Obtain a FASTA copy of the target genome. See tip 2.
+2. Upload the genome to Galaxy and to add it as a dataset in your history.
+3. [Clean up the format0({% link faqs/galaxy/datasets_working_with_fasta.md %}) with the tool **NormalizeFasta** using the options to wrap sequence lines at 80 bases and to trim the title line at the first whitespace.
+4. Make sure the [chromosome identifiers]({% link faqs/galaxy/datasets_chromosome_identifiers.md %}) are a match for other inputs.
+5. Set a tool form's options to use a custom reference genome from the history and select the loaded genome FASTA.
+6. (Optional) Create a [custom genome build's database]{% link faqs/galaxy/analysis_add_custom_build.md %}) that you can [assign to datasets]({% link faqs/galaxy/datasets_change_dbkey.md %}).
+
+{% icon tip %} TIP 1: Avoid [assigning a native database]({% link faqs/galaxy/datasets_change_dbkey.md %}) to uploaded data unless you confirmed the data are based on the [same exact genome assembly]({% link faqs/galaxy/datasets_chromosome_identifiers.md %}) or you [adjusted the data to be a match]({% link topics/introduction/tutorials/data-manipulation-olympics/tutorial.html %}) **first**!
+
+{% icon tip %} TIP 2: When choosing your reference genome, consider [choosing your reference annotation]{% link faqs/galaxy/analysis_differential_expression_help %}) at the same time. Standardize the format of both as a preparation step. Put the files in a dedicated "reference data" history for easy resuse.