Guide to BLAST databases? #1130
-
I'm looking to BLAST a transcriptome (specifically cbai_transcriptomev2.0) against the whole NCBI database with a taxonomy filter to determine which sequences are Hematodinium and which are C. bairdi. In the mox folder /gscratch/srlab/blastdbs/, there's quite a few existing databases, as follows: UniProtKB_20181008 UniProtKB_20190109 ncbi-nr-20190925 ncbi-nr-20200924 ncbi-nr-nt-20181114 ncbi-nr-nt-v5 ncbi-sp-v5 ncbi-sp-v5_20210224 uniprot_sprot_20200123 I assume I want to use either one of the ncbi-nr or ncbi-sp databases, but I really have no clue which I should use! Is there some kind of guide defining each database? |
Beta Was this translation helpful? Give feedback.
Replies: 8 comments 3 replies
-
Do the readme files in those folders offer any insight? |
Beta Was this translation helpful? Give feedback.
-
Which folder? You listed nine folders above. |
Beta Was this translation helpful? Give feedback.
-
Also, I'd probably lean towards the most recent:
|
Beta Was this translation helpful? Give feedback.
-
Alright, after consulting the README files, these seem like the good candidates: ncbi-nr-20190925:Directory containing files for BLASTing against the NCBI nr database. nr FastA file was downloaded 20200924 by SJW from: ftp://ftp.ncbi.nih.gov/blast/db/FASTA/nr.gz FILES
ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdmp.zip
/gscratch/srlab/programs/diamond-2.0.4/diamond makedb --in nr.faa -d nr --taxonmap prot.accession2taxid.gz --taxonnodes nodes.dmp --taxonnames names.dmp
ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/prot.accession2taxid.gz ncbi-nr-nt-v5NCBI non-redundant (nr) nucleotide (nt) database v5. Downloaded 20190105 from: https://ftp.ncbi.nlm.nih.gov/blast/db/v5/ ncbi-sp-v5_20210224/gscratch/srlab/blastdbs/ncbi-sp-v5_20210224 NCBI SwissProt V5 database. Specifically for use with NCBI BLAST 2.8.1+ (and higher). Downloaded 200210224 by SJW from: https://ftp.ncbi.nlm.nih.gov/blast/db/v5/swissprot_v5.tar.gz End of choicesAlright, so of the three, I'm not quite sure which would be optimal. Would it be optimal to determine taxa by compare the transcriptome to nucleotide sequences, or should I instead use BLASTx to compare to protein sequences? And if the latter, which database should be used? |
Beta Was this translation helpful? Give feedback.
-
Answered today during class - I'll be using ncbi-sp-v5_20210224! |
Beta Was this translation helpful? Give feedback.
-
for documentation purposes note that ncbi has nothing to do with
swiss-prot, thus labelling is confusing.
…On Thu, Mar 4, 2021 at 1:09 PM afcoyle ***@***.***> wrote:
Answered today during class - I'll be using ncbi-sp-v5_20210224!
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1130 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABB4PN6BK3WU7TPDMOLHUMDTB7YZXANCNFSM4YSAMRMQ>
.
|
Beta Was this translation helpful? Give feedback.
-
Sort of. It's a SwissProt BLAST library supplied by NCBI. |
Beta Was this translation helpful? Give feedback.
-
Hmm, alright, gotcha. So my goal is to BLAST Transcriptome v2.0 against the entire NCBI database - in that case, should I choose to BLAST against ncbi-nr-20190925 (most recent, unsure if a standard BLAST is possible, given that the README indicates it's a DIAMOND BLAST database) or the NCBI non-redundant nucleotide database (downloaded 2019-01-05)? Or does the Swiss-Prot database still fit the purposes here? |
Beta Was this translation helpful? Give feedback.
Answered today during class - I'll be using ncbi-sp-v5_20210224!