Replies: 2 comments 3 replies
-
One thing I would be interested in is head to head comparison of blast and diamond blast (same query/db/sertings) |
Beta Was this translation helpful? Give feedback.
3 replies
-
Personally, I'm happy to see similar results between the two when using BLASTx, but it doesn't clear up why there's such a large difference between the BLASTp and BLASTx results - which is what prompted this discussion. I still find that difference surprising. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'll start out by qualifying this post as possibly being moot from the start, as the following is definitely not an "apples to apples" comparison. Okay, with that out of the way...
Recently, Steven and I took two different approaches to annotating (via BLASTing) the C.virginica genome and we got very different results in the number of BLAST matches.
@sr320's approach (Notebook)
grep -c "^>" GCF_002022765.2_C_virginica-3.0_translated_cds.faa
)grep -o '\[gene=[^]]*\]' | grep -oP '(?<=\[gene=)[^]]*' GCF_002022765.2_C_virginica-3.0_translated_cds.faa | sort -u | wc -l
)Results:
BLAST
wc -l Cvir_transcds-uniprot_blastp.tab
)awk -F "|" '{print $3}' Cvir_transcds-uniprot_blastp.tab | sort -u | wc -l
)Gene Ontology
Sam's approach
Extract gene sequences as FastA (Notebook).
wc -l C_virginica-3.0_Gnomon_genes.bed
)Use reviewed UniProt proteins FastA as BLAST database (uniprot_sprot.fasta.gz).
Use DIAMOND to make DIAMOND BLAST database.
Use DIAMOND BLASTx to BLAST. (Notebook)
Results:
BLAST
wc -l GCF_002022765.2_C_virginica-3.0-genes.blastx.outfmt6
)awk -F "|" '{print $2}' GCF_002022765.2_C_virginica-3.0-genes.blastx.outfmt6 | sort -u | wc -l
)Sooo, what do people think about this? Anyone have any thoughts/opinions? E-value cutoffs for BLASTing for both approaches were 1e-20 and 1e-25 for each approach, respectively.
Beta Was this translation helpful? Give feedback.
All reactions