Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

table_annovar vs annotate_variation output #269

Open
sup3rgiu opened this issue Dec 17, 2024 · 3 comments
Open

table_annovar vs annotate_variation output #269

sup3rgiu opened this issue Dec 17, 2024 · 3 comments

Comments

@sup3rgiu
Copy link

Hi.

I'm using this sample file as input (let's call it to_annotate.avinput):

3	15295364	15295364	G	A

annotate_variation.pl

When I use annotate_variation:

perl $ANNOVAR_DIR/annotate_variation.pl \
		to_annotate.avinput $ANNOVAR_DIR/humandb/ \
		-build hg19 \
		-out output \
		--geneanno -dbtype refGene

the output is the following:

upstream;downstream	SH3BP5-AS1(dist=327);CAPN7,SH3BP5(dist=327)	3	15295364	15295364	G	A

table_annovar.pl

While when I use table_annovar:

perl $ANNOVAR_DIR/table_annovar.pl \
		to_annotate.avinput $ANNOVAR_DIR/humandb/ \
		-buildver hg19 \
		-out output  \
		-remove \
		-protocol refGene \
		-operation g \
		-nastring .	

the output is the following:

Chr	Start	End	Ref	Alt	Func.refGene	Gene.refGene	GeneDetail.refGene	ExonicFunc.refGene	AAChange.refGene
3	15295364	15295364	G	A	upstream;downstream	SH3BP5-AS1;CAPN7;SH3BP5	dist=327;dist=327	.	.

Issue

The issue is that when

  • annotate_variation is used, genes are separated by both , and ; --> SH3BP5-AS1(dist=327);CAPN7,SH3BP5(dist=327)
  • table_annovar is used, genes are separated only by ; --> SH3BP5-AS1;CAPN7;SH3BP5

This prevents me to correctly assign the genes to the functions when using table_annovar (that is the recommended command to use)

@kaichop
Copy link
Contributor

kaichop commented Dec 17, 2024 via email

@sup3rgiu
Copy link
Author

@kaichop thanks a lot! Let me know if you need some help with testing 🙂

@sup3rgiu
Copy link
Author

sup3rgiu commented Dec 18, 2024

I've looked at the code.

Inside the geneOperation() function of the table_annovar.pl file, we have (line 508 of revision c66762679205bdc00c64e465ac6ccd8f62132e2f, date 2020-06-08 00:46:07 -0400 (Mon, 8 Jun 2020)):

while (<FUNCTION>)
{
	s/[\r\n]+$//;
	m/^([^\t]+)\t([^\t]+)\t(\S+\s+\S+\s+\S+\s+\S+\s+\S+).*/ or die "Error: invalid record found in annovar outputfile: <$_>\n";		#example: splicing        KLK12(NM_019598:exon4:c.457+6T>C,NM_145894:exon4:c.457+6T>C,NM_001370125:exon4:c.457+6T>C)
	my ($function, $gene, $varstring) = ($1, $2, $3);
	my $spliceanno = '';
	$varstring =~ s/\s+/\t/g;
	while ($gene =~ m/\(([^)]+)\)/g) {
		$spliceanno .= "$1;";
	}
	chop $spliceanno if $gene =~ s/\(([^)]+)\)//g;
	$spliceanno =~ tr/,/;/;
	$gene =~ tr/,/;/;

The problem seems to be easily solved by commenting out the last line:

$gene =~ tr/,/;/;

This works for my purposes, but I don't know if it breaks anything else.

However, there is a similar problem for the GeneDetail column, since:

  • annotate_variation: SH3BP5-AS1(dist=327);CAPN7,SH3BP5(dist=327)
  • table_annovar: dist=327;dist=327

So, using table_annovar I don't know to which gene the dist refers to.
This is due the while loop:

while ($gene =~ m/\(([^)]+)\)/g) {
    $spliceanno .= "$1;";
}

but this time a fix should be based on some formatting logic.
A possible solution would be to add an empty value for genes without details:
dist=327;;dist=327 (note the double ;; as we have no detail for the middle gene CAPN7).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants