Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

peak annotation towards intergenic and intron #20

Open
deep-buddingcoder opened this issue May 11, 2023 · 3 comments
Open

peak annotation towards intergenic and intron #20

deep-buddingcoder opened this issue May 11, 2023 · 3 comments
Labels
question Further information is requested

Comments

@deep-buddingcoder
Copy link

Hi,

This is not a technical issue but a conceptual question.

Till date, I have been using HOMER for ChIP-seq and ATAC-seq peak annotation. As I intend to perform TOBIAS DNA foot printing analysis, I have decided to generate peak annotation file (required for TOBIAS BINDetect) using UROPA. In this regard, I plan to use NCBI RefSeq GTF file (which is also used by HOMER). Given is the weblink:
http://hgdownload.soe.ucsc.edu/goldenPath/archive/hg38/ncbiRefSeq/000001405.40-RS_2023_03/

The col 3 of GTF does not contain any information about promoter, intron, intergenic or intergenic_CNS (conserved non-coding sequence).

Yet, HOMER manages to find detail annotation for promoter, intron, intergenic regions, different types of RNA etc.

Will UROPA also perform detail annotation using NCBI RefSeq or am I working with a incorrect GTF file?

Thanks in anticipation for your help.

@msbentsen
Copy link
Member

Hi @deep-buddingcoder,

The .gtf-file you are using looks fine. UROPA does not automatically find the details about promoter etc., but you can set up specific queries for that, for example as seen in the example config file here: https://github.com/loosolab/UROPA/blob/master/sample_config.json. The example shows promoters, forward exons or levels, but you can put introns, intergenic etc. as well depending on the setup:
image

In that way, the promoter-information is not given in the third column, but you set it yourself in the UROPA run. Hope that makes sense.

@msbentsen msbentsen added the question Further information is requested label May 16, 2023
@deep-buddingcoder
Copy link
Author

Thanks for the suggestion. I definitely missed this bit of information about config file structure. I will work on it and then update the status of this issue.

@samuelruizperez
Copy link

samuelruizperez commented Oct 14, 2023

For introns and intergenic regions, you could also first run AGAT's:

agat_sp_add_introns.pl \
    -f hg38.000001405.40-RS_2023_03.ncbiRefSeq.gtf \
    --out hg38.000001405.40-RS_2023_03.ncbiRefSeq.wIntrons.gff3

agat_sp_add_intergenic_regions.pl \
    -f hg38.000001405.40-RS_2023_03.ncbiRefSeq.wIntrons.gff3 \
    --out hg38.000001405.40-RS_2023_03.ncbiRefSeq.wIntrons.wIntergenic.gff3

# Merge main annotation with other features (promoter, enhancer, RNAs annotations, etc.)
agat_sp_merge_annotations.pl \
    -f hg38.000001405.40-RS_2023_03.ncbiRefSeq.wIntrons.wIntergenic.gff3 \
    -f hg38.enhancers.gtf \
    -f hg38.rnas.gff \
    --out hg38.merged.gff3

agat_convert_sp_gff2gtf.pl \
    --gff hg38.merged.gff3 \
    --gtf_version relax \
    --out hg38.merged.gtf
grep -v "^#" hg38.merged.gtf | sort -k1,1 -k4,4n \
    > hg38.merged.sorted.gtf

And then use intron and intergenic_region (or other merged features) as independent features in the uropa_config.json file:

{
    "queries":[
        {"name": "inferred_TSS_promoter", "feature":"gene", "feature.anchor": "start", "distance":[1000,100], "internals":"True", "direction":"upstream"},
        {"name": "inferred_TTS", "feature":"gene", "feature.anchor": "end", "distance":[100,1000], "internals":"True", "direction":"downstream"},
        {"name": "cds", "feature":"CDS", "distance":[1,1], "internals":"True"},
        {"name": "five_prime_utr", "feature":"five_prime_UTR", "distance":[1,1], "internals":"True"},
        {"name": "three_prime_utr", "feature":"three_prime_UTR", "distance":[1,1], "internals":"True"},
        {"name": "exonic", "feature":"exon", "distance":[1,1], "internals":"True"},
        {"name": "intronic", "feature":"intron", "distance":[1,1], "internals":"True"},
        {"name": "intergenic", "feature":"intergenic_region", "distance":[1,1], "internals":"True"}
    ],
    "priority": "True",
    "gtf": "hg38.merged.sorted.gtf",
    "bed": "your.bed"
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants