Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Genes annotation ranges tampered after project creation #2252

Open
Baboon61 opened this issue Jan 8, 2025 · 3 comments
Open

Genes annotation ranges tampered after project creation #2252

Baboon61 opened this issue Jan 8, 2025 · 3 comments
Labels
bug Something isn't working

Comments

@Baboon61
Copy link

Baboon61 commented Jan 8, 2025

Granges coordinates of an handful of genes are modified after project creation (example with CBS, gene_id=875 in TxDb.Hsapiens.UCSC.hg38.knownGene, this gene does not fall into a blacklist region)

ArchR version : ArchR_1.0.2

How to reproduce :

library(TxDb.Hsapiens.UCSC.hg38.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene
txdb_genes <- genes(txdb)
txdb_genes[txdb_genes$gene_id==875,] #chr21:43053191-43076943
test_genes <- createGeneAnnotation(TxDb = TxDb.Hsapiens.UCSC.hg38.knownGene, OrgDb = org.Hs.eg.db)
test_genes$genes[test_genes$genes$gene_id==875] #chr21:43053191-43076943
addArchRGenome("hg38")
test_arrow <- createArrowFiles(
  inputFiles = fragpath,
  sampleNames = "test",
  geneAnnotation = getGeneAnnotation(),
  genomeAnnotation = getGenomeAnnotation(),
  minTSS = 0,
  minFrags = 0,
  maxFrags = 1e+10,
  addTileMat = FALSE,
  addGeneScoreMat = FALSE,
  offsetPlus = 0,
  offsetMinus = 0,
  force = TRUE
)
test_proj <- ArchRProject(
  ArrowFiles = test_arrow,
  outputDirectory = "my_path")
getGeneAnnotation(test_proj)$genes[getGeneAnnotation(test_proj)$genes$gene_id==875,] #chr21:6444869-43076943
@Baboon61 Baboon61 added the bug Something isn't working label Jan 8, 2025
@rcorces
Copy link
Collaborator

rcorces commented Jan 8, 2025

Hi @Baboon61! Thanks for using ArchR! Lately, it has been very challenging for me to keep up with maintenance of this package and all of my other
responsibilities as a PI. I have not been responding to issue posts and I have not been pushing updates to the software. We are actively searching to hire
a computational biologist to continue to develop and maintain ArchR and related tools. If you know someone who might be a good fit, please let us know!
In the meantime, your issue will likely go without a reply. Most issues with ArchR right not relate to compatibility. Try reverting to R 4.1 and Bioconductor 3.15.
Newer versions of Seurat and Matrix also are causing issues. Sorry for not being able to provide active support for this package at this time.

@Baboon61
Copy link
Author

I have put my finger on it, if it can help.

Note

TLDR : ArchR is fetching a TxDb.Hsapiens.UCSC.hg38.knownGene annotation from 6 years ago, settle in /data/ which was including some weird genes body lengths. It has been fixed in the latest releases of the same package.

When one is using addArchRGenome("hg38"), the genes annotation is not extracted from the last version of TxDb.Hsapiens.UCSC.hg38.knownGene as I though it would. It is actually fetching an annotation (probably from Bioconductor 3.9, from May 2019 with TxDb.Hsapiens.UCSC.hg38.knownGene==3.4.6) in ArchR github /data/.

This is true for all genomes supported by ArchR.

return(eval(parse(text=geneAnno)))

Tip

Load an updated version of gene annotation or your own annotation

First unload the genome from the global environment

options(ArchR.genome = NULL)

Load a new version of TxDb.Hsapiens.UCSC.hg38.knownGene, org.Hs.eg.db and BSgenome.Hsapiens.UCSC.hg38, or use a Granges from a GTF file.

library(TxDb.Hsapiens.UCSC.hg38.knownGene)
library(org.Hs.eg.db)
library(BSgenome.Hsapiens.UCSC.hg38)
geneAnnotation <- createGeneAnnotation(TxDb = TxDb.Hsapiens.UCSC.hg38.knownGene, OrgDb = org.Hs.eg.db)
genomeAnnotation <- createGenomeAnnotation(genome = BSgenome.Hsapiens.UCSC.hg38)

Create a new arrow file from the geneAnnotation and genomeAnonotation.

ArrowFiles <- createArrowFiles(
  inputFiles = fragpath,
  sampleNames = "test",
  validBarcodes = cell_in_tissue,
  geneAnnotation = geneAnnotation,
  genomeAnnotation = genomeAnnotation,
  minTSS = 0,
  minFrags = 0,
  maxFrags = 1e+10,
  addTileMat = TRUE,
  addGeneScoreMat = TRUE,
  offsetPlus = 0,
  offsetMinus = 0,
  TileMatParams = list(tileSize = 5000),
  force = TRUE
)

Create a new project.

proj_normalCreate <- ArchRProject(
  ArrowFiles = ArrowFiles ,
  outputDirectory = "my_dir",
  geneAnnotation = geneAnnotation,
  genomeAnnotation = genomeAnnotation,
  copyArrows = TRUE
)

@immanuelazn
Copy link
Collaborator

Thanks, I'll try to update this in the near future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants