SRAmetadata

This repository contains code, documentation and presentations on how to get metadata from SRA (Sequence Read Archive).

Getting and cleaning data

The Sqlite database for SRA was downloaded from the website: http://gbnci.abcc.ncifcrf.gov/backup/SRAmetadb.sqlite.gz

wget http://gbnci.abcc.ncifcrf.gov/backup/SRAmetadb.sqlite.gz
--2015-04-17 00:58:29--  http://gbnci.abcc.ncifcrf.gov/backup/SRAmetadb.sqlite.gz
Resolving gbnci.abcc.ncifcrf.gov... 129.43.40.100, 2607:f220:41d:4f4d::92
Connecting to gbnci.abcc.ncifcrf.gov|129.43.40.100|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 836509538 (798M) [application/x-gzip]
Saving to: “SRAmetadb.sqlite.gz”

100%[======================================================================================================>] 836.509.538 20,3M/s   in 35s     

2015-04-17 00:59:17 (22,8 MB/s) - “SRAmetadb.sqlite.gz” saved [836509538/836509538]

gunzip SRAmetadb.sqlite.gz

Create table for metadata and manifest file

The script define_and_get_fields_SRA.R creates a table with relevant metadata from SRA and an associated manifest file. Both files can be linked, since the order of rows is the same.

Output files:

all_illumina_sra_for_human.txt
manifest_file_illumina_sra_human

Generate random sample

Then, a sample of 3000 runs without replacement was made with the script sample_manifest_file.R. This script generates a file with the sample and a second file that maps the column number from the manifest file to the metadata fields in "sample_size_3000.txt". The value of the seed used to make the sample was 42.

Output files:

relationship_manifest_file-sample
sample_size_3000.txt

Notes

library_name has information about biological and technical replicates (manual curation??)

Name		Name	Last commit message	Last commit date
Latest commit History 231 Commits
SRA3k/infer_population		SRA3k/infer_population
population		population
presentations		presentations
sample_manifest_file		sample_manifest_file
README.md		README.md
absolute_asymptote.cpp		absolute_asymptote.cpp
add_ann.py		add_ann.py
ann.py		ann.py
associate.py		associate.py
asymptote.cpp		asymptote.cpp
asymptote.py		asymptote.py
asymptote.sh		asymptote.sh
cluster.cpp		cluster.cpp
define_and_get_fields_SRA.R		define_and_get_fields_SRA.R
define_and_get_fields_SRA.Rout		define_and_get_fields_SRA.Rout
extract_geo.py		extract_geo.py
extract_splice_sites.py		extract_splice_sites.py
geo_fields.py		geo_fields.py
heatmap.py		heatmap.py
index_to_SRA_accession.tsv		index_to_SRA_accession.tsv
intron_accuracy_experiments.sh		intron_accuracy_experiments.sh
intron_detection_experiment_align.sh		intron_detection_experiment_align.sh
intron_detection_experiment_prep.sh		intron_detection_experiment_prep.sh
jaccard_matrix.py		jaccard_matrix.py
manifest_file_illumina_sra_human		manifest_file_illumina_sra_human
number_of_samples.py		number_of_samples.py
sample_1_size_2000.txt		sample_1_size_2000.txt
sra_vs_ann.py		sra_vs_ann.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SRAmetadata

Getting and cleaning data

Create table for metadata and manifest file

Generate random sample

Notes

About

Releases

Packages

Contributors 3

Languages

leekgroup/SRAmetadata

Folders and files

Latest commit

History

Repository files navigation

SRAmetadata

Getting and cleaning data

Create table for metadata and manifest file

Generate random sample

Notes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages