pubmed_download

This is a clone of the original repository from the ONTOX GitHub page, created for portfolio visibility. Find the orginal GitHub page here

pubmed_download

Pubmed makes its entire database publicly availible. This data can be downloaded using the FTP protocol. The data is availible as a set of files that updates annualy or daily. This repository holds a script to download this data dump quickly by making use of parallel downloading.

Usage

The script pubmed_downloader_ftp.py is the code for a Command Line Interface (CLI). Find out how to use it by adding a --help flag:

python pubmed_downloader_ftp.py --help

Which will output the following:

In short the script visits the host that is supplied in the host argument, and moves into the directory specified in the 'host_wd' argument. Then it filters the files found there with a reg_ex supplied in the reg_ex argument. The script then begins downloading the filterd files in paralel. The threads argument indicates how many paralel downloads the script uses. WARNING too many paralel downloads might result in your IP adres being blocked by the National Center for Biotechnology Information. Please check out the terms and conditions

An example of the CLI being used can be found in run_ftp.sh.

python pubmed_downloader_ftp.py 'tmp' 'ftp.ncbi.nlm.nih.gov' '/pubmed/baseline/' 'pubmed23n\d{4}\.xml\.gz$' 4

Problems

pubmed_downloader.py Is a implementation of the pubmed downloader that doesn't make use of FTP. For some reason this implementation would skip some files.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
img		img
.Rhistory		.Rhistory
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
pubmed_downloader.py		pubmed_downloader.py
pubmed_downloader_ftp.py		pubmed_downloader_ftp.py
requirements.txt		requirements.txt
run_ftp.sh		run_ftp.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

This is a clone of the original repository from the ONTOX GitHub page, created for portfolio visibility. Find the orginal GitHub page here

pubmed_download

Usage

Problems

About

Releases

Packages

Languages

Larsdegroot/pubmed_download_

Folders and files

Latest commit

History

Repository files navigation

This is a clone of the original repository from the ONTOX GitHub page, created for portfolio visibility. Find the orginal GitHub page here

pubmed_download

Usage

Problems

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages