Skip to content

Commit

Permalink
improve organisation of README
Browse files Browse the repository at this point in the history
  • Loading branch information
AlessioMilanese authored Sep 27, 2022
1 parent 2330748 commit 81bc12b
Showing 1 changed file with 28 additions and 23 deletions.
51 changes: 28 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,42 +1,42 @@
Download genomes used for mOTUs 3
Download mOTUs 3 genomes
========

This tool downloads the 700,000 genomes used for the mOTUs 3 database.
This tool allows to download all or any of the 700,000 genomes used for the mOTUs 3 database. The user can download a specific genome (type), all genomes associated with a specific mOTU, or the complete database.

## Installation
To run the script, clone this repository

First, clone this repository
```
git clone https://github.com/motu-tool/motus_v3_genomes
cd motus_v3_genomes
```

## Downloading genomes

The data will be downloaded into the same folder where the script is located, under a folder with the name `motus_v3_genomes`.

The structure is as follows:
- Within `motus_v3_genomes` there is a folder for each mOTU
- Within each mOTU folder are the genomes associated with this mOTU. In addition, there is a file `1.list_files.txt`. This file lists the paths to each of the downloaded genome files.
- If you run `motus_genomes_download -m all`, an additional file `1.list_all_files.txt` will be created within `motus_v3_genomes`.

Note that all files are first downloaded to `motus_v3_genomes/temp_dir` and moved to the final destination once the download and unzip are complete.

The script automatically checks the md5 sum of each of the downloaded files.


To download one genome type (for example the MAG `LIAN20-1_SAMN11649416_METAG_000035`):

```
python motus_genomes_download -m LIAN20-1_SAMN11649416_METAG_000035
```

To download all genomes from one motu (for example the ref mOTU `ref_mOTU_v3_00006`):
To download all genomes from one mOTU (for example the ref mOTU `ref_mOTU_v3_00006`):
```
python motus_genomes_download -m ref_mOTU_v3_00006
```

To download all genomes:
```
python motus_genomes_download -m all
```



The script automatically checks the md5 sum of the downloaded file.

The data will be downloaded in the same folder where the script is located, under a folder with the name `motus_v3_genomes`.
The structure is as follows:
- Within `motus_v3_genomes` there is a folder per mOTU
- Within each mOTU folder there is a file `1.list_files.txt` with the path to the files, and the files of the genomes are within the mOTU directory
- If you run `motus_genomes_download -m all` an additional file `1.list_all_files.txt` will be created within `motus_v3_genomes`.

Note that all files are first downloaded to `motus_v3_genomes/temp_dir` and moved to the final destination once the download and unzip is complete.

If you run the two commands listed above to download a genome and mOTU, you will end up with the following structure:
If you run the two commands above to download a genome and mOTU, you will end up with the following structure:
```
.
|-- README.md
Expand All @@ -54,7 +54,12 @@ If you run the two commands listed above to download a genome and mOTU, you will
`-- temp_dir
```

Finally, you can list all motus and the number of associated genomes (available for download) with:
To download all genomes:
```
python motus_genomes_download -m all
```

Finally, you can list all mOTUs and the number of associated genomes (available for download) with:
```
python motus_genomes_download -l
```
Expand Down

0 comments on commit 81bc12b

Please sign in to comment.