Skip to content

Commit

Permalink
Merge pull request #3 from jtamames/0.3-singleReads
Browse files Browse the repository at this point in the history
0.3 single reads
  • Loading branch information
fpusan authored Oct 17, 2018
2 parents 15758d3 + abbb920 commit 89f43b3
Show file tree
Hide file tree
Showing 56 changed files with 108 additions and 49 deletions.
5 changes: 1 addition & 4 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,7 @@ db/*
test/*
download/*
scripts/squeezeM_conf.pl
*pyc
lib/checkm/DATA_CONFIG
lib/checkm/*pyc
lib/checkm/*/*.pyc
bin/*.pyc
lib/spades/*/*.pyc
lib/classifier/*
ToDo
15 changes: 12 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,8 @@ The command for running squeezeM has the following syntax:
* -f|-seq: Fastq read files' directory (REQUIRED)
* -t: Number of threads (Default:12)
* -a: assembler [megahit,spades] (Default:megahit)
* -c|-contiglen: Minimum length of contigs (Default:1200)
* -c|-contiglen: Minimum length of contigs (Default:1200)
* -map: Read mapper [bowtie,bwa,minimap2-ont,minimap2-pb,minimap2-sr] (Default: bowtie)
* --nocog: Skip COG assignment (Default: no)
* --nokegg: Skip KEGG assignment (Default: no)
* --nopfam: Skip Pfam assignment (Default: no)
Expand Down Expand Up @@ -129,7 +130,13 @@ The make_databases.pl script also downloads two datasets for testing that the pr

Alternative `-m sequential`, `-m merged` can be used.

## 6. License and third-party software
## 6. Working with Oxford MinION and PacBio reads.
Since version 0.3.0, squeezeM is able to seamlessly work with single-end reads. In order to obtain better mappings of MinION and PacBio reads agains the assembly, we advise to use minimap2 for read counting, by including the -map *minimap2-ont* (MinION) or *-map minimap2-pb* (PacBio) flags when calling squeezeM.

## 7. Setting up the mySQL database.
SqueezeM includes a built in MySQL database that can be queried via a web-based interface, in order to facilitate the exploration of metagenomic results. Code and instruction installations can be found at https://github.com/jtamames/SqueezeMdb.

## 8. License and third-party software
SqueezeM is distributed with a GPL-3 license.
Additionally, squeezeM redistributes the following third-party software:
* [Megahit](https://github.com/voutcn/megahit)
Expand All @@ -141,6 +148,8 @@ Additionally, squeezeM redistributes the following third-party software:
* [hmmer](http://hmmer.org/)
* [diamond](https://github.com/bbuchfink/diamond)
* [bedtools](https://github.com/arq5x/bedtools2)
* [bwa](https://github.com/lh3/bwa)
* [minimus2](https://github.com/lh3/minimap2)
* [bowtie2](http://bowtie-bio.sourceforge.net/bowtie2/index.shtml)
* [barrnap](https://github.com/tseemann/barrnap)
* [maxbin](https://downloads.jbei.org/data/microbial_communities/MaxBin/MaxBin.html)
Expand All @@ -149,7 +158,7 @@ Additionally, squeezeM redistributes the following third-party software:
* [MinPath](http://omics.informatics.indiana.edu/MinPath)
* [RDP classifier](https://github.com/rdpstaff/classifier)

## 7. About
## 9. About
SqueezeM is developed by Javier Tamames with collaboration from Fernando Puente-Sánchez. Feel free to contact us for support (jtamames@cnb.csic.es, fpuente@cnb.csic.es).


Binary file added bin/bwa
Binary file not shown.
Binary file added bin/minimap2
Binary file not shown.
Binary file removed bin/spades_init.pyc
Binary file not shown.
Binary file removed lib/spades/joblib2/__init__.pyc
Binary file not shown.
Binary file removed lib/spades/joblib2/disk.pyc
Binary file not shown.
Binary file removed lib/spades/joblib2/format_stack.pyc
Binary file not shown.
Binary file removed lib/spades/joblib2/func_inspect.pyc
Binary file not shown.
Binary file removed lib/spades/joblib2/functools.pyc
Binary file not shown.
Binary file removed lib/spades/joblib2/hashing.pyc
Binary file not shown.
Binary file removed lib/spades/joblib2/logger.pyc
Binary file not shown.
Binary file removed lib/spades/joblib2/memory.pyc
Binary file not shown.
Binary file removed lib/spades/joblib2/my_exceptions.pyc
Binary file not shown.
Binary file removed lib/spades/joblib2/numpy_pickle.pyc
Binary file not shown.
Binary file removed lib/spades/joblib2/parallel.pyc
Binary file not shown.
Binary file removed lib/spades/pyyaml2/__init__.pyc
Binary file not shown.
Binary file removed lib/spades/pyyaml2/composer.pyc
Binary file not shown.
Binary file removed lib/spades/pyyaml2/constructor.pyc
Binary file not shown.
Binary file removed lib/spades/pyyaml2/cyaml.pyc
Binary file not shown.
Binary file removed lib/spades/pyyaml2/dumper.pyc
Binary file not shown.
Binary file removed lib/spades/pyyaml2/emitter.pyc
Binary file not shown.
Binary file removed lib/spades/pyyaml2/error.pyc
Binary file not shown.
Binary file removed lib/spades/pyyaml2/events.pyc
Binary file not shown.
Binary file removed lib/spades/pyyaml2/loader.pyc
Binary file not shown.
Binary file removed lib/spades/pyyaml2/nodes.pyc
Binary file not shown.
Binary file removed lib/spades/pyyaml2/parser.pyc
Binary file not shown.
Binary file removed lib/spades/pyyaml2/reader.pyc
Binary file not shown.
Binary file removed lib/spades/pyyaml2/representer.pyc
Binary file not shown.
Binary file removed lib/spades/pyyaml2/resolver.pyc
Binary file not shown.
Binary file removed lib/spades/pyyaml2/scanner.pyc
Binary file not shown.
Binary file removed lib/spades/pyyaml2/serializer.pyc
Binary file not shown.
Binary file removed lib/spades/pyyaml2/tokens.pyc
Binary file not shown.
Binary file removed lib/spades/spades_pipeline/common/SeqIO.pyc
Binary file not shown.
Binary file removed lib/spades/spades_pipeline/common/__init__.pyc
Binary file not shown.
Binary file removed lib/spades/spades_pipeline/common/alignment.pyc
Binary file not shown.
Binary file removed lib/spades/spades_pipeline/common/sam_parser.pyc
Binary file not shown.
Binary file removed lib/spades/spades_pipeline/hammer_logic.pyc
Binary file not shown.
Binary file removed lib/spades/spades_pipeline/options_storage.pyc
Binary file not shown.
Binary file removed lib/spades/spades_pipeline/process_cfg.pyc
Binary file not shown.
Binary file removed lib/spades/spades_pipeline/spades_logic.pyc
Binary file not shown.
Binary file removed lib/spades/spades_pipeline/support.pyc
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
8 changes: 5 additions & 3 deletions scripts/01.run_assembly.pl
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#!/usr/bin/perl

#-- Part of squeezeM distribution. 01/05/2018 Original version, (c) Javier Tamames, CNB-CSIC
#-- Part of squeezeM distribution. 28/08/2018 for version 0.3.0, (c) Javier Tamames, CNB-CSIC
#-- Runs assembly programs (currently megahit or spades). Uses prinseq to filter out contigs by length (excluding small ones).

use strict;
Expand Down Expand Up @@ -29,11 +29,13 @@

if($assembler=~/megahit/i) {
$outassembly="$datapath/megahit/final.contigs.fa";
$command="$megahit_soft $assembler_options -1 $par1name -2 $par2name -t $numthreads -o $datapath/megahit";
if(-e $par2name) { $command="$megahit_soft $assembler_options -1 $par1name -2 $par2name -t $numthreads -o $datapath/megahit"; }
else { $command="$megahit_soft $assembler_options -r $par1name -t $numthreads -o $datapath/megahit"; } #-- Support for single reads
}
elsif($assembler=~/spades/i) {
$outassembly="$datapath/spades/contigs.fasta";
$command="$spades_soft $assembler_options --meta --pe1-1 $par1name --pe1-2 $par2name -m 400 -k 21,33,55,77,99,127 -t $numthreads -o $datapath/spades";
if(-e $par2name) { $command="$spades_soft $assembler_options --meta --pe1-1 $par1name --pe1-2 $par2name -m 400 -k 21,33,55,77,99,127 -t $numthreads -o $datapath/spades"; }
else { $command="$spades_soft $assembler_options --meta --s1 $par1name -m 400 -k 21,33,55,77,99,127 -t $numthreads -o $datapath/spades"; } #-- Support for single reads
}
else { die "Unrecognized assembler\n"; }

Expand Down
18 changes: 11 additions & 7 deletions scripts/01.run_assembly_merged.pl
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#!/usr/bin/perl

#-- Part of squeezeM distribution. 01/05/2018 Original version, (c) Javier Tamames, CNB-CSIC
#-- Part of squeezeM distribution. 28/08/2018 for version 0.3.0, (c) Javier Tamames, CNB-CSIC
#-- Runs assembly programs (currently megahit or spades) for several metagenomes that will be merged in the next step (merged mode).
#-- Uses prinseq to filter out contigs by length (excluding small ones).

Expand Down Expand Up @@ -38,7 +38,7 @@

#-- Prepare files for the assembly

my($par1name,$par2name);
my($par1name,$par2name,$command);
foreach my $thissample(sort keys %samplefiles) {
print "Working for sample $thissample\n";
my $cat1="cat ";
Expand All @@ -52,9 +52,11 @@
my $command="$cat1 > $par1name";
print "$command\n";
system($command);
$command="$cat2 > $par2name";
print "$command\n";
system($command);
if($cat2) { #-- Support for single reads
$command="$cat2 > $par2name";
print "$command\n";
system($command);
}
my $assemblyname;

#-- Run the assembly
Expand All @@ -63,7 +65,8 @@
if($assembler=~/megahit/i) {
system("rm -r $datapath/megahit");
$assemblyname="$datapath/megahit/$thissample.final.contigs.fa";
my $command="$megahit_soft -1 $par1name -2 $par2name --k-list 29,39,59,79,99,119,141 -t $numthreads -o $datapath/megahit";
if(-e $par2name) { $command="$megahit_soft $assembler_options -1 $par1name -2 $par2name --k-list 29,39,59,79,99,119,141 -t $numthreads -o $datapath/megahit"; }
else { $command="$megahit_soft $assembler_options -r $par1name --k-list 29,39,59,79,99,119,141 -t $numthreads -o $datapath/megahit"; } #-- Support for single reads
print "Running Megahit for $thissample: $command\n";
system $command;
system("mv $datapath/megahit/final.contigs.fa $assemblyname");
Expand All @@ -74,7 +77,8 @@
if($assembler=~/spades/i) {
system("rm -r $datapath/spades");
$assemblyname="$datapath/spades/$thissample.contigs.fasta";
my $command="$spades_soft --meta --pe1-1 $par1name --pe1-2 $par2name -m 400 -t $numthreads -o $datapath/spades";
if(-e $par2name) { $command="$spades_soft $assembler_options --meta --pe1-1 $par1name --pe1-2 $par2name -m 400 -t $numthreads -o $datapath/spades"; }
else { $command="$spades_soft $assembler_options --meta --s1 $par1name -m 400 -t $numthreads -o $datapath/spades"; } #-- Support for single reads
print "Running Spades for $thissample: $command\n";
system $command;
system("mv $datapath/spades/contigs.fasta $assemblyname");
Expand Down
77 changes: 60 additions & 17 deletions scripts/09.mapbamsamples.pl
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#!/usr/bin/perl

#-- Part of squeezeM distribution. 01/05/2018 Original version, (c) Javier Tamames, CNB-CSIC
#-- Part of squeezeM distribution. 28/08/2018 for version 0.3.0, (c) Javier Tamames, CNB-CSIC
#-- Calculates coverage/RPKM for genes/contigs by mapping back reads to the contigs and count how many fall in each gene/contig
#-- Uses bowtie2 for mapping, and bedtools for counting.
#-- WARNING! Bedtools version must be <0.24!
Expand All @@ -17,7 +17,8 @@

#-- Configuration variables from conf file

our($datapath,$bowtieref,$bowtie2_build_soft,$contigsfna,$mappingfile,$mode,$resultpath,$rpkmfile,$contigcov,$coveragefile,$bowtie2_x_soft,$gff_file,$tempdir,$numthreads,$scriptdir,$bedtools_soft);
our($datapath,$bowtieref,$bowtie2_build_soft,$contigsfna,$mappingfile,$mode,$resultpath,$rpkmfile,$contigcov,$coveragefile,$bowtie2_x_soft,
$mapper, $bwa_soft, $minimap2_soft, $gff_file,$tempdir,$numthreads,$scriptdir,$bedtools_soft);

my $keepsam=1; #-- Set to one, it keeps SAM files. Set to zero, it deletes them when no longer needed

Expand All @@ -37,7 +38,8 @@
next if !$_;
my @t=split(/\t/,$_);
next if(($mode eq "sequential") && ($t[0] ne $project));
if($t[2] eq "pair1") { $allsamples{$t[0]}{"$fastqdir/$t[1]"}=1; } else { $allsamples{$t[0]}{"$fastqdir/$t[1]"}=2; }
if($t[2] eq "pair1") { $allsamples{$t[0]}{"$fastqdir/$t[1]"}=1; }
elsif ($t[2] eq "pair2") { $allsamples{$t[0]}{"$fastqdir/$t[1]"}=2; }
}
close infile1;

Expand All @@ -46,13 +48,26 @@
my $nums;
print "Metagenomes found: $numsamples\n";

#-- Creates Bowtie2 reference for mapping (index the contigs)

if(-e "$bowtieref.1.bmy t2") {}
else {
my $bowtie_command="$bowtie2_build_soft --quiet $contigsfna $bowtieref";
system($bowtie_command);
}
#-- Creates Bowtie2 or BWA reference for mapping (index the contigs)

if($mapper eq "bowtie") {
if(-e "$bowtieref.1.bt2") {}
else {
print("Creating reference.\n");
my $bowtie_command="$bowtie2_build_soft --quiet $contigsfna $bowtieref";
system($bowtie_command);
}
}
elsif($mapper eq "bwa") {
if(-e "$bowtieref.bwt") {}
else {
print("Creating reference.\n");
my $bwa_command="$bwa_soft index -p $bowtieref $contigsfna";
system($bwa_command);
}
}


#-- Prepare output files

Expand Down Expand Up @@ -81,24 +96,52 @@
if($allsamples{$thissample}{$ifile}==1) { push(@pair1,$ifile); } else { push(@pair2,$ifile); }
}
my($par1name,$par2name);
if($pair1[0]=~/gz/) { $par1name="$project.$thissample.current_1.gz"; $par2name="$project.$thissample.current_2.gz"; }
else { $par1name="$project.$thissample.current_1"; $par2name="$project.$thissample.current_2";}
if($#pair1==0) { $command="ln -s $pair1[0] $tempdir/$par1name; ln -s $pair2[0] $tempdir/$par2name;"; }
else {
if($pair1[0]=~/gz/) { $par1name="$project.$thissample.current_1.gz"; }
else { $par1name="$project.$thissample.current_1"; }
if($pair2[0]=~/gz/) { $par2name="$project.$thissample.current_2.gz"; }
else { $par2name="$project.$thissample.current_2";}
if($#pair1==0) { $command="ln -s $pair1[0] $tempdir/$par1name;"; }
elsif($#pair1>0) {
my $a1=join(" ",@pair1);
$command="cat $a1 > $tempdir/$par1name; ";
}
if($#pair2==0) { $command.="ln -s $pair2[0] $tempdir/$par2name;"; }
elsif($#pair2>0) {
my $a2=join(" ",@pair2);
$command="cat $a1 > $tempdir/$par1name; cat $a2 > $tempdir/$par2name;";
$command.="cat $a2 > $tempdir/$par2name; cat $a2 > $tempdir/$par2name;";
}
print " Getting raw reads\n";
print "$command\n";
system $command;

#-- Now we start mapping reads against contigs

print " Aligning with Bowtie\n";
print " Aligning to reference...\n";
if($keepsam) { $outsam="$samdir/$project.$thissample.sam"; } else { $outsam="$samdir/$project.$thissample.current.sam"; }
if($formatseq eq "fasta") { $formatoption="-f"; }
$command="$bowtie2_x_soft -x $bowtieref $formatoption -1 $tempdir/$par1name -2 $tempdir/$par2name --quiet -p $numthreads -S $outsam";

#-- Support for single reads
if($mapper eq "bowtie") {
if($formatseq eq "fasta") { $formatoption="-f"; }
if(-e "$tempdir/$par2name") { $command="$bowtie2_x_soft -x $bowtieref $formatoption -1 $tempdir/$par1name -2 $tempdir/$par2name --quiet -p $numthreads -S $outsam"; }
else { $command="$bowtie2_x_soft -x $bowtieref $formatoption -U $tempdir/$par1name --quiet -p $numthreads -S $outsam"; } }
elsif($mapper eq "bwa") {
#Apparently bwa works seemlesly with fasta files as imput.
if(-e "$tempdir/$par2name") { $command="$bwa_soft mem $bowtieref $tempdir/$par1name $tempdir/$par2name -v 1 -t $numthreads > $outsam"; }
else { $command="$bwa_soft mem $bowtieref $tempdir/$par1name -v 1 -t $numthreads > $outsam"; } }
elsif($mapper eq "minimap2-ont") {
#Minimap2 does not require to create a reference beforehand, and work seamlessly with fasta as an input.
if(-e "$tempdir/$par2name") { $command="$minimap2_soft -ax map-ont $contigsfna $tempdir/$par1name $tempdir/$par2name -t $numthreads > $outsam"; }
else { $command="$minimap2_soft -ax map-ont $contigsfna $tempdir/$par1name -t $numthreads > $outsam"; } }
elsif($mapper eq "minimap2-pb") {
#Minimap2 does not require to create a reference beforehand, and work seamlessly with fasta as an input.
if(-e "$tempdir/$par2name") { $command="$minimap2_soft -ax map-pb $contigsfna $tempdir/$par1name $tempdir/$par2name -t $numthreads > $outsam"; }
else { $command="$minimap2_soft -ax map-pb $contigsfna $tempdir/$par1name -t $numthreads > $outsam"; } }
elsif($mapper eq "minimap2-sr") {
#Minimap2 does not require to create a reference beforehand, and work seamlessly with fasta as an input.
if(-e "$tempdir/$par2name") { $command="$minimap2_soft -ax sr $contigsfna $tempdir/$par1name $tempdir/$par2name -t $numthreads > $outsam"; }
else { $command="$minimap2_soft -ax sr $contigsfna $tempdir/$par1name -t $numthreads > $outsam"; } }


print "$command\n";
system $command;

Expand Down
4 changes: 2 additions & 2 deletions scripts/18.getcontigs.pl
Original file line number Diff line number Diff line change
Expand Up @@ -111,8 +111,8 @@

#-- Headers

print outfile1 "Created by $0, ",scalar localtime,"\n";
print outfile1 "#Contig ID\tTax\tChimerism\tGC perc\tLength\tNum genes\tBin ID";
print outfile1 "#Created by $0, ",scalar localtime,"\n";
print outfile1 "Contig ID\tTax\tChimerism\tGC perc\tLength\tNum genes\tBin ID";
foreach my $countfile(sort keys %allsamples) { print outfile1 "\tCoverage $countfile\tRPKM $countfile\tRaw $countfile"; }
print outfile1 "\n";

Expand Down
2 changes: 1 addition & 1 deletion scripts/preparing_databases/make_databases.pl
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@
###Update configuration files to reflect new db path.
print("\nUpdating configuration...\n");

#

my $checkm_manifest = "{\"dataRoot\": \"$databasedir\", \"remoteManifestURL\": \"https://data.ace.uq.edu.au/public/CheckM_databases/\", \"manifestType\": \"CheckM\", \"localManifestName\": \".dmanifest\", \"remoteManifestName\": \".dmanifest\"}\n";

open(outfile1, ">$installpath/lib/checkm/DATA_CONFIG") || die;
Expand Down
2 changes: 1 addition & 1 deletion scripts/preparing_databases/nrindex.pl
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
#-- Output file. MUST POINT TO DATABASE DIRECTORY

my $databasedir=$ARGV[0];
my $nrfasta="$databasedir/nr15122017.fasta"; #-- Previously downloaded
my $nrfasta="$databasedir/nr.faa"; #-- Previously downloaded

my $resdir="$databasedir/LCA_tax";
if(-d $resdir) {} else { system("mkdir $resdir"); }
Expand Down
2 changes: 1 addition & 1 deletion scripts/preparing_databases/parents.sql
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
create table parents(id TEXT PRIMARY KEY,fulltax TEXT);
create table parents(id TEXT PRIMARY KEY, fulltax TEXT, taxonomy_id INT);
.separator "\t"
--.import parents.txt parents
.exit
Expand Down
4 changes: 2 additions & 2 deletions scripts/preparing_databases/taxid_tree.pl
Original file line number Diff line number Diff line change
Expand Up @@ -37,8 +37,8 @@
chomp;
next if !$_;
%store=();
my ($id,$acc,$tax)=split(/\t/,$_);
print outfile1 "$id\t$acc";
my ($id,$tax)=split(/\t/,$_);
print outfile1 "$id";
# print "*$id\t*$acc\t*$tax\n";
my @k=split(/\;/,$tax);
foreach my $l(@k) {
Expand Down
Loading

0 comments on commit 89f43b3

Please sign in to comment.