Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SEACR input and spike-in normalization #77

Open
jordan841220 opened this issue Sep 8, 2023 · 0 comments
Open

SEACR input and spike-in normalization #77

jordan841220 opened this issue Sep 8, 2023 · 0 comments

Comments

@jordan841220
Copy link

jordan841220 commented Sep 8, 2023

Thank you for your contribution in this amazing tool. My question might be naive but I couldn't find any explanation in the paper or github (or maybe I simply ignored it).

  1. From the peak calling section in the pipeline as described below, I found that the input bedgraph file of SEACR was inherently the intermediate ouput of macs2 callpeak. I was curious why this CUT&RUN pipeline was designed like this, instead of converting the original BAM/BED file to bedgraph file using bedtools genomecov, and use it as the input of SEACR (I have tried and the results were very different), which was wrote in the SEACR tutorial.
# macs2 narrow peak calling
>&2 echo "[info] macs2 narrow peak calling"
$macs2bin/macs2 callpeak -t $bam_file -g $macs2_genome -f BAMPE -n $base_file --outdir $outdir -q 0.01 -B --SPMR --keep-dup all 2> $logdir/"$base_file".macs2.narrow

# macs2 broad peak calling
>&2 echo "[info] macs2 broad peak calling"
$macs2bin/macs2 callpeak -t $bam_file -g $macs2_genome -f BAMPE -n $base_file --outdir $outdirbroad --broad --broad-cutoff 0.1 -B --SPMR --keep-dup all 2> $logdir/"$base_file".macs2.broad

# macs2 broad peak calling
>&2 echo "[info] Getting broad peak summits"
$pythonbin/python $extratoolsbin/get_summits_broadPeak.py $outdirbroad/"$base_file"_peaks.broadPeak | $bedopsbin/sort-bed - > $outdirbroad/"$base_file"_summits.bed

# SEACR peak calls
>&2 echo "[info] SEACR stringent peak calling"
$macs2bin/macs2 callpeak -t $bam_file -g $macs2_genome -f BAMPE -n $base_file --outdir $outdirseac -q 0.01 -B --SPMR --keep-dup all 2> $logdir/"$base_file".seacr
$pythonbin/python $extratoolsbin/change.bdg.py $outdirseac/"$base_file"_treat_pileup.bdg > $outdirseac/"$base_file"_treat_integer.bdg
chmod +x $extratoolsbin/SEACR_1.1.sh
$extratoolsbin/SEACR_1.1.sh $outdirseac/"$base_file"_treat_integer.bdg 0.01 non stringent $outdirseac/"$base_file"_treat $Rscriptbin
$bedopsbin/sort-bed $outdirseac/"$base_file"_treat.stringent.bed > $outdirseac/"$base_file"_treat.stringent.sort.bed
$pythonbin/python $extratoolsbin/get_summits_seacr.py $outdirseac/"$base_file"_treat.stringent.bed|$bedopsbin/sort-bed - > $outdirseac/"$base_file"_treat.stringent.sort.summits.bed
for i in _summits.bed _peaks.xls _peaks.narrowPeak _control_lambda.bdg _treat_pileup.bdg; do 
    rm -rf $outdirseac/"$base_file"$i
done
  1. The pipeline conducted the spike_in normalization using bamCoverage after peak calling. Isn't it make sense to use the spike_in-normalized bedgraph file to perform peak calling?

Thanks!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant