-
Notifications
You must be signed in to change notification settings - Fork 41
Preview mode
iPyrad includes a preview mode for subsampling your data and running steps quickly to test your configuration. Here are some notes on preview mode for each of the steps.
NB: You should definitely NOT run preview mode on any assembly step that has already completed a full run to your satisfaction. If you preview a step that has already completed the output files for this step will be overwritten by the output from preview.
Passing preview=True
to the top level run command will set preview mode for step1 only. This is probably the most common use case. If you subsample the raw data initially, then steps 2-7 should all run quickly. In this mode on a standard desktop with real data (1 plate/95 samples/~300 Million reads total) here are approximate runtimes:
- step 1:
- 500k < 2 minutes
- 1Mil ~ 5 minutes
- step 2:
- 500K < 2 minutes
- 1Mil ~6 minutes
- 4Mil ~15 minutes
- step 3:
- 500k ~ 90 minutes (~1 minute/sample)
- step 4: < 2 minutes
- step 5: ~ 60 minutes (< 1 minute/sample)
- step 6:
Running preview on step one will look at just one raw fastq file (or a pair in the case of paired-end). Before any demultiplexing happens step1 will take a subset of the raw data, the default length of this subset is 2,000,000 lines. On real data this will run in approximately 4 minutes on an average desktop. If you wish to subset more or less of the raw data for this step be aware that sizes < 1,000,000 lines will probably not adequately sample enough sequences per individual, so further steps may not work well.
Step 2 preview is useful if you have already run step 1 to completion and have fully demultiplexed all your raw data. In this case you can use step2(preview=True)
to continue previewing steps downstream, without having to rerun step 1 over and over.
Previewing step 3 is handy in a couple cases. Imagine you want to try out different clustering parameters to get a feel for how well they work, or if you want to try aligning to different reference sequences without committing to a full run. On real data with preview subsampling from step 1
No preview mode for these steps. Best strategy for previewing these steps is to run step3 in preview mode and then just run steps 4-6 after that. The preview of step3 will effectively subsample the data and all the rest of the steps will run relatively quickly.
No preview