Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redirect tmp directory #938

Open
carleton-envbiotech opened this issue Feb 19, 2025 · 10 comments
Open

Redirect tmp directory #938

carleton-envbiotech opened this issue Feb 19, 2025 · 10 comments

Comments

@carleton-envbiotech
Copy link

carleton-envbiotech commented Feb 19, 2025

Hello,

I have run into an issue where I was running SqueezeMeta to test if it was able to complete a large co-assembly on 10 samples from different habitats. I inadvertently filled the 'tmp' directory on my server's personal account, which has basically killed all other jobs I had running.

I was wondering if there was a way to redirect where SqueezeMeta's writes to in terms of a tmp directory so I can redirect it to a much larger drive I have on my server. It seems to be tied to the perl scripts in the 01.merge.assemblies.pl and Transforming to afg format step.

Thanks in advance for any help you can provide.

@fpusan
Copy link
Collaborator

fpusan commented Feb 19, 2025

What files were written to your tmp directory exactly?
In principle we write everything into the directory specified by the user, but some of the programs we call may behave differently...

@carleton-envbiotech
Copy link
Author

carleton-envbiotech commented Feb 19, 2025

It looks like a file called: tmp.204864.seq which is ~ 39 Gb (based on ls -latr output for the tmp directory below)

Image

@fpusan
Copy link
Collaborator

fpusan commented Feb 19, 2025

What exact command did you use to run SqueezeMeta?

@carleton-envbiotech
Copy link
Author

carleton-envbiotech commented Feb 19, 2025

Here is the input command I used:

SqueezeMeta.pl \ -m merged \ -p eCycle_October_2023_MAGS_merged_SqueezeMeta \ -s eCycle_October_2023_short_read_metagenomic_sequencing_analyses/eCycle_October_2023_short_read_metagenomic_sequencing_coassembly_sample_sheet.txt \ -f /datastore/userdata/daniel/eCycle_short_read_metagenomic_sequencing_October_2023/ \ --cleaning Yes \ --a megahit \ -assembly_options "--presets meta-large" \ -c 2500 \ -map bowtie \ -binners maxbin,metabat2 \ -t 40

I ran this from a parent directory one level up from the 'eCycle_October_2023_MAGS_merged_SqueezeMeta' in a drive that has ~ 30 Tb storage left and all of the other SqueezeMeta files wrote to that directory

@fpusan
Copy link
Collaborator

fpusan commented Feb 19, 2025

Ok, the culprit is AMOS, itself called when using the merged and seqmerge modes (coasssembly and sequential would be fine).
Let me see if I can figure out how it gets that dir exactly ..

@fpusan
Copy link
Collaborator

fpusan commented Feb 19, 2025

Ok, pretty sure you can avoid this by setting the TMPDIR environmental variable before running SqueezeMeta

@carleton-envbiotech
Copy link
Author

Okay so it sounds like I should abort the current run, wipe the tmp directory creating the issue, but then resetting the TMPDIR variable and re-running. Is there anyway I could pick up the SqueezeMeta run using the -restart flag since the initial assemblies took so long? Ideally picking up from the STEP1 --> merging assemblies step where this hang up started

@fpusan
Copy link
Collaborator

fpusan commented Feb 19, 2025

Not sure about this one, maybe @jtamames can help, but if the initial assemblies took a lot already and that temp file was already 39Gb I suspect the process would stall even after fixing the tmpdir issue.
I would recommend sequential, probably

@carleton-envbiotech
Copy link
Author

Ended up killing the run to restore life to my server. My original intention was to limit test SqueezeMeta's performance, but more generally speaking, I don't have a biological reason to turn to coassembly in my experimental design at this time. My other intent was to ensure I had an output in SQMTools that would include all samples so I could compare them in heatmaps and barplots. Would I still be able to do this with sequential assembly or would it require combining multiple SQM objects into one master file? If the latter is the case, that is no issue and I will proceed with sequential assembly mode.

@fpusan
Copy link
Collaborator

fpusan commented Feb 21, 2025

You can run then sequentially and then analyzed them together with SQMtools (with some limitations that are lifted in the dev versión of SQMtools)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants