You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This may be two separate issues, but I'm having jobs fail either with a Java Runtime Environment SIGBUS (0x7) error, or with FATAL: Couldn't determine user account information: user: unknown userid [some number]
In both cases, the process that fails and the file associated with the error changes with each rerun.
With subsequent reruns, the process and sample that failed previously will complete, but a new process and sample will error out. Eventually the pipeline can be completed with multiple reruns.
The pipeline runs fine with the test profile.
From what I can figure out the Java Runtime Environment error has something to do with running out of memory in the java vm.
but I'm stumped on the userid error.
I have made a small modification to the modules.config file to add --no-model param to MACS. But I am not seeing any errors with MACS, and again, the test profile runs as expected.
It turns out this is almost certainly an issue on the HPC side.
slurm / admins were causing jobs that were submitted to one partition to get redirected to a different partition.
By default, nextflow polls job status from the submitted partition, and if it can't find the job things get messed up.
We have been able to fix this by adding the following to our config file:
executor {
name = "slurm"
queueGlobalStatus = true
}
This will tell nextflow to poll for job status globally and not just within the submitted partition.
It would be great if this was either the default behavior from nextflow, or there was a more informative error message in cases where a job has been redirected.
Description of the bug
This may be two separate issues, but I'm having jobs fail either with a Java Runtime Environment
SIGBUS (0x7)
error, or withFATAL: Couldn't determine user account information: user: unknown userid [some number]
In both cases, the process that fails and the file associated with the error changes with each rerun.
With subsequent reruns, the process and sample that failed previously will complete, but a new process and sample will error out. Eventually the pipeline can be completed with multiple reruns.
The pipeline runs fine with the test profile.
From what I can figure out the Java Runtime Environment error has something to do with running out of memory in the java vm.
but I'm stumped on the userid error.
I have made a small modification to the
modules.config
file to add--no-model
param to MACS. But I am not seeing any errors with MACS, and again, the test profile runs as expected.Any help would be greatly appreciated!
Command used and terminal output
Relevant files
bug.zip
System information
Nextflow: 23.04.2
Hardware: HPC
Executor: slurm
Container: Singularity
OS: RHEL8
nf-core/chipseq: 2.0.0
The text was updated successfully, but these errors were encountered: