Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mztab exporter failing for big dataset PXD030304 #323

Closed
ypriverol opened this issue Nov 4, 2023 · 1 comment
Closed

mztab exporter failing for big dataset PXD030304 #323

ypriverol opened this issue Nov 4, 2023 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@ypriverol
Copy link
Member

Description of the bug


nf-core/quantms execution completed unsuccessfully!

The exit status of the task that caused the workflow execution to fail was: 1.

The full error message was:

Error executing process > 'NFCORE_QUANTMS:QUANTMS:DIA:DIANNCONVERT (PXD030304.sdrf)'

Caused by:
  Process `NFCORE_QUANTMS:QUANTMS:DIA:DIANNCONVERT (PXD030304.sdrf)` terminated with an error exit status (1)

Command executed:

  diann_convert.py convert \
      --folder ./ \
      --exp_design PXD030304.sdrf_openms_design.tsv \
      --diann_version ./version/versions.yml \
      --dia_params "40.0;ppm;40.0;ppm;Trypsin;Carbamidomethyl (C);" \
      --charge 4 \
      --missed_cleavages 1 \
      --qvalue_threshold 0.01 \
      2>&1 | tee convert_report.log
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_QUANTMS:QUANTMS:DIA:DIANNCONVERT":
      pyopenms: $(pip show pyopenms | grep "Version" | awk -F ': ' '{print $2}')
  END_VERSIONS

Command exit status:
  1

Command output:
    out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-assay_refs"] = ",".join(study_variable)
  /hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py:512: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value 'assay[6837],assay[6838],assay[6839],assay[6840],assay[6837]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
    out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-assay_refs"] = ",".join(study_variable)
  /hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py:513: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
    out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-description"] = "no description given"
  /hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py:513: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value 'no description given' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
    out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-description"] = "no description given"
  2023-11-04 18:05:44,988 [mztab_PRH] - Constructing PRH sub-table...
  2023-11-04 18:05:44,988 [mztab_PRH] - Input report shape: (240052070, 23), input pg shape: (8008, 6867), input index_ref shape: (6862, 6), input fasta_df shape: (20686, 3)
  2023-11-04 18:05:47,789 [mztab_PRH] - Classifying results type ...
  2023-11-04 18:05:47,948 [mztab_PRH] - Extracting accession values (keeping first)...
  Warning: OPENMS_DATA_PATH environment variable not found and no share directory was installed. Some functionality might not work as expected.
  Traceback (most recent call last):
    File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 1333, in 
      cli()
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
      return self.main(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1078, in main
      rv = self.invoke(ctx)
           ^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
      return _process_result(sub_ctx.command.invoke(sub_ctx))
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
      return ctx.invoke(self.callback, **ctx.params)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 783, in invoke
      return __callback(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/decorators.py", line 33, in new_func
      return f(get_current_context(), *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 144, in convert
      diann_directory.convert_to_mztab(
    File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 310, in convert_to_mztab
      PRH = mztab_PRH(report, pg, index_ref, database, fasta_df)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 598, in mztab_PRH
      out_mztab_PRH = pd.concat([out_mztab_PRH, protein_details_df]).reset_index(drop=True)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/pandas/core/reshape/concat.py", line 393, in concat
      return op.get_result()
             ^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/pandas/core/reshape/concat.py", line 676, in get_result
      indexers[ax] = obj_labels.get_indexer(new_labels)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 3874, in get_indexer
      raise InvalidIndexError(self._requires_unique_msg)
  pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects

Command wrapper:
    out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-assay_refs"] = ",".join(study_variable)
  /hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py:512: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value 'assay[6837],assay[6838],assay[6839],assay[6840],assay[6837]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
    out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-assay_refs"] = ",".join(study_variable)
  /hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py:513: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
    out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-description"] = "no description given"
  /hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py:513: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value 'no description given' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
    out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-description"] = "no description given"
  2023-11-04 18:05:44,988 [mztab_PRH] - Constructing PRH sub-table...
  2023-11-04 18:05:44,988 [mztab_PRH] - Input report shape: (240052070, 23), input pg shape: (8008, 6867), input index_ref shape: (6862, 6), input fasta_df shape: (20686, 3)
  2023-11-04 18:05:47,789 [mztab_PRH] - Classifying results type ...
  2023-11-04 18:05:47,948 [mztab_PRH] - Extracting accession values (keeping first)...
  Warning: OPENMS_DATA_PATH environment variable not found and no share directory was installed. Some functionality might not work as expected.
  Traceback (most recent call last):
    File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 1333, in 
      cli()
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
      return self.main(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1078, in main
      rv = self.invoke(ctx)
           ^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
      return _process_result(sub_ctx.command.invoke(sub_ctx))
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
      return ctx.invoke(self.callback, **ctx.params)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 783, in invoke
      return __callback(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/decorators.py", line 33, in new_func
      return f(get_current_context(), *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 144, in convert
      diann_directory.convert_to_mztab(
    File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 310, in convert_to_mztab
      PRH = mztab_PRH(report, pg, index_ref, database, fasta_df)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 598, in mztab_PRH
      out_mztab_PRH = pd.concat([out_mztab_PRH, protein_details_df]).reset_index(drop=True)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/pandas/core/reshape/concat.py", line 393, in concat
      return op.get_result()
             ^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/pandas/core/reshape/concat.py", line 676, in get_result
      indexers[ax] = obj_labels.get_indexer(new_labels)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 3874, in get_indexer
      raise InvalidIndexError(self._requires_unique_msg)
  pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects

Work dir:
  /hps/nobackup/juan/pride/reanalysis/absolute-expression/cell-lines/PXD030304/work/85/155baa81b4a6aa41867b31ddec1f9e

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

``

### Command used and terminal output

_No response_

### Relevant files

_No response_

### System information

_No response_
@ypriverol ypriverol added the bug Something isn't working label Nov 4, 2023
@ypriverol
Copy link
Member Author

I will close this issue. In favor of bigbio/quantms.io#31

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants