Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional columns/examples questions #35

Open
3 tasks
timosachsenberg opened this issue Jan 9, 2024 · 2 comments
Open
3 tasks

Additional columns/examples questions #35

timosachsenberg opened this issue Jan 9, 2024 · 2 comments
Assignees
Labels
documentation Improvements or additions to documentation enhancement New feature or request question Further information is requested

Comments

@timosachsenberg
Copy link
Contributor

timosachsenberg commented Jan 9, 2024

This is refereing to the protein tables (ae, de) maybe the examples are outdated.

  • how to handle protein groups (currently, accession were always one protein)
  • consider adding gene columns (my bet is that this will be requested a lot by external users)
  • columns between de and ae differ a lot (e.g., ae has more columns with condition, sample etc.). Is this intentional?
@timosachsenberg
Copy link
Contributor Author

timosachsenberg commented Jan 10, 2024

In https://github.com/bigbio/quantms.io/blob/dev/docs/feature.rst

  • Especially for TMT data the representation seems to be very redundant. E.g., it looks like for TMT10plex one would have 10 features. Each with all meta data or even PSM information / spectrum intensities.
  • mz_array and intensity_array could be renamed to mzs and intensities (or even fragment_mzs, fragment_intensities)

@ypriverol ypriverol added documentation Improvements or additions to documentation enhancement New feature or request question Further information is requested labels Jan 10, 2024
@jpfeuffer
Copy link
Contributor

jpfeuffer commented Jan 12, 2024

Regarding duplication. I raised the same concern but consensus was that parquet will only store unique string or at least compress them anyway.
I think this should be benchmarked for the most common use cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants