should we store stats as YAML (or json) #129

sergpolly · 2020-01-31T21:16:26Z

sergpolly
Jan 31, 2020
Maintainer

that's how we store stats now:

total_mapped    2189618376
total_nodups    1753432070
cis     1533122797
...
pair_types/WW   88884
pair_types/MU   404456330
...
cis_1kb+        998076606
cis_2kb+        836035718
...
chrom_freq/chr1/chr1    137125332
chrom_freq/chr1/chr10   1791283
...

it is hard to parse that and YAML would serve us just fine i believe - should we switch ?
would be useful for #78

Phlya · 2020-01-31T21:23:36Z

Phlya
Jan 31, 2020
Maintainer

It's quite easy to parse, I think... Just read as a table with pandas?

…

On Fri, Jan 31, 2020, 21:16 Sergey Venev ***@***.***> wrote: that's how we store stats now: total_mapped 2189618376 total_nodups 1753432070 cis 1533122797 ... pair_types/WW 88884 pair_types/MU 404456330 ... cis_1kb+ 998076606 cis_2kb+ 836035718 ... chrom_freq/chr1/chr1 137125332 chrom_freq/chr1/chr10 1791283 ... it is hard to parse that and YAML would serve us just fine i believe - should we switch ? would be useful for #78 <#78> — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#79>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAWCZORRRRE7LRPBF5A2XJLRASIKVANCNFSM4KONV4VA> .

0 replies

sergpolly · 2020-01-31T21:28:31Z

sergpolly
Jan 31, 2020
Maintainer Author

things like pairs_type:

...
pair_types/WW   88884
pair_types/MU   404456330
...

imply nested structure - i.e. I would want to parse it as

stats = {...,"pair_types": {"WW": 8884, "MU":40404000},...}

I'm not sure pandas would help with that

Also , for MultiQC - they don't want to rely on pandas for whatever reason - pandas isn't the smallest dependency I guess

0 replies

sergpolly · 2020-01-31T21:35:32Z

sergpolly
Jan 31, 2020
Maintainer Author

that's how we parse a typical stats file in the pairtools now: https://github.com/mirnylab/pairtools/blob/d1ddf9c39a336662f7fc725fa5a70ec68df9ba95/pairtools/pairtools_stats.py#L263

with standard YAML - that is great for storing nested dicst, and various small lists it would simply look like:

import yaml

stats_dict = yaml.load("sample.nodups.stats.yml")

and here is the ultimate goal:
https://multiqc.info/
https://multiqc.info/examples/hi-c/multiqc_report.html

0 replies

agalitsyna · 2022-04-14T09:54:22Z

agalitsyna
Apr 14, 2022
Maintainer

I updated pairtools stats output in yaml in version 1.0.0: https://github.com/open2c/pairtools/pull/117/files#diff-e4b8770efd538564222d48d69b00ed2c5012a76b35c926f1aba227fe45db2309

I guessed the best way to convert some fields, e.g. reporting chromosomes separated by slash instead of separate dict for each chromosome:

chrom_freq:
  chr1/chr1: 3
  chr1/chr2: 1
  chr2/chr3: 1

But this is minor and you may change it in the future.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

should we store stats as YAML (or json) #129

{{title}}

Replies: 4 comments

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

should we store stats as YAML (or json) #129

sergpolly Jan 31, 2020 Maintainer

Replies: 4 comments

Phlya Jan 31, 2020 Maintainer

sergpolly Jan 31, 2020 Maintainer Author

sergpolly Jan 31, 2020 Maintainer Author

agalitsyna Apr 14, 2022 Maintainer

sergpolly
Jan 31, 2020
Maintainer

Phlya
Jan 31, 2020
Maintainer

sergpolly
Jan 31, 2020
Maintainer Author

sergpolly
Jan 31, 2020
Maintainer Author

agalitsyna
Apr 14, 2022
Maintainer