Replies: 4 comments
-
It's quite easy to parse, I think... Just read as a table with pandas?
…On Fri, Jan 31, 2020, 21:16 Sergey Venev ***@***.***> wrote:
that's how we store stats now:
total_mapped 2189618376
total_nodups 1753432070
cis 1533122797
...
pair_types/WW 88884
pair_types/MU 404456330
...
cis_1kb+ 998076606
cis_2kb+ 836035718
...
chrom_freq/chr1/chr1 137125332
chrom_freq/chr1/chr10 1791283
...
it is hard to parse that and YAML would serve us just fine i believe -
should we switch ?
would be useful for #78 <#78>
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#79>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAWCZORRRRE7LRPBF5A2XJLRASIKVANCNFSM4KONV4VA>
.
|
Beta Was this translation helpful? Give feedback.
-
things like pairs_type:
imply nested structure - i.e. I would want to parse it as stats = {...,"pair_types": {"WW": 8884, "MU":40404000},...} I'm not sure Also , for MultiQC - they don't want to rely on pandas for whatever reason - pandas isn't the smallest dependency I guess |
Beta Was this translation helpful? Give feedback.
-
that's how we parse a typical stats file in the pairtools now: https://github.com/mirnylab/pairtools/blob/d1ddf9c39a336662f7fc725fa5a70ec68df9ba95/pairtools/pairtools_stats.py#L263 with standard YAML - that is great for storing nested dicst, and various small lists it would simply look like:
and here is the ultimate goal: |
Beta Was this translation helpful? Give feedback.
-
I updated I guessed the best way to convert some fields, e.g. reporting chromosomes separated by slash instead of separate dict for each chromosome:
But this is minor and you may change it in the future. |
Beta Was this translation helpful? Give feedback.
-
that's how we store stats now:
it is hard to parse that and YAML would serve us just fine i believe - should we switch ?
would be useful for #78
Beta Was this translation helpful? Give feedback.
All reactions