examples to run? #6

jianshu93 · 2023-12-02T02:03:48Z

Hello Team,

I was trying to run the perl script to have output from many sequences, is there an example input for the perl script (I have already successfully compiled everything needed). I want to compare OMH estimated Jaccard index with exact jaccard for 1000 sequences, so all versus all.

Thanks,

Jianshu

danfdeblasio · 2023-12-04T16:39:39Z

Hi Jianshu,

Happy to hear you're looking to use OMH, thanks for pointing out this issue. I think it stems mainly from some missing documentation.

I made the random sequences README a little more verbose, and I hope it helps. At the last minute we added the k-mer size to the input of the perl script and this was not documented. The input file is expecting the captured standard output from the generator python script, but as long as its a tab delimited list of pairs of sequences it should capture it okay and be able to output a list of OMH values (in the 4th column of another tab delimited file).

As an example, the output from generate_random_pairs.py would be something like (here this is k=5, n=10 with --trim on):

AACAACACCC AACACAAACC 4 4 6 0.09090909090909091 0.09090909090909091 3
ACACCAACAC ACCCCAAACC 4 4 5 0.0 0.0 3
CAACAAACCC CACAACACAA 6 1 6 0.1 0.1 4
AAACCACA00 AACCCAACAC 6 1 5 0.0 0.0 4
AACAACAAAA AACCCCAAAC 6 1 4 0.0 0.0 2
ACCAAAACCC ACCACCAACC 4 4 5 0.09090909090909091 0.09090909090909091 3
CACACACACA CCCCCAACAC 6 1 5 0.0 0.0 5
ACACCCACAA ACCCACCAC0 6 1 6 0.2 0.2 5
ACACAACAA0 ACACCAACAC 4 4 7 0.09090909090909091 0.09090909090909091 3
AAACCCCCAA AACCCCCA00 4 4 8 0.5 0.5 3
...

Then the output of the perl script is something like: (again with k=5 specified)

4 0.09090909090909091 0.09090909090909091 0 3
4 0.0 0.0 0 3
6 0.1 0.1 0 4
6 0.0 0.0 0 4
6 0.0 0.0 0 2
4 0.09090909090909091 0.09090909090909091 0 3
6 0.0 0.0 0 5
6 0.2 0.2 0.0222222 5
4 0.09090909090909091 0.09090909090909091 0 3
4 0.5 0.5 0.214286 3
...

The columns other than the 4th are copied from the input (those values are computed in the python script).

So if you were to input just sequence pairs, you would have extraneous tabs in the output, but it should work.

Let me know if you have additional questions.

Dan

danfdeblasio self-assigned this Dec 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

examples to run? #6

examples to run? #6

jianshu93 commented Dec 2, 2023

danfdeblasio commented Dec 4, 2023

examples to run? #6

examples to run? #6

Comments

jianshu93 commented Dec 2, 2023

danfdeblasio commented Dec 4, 2023