Documentation

Command line

This documentation provides details on the command-line arguments and options available in RNA3DB.

$ python -m rna3db [--cpu <cpus>] <command> [<args>]

    --cpu <cpus>: Number of CPUs to use when able (optional)

Parse mmCIF files and extract RNAs.

$ python -m rna3db parse <input> <output>

    <input>: Directory containing mmCIF files to parse.
    <output>: Output JSON file.

--nmr_resolution <float>
- Resolution to use for NMR structures.
- Default: float('inf')

Filter a JSON file based on various criteria.

$ python -m rna3db filter <input> <output> [options]
    <input>: Input JSON file.
    <output>: Output JSON file.

--single_ratio_cutoff <float>
- Filter chains where a single nucleotide makes up more than this fraction of residues.
- Default: 0.8
--max_unknown_ratio <float>
- Filter chains with more than this fraction of unknown nucleotides.
- Default: 0.3
--max_resolution <float>
- Filter chains over this resolution.
- Resolution is given in ångströms (Å).
- Default: 9.0
--min_length <int>
- Filter chains shorter than this length.
- Default: 32
--filter_log_path <path>
- Path to the filter log. The filter log shows which filters hit each sequence.
- Optional

Cluster RNAs by sequence and structure similarity.

$ python -m rna3db cluster <input> <output> [options]
    <input>: Input JSON file.
    <output>: Output JSON file.

--tbl_dir <path>
- Directory containing Infernal .tbl files.
- Not required when using --only_sequence
--min_seq_id <float>
- Minimum Sequence Identity.
- See: MMseqs2: clustering criteria
- Default: 0.99
--min_seq_coverage <float>
- Minimum Sequence Coverage.
- See: MMseqs2: clustering criteria
- Default: 0.99
--mmseqs_binary_path <path>
- Path to MMseqs2 binary.
- Can usually be inferred via $ which mmseqs, but may need to be provided if RNA3DB is unable to find a suitable path.
- Optional
--mmseqs_coverage_mode <int>
- MMseqs Coverage Mode.
- See: MMseqs2: How to set the right alignment coverage to cluster
- Default: 1
--mmseqs_sensitivity <float>
- MMseqs Sensitivity.
- See: MMseqs2: Optimizing sensitivity and consumption of resources
- Default: 7.5
--mmseqs_alignment_mode <int>
- MMseqs Alignment Mode.
- See: MMseqs2: Optimizing sensitivity and consumption of resources
- Default: 3
--structural_e_value_cutoff <float>
- Structural E-Value Cutoff used to build graph edges.
- Default: 1.0
--only_sequence
- Use only sequence information.
- Mutually exclusive with --only_structure
--only_structure
- Use only structure information.
- Mutually exclusive with --only_sequence

Split RNA data into training and test sets.

$ python -m rna3db split <input> <output> [options]
    <input>: Input JSON file.
    <output>: Output JSON file.

--train_percentage <float>
- Percentage of data for the train set.
- Default: 0.3
--force_zero_test
- Force component zero into the test set.
- Optional