Skip to content
Marcell Szikszai edited this page Mar 6, 2024 · 2 revisions

Welcome to the RNA3DB Wiki

Project Highlights

  • While there is a recent surge in applying deep learning to RNA structure prediction, domain experts have raised concerns about generalization and current trends in benchmarking.

  • Many of the concerns primarily relate to how novel RNA families--i.e. families unseen in the training set--are benchmarked, and whether the models are effective at handling such cases. Performance on benchmarks reflective of real-world applications, such as CASP15, is poor for RNA deep learning models.

  • We present a dataset--RNA3DB--that is designed for training and benchmarking deep learning models for RNA structure prediction. RNA3DB provides coverage of all RNA chains found in the Protein Data Bank (PDB).

  • RNA3DB is clustered into groups that are both sequentially and structurally non-redundant, providing a robust way of creating training, validation, and testing sets for deep learning models. Along with the dataset, we also provide a transparent methodology as well as the source-code, making our tool both reproducible and customizable.

Wiki Contents

  • Building RNA3DB from Scratch: A comprehensive guide on how to build the RNA3DB dataset from the ground up.

  • Documentation: Detailed documentation of the command-line tools offered by RNA3DB for building custom datasets.

  • Understanding the JSON Files: Documentation of the JSON file format used in RNA3DB, guiding you through the information contained within these files and how to effectively utilize them in your projects.

Clone this wiki locally