-
Notifications
You must be signed in to change notification settings - Fork 7
Home
-
While there is a recent surge in applying deep learning to RNA structure prediction, domain experts have raised concerns about generalization and current trends in benchmarking.
-
Many of the concerns primarily relate to how novel RNA families--i.e. families unseen in the training set--are benchmarked, and whether the models are effective at handling such cases. Performance on benchmarks reflective of real-world applications, such as CASP15, is poor for RNA deep learning models.
-
We present a dataset--RNA3DB--that is designed for training and benchmarking deep learning models for RNA structure prediction. RNA3DB provides coverage of all RNA chains found in the Protein Data Bank (PDB).
-
RNA3DB is clustered into groups that are both sequentially and structurally non-redundant, providing a robust way of creating training, validation, and testing sets for deep learning models. Along with the dataset, we also provide a transparent methodology as well as the source-code, making our tool both reproducible and customizable.
-
Building RNA3DB from Scratch: A comprehensive guide on how to build the RNA3DB dataset from the ground up.
-
Documentation: Detailed documentation of the command-line tools offered by RNA3DB for building custom datasets.
-
Understanding the JSON Files: Documentation of the JSON file format used in RNA3DB, guiding you through the information contained within these files and how to effectively utilize them in your projects.