Code for the paper "High-Throughput DNA melt measurements enable improved models of DNA folding thermodynamics" https://doi.org/10.1101/2024.01.08.574731. "NNN" stands for "Not-Nearest-Neighbor"
Jupyter notebooks 01.1 to 01.5 correspond to the 5 main figures.
Notebook 01.0_DataPrep.ipynb
performs data cleaning and train-val-test split from the output of the preprocessing pipeline.
Functions used for generation of figures are defined in nnn/
.
Three major conda environments were used:
- `nnn.yml` most analysis in the repository
- `torch.yml` for training and running graph neural networks
- `nn_train.yml` for fitting and running linear regression models.
Also available as a singularity container as defined in `nn_train.def`.
To install, make sure conda
is already installed, then run conda create -f {path/to/yml/file/name}
. For example, conda create -f envs/nnn.yml
.
The local conda environment nnn
is also directly exported to envs/nnn_environment.yml
. It is provide for record keeping, and envs/nnn.yml
is still recommended for installing from scratch.
Package RiboGraphViz
needs to be installed from cloned github repository, as directed at https://github.com/DasLab/RiboGraphViz.
NUPACK4 (v4.0.0.27) was manually installed from file as it requires a free liscence for download (https://docs.nupack.org/). As of Oct. 2024, NUPACK is temporarily free for academic personal users but may need paid subscription in the future.
- Register a new account to get an academic liscence. Verify your email.
- After logging in and acknowledging the liscence, you will find download links at https://www.nupack.org/download/software. Click
NUPACK 4.0.0.28
to download the zip file. - Unzip the zip file. Go to the directory
nupack-4.0.0.28/package
. - Choose one of the four
cp38
wheel files depending on your operating system. For example,nupack-4.0.0.28-cp38-cp38-macosx_10_13_x86_64.whl
on macos. - Run
conda activate nnn
to activate thennn
environment. - Run
pip install {path/to/your/nupack/whl/file}
to install the NUPACK python module.
Python scripts in scripts/
generates the sequences in the variant library and are helpful to understand library design logics.
Run gnn_run.py
in torch
envoronment, pointing to the path of the saved model state dict file.
For any questions, contact Yuxi Ke (kyx@stanford.edu).
Jan. 2024, updated Oct. 2024