Manuscript themes #1

agitter · 2024-08-30T14:05:39Z

This issue is for discussing themes of the perspective before we begin outlining sections. The goal is to not systematically review machine learning methods. There are already recent high-quality reviews we can refer to. We instead want to provide targeted discussion around what are best practices based on our observations from the literature and our own experiences. If there are gaps in the literature, we also want to identify those to shape future research.

A few initial ideas:

What does it mean to be in the "low N" regime? When does one have enough sequence-function data that special modeling techniques are no longer needed or have diminishing returns?
How to choose the initial sequences (training points) to test? What are people actually doing versus what options are available? Does this vary based on the protein function or engineering goals?
What are the best ways to evaluate success in the low N setting? What is actually being done in the literature (metrics, training set sizes, datasets, etc.) and do we agree that gives an effective assessment of what will work in practice? If not, what would be better evalution practices? Does the community need a low N FLIP?
What are the broad computational strategies that are applicable when there is little to no labeled sequence-function data initially? Active learning, reinforcement learning, Bayesian optimization, single round training/prediction, etc. have all been applied. We may not give equal attention to all of these, but when would an experimentalist prefer one class of approach over others?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Manuscript themes #1

Manuscript themes #1

agitter commented Aug 30, 2024

Manuscript themes #1

Manuscript themes #1

Comments

agitter commented Aug 30, 2024