You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue is for discussing themes of the perspective before we begin outlining sections. The goal is to not systematically review machine learning methods. There are already recent high-quality reviews we can refer to. We instead want to provide targeted discussion around what are best practices based on our observations from the literature and our own experiences. If there are gaps in the literature, we also want to identify those to shape future research.
A few initial ideas:
What does it mean to be in the "low N" regime? When does one have enough sequence-function data that special modeling techniques are no longer needed or have diminishing returns?
How to choose the initial sequences (training points) to test? What are people actually doing versus what options are available? Does this vary based on the protein function or engineering goals?
What are the best ways to evaluate success in the low N setting? What is actually being done in the literature (metrics, training set sizes, datasets, etc.) and do we agree that gives an effective assessment of what will work in practice? If not, what would be better evalution practices? Does the community need a low N FLIP?
What are the broad computational strategies that are applicable when there is little to no labeled sequence-function data initially? Active learning, reinforcement learning, Bayesian optimization, single round training/prediction, etc. have all been applied. We may not give equal attention to all of these, but when would an experimentalist prefer one class of approach over others?
The text was updated successfully, but these errors were encountered:
This issue is for discussing themes of the perspective before we begin outlining sections. The goal is to not systematically review machine learning methods. There are already recent high-quality reviews we can refer to. We instead want to provide targeted discussion around what are best practices based on our observations from the literature and our own experiences. If there are gaps in the literature, we also want to identify those to shape future research.
A few initial ideas:
The text was updated successfully, but these errors were encountered: