Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manuscript themes #1

Open
agitter opened this issue Aug 30, 2024 · 0 comments
Open

Manuscript themes #1

agitter opened this issue Aug 30, 2024 · 0 comments

Comments

@agitter
Copy link
Member

agitter commented Aug 30, 2024

This issue is for discussing themes of the perspective before we begin outlining sections. The goal is to not systematically review machine learning methods. There are already recent high-quality reviews we can refer to. We instead want to provide targeted discussion around what are best practices based on our observations from the literature and our own experiences. If there are gaps in the literature, we also want to identify those to shape future research.

A few initial ideas:

  • What does it mean to be in the "low N" regime? When does one have enough sequence-function data that special modeling techniques are no longer needed or have diminishing returns?
  • How to choose the initial sequences (training points) to test? What are people actually doing versus what options are available? Does this vary based on the protein function or engineering goals?
  • What are the best ways to evaluate success in the low N setting? What is actually being done in the literature (metrics, training set sizes, datasets, etc.) and do we agree that gives an effective assessment of what will work in practice? If not, what would be better evalution practices? Does the community need a low N FLIP?
  • What are the broad computational strategies that are applicable when there is little to no labeled sequence-function data initially? Active learning, reinforcement learning, Bayesian optimization, single round training/prediction, etc. have all been applied. We may not give equal attention to all of these, but when would an experimentalist prefer one class of approach over others?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant