Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Molecule investigation #59

Open
wants to merge 3 commits into
base: update
Choose a base branch
from

Conversation

ujamshed
Copy link
Collaborator

Currently there are 313 molecules that lack PCA or UMAP data. This causes issues on the front end because the assumption was that all molecules should have that data. Additionally, of the 313 molecules that lack PCA or UMAP data, 78 of them technically do have that data but as duplicates. This PR contains an investigation into this issue.

The front end issues have been resolved but we still need to tackle the duplicate issue in the DB.

The jupyter notebook shows all the information and the .pkl file contains the information about the duplicates.

IN PROGRESS

@ujamshed ujamshed requested a review from janash October 14, 2023 20:51
@janash
Copy link
Member

janash commented Nov 6, 2023

Thanks for looking into this! I am going to look over the duplicates that you've identified. We should think of a way to identify them, then also make sure our molecule table index is based on canonical smiles or something else that we will not duplicate.

I'm not sure about implementing the fix on the frontend/backend that is in this PR. The fix is based on flawed data, so I think it would be better to address the underlying data issue than to put a fix on the frontend/backend. What are your thoughts?

@ujamshed
Copy link
Collaborator Author

Yes, the fix for the front-end in this PR was just a hotfix because some of the searches would break the front end (on the neighbors page) or show incorrect data on the molecule page (in the neighbors graph). Depending on when we can fix the data dupes it might make sense to incorporate it and then remove it after once the data is correct. Alternatively, it might be nice to still have it as a sanity check so that even if data is missing in the backend or is incorrect, the front end will be able to handle it (for other data libraries).

@janash
Copy link
Member

janash commented Nov 13, 2023

Sounds good! Would you mind removing the exploratory notebooks/pkl and maybe put it on another repository for me to pull? I don't want to commit it to the main repo. I'll approve and merge after that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants