Skip to content

Commit

Permalink
update with final-ish notebooks for workshop
Browse files Browse the repository at this point in the history
  • Loading branch information
janash committed Jul 10, 2024
1 parent 4ad6305 commit e76f322
Show file tree
Hide file tree
Showing 21 changed files with 2,533 additions and 698 deletions.
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -160,3 +160,8 @@ cython_debug/
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/

ligands/
ligands_to_dock/
protein_structures/
pdbqt/
75 changes: 24 additions & 51 deletions EC_class_ligands_search.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -7,25 +7,20 @@
"source": [
"# Enzyme Commission Class with Ligands\n",
"\n",
"\n",
"`````{admonition} Overview\n",
":class: overview\n",
"\n",
"Questions\n",
"<div class=\"alert alert-block alert-info\"> \n",
"<h2>Overview</h2>\n",
" \n",
"<strong>Questions</strong>\n",
"\n",
"* How are enzymes classified?\n",
"\n",
"* How can I search the PDB for ligands that bind to a specific enzyme class?\n",
"\n",
"\n",
"Objectives\n",
"<strong>Learning Objectives</strong>\n",
"\n",
"* Understand the hierarchical classification of enzymes using the Enzyme Commission (EC) system.\n",
"\n",
"* Use the RCSB Search API to find ligands that bind to a specific enzyme class.\n",
"\n",
"`````\n",
"\n",
"</div>\n",
"Enzymes ar\n",
"Enzymes are biological catalysts and most enzymes are proteins (at least that's our current thinking). \n",
"To systematize the study of enzymes, IUPAC (the International Union of Pure and Applied Chemistry) has organized enzymes in a hierarchical class structure, with 7 top level classes and a total of 4 levels in the hierarchy. \n",
"\n",
Expand Down Expand Up @@ -111,7 +106,7 @@
"- EC Class. I will focus on the EC class for trypsin, 3.4.21.4, but any class should work.\n",
"- Ligands. I am looking for ligands that are larger than a single atom (e.g., potassium ion) or a buffer molecule (phosphate), but of a size that consists of 10-30 heavy atoms, so I will aim for a molecular weight between 300 and 800.\n",
"\n",
"Please note that you can use this interface to search for dozens of attributes associated with a PDB entry. The attribute that we will use to look for proteins that have the EC# = 3.4.21.4 is `rcsb_polymer_entity.rcsb_ec_lineage.id`. Other searchable attributes include the abbreviated journal title for the primary citation, `rcsb_primary_citation.rcsb_journal_abbrev`, the method used to determine the structure `exptl.method`, or specific molecules that are part of PDB entries `pdbx_reference_molecule.class'. "
"Please note that you can use this interface to search for dozens of attributes associated with a PDB entry. The attribute that we will use to look for proteins that have the EC# = 3.4.21.4 is `rcsb_polymer_entity.rcsb_ec_lineage.id`. Other searchable attributes include the abbreviated journal title for the primary citation, `rcsb_primary_citation.rcsb_journal_abbrev`, the method used to determine the structure `exptl.method`, or specific molecules that are part of PDB entries `pdbx_reference_molecule.class`. "
]
},
{
Expand All @@ -125,17 +120,17 @@
"\n",
"ECnumber = \"3.4.21.4\" # We will use this variable again later\n",
"\n",
"q1 = attrs.rcsb_polymer_entity.rcsb_ec_lineage.id == ECnumber # looking for trypsins\n",
"q2 = attrs.chem_comp.formula_weight >= 300 # setting the lower limit for molecular weight\n",
"q3 = attrs.chem_comp.formula_weight <= 800 # setting the upper limit for molecular weight\n",
"q1 = attrs.rcsb_polymer_entity.rcsb_ec_lineage.id == ECnumber # looking for trypsin structures with EC = 3.4.21.4\n",
"q2 = attrs.chem_comp.formula_weight >= # setting the lower limit for molecular weight\n",
"q3 = attrs.chem_comp.formula_weight <= # setting the upper limit for molecular weight\n",
"\n",
"query = q1 & q2 & q3 # combining the three queries into one\n",
"\n",
"resultL = list(query()) # assign the results of the query to a list variable\n",
"\n",
"print(resultL[0:10]) # list the first 10 results\n",
"\n",
"len(resultL)"
"print(\"There are\", len(resultL), \"trypsin structures that contain ligands in the RCSB PDB.\")"
]
},
{
Expand All @@ -156,7 +151,7 @@
"\n",
"The last statement in the previous cell\n",
"\n",
"`len(resultL)`\n",
"`print(\"There are\", len(resultL), \"trypsin structures that contain ligands in the RCSB PDB.\")`\n",
"\n",
"tells us how many PDB entries have ligands of that size. The default return item for the query is `structure`, which provides the four character alphanumeric entry for the full structure in the PDB. We want to identify and download the ligands that are bound to these PDB structures, so we need to switch return types. \n",
"\n",
Expand All @@ -170,7 +165,7 @@
"metadata": {},
"outputs": [],
"source": [
"molResultL = list(query(\"mol_definition\"))\n",
"molResultL = list(query(\"\"))\n",
"print(\"There are\",len(molResultL), \"ligands for EC Number\", ECnumber, \"in this list. Here is a list of the first 10 ligands.\")\n",
"molResultL[0:10]"
]
Expand All @@ -188,7 +183,7 @@
"\n",
"![Small molecule file formats that can be downloaded from the RCSB PDB](images/SmallMoleculeFilesTable.png \"a title\")\n",
"\n",
"From this table, we want the ligand files in mol2 format, which we will later convert to another format called `pdbqt` for docking."
"From this table, we want the ideal coordinate ligand files in mol2 format, which we will later convert to another format called `pdbqt` for docking."
]
},
{
Expand All @@ -208,8 +203,8 @@
"metadata": {},
"outputs": [],
"source": [
"import requests # to enable us to pull files from the PDB\n",
"import os # to enable us to create a directory to store the files"
"import # to enable us to pull files from the PDB\n",
"import # to enable us to create a directory to store the files"
]
},
{
Expand All @@ -233,7 +228,7 @@
"source": [
"# check to see that the file downloaded properly. A status code of 200 means everything is okay.\n",
"\n",
"res11U_mol2.status_code"
"res11U_mol2"
]
},
{
Expand All @@ -246,7 +241,7 @@
"# To really be sure, let's look at the file one line at a time. First we write the downloaded content to a file.\n",
"\n",
"# make a ligands folder for our results\n",
"os.makedirs(\"ligands\", exist_ok=True)\n",
"os.makedirs(\"\", exist_ok=True)\n",
"\n",
"with open(\"ligands/res11U.mol2\", \"w+\") as file:\n",
" file.write(res11U_mol2.text)"
Expand All @@ -265,7 +260,7 @@
"file1 = open('ligands/res11U.mol2', 'r')\n",
"file_text = file1.read() # This reads in the file as a string.\n",
"\n",
"print(file_text)"
"print()"
]
},
{
Expand Down Expand Up @@ -321,35 +316,13 @@
"id": "26026341",
"metadata": {},
"source": [
"### Exercise\n",
"<div class=\"alert alert-block alert-warning\"> \n",
"<h3>Exercise</h3>\n",
"\n",
"To go a bit deeper with these tools, use the [BRENDA Enzyme Database](https://www.brenda-enzymes.org/) to find the EC# for alcohol dehydrogenase (or look for an enzyme that interests you). How many structures have ligands with molecular weights between 400 and 700? How many unique ligands are bound to these structures? \n",
"\n",
"Note: You can enter only the upper levels of an EC Class to identify more ligands. This exercise can be repeated with any EC#. If you have time, try a broader search where you use only 2 or 3 levels, e.g., 3.4 or 3.4.21, and see what you find."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8b9106d8",
"metadata": {},
"outputs": [],
"source": [
"### Solution\n",
"\n",
"ECnumber = \"1.1.1.1\" # We will use this variable again later\n",
"\n",
"q1 = attrs.rcsb_polymer_entity.rcsb_ec_lineage.id == ECnumber # looking for trypsins\n",
"q2 = attrs.chem_comp.formula_weight >= 400 # setting the lower limit for molecular weight\n",
"q3 = attrs.chem_comp.formula_weight <= 700 # setting the upper limit for molecular weight\n",
"\n",
"query = q1 & q2 & q3 # combining the three queries into one\n",
"\n",
"ResultL = list(query(\"entry\"))\n",
"molResultL = list(query(\"mol_definition\"))\n",
"print(\"There are\",len(ResultL), \"structures from EC Number\", ECnumber, \"that have bound ligands with molecular weights between 400 and 700).\")\n",
"print(\"There are\",len(molResultL), \"unique ligands for structures with EC Number\", ECnumber, \"in this list. Here is a list of the\", len(molResultL), \"ligands.\")\n",
"molResultL"
"Note: You can enter only the upper levels of an EC Class to identify more ligands. This exercise can be repeated with any EC#. If you have time, try a broader search where you use only 2 or 3 levels, e.g., 3.4 or 3.4.21, and see what you find.\n",
"</div>"
]
},
{
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,6 @@ Four notebooks and an environment have been set up for the IQB 2024 workshop, [P
## Notebooks

1. Enzyme Commission Class with Ligands (EC_ligand_search.ipynb)
2. Visualizing the Binding Site of a Protein-Ligand Complex (binding_site_investigation.ipynb)
3. Modifying Ligands with RDKit(molecule_manipulation.ipynb)
2. Modifying Ligands with RDKit(molecule_manipulation.ipynb)
3. Docking Preparation (docking_preparation.ipynb)
4. Docking with AutoDock Vina (docking_single_ligand.ipynb)
Loading

0 comments on commit e76f322

Please sign in to comment.