diff --git a/.github/workflows/python-app.yml b/.github/workflows/python-app.yml index e2f4687..f5724b5 100644 --- a/.github/workflows/python-app.yml +++ b/.github/workflows/python-app.yml @@ -13,7 +13,7 @@ jobs: build: runs-on: ubuntu-latest - + strategy: matrix: py_version: ["3.8", "3.9", "3.10", "3.11", "3.12"] @@ -30,7 +30,7 @@ jobs: pip install --upgrade . pip install git+https://github.com/pckroon/pysmiles.git pip install -r requirements-tests.txt - + - name: Run pytest with codecoverage run: pytest --cov cgsmiles --cov-report=xml # - name: Upload coverage codecov @@ -55,6 +55,31 @@ jobs: pip install --upgrade setuptools pip pip install --upgrade . pip install -r requirements-tests.txt - - name: Run pylint + - name: Run pylint run: | pylint --disable=fixme --fail-under=8.0 cgsmiles + + docs: + runs-on: ubuntu-latest + + steps: + - uses: actions/checkout@v4 + - name: Set up Python 3 + uses: actions/setup-python@v5 + with: + python-version: '3.x' + cache: pip + cache-dependency-path: | + **/setup.cfg + **/requirements-*.txt + **/pyproject.toml + - name: Install dependencies + run: | + pip install --upgrade setuptools pip + pip install --upgrade . + pip install -r requirements-docs.txt + + - name: Run docs + run: | + mkdir -p docs/source/_static + sphinx-build -E -b html docs/source/ docs/build/html diff --git a/.readthedocs.yml b/.readthedocs.yml new file mode 100644 index 0000000..c9774e4 --- /dev/null +++ b/.readthedocs.yml @@ -0,0 +1,29 @@ +# .readthedocs.yaml +# Read the Docs configuration file +# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details + +# Required +version: 2 + +# Set the version of Python and other tools you might need +build: + os: ubuntu-22.04 + tools: + python: "3" + +formats: + - pdf + +# Build documentation in the docs/ directory with Sphinx +sphinx: + builder: html + fail_on_warning: true + configuration: docs/source/conf.py + +# We recommend specifying your dependencies to enable reproducible builds: +# https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html +python: + install: + - method: pip + path: . + - requirements: requirements-docs.txt diff --git a/README.md b/README.md deleted file mode 100644 index ab85a1b..0000000 --- a/README.md +++ /dev/null @@ -1,2 +0,0 @@ -# CGsmiles -Coarse-Grained SMILES (CGsmiles) for representing abitrarily complex molecules using line notation diff --git a/README.rst b/README.rst new file mode 100644 index 0000000..e1b36e0 --- /dev/null +++ b/README.rst @@ -0,0 +1,92 @@ +================================ +Coarse-Grained SMILES (CGsmiles) +================================ + +Overview +======== + +The CGSmiles line notation encodes arbitrary resolutions of molecules and +defines the conversion between these resolutions unambiguously. For example, +in coarse-grained (CG) simulations multiple atoms are represented as one large +pseudo-atom often called bead. The conversion from the atomic resolution to +the CG resolution can be described using the CGSmiles notation. In the +`Martini 3 force field `__, Benzene is represented +as three particles. The CGSmiles string would be: + +.. code:: + + "{[#TC5]1[#TC5][#TC5]1}.{#TC5=[$]cc[$]}" + +Additionally, multiple resolutions may be layered together so that a hirachical +description between one or more CG resolutions becomes possible. Especially, +expressing large polymeric molecules becomes simpler when using multiple +resolution. For instance consider the copolymer +`Styreic-Melic Acid `__. +It is an almost perfectly alternating polymer of maleic anhydrade and styrene. +In CGSmiles, we can thus write 100 repeat units of this polymer by using three +resolutions each contained in curly braces: + +.. code:: + + "{[#SMA]|100}.{#SMA=[#PS][#MAH]}.{#PS=[>]CC[<]c1ccccc1,#MHA=[<]C1C(=O)CC(=O)C1[>]}" + +The CGSmiles Python package is created around this notation to read, write, and +further process the resulting graphs. Reading and resolving provides the all the +molecule information in form of `NetworkX graphs `__, +providing an easy way to interface with other python libraries. + +There are a number of other packages and libraries, which use CGSmiles. They are +mostly used for coarse-grained modelling with the Martini force field or atomic +resolution molecular dynamics simulations. More informtion about the syntax and +the different use cases can be found in this documentation. If you are here from +one of the packages using CGSmiles check out the GettingStarted section to learn +the syntax. + +Examples +======== + +The CGSmiles python package is designed to read and resolve these smiles +into networkx graphs that can be used for further tasks, for example drawing +the relation between two resolutions (i.e. the mapping). + +Martini 3 Benzene + +.. code:: python + + import cgsmiles + from cgsmiles.drawing import draw_molecule + + # Martini 3 Benzene + cgsmiles_str = "{[#TC5]1[#TC5][#TC5]1}.{#TC5=[$]cc[$]}" + + # Resolve molecule into networkx graphs + res_graph, mol_graph = cgsmiles.MoleculeResolver.from_string(cgsmiles_str).resolve() + + # Draw molecule at different resolutions + ax, pos = draw_molecule(mol_graph) + +Resources +========= + +- here go some resources + +Related Tools +============= + +- `pysmiles `__: + Lightweight python library for reading and writing SMILES. CGSmiles runs + pysmiles in the background for interpreting atomic resolution fragments. + +- `polyply `__: + Generate topology files and coordinates for molecular dynamics (MD) + from CGSmiles notation. It takes CGSmiles as input to generate all-atom or + coarse-grained topologies and input parameters. + +- `fast_forward `__: + Forward map molecular dynamics trajectories from a high to lower resolution using + CGSmiles. + +Citation +======== + +When using **cgsmiles** to for your publication, please: diff --git a/docs/Makefile b/docs/Makefile new file mode 100644 index 0000000..fc49f84 --- /dev/null +++ b/docs/Makefile @@ -0,0 +1,20 @@ +# Minimal makefile for Sphinx documentation +# + +# You can set these variables from the command line. +SPHINXOPTS = +SPHINXBUILD = sphinx-build +SPHINXPROJ = CGSmiles +SOURCEDIR = source +BUILDDIR = build + +# Put it first so that "make" without argument is like "make help". +help: + @$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) + +.PHONY: help Makefile + +# Catch-all target: route all unknown targets to Sphinx using the new +# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). +%: Makefile + @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) diff --git a/docs/source/.DS_Store b/docs/source/.DS_Store new file mode 100644 index 0000000..8e03952 Binary files /dev/null and b/docs/source/.DS_Store differ diff --git a/docs/source/api/cgsmiles.cgsmiles_utils.rst b/docs/source/api/cgsmiles.cgsmiles_utils.rst new file mode 100644 index 0000000..d8318b7 --- /dev/null +++ b/docs/source/api/cgsmiles.cgsmiles_utils.rst @@ -0,0 +1,5 @@ +cgsmiles.cgsmiles\_utils module +=============================== + +.. automodule:: cgsmiles.cgsmiles_utils + :members: diff --git a/docs/source/api/cgsmiles.dialects.rst b/docs/source/api/cgsmiles.dialects.rst new file mode 100644 index 0000000..ff31b2a --- /dev/null +++ b/docs/source/api/cgsmiles.dialects.rst @@ -0,0 +1,5 @@ +cgsmiles.dialects module +======================== + +.. automodule:: cgsmiles.dialects + :members: diff --git a/docs/source/api/cgsmiles.drawing.rst b/docs/source/api/cgsmiles.drawing.rst new file mode 100644 index 0000000..7434966 --- /dev/null +++ b/docs/source/api/cgsmiles.drawing.rst @@ -0,0 +1,5 @@ +cgsmiles.drawing module +======================= + +.. automodule:: cgsmiles.drawing + :members: diff --git a/docs/source/api/cgsmiles.drawing_utils.rst b/docs/source/api/cgsmiles.drawing_utils.rst new file mode 100644 index 0000000..28aca43 --- /dev/null +++ b/docs/source/api/cgsmiles.drawing_utils.rst @@ -0,0 +1,5 @@ +cgsmiles.drawing\_utils module +============================== + +.. automodule:: cgsmiles.drawing_utils + :members: diff --git a/docs/source/api/cgsmiles.graph_layout.rst b/docs/source/api/cgsmiles.graph_layout.rst new file mode 100644 index 0000000..dfa6475 --- /dev/null +++ b/docs/source/api/cgsmiles.graph_layout.rst @@ -0,0 +1,5 @@ +cgsmiles.graph\_layout module +============================= + +.. automodule:: cgsmiles.graph_layout + :members: diff --git a/docs/source/api/cgsmiles.graph_layout_utils.rst b/docs/source/api/cgsmiles.graph_layout_utils.rst new file mode 100644 index 0000000..9911e38 --- /dev/null +++ b/docs/source/api/cgsmiles.graph_layout_utils.rst @@ -0,0 +1,5 @@ +cgsmiles.graph\_layout\_utils module +==================================== + +.. automodule:: cgsmiles.graph_layout_utils + :members: diff --git a/docs/source/api/cgsmiles.graph_utils.rst b/docs/source/api/cgsmiles.graph_utils.rst new file mode 100644 index 0000000..aa19a75 --- /dev/null +++ b/docs/source/api/cgsmiles.graph_utils.rst @@ -0,0 +1,5 @@ +cgsmiles.graph\_utils module +============================ + +.. automodule:: cgsmiles.graph_utils + :members: diff --git a/docs/source/api/cgsmiles.linalg_functions.rst b/docs/source/api/cgsmiles.linalg_functions.rst new file mode 100644 index 0000000..f54abe3 --- /dev/null +++ b/docs/source/api/cgsmiles.linalg_functions.rst @@ -0,0 +1,5 @@ +cgsmiles.linalg\_functions module +================================= + +.. automodule:: cgsmiles.linalg_functions + :members: diff --git a/docs/source/api/cgsmiles.pysmiles_utils.rst b/docs/source/api/cgsmiles.pysmiles_utils.rst new file mode 100644 index 0000000..c81bd11 --- /dev/null +++ b/docs/source/api/cgsmiles.pysmiles_utils.rst @@ -0,0 +1,5 @@ +cgsmiles.pysmiles\_utils module +=============================== + +.. automodule:: cgsmiles.pysmiles_utils + :members: diff --git a/docs/source/api/cgsmiles.read_cgsmiles.rst b/docs/source/api/cgsmiles.read_cgsmiles.rst new file mode 100644 index 0000000..c35fd15 --- /dev/null +++ b/docs/source/api/cgsmiles.read_cgsmiles.rst @@ -0,0 +1,5 @@ +cgsmiles.read\_cgsmiles module +============================== + +.. automodule:: cgsmiles.read_cgsmiles + :members: diff --git a/docs/source/api/cgsmiles.read_fragments.rst b/docs/source/api/cgsmiles.read_fragments.rst new file mode 100644 index 0000000..460734b --- /dev/null +++ b/docs/source/api/cgsmiles.read_fragments.rst @@ -0,0 +1,5 @@ +cgsmiles.read\_fragments module +=============================== + +.. automodule:: cgsmiles.read_fragments + :members: diff --git a/docs/source/api/cgsmiles.resolve.rst b/docs/source/api/cgsmiles.resolve.rst new file mode 100644 index 0000000..f849828 --- /dev/null +++ b/docs/source/api/cgsmiles.resolve.rst @@ -0,0 +1,5 @@ +cgsmiles.resolve module +======================= + +.. automodule:: cgsmiles.resolve + :members: diff --git a/docs/source/api/cgsmiles.rst b/docs/source/api/cgsmiles.rst new file mode 100644 index 0000000..20daf42 --- /dev/null +++ b/docs/source/api/cgsmiles.rst @@ -0,0 +1,29 @@ +cgsmiles package +================ + +Submodules +---------- + +.. toctree:: + :maxdepth: 4 + + cgsmiles.cgsmiles_utils + cgsmiles.dialects + cgsmiles.drawing + cgsmiles.drawing_utils + cgsmiles.graph_layout + cgsmiles.graph_layout_utils + cgsmiles.graph_utils + cgsmiles.linalg_functions + cgsmiles.pysmiles_utils + cgsmiles.read_cgsmiles + cgsmiles.read_fragments + cgsmiles.resolve + cgsmiles.sample + cgsmiles.write_cgsmiles + +Module contents +--------------- + +.. automodule:: cgsmiles + :members: diff --git a/docs/source/api/cgsmiles.sample.rst b/docs/source/api/cgsmiles.sample.rst new file mode 100644 index 0000000..2030ce7 --- /dev/null +++ b/docs/source/api/cgsmiles.sample.rst @@ -0,0 +1,5 @@ +cgsmiles.sample module +====================== + +.. automodule:: cgsmiles.sample + :members: diff --git a/docs/source/api/cgsmiles.write_cgsmiles.rst b/docs/source/api/cgsmiles.write_cgsmiles.rst new file mode 100644 index 0000000..5e7d965 --- /dev/null +++ b/docs/source/api/cgsmiles.write_cgsmiles.rst @@ -0,0 +1,5 @@ +cgsmiles.write\_cgsmiles module +=============================== + +.. automodule:: cgsmiles.write_cgsmiles + :members: diff --git a/docs/source/api/modules.rst b/docs/source/api/modules.rst new file mode 100644 index 0000000..49f10d2 --- /dev/null +++ b/docs/source/api/modules.rst @@ -0,0 +1,7 @@ +cgsmiles +======== + +.. toctree:: + :maxdepth: 4 + + cgsmiles diff --git a/docs/source/api/overview.rst b/docs/source/api/overview.rst new file mode 100644 index 0000000..5744b21 --- /dev/null +++ b/docs/source/api/overview.rst @@ -0,0 +1,177 @@ +Overview +======== +The API is designed to read, write, and interpret CGSmiles string. +Detailed information can be found in the module documentation. +This overview page provides some quick tutorial style explanation +of the main functionalities. + +Reading CGSmiles +---------------- +A CGSmiles string can contain a base-graph (see Syntax Rules) and +multiple enumerations of fragment graphs each corresonding to a +different resolution. The base graph can be read using the +``read_cgsmiles`` function, while the fragments can be read using +the ``read_fragments`` function. However, most user will find it +convienient to directly read the entire string and resolve the +different resolutions. This is done using the ``MoleculeResolver`` +class. + +First we need to import the ``MoleculeResolver`` and initate it +using the ``from_string`` or one of the other initator methods. +Note that we can specify if the last resolution is at the atomic +level by providing ``last_all_atom=True`` argument. + +.. code-block:: python + + from cgsmiles import MoleculeResolver + cgsmiles_string = '{[#TC5]1[#TC5][#TC5]1}.{#TC5=[$]cc[$]}' + resolver = MoleculeResolver.from_string(cgsmiles_string, + last_all_atom=True) + +Next we can resolve the atomic resolution from the CG graph by +running the ``.resolve`` function once. + +.. code-block:: python + + cg_graph, aa_graph = resolver.resolve() + +For multiple resolutions we can run the ``resolver`` function +multiple times. Each time a new set of graphs at a coarse level +and the next finer level is returned. Alternatively, the +``resolve_iter`` can be used to loop over all resolutions. Let's +take the molecule in Figure 3 of the main paper: + +.. code-block:: python + + from cgsmiles import MoleculeResolver + # CGSmiles string with 3 resolutions + cgsmiles_str = "{[#hphilic][#hdphob]|3[#hphilic]}.\ + {#hphilic=[<][#PEO][>]|3,#hdphob=[<][#PMA][>]([#BUT])}.\ + {#PEO=[<][#SN3r][>],#PMA=[<][#TC3][>][#SN4a][$],#BUT=[$][#SC3][$]}.\ + {#SN3r=[<]COC[>],#TC3=[<]CC[>][$1],#SN4a=[$1]C(=O)OC[$2],#SC3=[$2]CCC}" + # Generate the MoleculeResolver + resolver = MoleculeResolver.from_string(cgsmiles_str, last_all_atom=True) + + # Now we can loop over all resolutions using + for coarse_graph, finer_graph in resolver.resolve_iter(): + print(coarse_graph.nodes(data='fragname')) + print(finer_graph.nodes(data='atomname')) + +Alternatively, we could just have gotten the final two pairs by calling +``.resolve_all()``. + +Drawing CGSmiles +---------------- +It is very easy to check the correctness of a CGSmiles string by +simply drawing the molecule and the mapping to the coarser level. +Drawing molecules can be accomplished using the drawing module. + +The drawing function takes any networkx graph, assumes it is a +molecule and makes a 2D drawing using an `vespr` layout. The +following example demonstrates how to draw the mapping for +Martini 3 Benzene. + +.. code:: python + + import cgsmiles + from cgsmiles.drawing import draw_molecule + + # Martini 3 Benzene + cgsmiles_str = "{[#TC5]1[#TC5][#TC5]1}.{#TC5=[$]cc[$]}" + + # Resolve molecule into networkx graphs + res_graph, mol_graph = cgsmiles.MoleculeResolver.from_string(cgsmiles_str).resolve() + + # Draw molecule at different resolutions + ax, pos = draw_molecule(mol_graph) + +By setting ``cg_mapping=False`` only the atomic resolution molecule is drawn. +Sometimes it is handy to not draw all the hydrogen atoms of a molecule. To do so +one can use pysmiles to remove hydrogen atoms and then simply provide node +labels that will show the atom plus the hydrogen count. The example below +illustrates this for poly(ethylene) glycol. + +.. code:: python + + import pysmiles + import networkx as nx + import cgsmiles + from cgsmiles.drawing import draw_molecule + + # PEO Polymer + cgsmiles_str = "{[#PEO]|10}.{#PEO=[$]COC[$]}" + + # Resolve molecule into networkx graphs + res_graph, mol_graph = cgsmiles.MoleculeResolver.from_string(cgsmiles_str).resolve() + + # Remove hydrogen atoms + # Each carbon atom gets a node keyword `hcount` + pysmiles.remove_explicit_hydrogens(mol_graph) + + # Now we generate the node labels + labels = {} + for node in mol_graph.nodes: + hcount = mol_graph.nodes[node].get('hcount', 0) + label = mol_graph.nodes[node].get('element', '*') + if hcount > 0: + label = label + f"H{hcount}" + labels[node] = label + + # Draw molecule at different resolutions + ax, pos = draw_molecule(mol_graph, labels=labels, scale=1) + ax.set_frame_on('True') + +Likley you will see that not the entire molecule fits in the bounding box as +indicated by the frame. The reason is that the drawing function does not +automatically scale the image. You have two choices now. You can use the scale +keywod to shrink the molecule image until it fits (e.g. ``scale=0.5``) or you +can provide a larger canvas. + +.. code:: python + + import matplotlib.pyplot as plt + fig, ax = plt.subplots(1, 1, figsize=(20, 6)) + + ... + + ax, pos = draw_molecule(mol_graph, labels=labels, scale=1, ax=ax) + +The advantage of not automatically fitting the drawing into the bounding box is +that if you draw multiple molecules they will all have exactly the same size in +terms of labels, bonds, and atoms. Thus you only have to find a visually pleasing +canvas size once and can draw a large collection of molecules. + +One added bonus feature of the drawing utility is that it will draw cis/trans +isomers correctly accoding to the cgsmiles string the user has provided. You +can see a simple exmaple below. + +.. code:: python + + import cgsmiles + from cgsmiles.drawing import draw_molecule + + # let's have two panels for each molecule + fig, axes = plt.subplots(1,2, figsize=(6, 6)) + + # trans butene + cgsmiles_str_tans = "{[#A][#B]}.{#A=c\c[$],#B=[$]c\c}" + + # cis butene + cgsmiles_str_cis = "{[#A][#B]}.{#A=c\c[$],#B=[$]c/c}" + + # Resolve molecule into networkx graphs + for ax, cgstr in zip(axes, [cgsmiles_str_tans, cgsmiles_str_cis]): + res_graph, mol_graph = cgsmiles.MoleculeResolver.from_string(cgstr).resolve() + pysmiles.remove_explicit_hydrogens(mol_graph) + + # Now we generate the node labels + labels = {} + for node in mol_graph.nodes: + hcount = mol_graph.nodes[node].get('hcount', 0) + label = mol_graph.nodes[node].get('element', '*') + if hcount > 0: + label = label + f"H{hcount}" + labels[node] = label + + # Draw molecule at different resolutions + ax, pos = draw_molecule(mol_graph, ax=ax, labels=labels) diff --git a/docs/source/conf.py b/docs/source/conf.py new file mode 100644 index 0000000..fd1203f --- /dev/null +++ b/docs/source/conf.py @@ -0,0 +1,250 @@ +# -*- coding: utf-8 -*- +# +# Configuration file for the Sphinx documentation builder. +# +# This file does only contain a selection of the most common options. For a +# full list see the documentation: +# http://www.sphinx-doc.org/en/stable/config + +# -- Path setup -------------------------------------------------------------- + +# If extensions (or modules to document with autodoc) are in another directory, +# add these directories to sys.path here. If the directory is relative to the +# documentation root, use os.path.abspath to make it absolute, like shown here. +# +from pkg_resources import get_distribution +import os +# import sys +# sys.path.insert(0, os.path.abspath('.')) +# Do not generate APIdocs for members missing docstrings (undoc-members) +os.environ['APIDOC_OPTIONS'] = 'members,show-inheritence,inherited-members' + +# Set APIDOC options +#os.environ['SPHINX_APIDOC_OPTIONS'] = 'members,undoc-members,show-inheritance,special-members' +os.environ['SPHINX_APIDOC_OPTIONS'] = 'members' + +# -- Project information ----------------------------------------------------- + +project = 'CGSmiles' +copyright = '2024, Dr. F Gruenewald' +author = 'F. Gruneewald and P. C. Kroon' + +# The full version, including alpha/beta/rc tags +release = get_distribution('cgsmiles').version +# The short X.Y version +# version = '.'.join(release.split('.')[:2]) +version = release + +# -- General configuration --------------------------------------------------- + +# If your documentation needs a minimal Sphinx version, state it here. +# +# needs_sphinx = '1.0' + +# Add any Sphinx extension module names here, as strings. They can be +# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom +# ones. +extensions = [ + 'sphinx.ext.autodoc', + 'sphinx.ext.intersphinx', + 'sphinx.ext.mathjax', + 'sphinx.ext.viewcode', + 'sphinx.ext.napoleon', + 'sphinx.ext.autosectionlabel', + 'sphinxcontrib.apidoc', +] + +# Add any paths that contain templates here, relative to this directory. +templates_path = ['_templates'] + +# The suffix(es) of source filenames. +# You can specify multiple suffix as a list of string: +# +# source_suffix = ['.rst', '.md'] +source_suffix = '.rst' + +# The master toctree document. +master_doc = 'index' + +# The language for content autogenerated by Sphinx. Refer to documentation +# for a list of supported languages. +# +# This is also used if you do content translation via gettext catalogs. +# Usually you set "language" from the command line for these cases. +language = 'en' + +# List of patterns, relative to source directory, that match files and +# directories to ignore when looking for source files. +# This pattern also affects html_static_path and html_extra_path . +exclude_patterns = [] + +# The name of the Pygments (syntax highlighting) style to use. +pygments_style = 'sphinx' + + +nitpick_ignore = [ + ('py:class', 'networkx.algorithms.isomorphism.isomorphvf2.GraphMatcher'), + ('py:class', 'optional'), + ] + +# -- Options for HTML output ------------------------------------------------- + +# The theme to use for HTML and HTML Help pages. See the documentation for +# a list of builtin themes. +# +html_theme = "furo" +html_theme_options = { + "sidebar_hide_name": True, +} + +# The name of an image file (relative to this directory) to place at the top +# of the sidebar. +html_logo = "images/CGsmiles_logo_large.png" + +# The name of an image file (within the static path) to use as favicon of the +# docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32 +# pixels large. +#html_favicon = "images/cg_smiles.png" + +# Theme options are theme-specific and customize the look and feel of a theme +# further. For a list of options available for each theme, see the +# documentation. +# +# html_theme_options = {} + +# Add any paths that contain custom static files (such as style sheets) here, +# relative to this directory. They are copied after the builtin static files, +# so a file named "default.css" will overwrite the builtin "default.css". +#html_static_path = ['_static'] + +# Custom sidebar templates, must be a dictionary that maps document names +# to template names. +# +# The default sidebars (for documents that don't match any pattern) are +# defined by theme itself. Builtin themes are using these templates by +# default: ``['localtoc.html', 'relations.html', 'sourcelink.html', +# 'searchbox.html']``. +# +# html_sidebars = {} + +html_show_sourcelink = True + +# -- Options for HTMLHelp output --------------------------------------------- + +# Output file base name for HTML help builder. +htmlhelp_basename = 'CGSmilesdoc' + + +# -- Options for LaTeX output ------------------------------------------------ + +latex_elements = { + # The paper size ('letterpaper' or 'a4paper'). + # + # 'papersize': 'letterpaper', + + # The font size ('10pt', '11pt' or '12pt'). + # + # 'pointsize': '10pt', + + # Additional stuff for the LaTeX preamble. + # + # 'preamble': '', + + # Latex figure (float) alignment + # + # 'figure_align': 'htbp', +} + +# Grouping the document tree into LaTeX files. List of tuples +# (source start file, target name, title, +# author, documentclass [howto, manual, or own class]). +latex_documents = [ + (master_doc, 'CGSmiles.tex', 'CGSmiles Documentation', + author, 'manual'), +] + + +# -- Options for manual page output ------------------------------------------ + +# One entry per manual page. List of tuples +# (source start file, name, description, authors, manual section). +man_pages = [ + (master_doc, 'cgsmiles', 'CGSmiles Documentation', + [author], 1) +] + + +# -- Options for Texinfo output ---------------------------------------------- + +# Grouping the document tree into Texinfo files. List of tuples +# (source start file, target name, title, author, +# dir menu entry, description, category) +texinfo_documents = [ + (master_doc, 'CGSmiles', 'CGSmiles Documentation', + author, 'CGSmiles', 'One line description of project.', + 'Miscellaneous'), +] + + +# -- Extension configuration ------------------------------------------------- +apidoc_module_dir = '../../cgsmiles' +apidoc_output_dir = 'api' +apidoc_separate_modules = True +apidoc_excluded_paths = ['tests', 'redistributed'] + +autodoc_inherit_docstrings = False +autoclass_content = 'both' +autodoc_default_options = {'members': None, + 'undoc-members': None, + 'show-inheritance': None} + +napoleon_google_docstring = False +napoleon_numpy_docstring = True +napoleon_preprocess_types = False +napoleon_type_aliases = { + 'Molecule': 'vermouth.molecule.Molecule', + } + +# -- Options for intersphinx extension --------------------------------------- +# Example configuration for intersphinx: refer to the Python standard library. +intersphinx_mapping = { + 'python': ('https://docs.python.org', None), + 'networkx': ('https://networkx.github.io/documentation/latest', None), + 'numpy': ('http://docs.scipy.org/doc/numpy', None), + 'scipy': ('http://docs.scipy.org/doc/scipy/reference', None), +} + + +# Borrowed from https://github.com/sphinx-doc/sphinx/issues/5603 +# On top of that, networkx.isomorphism.GraphMatcher is not documented, so link +# to the VF2 isomorphism module instead. +# See https://github.com/networkx/networkx/issues/3239 +intersphinx_aliases = { + #('py:class', 'networkx.classes.graph.Graph'): ('py:class', 'networkx.Graph'), + #('py:class', 'networkx.algorithms.isomorphism.vf2userfunc.GraphMatcher'): ('py:class', 'networkx.isomorphism.GraphMatcher'), + #('py:class', 'networkx.algorithms.isomorphism.vf2userfunc.GraphMatcher'): ('py:module','networkx.algorithms.isomorphism.isomorphvf2'), + ('py:class', 'networkx.isomorphism.GraphMatcher'): ('py:module', 'networkx.algorithms.isomorphism.isomorphvf2') +} + +autosectionlabel_prefix_document = True + +def add_intersphinx_aliases_to_inv(app): + from sphinx.ext.intersphinx import InventoryAdapter + inventories = InventoryAdapter(app.builder.env) + + for alias, target in app.config.intersphinx_aliases.items(): + alias_domain, alias_name = alias + target_domain, target_name = target + try: + found = inventories.main_inventory[target_domain][target_name] + try: + inventories.main_inventory[alias_domain][alias_name] = found + except KeyError: + continue + except KeyError: + continue + + +def setup(app): + app.add_config_value('intersphinx_aliases', {}, 'env') + app.connect('builder-inited', add_intersphinx_aliases_to_inv) diff --git a/docs/source/gettingstarted/api_examples.rst b/docs/source/gettingstarted/api_examples.rst new file mode 100644 index 0000000..f3320c0 --- /dev/null +++ b/docs/source/gettingstarted/api_examples.rst @@ -0,0 +1,76 @@ +API Examples +============ + +The following tutorials illustrate how to use read, +draw, and manipulate CGSmiles using the package API. +For more detailed information on the syntax please +consult the examples and Syntax documentation. + +Read and draw CGSmile of Polystyrene +------------------------------------ + +If one just seeks to describe a graph at abitrary level of +complexity CGSmiles notation can be used. + +.. code:: python + + import matplotlib.pyplot as plt + import networkx as nx + import cgsmiles + + # Express 5 units of Polystyrene in CGSmiles + cgsmiles_str = "{[#PS]|5}.{#PS=[$]CC[$](c1ccccc1)}" + + # Resolve molecule into networkx graphs + res_graph, mol_graph = cgsmiles.MoleculeResolver(cgsmiles_str).resolve() + + # Draw molecule at different resolutions + for g in [res_graph, mol_graph]: + nx.draw_networkx(g) + plt.show() + + # Get fragment corresponding to first residue + fragment_1 = res_graph.nodes[0]['graph'] + +Map all-atom structure to CG resolution +--------------------------------------- + +Here we use Vermouth to read an all-atom structure of Benzene and map +it to coarse-grained Martini 3 resolution. Note this example requires +BENZ.pdb from this repository. + +.. code:: python + + import networkx as nx + import vermouth + import cgsmiles + + # Read pdb of Benzen + mol = vermouth.pdb.read_pdb("BENZ.pdb") + + # Express the mapping as CGSmiles string + cgsmiles_str = "{[#R]1[#R][#R]1}.{#R=[$]cc[$]}" + + # Resolve molecule into networkx graphs + res_graph, mol_graph = cgsmiles.MoleculeResolver(cgsmiles_str).resolve() + + # Find how the coordinates correspond to the molecule graph + mapping = nx.isomorphism.GraphMatcher(mol, mol_graph).match() + + # Compute the mapped positions + for node in res_graph.nodes: + pos = np.zeros((3)) + # each bead in the CG molecule contains an attribute graph + # that has the atoms of which this bead is created from + # so we don't have to do any expensive lookups + fragment = res_graph.nodes[node]['graph'] + for all_atom_node in fragement: + pos += mol.nodes[mapping[all_atom_node]]['position'] + final_pos = pos / len(fragement) + res_graph.nodes[node][final_pos] + +Searching the Martini databse of small molecules +------------------------------------------------ + +Here goes some example on how to lookup molecules from the Martini +Database using CGSmiles diff --git a/docs/source/gettingstarted/installation.rst b/docs/source/gettingstarted/installation.rst new file mode 100644 index 0000000..6807c17 --- /dev/null +++ b/docs/source/gettingstarted/installation.rst @@ -0,0 +1,14 @@ +Installation +============ + +The easiest ways to install **cgsmiles** is using pip: + +.. code:: bash + + pip install cgsmiles + +Alternatively install the master branch directly from GitHub: + +.. code:: bash + + pip install git+https://github.com/gruenewald-lab/CGsmiles.git diff --git a/docs/source/gettingstarted/syntax_examples.rst b/docs/source/gettingstarted/syntax_examples.rst new file mode 100644 index 0000000..f07052c --- /dev/null +++ b/docs/source/gettingstarted/syntax_examples.rst @@ -0,0 +1,91 @@ +Syntax Examples +=============== + +This page collects examples of CGSmiles string of increasing +complexity. They are seperated into the following categories: + +- CGSmiles without fragments +- CGSmiles with all-atom fragments +- CGSmiles with coarse-grained fragments + +CGSmiles without fragments +-------------------------- + +If one just seeks to describe a graph at abitrary level of +complexity CGSmiles notation can be used. Each of the smiles +listed below can be read and converted using the `read_cgsmiles` +function of the package: + +- simple linear graph with three nodes + + .. code:: + + "{[#nodeA][#nodeB][#nodeC]}" + +- simple linear graph of 10 nodes of B and three other nodes + neighborung it + + .. code:: + + "{[#nodeA][#nodeB]|10[#nodeC]}" + +- simple ring of six nodes + + .. code:: + + "{[#nodeA]1[#nodeB][#nodeC][#nodeD][#nodeE][#nodeD]1} + +- rhombic graph with four nodes + + .. code:: + + "{[#nodeA]1[#nodeB]2[#nodeC]1[#nodeD]2} + +- linear sequence with branch + + .. code:: + + "{[#nodeA]([#nodeAB][#nodeAB])[#nodeC][#nodeD]} + +- linear sequence with regular branching pattern; this is + equivalent to a graft polymer. Note that this results + into 5 nodes A connected to each other each with an AB + branch of two units. + + .. code:: + + "{[#nodeA]([#nodeAB][#nodeAB])|5}" + + +CGSmiles with all-atom fragments +-------------------------------- + +- simple linear graph describing PEO with two OH end-groups + + .. code:: + + "{[#OH][#PEO][#OH]}.{#OH=[$]O,#PEO=[$]COC[$]}" + +- same as above but now with 10 residues. + + .. code:: + + "{[#OH][#PEO]|10[#OH]}.{#OH=[$]O,#PEO=[$]COC[$]}" + +- simple ring describing crwon ether + + .. code:: + + "{[#PEO]1[#PEO]|4[#PEO]1}.{#PEO=[$]COC[$]}" + +- Martini 3 p-cresol with all-atom fragments + + .. code:: + + "{[#A]1[#B]2[#C]1[#D]2}.{#A=cc...}" + +- mPEG acrylate with 5 residues + + .. code:: + + "{[#PMA]([#PEO]|3)|5}.{#PMA=[>]CC[<](C(=O)OC[$]),#PEO=[$]COC[$]}" diff --git a/docs/source/images/CGsmiles_logo.png b/docs/source/images/CGsmiles_logo.png new file mode 100644 index 0000000..872eaff Binary files /dev/null and b/docs/source/images/CGsmiles_logo.png differ diff --git a/docs/source/images/CGsmiles_logo_large.png b/docs/source/images/CGsmiles_logo_large.png new file mode 100644 index 0000000..98812c6 Binary files /dev/null and b/docs/source/images/CGsmiles_logo_large.png differ diff --git a/docs/source/index.rst b/docs/source/index.rst new file mode 100644 index 0000000..9ab6a1a --- /dev/null +++ b/docs/source/index.rst @@ -0,0 +1,43 @@ +.. include:: ../../README.rst + +Table of Contents +================= + +.. toctree:: + :maxdepth: 2 + :caption: Getting Started + + gettingstarted/installation + gettingstarted/syntax_examples + gettingstarted/api_examples + +.. toctree:: + :maxdepth: 2 + :caption: Syntax Rules + + syntax/introduction + syntax/basic_graph_description + syntax/fragments + syntax/multiple_resolutions + syntax/chirality + +.. toctree:: + :maxdepth: 2 + :caption: API + + api/overview.rst + api/cgsmiles.resolve + api/cgsmiles.drawing + api/cgsmiles.sample + api/cgsmiles.write_cgsmiles + api/cgsmiles.read_cgsmiles + api/cgsmiles.read_fragments + api/cgsmiles.graph_utils + api/cgsmiles.pysmiles_utils + +Indices and tables +================== + +* :ref:`genindex` +* :ref:`modindex` +* :ref:`search` diff --git a/docs/source/md/martini.rst b/docs/source/md/martini.rst new file mode 100644 index 0000000..615c1a4 --- /dev/null +++ b/docs/source/md/martini.rst @@ -0,0 +1,2 @@ +Martini +======= diff --git a/docs/source/md/polyply.rst b/docs/source/md/polyply.rst new file mode 100644 index 0000000..e89e896 --- /dev/null +++ b/docs/source/md/polyply.rst @@ -0,0 +1,2 @@ +polyply +======= diff --git a/docs/source/ml/fingerprints.rst b/docs/source/ml/fingerprints.rst new file mode 100644 index 0000000..9633bdf --- /dev/null +++ b/docs/source/ml/fingerprints.rst @@ -0,0 +1,2 @@ +Molecular Fingerprints +====================== diff --git a/docs/source/ml/gnn.rst b/docs/source/ml/gnn.rst new file mode 100644 index 0000000..ca8fb8f --- /dev/null +++ b/docs/source/ml/gnn.rst @@ -0,0 +1,2 @@ +Graph Neural Netwroks +===================== diff --git a/docs/source/syntax/basic_graph_description.rst b/docs/source/syntax/basic_graph_description.rst new file mode 100644 index 0000000..09bce26 --- /dev/null +++ b/docs/source/syntax/basic_graph_description.rst @@ -0,0 +1,225 @@ +General Graph Syntax +==================== + +Overview +-------- +The first resolution of the CGSmiles notation captures the coarsest representation +of a molecule. The syntax is adapted from the SMILES notation and can be used to +represent arbitrary graphs. These graphs do not need to be molecules but the +syntax is geared towards molecules. The basic syntax features are sufficient to +write a CGSmiles string for any (connected) graph. The advanced syntax features +can be used to reduce the verbosity through use of a multiplication operator, +allow annotation of bond orders, which are important for atomic resolutions and +resolving multiple resolutions, as well as a general annotation syntax that +permits writing of node labels. + +Basic Syntax Features +----------------------- +The basic structure of CGSmiles involves describing each node within a graph +using a specific notation that identifies connections and relationships between +nodes. Here’s how the nodes and their connections are represented: + +Nodes +^^^^^ +Each node is described as ``#`` followed by an alphanumeric +identifier, enclosed in square brackets. For example, a node named A +is represented as ``[#A]``. + +Edges +^^^^^ +Edges are connections between nodes. At the atomic resolution they are covalent +or coordination bonds. At any other resolution they simply describe the +connectivity between nodes. + +Nodes that follow each other in the string are assumed to be connected by an +edge. For example, to denote that nodes A and B are connected, you would +write ``[#A][#B]``. + +.. code-block:: none + + Example: [#A][#B] denotes nodes A and B connected directly. + +Branches +^^^^^^^^ +Branches allow the representation of complex branching structures within +molecules. Branches are indicated by enclosing them in parentheses. For +instance, to connect node D to node B in a sequence from A to C, the notation +would be ``[#A][#B]([#D])[#C]``. + +.. code-block:: none + + Example: [#A][#B]([#D])[#C] shows a branch with D connected to B. + +Rings and Non-linear Edges +^^^^^^^^^^^^^^^^^^^^^^^^^^ +This feature allows the description of rings and other complex topologies. Rings +are indicated by integers following the node identifiers. +An edge will connect nodes marked with the same integer. For example, a triangle +connecting nodes A, B, and C would be written as ``[#A]1[#B][#C]1``. + +.. code-block:: none + + Example: [#A]1[#B][#C]1 forms a triangular ring structure. + +String Encapsulation +^^^^^^^^^^^^^^^^^^^^ +For clarity and to define boundaries, CGSmiles strings are enclosed in curly braces. + +.. code-block:: none + + Example: {[#A][#B]([#D])[#C]} + +Advanced Syntax Features +------------------------ + +Bond orders +^^^^^^^^^^^ +One can specify a bond order for edges between nodes. At the atomic resolution these +bond orders describe the order of covalent bonds as in SMILES. There are fife bond +order symbols that specify the bond order 0-4 ('.', '-', '=', '#', '$'). The bond +order symbol must be placed between two nodes if the bond is implicit: + +.. code-block:: none + + Zero bond order + Example: {[#A].[#B]} + +It must be placed between a node and before the ring marker, if it refers to the +ring bond: + +.. code-block:: none + + Zero bond order but only between A and C + Example: {[#A].1[#B][#C]1} + +For branches the bond order symbol must be placed between the node and the branch +brace if it refers to the first atom in the branch and otherwise after the branch +braces. + +.. code-block:: none + + Zero bond order between A and B + Example: {[#A].([#B][#C])[#D]} + +.. code-block:: none + + Zero bond order between A and D + Example: {[#A].([#B][#C]).[#D]} + +The meaning of bond orders at the atomic resolution is well defined. At coarse +resolutions bond orders may be used to describe virtual edges (i.e. bond order 0). +Virtual edges have no corresponding connectivity of the nodes at the atomic +resolution. Additionally, bond orders of 1-4 are used to indicate that rings at +the finer resolution are mapped to linear graphs at the coarse level. See section +`Layering Resolutions.Linearized rings`. + +Annotations +^^^^^^^^^^^ +Some important information are are not encoded by the graph representation +of a molecule. Such information are for examples charges or chirality. +CGSmiles supports a general annotation syntax, which allows users to store +this kind of information in the form of ``symbol=value`` pairs. Any node +name may be followed by one or more of these ``symbol=value`` pairs separated by +a semi-colon. For example, to specify that node a has a charge of 1 but node +B does not one can write: + +.. code-block:: none + + Example: {[#A;q=1][#B;q=0]} + +We could also specify the mass in addition to the charge. + +.. code-block:: none + + Example: {[#A;q=1;mass=72][#B;q=0;mass=36]} + +The `symbol` is a string of arbitrary length though one letter strings are most +convenient for brevity sake. + +Users can specify some predefined symbols, which work like arguments to a +Python function. That means they have a default value and the symbol keyword may +be omitted if the previous positions are filled. For example, charge ``q`` and +weight ``w`` are part of the predefined symbols for any coarse resolution. One +can define a weight by either providing the keyword as in ``[#A;w=0.5]`` or +omitting the keyword but then defining the default charge value as in +``[#A;0;0.5]``. In case of the charge as it is the first keyword the following +strings are identical ``[#A;0]`` and ``[#A;q=0]``. + +Additionally, these symbols are converted to longer keywords upon reading. For +example, the symbol `q` gets assigned the keyword `charge`. A set of such +symbols is named a `dialect` and can be specified using the functionality in +the dialect module. Note that currently dialects are not easily accessible +for modification. + +CGSmiles comes with two sets of predefined dialects. One is used for the coarse +resolution fragments / graphs and the other for those which are of atomic +resolution. The table below lists the specifications of those keywords. Note that +it is always permissible to use the keyword explicitly. + +Reserved Annotation Symbols + ++----------+------------+-----------+--------+-----------------------+---------+ +| Symbol | Resolution | Keyword | Type | Example | Default | ++==========+============+===========+========+=======================+=========+ +| q | coarse | charge | float | {[#A;q=1]} or | 0.0 | +| | | | | {[#A;1]} | | ++----------+------------+-----------+--------+-----------------------+---------+ +| w | coarse | weight | float | {[#A;w=0.5]} or | 1.0 | +| | | | | {[#A;0;0.5]} | | ++----------+------------+-----------+--------+-----------------------+---------+ +| w | atomic | weight | float | same as above | 1.0 | ++----------+------------+-----------+--------+-----------------------+---------+ +| x | atomic | chirality | S or R | {#frag=Br[C;x=S]ClI} | None | ++----------+------------+-----------+--------+-----------------------+---------+ + +Multiplication Operator +^^^^^^^^^^^^^^^^^^^^^^^ +To efficiently represent repeated units in large molecules, such as polymers, +CGSmiles syntax includes a multiplication operator ``|``. This operator can be +applied after a node or a branch to repeat it a specified number of times. + +- **Node Multiplication:** The multiplication operator is placed after a node + and followed by an integer indicating the number of repetitions. For example, + ``[#A]|5`` represents five consecutive nodes of type A, which is equivalent to + writing ``[#A][#A][#A][#A][#A]``. + +.. code-block:: none + + Example: [#A]|5 simplifies the representation of five A nodes. + +- **Branch Multiplication:** When the multiplication operator is placed after a + branch, the entire branch including the anchoring node is repeated as specified. + This feature is particularly useful for describing structures like graft + polymers. For instance, ``[#A]([#B][#B])|5`` represents a chain of five units + where each unit starts with node A followed by two B nodes. + +.. code-block:: none + + Example: [#A]([#B][#B])|5 describes repeated branches of [#B][#B] anchored to [#A]. + +Syntax Features Lookup Table +---------------------------- +Below is the updated quick reference table for the essential features of +CGSmiles syntax: + ++----------------+----------------------------------------------+------------------------------------------------+ +| Feature | Description | Example | ++================+==============================================+================================================+ +| Nodes | Represent nodes in the graph. | [#A] | ++----------------+----------------------------------------------+------------------------------------------------+ +| Edges | Direct connections between nodes. | [#A][#B] | ++----------------+----------------------------------------------+------------------------------------------------+ +| Branches | Indicate branching off the main chain. | [#A][#B]([#D])[#C] | ++----------------+----------------------------------------------+------------------------------------------------+ +| Rings | Describe rings and non-linear connections. | [#A]1[#B][#C]1 | ++----------------+----------------------------------------------+------------------------------------------------+ +| Encapsulation | Enclose cgsmiles strings for clarity. | {[#a][#b]([#d])[#c]} | ++----------------+----------------------------------------------+------------------------------------------------+ +| Bond Orders | Specify the order (0-4) between bonds. | {[#a]=[#b]} ; double bond | +| | 0 = `.`; 1 = `-`, 2 = `=`, 3 = `#`, 4 = `$` | {[#a].[#b]} ; zero order bond | ++----------------+----------------------------------------------+------------------------------------------------+ +| Annotations | Store node labels as key value pairs. | {[#A;q=1][#B;q=0]} ; charges q | ++----------------+----------------------------------------------+------------------------------------------------+ +| Multiplication | Repeat a node or branch a specified number | [#A]|5, [#A]([#B][#B])|5 | +| | of times. | | ++----------------+----------------------------------------------+------------------------------------------------+ diff --git a/docs/source/syntax/chirality.rst b/docs/source/syntax/chirality.rst new file mode 100644 index 0000000..80f4c80 --- /dev/null +++ b/docs/source/syntax/chirality.rst @@ -0,0 +1,58 @@ +Chirality, Isomerism & Aromaticity +================================== +When transitioning between CG and atomistic representations, certain atomistic +features have no direct counterparts in CG models and require special treatment. + +Implicit Hydrogen +^^^^^^^^^^^^^^^^^ +The simplest case is the treatment of implicit hydrogen atoms. SMILES allows for +shorthand notation where hydrogen atoms can be omitted and CGSmiles adopts this +approach. Hydrogen atoms are automatically assigned once the full atomistic +molecule is resolved. This procedure ensures proper handling of any unconsumed +bonding operators, which are interpreted as additional hydrogen atoms where +applicable. However, hydrogen atoms requiring specific annotations, such as a +weight (e.g. ``[H;w=0.5]``) must be explicitly included. + +cis/trans Isomerism +^^^^^^^^^^^^^^^^^^^ +cis and trans isomers are distinguished using a ‘/’ or ‘\’ between atoms to indicate +their relative orientation around a double bond, following the OpenSMILES +definition. A pair of these symbols defines the isomerism of the two atoms as +outlined in Table S2 of the main paper describing the syntax. We note that this +notation is permutation invariant, i.e. when double bond substituents are split +across fragments, the relative position needs to be assigned only once as if +constructing the complete SMILES string. + +Chirality +^^^^^^^^^ +CGSmiles adopts an explicit method of chirality assignment using annotations. A +chiral atom can be annotated using the ``x`` keyword as shorthand for chirality. +For example, S-Alanine is represented as ``C[C;x=S]C(=O)ON``, while R-Alanine is +written as ``C[C;x=R]C(=O)ON``. The ``x`` may be omitted if a weight is defined +beforehand, such as in ``C[C;1;S]C(=O)ON``, which is also valid. Consult the +general annotation syntax for more information. + +Aromaticity +^^^^^^^^^^^ +In SMILES, aromaticity is encoded using lowercase letters as a shorthand for +aromatic atoms or a colon as a marker for aromatic bonds. CGSmiles utilizes the +same convention. In addition, aromatic systems may also be split across multiple +fragments by simply keeping the shorthand. For example, Martini Benzene is +represented as: + +.. code-block:: none + + {[#TC5]1[#TC5][#TC5]1}.{#TC5=[$]cc[$]} + +Although the shorthand for aromaticity is well-defined, its interpretation in +SMILES remains somewhat ambiguous. To ensure unambiguous valance assignment, +necessary for tasks like adding hydrogen atoms, CGSmiles employs the following +definition: only atoms capable of participating in delocalization-induced +molecular equivalence (i.e., systems where multiple resonance structures can be +drawn without introducing charges) are considered aromatic. By this definition +Benzene is aromatic but thiophene is not. CGsmiles uses the same definition as +Pysmiles package, which provides a more detailed discussion of this topic. To +enhance user-friendliness, the CGSmiles API automatically corrects strings with +incorrectly assigned aromaticity at the time of reading. If corrections cannot +be made unambiguously, an error is raised, ensuring robust and accurate handling +of aromaticity. diff --git a/docs/source/syntax/fragments.rst b/docs/source/syntax/fragments.rst new file mode 100644 index 0000000..48d7692 --- /dev/null +++ b/docs/source/syntax/fragments.rst @@ -0,0 +1,95 @@ +General Fragment Syntax +======================= + +Overview +-------- +CGSmiles supports the representation of molecular structures at different +resolutions through a fragment replacement syntax. This allows users to specify +more detailed molecular structures connected to a coarse graph representation. + +Fragment Graphs +--------------- +The notation for a fragment graph starts with a ‘#’ followed by the label of +the coarser-resolution node and an ‘=’ sign. Each fragment name must be unique +to ensure unambiguous identification. Fragments graphs use the same general +graph syntax as outlined before, however, it is permitted to use OpenSMILES +syntax to define an atomic resolution fragment. + +.. code-block:: none + + Example: Bezene as single fragment + "{#BENZ}.{#BENZ=c1ccccc1}" + +Bond Operators +^^^^^^^^^^^^^^ +To define how two consecutive fragments at a finer resolution are connected, +CGSmiles builds upon the bonding connector syntax established in BigSMILES to +avoid ambiguity. Any node or atom that connects to a neighboring fragment is +followed by one of four bonding connectors (‘$’, ‘>’, ‘<’, ‘!’) enclosed in +square brackets. In addition, any operator may be combined with an alphanumeric +label to distinguish non-equivalent operators of the same type. + +- **Undirected Bonding Operator $.** + The undirected bonding operator ‘$’ connects to any other ‘$’ operator in + connected fragments, as specified in the coarser resolution graph. An + undirected bonding operator may be followed by an alphanumeric label, + ensuring that only operators with matching labels are connected. + + .. code-block:: none + + Example: PEO can connect to any other PEO on the first or second carbon + {[#PEO=[$]COC[$]} + +- **Directed Bonding Operators > and <** + A directed bonding operator can only pair with its complementary counterpart to + ensure the correct connectivity in asymmetric fragments. These bonding operators + can also be annotated with an alphanumeric label for further specificity. + + .. code-block:: none + + Example: Polystyrene, where the CH2 group always connects to the CH group + {[#PS=[>]CC[<]c1ccccc1]} + +- **Shared Bonding Operator !** + To address a common scenario in CG force fields where an atom is distributed + between two finer resolution nodes, CGSmiles introduces the shared bonding + operator ‘!’. In the case of toluene represented at the Martini 3 level, some + of the ring atoms are shared between the two CG beads. When two fragments are + connected using the shared bonding operator, the atoms at the connection point + are merged into a single atom, retaining the bonds from both fragments. + + .. code-block:: none + + Example: Martini3 Toluene where some of the carbon atoms in Toluene are + shared between beads + {[#SC4]1[#TC5][#TC5]1}.{#SC4=Cc(c[!])c[!],#TC5=[!]ccc[!]} + +Valency +------- +CGSimles does not enforce valency rules for atoms or nodes, allowing any atom +to be followed by multiple bonding operators. In the case of all-atom fragments, +the hydrogen count is determined only after the molecule's full connection is +established. In cases where a bond of higher order needs to be represented, a +bond order symbol should be placed between the node and bonding operator in +both fragments. For example, splitting 2-pentene into two fragments results in +{[#A][#B]}.{#A=CC=[$],#B=[$]=CCC}, where the bond order symbol ‘=’ indicates a +double bond between ethane and propane fragment. + + +Updated Bonding Descriptors Lookup Table +---------------------------------------- +This table now includes the squash descriptor, summarizing all the bonding descriptors used in CGSmiles: + ++----------------+---------------------------+--------------------------------------------------------------------+ +| Descriptor | Symbol | Description | ++================+===========================+====================================================================+ +| Indiscriminate | `[$]` | Connects to any matching `$` descriptor. | ++----------------+---------------------------+--------------------------------------------------------------------+ +| Forward bond | `[>]` | Must connect with a bonding descriptor of type `<`. | ++----------------+---------------------------+--------------------------------------------------------------------+ +| Backward bond | `[<]` | Designed to connect with a descriptor of type `>`. | ++----------------+---------------------------+--------------------------------------------------------------------+ +| Alphanumeric | `[descriptor]alphanumeric`| Adds specificity to descriptors, requiring exact matches. | ++----------------+---------------------------+--------------------------------------------------------------------+ +| Squash | `[!]` | Indicates overlapping mappings; connected atoms are identical. | ++----------------+---------------------------+--------------------------------------------------------------------+ diff --git a/docs/source/syntax/introduction.rst b/docs/source/syntax/introduction.rst new file mode 100644 index 0000000..e17881c --- /dev/null +++ b/docs/source/syntax/introduction.rst @@ -0,0 +1,30 @@ +Introduction +============ + +The CGSmiles line notation encodes arbitrary resolutions of molecules and +defines the conversion between these resolutions unambiguously. Each +resolution is explicitly defined and multiple resolutions may be layered +together using this notation. + +At any resolution, a molecule can be expressed as a graph. In this graph, +the nodes correspond to (groups of) atoms, such as residues in a protein or +polymer, which represent a coarser resolution compared to the next (all-atom) +representation. Edges in the graph describe chemical connections between +these (groups of) atoms. + +With this premise, the first resolution of the CGSmiles notation describes +the molecule graph at the coarsest level. Subsequent resolutions define +fragments that specify how each node is represented at the next finer +resolution (e.g. residue to coarse-grained beads, or coarse-grained beads +to atoms). Each resolution is enclosed in curly braces '{}' as shown +below: + +.. code-block:: none + + {coarsest-graph}.{fragments-resolution-1}.{fragments-resolution-2} + +In the remainder of this section we first explain the syntax to describe +a general graph, which can represent a molecule at any resolution in +CGSmiles. Subsequently, the description is extended to define fragments. +Finally, it is show how to deal with special issues that can arise when +converting a coarse resolution graph to atomic representation. diff --git a/docs/source/syntax/multiple_resolutions.rst b/docs/source/syntax/multiple_resolutions.rst new file mode 100644 index 0000000..c879c5e --- /dev/null +++ b/docs/source/syntax/multiple_resolutions.rst @@ -0,0 +1,91 @@ +Layering of Resolutions +======================= + +CGSmiles enables the representation of molecular graphs at arbitrary resolutions +and their connection to progressively finer resolutions, allowing for the +hierarchical layering of multiple levels of details. + +Basic Syntax Features +--------------------- + +Base Graph & Resolutions +^^^^^^^^^^^^^^^^^^^^^^^^ +The notation starts with the coarsest representation of the system – the base +graph. This graph is enclosed in curly braces. Each additional resolution is +represented as a list of fragment graphs, also enclosed in curly braces and +separated from the preceding resolution graph by a period. If the final resolution +graph is at the atomic level, either CGSmiles or OpenSMILES syntax can be used +to describe the fragment graph. This dual approach allows seamless conversion to +atomistic resolution using established standards, while also supporting +intermediate coarse-grained representations. + +.. code-block:: none + + {coarsest-graph}.{fragments-resolution-1}.{fragments-resolution-2} + +Advanced Syntax Features +------------------------ + +Linearizing Rings +^^^^^^^^^^^^^^^^^ +Rings at the atomistic resolution can often be mapped into linear structures +at the CG level, a common practice in chemically specific force fields such +as Martini. In the CGSmiles notation, bond orders at the coarser resolution are +utilized to describe such a case. + +For example, cyclohexane is represented at the Martini 3 level with a bond +order of 2. This indicates that at the next finer resolution level, two bonds +must connect the atoms corresponding to the two CG nodes. + +.. code-block:: none + + Martini3 Cyclohexane + {[#SC3]=[#SC3]}.{#SC3=[$]CCC[$]} + +This approach also extends to more complex cases, such as splitting fused rings +with three or more shared bonds at the CG level. Each additional ring increases +the bond order. + +.. code-block:: none + + Napthalene split as two particles + {[#A]#[#A]}.{#A=[$]CCC[$]CC[$]} + +The current CGSimles syntax supports bond orders up to 4, which defines the +maximum number of ring connections that can be represented linearly. + +Virtual Edges +^^^^^^^^^^^^^ +In certain scenarios, a CG model might include interacting particles that do not +correspond to any finer-resolution nodes or atoms. For example, at the Martini 3 +resolution glucose is represented by three CG particles splitting the sugar ring +and one additional virtual particle. The TC4 bead captures the hydrophobic +interactions at the ring center but lacks any corresponding fragments at finer +resolution. To accommodate such particles, the CGSmiles notation employs zero +bond order edges, referred to as virtual edges. + +.. code-block:: none + + Martini 3 Glucose + {[#SP4r]1.2[#SP4r].3[#SP1r]1.[#TC4]23}.{#SP4r=OC[$]C[$]O,#SP1r=[$]OC[$]CO} + +Virtual edges are ignored when establishing connections and any particle with only +virtual edges is excluded entirely when transitioning to finer resolutions. We +note that these virtual edges and virtual particles are not to be confused with +the GROMACS virtual sites. A virtual site in GROMACS describes how a particle's +coordinates are constructed. If a virtual side describes real atoms or CG +particles they would be treated as regular nodes rather than virtual ones. + +Overloading Wildcards +^^^^^^^^^^^^^^^^^^^^^ +In certain cases, a single CG graph might describe more than one molecule at +the fine-grained resolution because of a loss in resolution at the CG level. +An example are Martini lipids such as POPC. POPC can describe lipids with a +tail length of 16 or 18 carbons and thus represents at least four molecules +when accounting for the position for the double bond. To capture this feature +CGSmiles allows to overload the wildcard (*) syntax using annotations. In +OpenSMILES a wildcard means any atom can be placed at the wildcard position. +To specify a selection of atoms CGSmiles allows to annotate a wildcard using the +select keyword abbreviated as ‘s’. Thus, a tail bead in POPC could be written as +``C1=CCCC[*;s=C,0][*;s=C,0]``. Note that the current molecule resolver is not able +to handle wildcard overloading. diff --git a/requirements-docs.txt b/requirements-docs.txt index 0db6a61..5e05624 100644 --- a/requirements-docs.txt +++ b/requirements-docs.txt @@ -2,6 +2,8 @@ sphinx >= 1.8.0 sphinxcontrib-apidoc pbr setuptools >= 30.3.0 - +furo numpy networkx ~= 2.0 +scipy +matplotlib diff --git a/requirements-tests.txt b/requirements-tests.txt index 97524e2..96df1e1 100644 --- a/requirements-tests.txt +++ b/requirements-tests.txt @@ -3,3 +3,5 @@ coverage pytest-cov pylint codecov +scipy +matplotlib diff --git a/setup.cfg b/setup.cfg index 6a725c0..06a5aad 100644 --- a/setup.cfg +++ b/setup.cfg @@ -5,7 +5,7 @@ universal = 1 name = cgsmiles author = Fabian Grunewald author_email = fgrunewald.science@gmail.com -description_file = README.md +description_file = README.rst description-content-type = text/markdown; charset=UTF-8 url = https://github.com/gruenewald-lab/CGsmiles license = undefined