Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updatedocs #45

Merged
merged 7 commits into from
Jan 9, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 19 additions & 12 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,25 @@ the different use cases can be found in this documentation. If you are here from
one of the packages using CGSmiles check out the GettingStarted section to learn
the syntax.

Installation
============

The easiest ways to install **cgsmiles** is using pip:

.. code:: bash

pip install git+https://github.com/gruenewald-lab/CGsmiles.git

In the future we will also distribute it through the Pypi
package index but that is currently not supported. Note that the drawing module
depends on the `scipy <https://scipy.org>`__ and `matplotlib <https://matplotlib.org>`__
packages. These need to be installed before the module can be used.

.. code:: bash

pip install scipy
pip install matplotlib

Examples
========

Expand All @@ -65,18 +84,6 @@ Martini 3 Benzene
# Draw molecule at different resolutions
ax, pos = draw_molecule(mol_graph)

Installation
============

The easiest ways to install **cgsmiles** is using pip:

.. code:: bash

pip install git+https://github.com/gruenewald-lab/CGsmiles.git

In the future we will also distribute it through the Pypi
package index but that is currently not supported.

Related Tools
=============

Expand Down
2 changes: 1 addition & 1 deletion cgsmiles/dialects.py
Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,7 @@ def create_dialect(default_attributes,
# KNOWN DIALECTS #
##########################################################
# this one is for global use
# it is the base CGSmiles dialect
# it is the base CGsmiles dialect
CGSMILES_DEFAULT_DIALECT = create_dialect({"fragname": (None, str),
"q": (0.0, float),
"w": (1.0, float)})
Expand Down
4 changes: 2 additions & 2 deletions cgsmiles/pysmiles_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ def rebuild_h_atoms(mol_graph,
"show delocalization-induced molecular equivalency and thus "
"is not considered aromatic. For example, 4-methyl imidazole "
"is often written as [nH]1cc(nc1)C, but should be written as "
"[NH]1C=C(N=C1)C. A corresponding CGSmiles string would be "
"[NH]1C=C(N=C1)C. A corresponding CGsmiles string would be "
"{[#A]1[#B][#C]1}.{#A=[>][<]N,#B=[$]N=C[>],#C=[$]C(C)=C[<]}")
raise SyntaxError(msg)
nx.set_node_attributes(mol_graph, 0, 'hcount')
Expand Down Expand Up @@ -126,7 +126,7 @@ def read_fragment_smiles(smiles_str,
ez_isomers={},
attributes={}):
"""
Read a smiles_str corresponding to a CGSmiles fragment and
Read a smiles_str corresponding to a CGsmiles fragment and
annotate bonding descriptors, isomers, as well as any other
attributes.

Expand Down
14 changes: 7 additions & 7 deletions cgsmiles/read_fragments.py
Original file line number Diff line number Diff line change
Expand Up @@ -104,21 +104,21 @@ def collect_ring_number(smile_iter, token, node_count, rings):

def strip_bonding_descriptors(fragment_string):
"""
Processes a CGSmiles fragment string by
Processes a CGsmiles fragment string by
stripping the bonding descriptors and storing
them in a dict with reference to the atom they
refer to. Furthermore, a cleaned SMILES or CGSmiles
refer to. Furthermore, a cleaned SMILES or CGsmiles
string is returned.

Parameters
----------
fragment_string: str
a CGSmiles fragment string
a CGsmiles fragment string

Returns
-------
str:
a canonical SMILES or CGSmiles string
a canonical SMILES or CGsmiles string
dict:
a dict mapping bonding descriptors
to the nodes within the string
Expand Down Expand Up @@ -255,19 +255,19 @@ def fragment_iter(fragment_str, all_atom=True):

def read_fragments(fragment_str, all_atom=True, fragment_dict=None):
"""
Collects the fragments defined in a CGSmiles fragment string
Collects the fragments defined in a CGsmiles fragment string
as networkx.Graph and returns a dict of them. Bonding descriptors
are annotated as node attribtues.

Parameters
----------
fragment_str: str
string using CGSmiles fragment syntax
string using CGsmiles fragment syntax

all_atom: bool
If the fragment strings are all-atom following
the OpenSmiles syntax. Default is True but if
set to False fragments follow the CGSmiles
set to False fragments follow the CGsmiles
syntax.

fragment_dict: dict
Expand Down
34 changes: 18 additions & 16 deletions cgsmiles/resolve.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
def compatible(left, right, legacy=False):
"""
Check bonding descriptor compatibility according
to the CGSmiles syntax conventions. With legacy
to the CGsmiles syntax conventions. With legacy
the BigSmiles convention can be used.

Parameters
Expand Down Expand Up @@ -87,27 +87,27 @@ def match_bonding_descriptors(source, target, bond_attribute="bonding", legacy=F

class MoleculeResolver:
"""
Resolve the molecule(s) described by a CGSmiles string and return a networkx.Graph
Resolve the molecule(s) described by a CGsmiles string and return a networkx.Graph
of the molecule.

First, this class has to be initiated using one of three class construction
methods. When trying to read a CGSmiles string always use the first method.
methods. When trying to read a CGsmiles string always use the first method.
The other constructors can be used in case fragments or the lowest
resolution molecule are defined by graphs that come from elsewhere.

`self.from_string`: use when fragments and lowest resolution are
described in one CGSmiles string.
`self.from_graph`: use when fragments are described by CGSmiles
described in one CGsmiles string.
`self.from_graph`: use when fragments are described by CGsmiles
strings but the lowest resolution is given
as nx.Graph
`self.from_fragment_dicts`: use when fragments are given as nx.Graphs
and the lowest resolution is provided as
CGSmiles string
CGsmiles string

Once the `MoleculeResolver` is initiated you can call the `resolve_iter` to
loop over the different levels of resolution. The resolve iter will always
return the previous lower resolution graph as well as the current higher
resolution graph. For example, if the CGSmiles string describes a monomer
resolution graph. For example, if the CGsmiles string describes a monomer
sequence of a regular polymer, the lower resolution graph will be the graph
of this monomer sequence and the higher resolution graph the full molecule.

Expand All @@ -129,6 +129,7 @@ class MoleculeResolver:
---------------------
Alternatively, one could have gotten the block level graph from somewhere
else defined as `nx.Graph` in that case:

>>> # the string only defines the fragments
>>> cgsmiles_str = "{#B1=[#PEO]|4,#B2=[#PE]|2}.{#PEO=[>]COC[<],#PE=[>]CC[<]}"
>>> block_graph = nx.Graph()
Expand All @@ -137,10 +138,11 @@ class MoleculeResolver:
>>> resolver = MoleculeResolver.from_graph(cgsmiles_str, block_graph)

Finally, there is the option of having the fragments from elsewhere for
example a library. Then only the graph defined as CGSmiles string. In this
example a library. Then only the graph defined as CGsmiles string. In this
case the `from_fragment_dicts` method can be used. Please note that the
fragment graphs need to have the following attributes as a graph returned
by the `cgsmiles.read_fragments` function.

>>> fragment_dicts = []
>>> for frag_string in ["{#B1=[#PEO]|4,#B2=[#PE]|2}", "{#PEO=[>]COC[<],#PE=[>]CC[<]}"]:
>>> frag_dict = read_fragments(frag_string)
Expand Down Expand Up @@ -178,11 +180,11 @@ def __init__(self,
legacy: bool
which syntax convention to use for matching the bonding descriptors.
Legacy syntax adheres to the BigSmiles convention. Default syntax
adheres to CGSmiles convention where bonding descriptors '$' match
adheres to CGsmiles convention where bonding descriptors '$' match
with every '$' and every '<' matches every '>'. With the BigSmiles
convention a alphanumeric string may be provided that distinguishes
these connectors. For example, '$A' would not match '$B'. However,
such use cases should be rare and the CGSmiles convention facilitates
such use cases should be rare and the CGsmiles convention facilitates
usage of bonding descriptors in the Sampler where the labels are used
to assign different probabilities.
"""
Expand All @@ -199,15 +201,15 @@ def __init__(self,
@staticmethod
def read_fragment_strings(fragment_strings, last_all_atom=True):
"""
Read a list of CGSmiles fragment_strings and return a list
Read a list of CGsmiles fragment_strings and return a list
of dicts with the fragment graphs. If `last_all_atom` is
True then pysmiles is used to read the last fragment string
provided in the list.

Parameters
----------
fragment_strings: list[str]
list of CGSmiles fragment strings
list of CGsmiles fragment strings
last_all_atom: bool
if the last string in the list is an all atom string
and should be read using pysmiles.
Expand Down Expand Up @@ -348,7 +350,7 @@ def squash_atoms(self):

def resolve(self):
"""
Resolve a CGSmiles string once and return the next resolution.
Resolve a CGsmiles string once and return the next resolution.
"""
# check if this is an all-atom level resolution
all_atom = (self.resolution_counter == self.resolutions - 1 and self.last_all_atom)
Expand Down Expand Up @@ -429,7 +431,7 @@ def from_string(cls, cgsmiles_str, last_all_atom=True, legacy=False):
legacy: bool
which syntax convention to use for matching the bonding descriptors.
Legacy syntax adheres to the BigSmiles convention. Default syntax
adheres to CGSmiles convention. A more detailed explanation can be
adheres to CGsmiles convention. A more detailed explanation can be
found in the MoleculeResolver.__init__ method.

Returns
Expand Down Expand Up @@ -466,7 +468,7 @@ def from_graph(cls, cgsmiles_str, meta_graph, last_all_atom=True, legacy=False):
legacy: bool
which syntax convention to use for matching the bonding descriptors.
Legacy syntax adheres to the BigSmiles convention. Default syntax
adheres to CGSmiles convention. A more detailed explanation can be
adheres to CGsmiles convention. A more detailed explanation can be
found in the MoleculeResolver.__init__ method.

Returns
Expand Down Expand Up @@ -507,7 +509,7 @@ def from_fragment_dicts(cls, cgsmiles_str, fragment_dicts, last_all_atom=True, l
legacy: bool
which syntax convention to use for matching the bonding descriptors.
Legacy syntax adheres to the BigSmiles convention. Default syntax
adheres to CGSmiles convention. A more detailed explanation can be
adheres to CGsmiles convention. A more detailed explanation can be
found in the MoleculeResolver.__init__ method.

Returns
Expand Down
6 changes: 3 additions & 3 deletions cgsmiles/sample.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,12 +42,12 @@ def _set_bond_order_defaults(bonding):

class MoleculeSampler:
"""
Given a fragment string in CGSmiles format and probabilities for residues
Given a fragment string in CGsmiles format and probabilities for residues
to occur, return a random molecule with target molecular weight.

First, this class has to be initiated using the class construction
method `from_string`, which makes sure to read and resolve the fragment
graphs provided in the CGSmiles string.
graphs provided in the CGsmiles string.

Once the `MoleculeSampler` is initiated you can call the `sampler` method
in order to generate a new random polymer molecule from the fragment string
Expand Down Expand Up @@ -124,7 +124,7 @@ class MoleculeSampler:
can be provided.

For example, To generate a bottle brush polymer that has PMA in the backbone
and PEG as side-chain terminated with an OH group the following CGSmiles string
and PEG as side-chain terminated with an OH group the following CGsmiles string
in combination with the above mentioned probabilities can be provided.

Note that in this case we declare '$A' and '$B' to be terminal bonding
Expand Down
2 changes: 1 addition & 1 deletion cgsmiles/tests/test_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"show delocalization-induced molecular equivalency and thus "
"is not considered aromatic. For example, 4-methyl imidazole "
"is often written as [nH]1cc(nc1)C, but should be written as "
"[NH]1C=C(N=C1)C. A corresponding CGSmiles string would be "
"[NH]1C=C(N=C1)C. A corresponding CGsmiles string would be "
"{[#A]1[#B][#C]1}.{#A=[>][<]N,#B=[$]N=C[>],#C=[$]C(C)=C[<]}")

@pytest.mark.parametrize('frag_str, hatoms_ref, error_type, err_msg', (
Expand Down
16 changes: 8 additions & 8 deletions cgsmiles/write_cgsmiles.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
def format_node(molecule, current):
"""
Format a node from a `molecule` graph according to
the CGSmiles syntax. The attribute fragname has to
the CGsmiles syntax. The attribute fragname has to
be set for the `current` node.

Parameters
Expand Down Expand Up @@ -68,7 +68,7 @@ def write_graph(molecule, smiles_format=False, default_element='*'):
Returns
-------
str
The CGSmiles string describing `molecule`.
The CGsmiles string describing `molecule`.
"""
start = min(molecule)
dfs_successors = nx.dfs_successors(molecule, source=start)
Expand Down Expand Up @@ -161,19 +161,19 @@ def write_graph(molecule, smiles_format=False, default_element='*'):

def write_cgsmiles_graph(molecule):
"""
Write a CGSmiles graph sans fragments at
Write a CGsmiles graph sans fragments at
different resolution.

Parameters
----------
molecule: networkx.Graph
a molecule where each node as a fragname attribute
that is used as name in the CGSmiles string.
that is used as name in the CGsmiles string.

Returns
-------
str
the CGSmiles string
the CGsmiles string
"""

cgsmiles_str = write_graph(molecule)
Expand All @@ -192,7 +192,7 @@ def write_cgsmiles_fragments(fragment_dict, smiles_format=True):
a dict of fragment graphs
smiles_format: bool
write all atom SMILES if True (default) otherwise
write CGSmiles
write CGsmiles

Returns
-------
Expand All @@ -208,7 +208,7 @@ def write_cgsmiles_fragments(fragment_dict, smiles_format=True):

def write_cgsmiles(molecule_graph, fragments, last_all_atom=True):
"""
Write a CGSmiles string given a low resolution molecule graph
Write a CGsmiles string given a low resolution molecule graph
and any number of higher resolutions provided as fragment dicts.

Parameters
Expand All @@ -222,7 +222,7 @@ def write_cgsmiles(molecule_graph, fragments, last_all_atom=True):
Returns
-------
str
CGSmiles string
CGsmiles string
"""
final_str = write_cgsmiles_graph(molecule_graph)
for layer, fragment in enumerate(fragments):
Expand Down
Loading
Loading