Skip to content

Commit

Permalink
update ssi tutorial
Browse files Browse the repository at this point in the history
  • Loading branch information
martinvoegele committed Dec 2, 2023
1 parent cc41926 commit c5b7d5a
Show file tree
Hide file tree
Showing 2 changed files with 164 additions and 71 deletions.
233 changes: 163 additions & 70 deletions docs/tut-6-ssi.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,113 +13,206 @@ the function ``ssi_feature_analysis()``.
The Co-SSI statistic, which tells how much the communication between two features
depends on the ensemble condition, can be calculated using ``cossi_featens_analysis()``.

Feature - Feature Communication
-------------------------------
Features and States
-------------------

.. code:: python
import os
import numpy as np
from pensa.features import \
read_structure_features, \
read_water_features, \
get_multivar_res, \
sort_features
from pensa.statesinfo import \
get_discrete_states
from pensa.comparison import \
ssi_ensemble_analysis, \
ssi_feature_analysis, \
cossi_featens_analysis
First, load the structural features as described in the previous tutorial:

.. code:: python
sim_a_rec = get_structure_features("traj/condition-a_receptor.gro",
"traj/condition-a_receptor.xtc")
sim_b_rec = get_structure_features("traj/condition-b_receptor.gro",
"traj/condition-b_receptor.xtc")
sim_a_rec = read_structure_features(
"traj/condition-a_receptor.gro",
"traj/condition-a_receptor.xtc"
)
sim_b_rec = read_structure_features(
"traj/condition-b_receptor.gro",
"traj/condition-b_receptor.xtc"
)
sim_a_rec_feat, sim_a_rec_data = sim_a_rec
sim_b_rec_feat, sim_b_rec_data = sim_b_rec
Then run the SSI feature-feature analysis in the same manner the other statistics.
Then combine all backbone torsions from one residue to one multivariate feature.
You can also analyze sidechain torsions, just replace ``bb`` by ``sc``. Analysis of
sidechain torsions takes much more time to run though so for this tutorial, we look
only at backbone torsions.

.. code:: python
bbtors_res_feat_a, bbtors_res_data_a = get_multivar_res(
sim_a_rec_feat['bb-torsions'], sim_a_rec_data['bb-torsions']
)
bbtors_res_feat_b, bbtors_res_data_b = get_multivar_res(
sim_b_rec_feat['bb-torsions'], sim_b_rec_data['bb-torsions']
)
From those residue-based multivariate features, we now determine the state boundaries.

.. code:: python
names_bbtors, ssi_featfeat_bbtors = ssi_feature_analysis(sim_a_rec_feat, sim_b_rec_feat,
sim_a_rec_data, sim_b_rec_data,
torsions='bb', verbose=True)
The output of ``ssi_feature_analysis()`` produces an array for the SSI between
all features, with the names_bbtors referring to both feature names. This result
can be visualized in a two-dimensional representation similar to the distances
using ``distances_visualization()``
bbtors_states = get_discrete_states(
bbtors_res_data_a, bbtors_res_data_b
)
Let's also featurize the water cavities in the system, using the grid we generated in
the preprocessing tutorial.

.. code:: python
distances_visualization(names_bbtors, ssi_featfeat_bbtors,
"plots/receptor_ssi-bbdist.pdf",
vmin = 0.0, vmax = max(ssi_featfeat_bbtors),
cbar_label='SSI')
grid = "traj/water_grid_ab_OH2_density.dx"
water_feat_a, water_data_a = read_water_features(
"traj/condition-a_water.gro", "traj/condition-a_water_aligned.xtc",
top_waters = 5, atomgroup = "OH2", grid_input = grid
)
water_feat_b, water_data_b = read_water_features(
"traj/condition-b_water.gro", "traj/condition-b_water.xtc",
top_waters = 5, atomgroup = "OH2", grid_input = grid
)
The Co-SSI feature-feature-ensemble analysis is done in the same manner.
Just like we did for torsions, we can now determine the discrete states for the orientation
of the water molecules in the pockets.

.. code:: python
names_bbtors, ssi_featfeat_bbtors, cossi_bbtors = cossi_featens_analysis(sim_a_rec_feat, sim_b_rec_feat,
sim_a_rec_data, sim_b_rec_data,
torsions='bb', verbose=True)
The output of ``cossi_featens_analysis()`` produces an array for the SSI and the
Co-SSI between all features, with the names_bbtors producing the same output as
the feature-feature SSI. These results can again be visualized in a 2D representation.
It is worth noting, as ``cossi_featens_analysis()`` also computes SSI, you need not
run ``ssi_feature_analysis()`` in addition.
water_states = get_discrete_states(
water_data_a['WaterPocket_Distr'],
water_data_b['WaterPocket_Distr'],
discretize='gaussian', pbc=True
)
Water occupancy (is water in the pocket or not?) is described as a binary feature with the
values 0 or 1. We can thus define the state boundaries manually.

.. code:: python
water_occup_states = [[[-0.1, 0.5, 1.1]]] * len(water_states)
Water pocket information transfer
---------------------------------
Internal water pockets and ion/atom pockets can also receive information about
the ensemble condition. The State Specific Information (SSI) analysis can be
applied to investigate these distributions in the same manner. First we need
to featurize the water pockets using the combined ensemble water density explained
in the featurization tutorial.
Water Pockets
-------------

We start by comparing the occupancy between the two conditions, similar to what we did in the
comparison tutorial:

.. code:: python
struc = "traj/cond-a_water.gro"
xtc = "dens/cond-a_wateraligned.xtc"
grid = "dens/ab_grid_OH2_density.xtc"
water_feat_a, water_data_a = get_water_features(structure_input = struc,
xtc_input = xtc,
top_waters = 5,
atomgroup = "OH2",
grid_input = grid)
water_names, water_ssi = ssi_ensemble_analysis(
water_feat_a['WaterPocket_OccupDistr'], water_feat_b['WaterPocket_OccupDistr'],
water_data_a['WaterPocket_OccupDistr'], water_data_b['WaterPocket_OccupDistr'],
water_occup_states, verbose=True, h2o=False, pbc=False
)
struc = "traj/cond-b_water.gro"
xtc = "traj/cond-b_water.xtc"
grid = "dens/ab_grid_OH2_density.xtc"
water_feat_b, water_data_b = get_water_features(structure_input = struc,
xtc_input = xtc,
top_waters = 5,
atomgroup = "OH2",
grid_input = grid)
Information shared between water pockets and the ensemble condition is then
quantified using ``ssi_ensemble_analysis()``. We set ``torsions=None`` for
waters.
Except for water site O2, there are not many differences in occupancy, so let's have a closer
look and compare the orientation of the water molecules between the two conditions:

.. code:: python
data_names, data_ssi = ssi_ensemble_analysis(water_feat_a['WaterPocket_Distr'],water_feat_b['WaterPocket_Distr'],
water_data_a['WaterPocket_Distr'],water_data_b['WaterPocket_Distr'],
torsions = None,
verbose=True)
water_names, water_ssi = ssi_ensemble_analysis(
water_feat_a['WaterPocket_Distr'], water_feat_b['WaterPocket_Distr'],
water_data_a['WaterPocket_Distr'], water_data_b['WaterPocket_Distr'],
water_states, verbose=True, h2o=True
)
Additionally, we can see if the pocket occupancy (i.e. the presence/absence of
water at the site) shares SSI - ``pocket_occupancy=True``. Currently this is only enabled with
``ssi_ensemble_analysis``. We need to turn off the periodic boundary conditions
as the distributions are no longer periodic - ``pbc=False``.
Beyond comparing distributions, we can also quantify the amount of feature-feature communication
between the water sites. The corresponding function looks similar to the ensemble comarison above,
however it calculates the amount of shared information between each pair of residues.

.. code:: python
data_names, data_ssi = ssi_ensemble_analysis(water_feat_a['WaterPocket_OccupDistr'],water_feat_b['WaterPocket_OccupDistr'],
water_data_a['WaterPocket_OccupDistr'],water_data_b['WaterPocket_OccupDistr'],
pocket_occupancy=True, pbc=False, verbose=True)
water_pairs_names, water_pairs_ssi = ssi_feature_analysis(
water_feat_a['WaterPocket_Distr'], water_feat_b['WaterPocket_Distr'],
water_data_a['WaterPocket_Distr'], water_data_b['WaterPocket_Distr'],
water_states, verbose=True, h2o=True
)
This function produces an array for the SSI between all features. This array is two-dimensional,
so we visualize its values using a heat map:

The same protocol applies for atom/ion pockets.
.. code:: python
from pensa.comparison import resnum_heatmap, pair_features_heatmap
pair_features_heatmap(
water_pairs_names, water_pairs_ssi,
"plots/water-pairs_ssi.pdf",
vmin = 0.0, vmax = 1.,
cbar_label='SSI',
separator=' & '
)
Water and Torsions Combined
---------------------------

Now we want to investigate information flow between more than one type of features.
To do so, we combine the water sites and the torsions:

.. code:: python
all_feat_a = water_feat_a['WaterPocket_Distr'] + bbtors_res_feat_a
all_feat_b = water_feat_b['WaterPocket_Distr'] + bbtors_res_feat_b
all_data_a = np.array(list(water_data_a['WaterPocket_Distr']) + list(bbtors_res_data_a), dtype=object)
all_data_b = np.array(list(water_data_b['WaterPocket_Distr']) + list(bbtors_res_data_b), dtype=object)
all_states = water_states + bbtors_states
Note that we only use backbone torsions here to minimize computational effort.
Analysis of sidechain torsions (or of both combined) can often deliver more scientific insigts.

As we did for the water sites above, we now calculate the SSI for all combined feature-feature pairs.

.. code:: python
all_pairs_names, all_pairs_ssi = ssi_feature_analysis(
all_feat_a, all_feat_b,
all_data_a, all_data_b,
all_states, verbose=True
)
The number of pairs for an entire protein is enormous so we determine those with
the highest SSI. To alleviate the computational effort for the sort function, we first
filter the pairs by a threshold SSI of 0.5:

.. code:: python
relevant = np.abs(all_pairs_ssi) > 0.5
not_self = np.array([name.split(' & ')[0] != name.split(' & ')[1] for name in all_pairs_names])
relevant *= not_self
argrelev = np.argwhere(relevant).flatten()
all_relevant_pairs_names = [all_pairs_names[i] for i in argrelev]
all_relevant_pairs_ssi = all_pairs_ssi[relevant]
Then we run the actual sorting by SSI.

.. code:: python
sort_features(all_relevant_pairs_names, all_relevant_pairs_ssi)
The Co-SSI feature-feature-ensemble analysis is done in the same manner.

.. code:: python
all_new_pairs_names, all_new_pairs_ssi, all_new_pairs_cossi = cossi_featens_analysis(
all_feat_a, all_feat_b, all_feat_a, all_feat_b,
all_data_a, all_data_b, all_data_a, all_data_b,
all_states, all_states, verbose=True
)
The output of ``cossi_featens_analysis()`` produces an array for the SSI and the Co-SSI
between all features. These results can again be visualized in a 2D representation.
2 changes: 1 addition & 1 deletion tutorial/PENSA_Tutorial_SSI.py
Original file line number Diff line number Diff line change
Expand Up @@ -142,7 +142,7 @@
extract_coordinates(ref_file_b, pdb_file_b, trj_file_b, out_name_b, sel_base_b)

# Extract the coordinates of the ensemble a aligned to ensemble b
extract_aligned_coords(
extract_aligned_coordinates(
out_name_a + ".gro", out_name_a + ".xtc",
out_name_b + ".gro", out_name_b + ".xtc"
)
Expand Down

0 comments on commit c5b7d5a

Please sign in to comment.