diff --git a/docs/tut-6-ssi.rst b/docs/tut-6-ssi.rst index 5458cbf..de0c732 100644 --- a/docs/tut-6-ssi.rst +++ b/docs/tut-6-ssi.rst @@ -13,113 +13,206 @@ the function ``ssi_feature_analysis()``. The Co-SSI statistic, which tells how much the communication between two features depends on the ensemble condition, can be calculated using ``cossi_featens_analysis()``. -Feature - Feature Communication -------------------------------- +Features and States +------------------- + +.. code:: python + + import os + import numpy as np + from pensa.features import \ + read_structure_features, \ + read_water_features, \ + get_multivar_res, \ + sort_features + from pensa.statesinfo import \ + get_discrete_states + from pensa.comparison import \ + ssi_ensemble_analysis, \ + ssi_feature_analysis, \ + cossi_featens_analysis + First, load the structural features as described in the previous tutorial: .. code:: python - sim_a_rec = get_structure_features("traj/condition-a_receptor.gro", - "traj/condition-a_receptor.xtc") - sim_b_rec = get_structure_features("traj/condition-b_receptor.gro", - "traj/condition-b_receptor.xtc") + sim_a_rec = read_structure_features( + "traj/condition-a_receptor.gro", + "traj/condition-a_receptor.xtc" + ) + sim_b_rec = read_structure_features( + "traj/condition-b_receptor.gro", + "traj/condition-b_receptor.xtc" + ) sim_a_rec_feat, sim_a_rec_data = sim_a_rec sim_b_rec_feat, sim_b_rec_data = sim_b_rec -Then run the SSI feature-feature analysis in the same manner the other statistics. +Then combine all backbone torsions from one residue to one multivariate feature. +You can also analyze sidechain torsions, just replace ``bb`` by ``sc``. Analysis of +sidechain torsions takes much more time to run though so for this tutorial, we look +only at backbone torsions. + +.. code:: python + + bbtors_res_feat_a, bbtors_res_data_a = get_multivar_res( + sim_a_rec_feat['bb-torsions'], sim_a_rec_data['bb-torsions'] + ) + bbtors_res_feat_b, bbtors_res_data_b = get_multivar_res( + sim_b_rec_feat['bb-torsions'], sim_b_rec_data['bb-torsions'] + ) +From those residue-based multivariate features, we now determine the state boundaries. .. code:: python - names_bbtors, ssi_featfeat_bbtors = ssi_feature_analysis(sim_a_rec_feat, sim_b_rec_feat, - sim_a_rec_data, sim_b_rec_data, - torsions='bb', verbose=True) - -The output of ``ssi_feature_analysis()`` produces an array for the SSI between -all features, with the names_bbtors referring to both feature names. This result -can be visualized in a two-dimensional representation similar to the distances -using ``distances_visualization()`` + bbtors_states = get_discrete_states( + bbtors_res_data_a, bbtors_res_data_b + ) + +Let's also featurize the water cavities in the system, using the grid we generated in +the preprocessing tutorial. .. code:: python - distances_visualization(names_bbtors, ssi_featfeat_bbtors, - "plots/receptor_ssi-bbdist.pdf", - vmin = 0.0, vmax = max(ssi_featfeat_bbtors), - cbar_label='SSI') + grid = "traj/water_grid_ab_OH2_density.dx" + water_feat_a, water_data_a = read_water_features( + "traj/condition-a_water.gro", "traj/condition-a_water_aligned.xtc", + top_waters = 5, atomgroup = "OH2", grid_input = grid + ) + water_feat_b, water_data_b = read_water_features( + "traj/condition-b_water.gro", "traj/condition-b_water.xtc", + top_waters = 5, atomgroup = "OH2", grid_input = grid + ) -The Co-SSI feature-feature-ensemble analysis is done in the same manner. +Just like we did for torsions, we can now determine the discrete states for the orientation +of the water molecules in the pockets. .. code:: python - names_bbtors, ssi_featfeat_bbtors, cossi_bbtors = cossi_featens_analysis(sim_a_rec_feat, sim_b_rec_feat, - sim_a_rec_data, sim_b_rec_data, - torsions='bb', verbose=True) - -The output of ``cossi_featens_analysis()`` produces an array for the SSI and the -Co-SSI between all features, with the names_bbtors producing the same output as -the feature-feature SSI. These results can again be visualized in a 2D representation. -It is worth noting, as ``cossi_featens_analysis()`` also computes SSI, you need not -run ``ssi_feature_analysis()`` in addition. + water_states = get_discrete_states( + water_data_a['WaterPocket_Distr'], + water_data_b['WaterPocket_Distr'], + discretize='gaussian', pbc=True + ) + +Water occupancy (is water in the pocket or not?) is described as a binary feature with the +values 0 or 1. We can thus define the state boundaries manually. + +.. code:: python + + water_occup_states = [[[-0.1, 0.5, 1.1]]] * len(water_states) -Water pocket information transfer ---------------------------------- -Internal water pockets and ion/atom pockets can also receive information about -the ensemble condition. The State Specific Information (SSI) analysis can be -applied to investigate these distributions in the same manner. First we need -to featurize the water pockets using the combined ensemble water density explained -in the featurization tutorial. +Water Pockets +------------- + +We start by comparing the occupancy between the two conditions, similar to what we did in the +comparison tutorial: .. code:: python - struc = "traj/cond-a_water.gro" - xtc = "dens/cond-a_wateraligned.xtc" - grid = "dens/ab_grid_OH2_density.xtc" - water_feat_a, water_data_a = get_water_features(structure_input = struc, - xtc_input = xtc, - top_waters = 5, - atomgroup = "OH2", - grid_input = grid) + water_names, water_ssi = ssi_ensemble_analysis( + water_feat_a['WaterPocket_OccupDistr'], water_feat_b['WaterPocket_OccupDistr'], + water_data_a['WaterPocket_OccupDistr'], water_data_b['WaterPocket_OccupDistr'], + water_occup_states, verbose=True, h2o=False, pbc=False + ) - struc = "traj/cond-b_water.gro" - xtc = "traj/cond-b_water.xtc" - grid = "dens/ab_grid_OH2_density.xtc" - water_feat_b, water_data_b = get_water_features(structure_input = struc, - xtc_input = xtc, - top_waters = 5, - atomgroup = "OH2", - grid_input = grid) - - -Information shared between water pockets and the ensemble condition is then -quantified using ``ssi_ensemble_analysis()``. We set ``torsions=None`` for -waters. +Except for water site O2, there are not many differences in occupancy, so let's have a closer +look and compare the orientation of the water molecules between the two conditions: .. code:: python - data_names, data_ssi = ssi_ensemble_analysis(water_feat_a['WaterPocket_Distr'],water_feat_b['WaterPocket_Distr'], - water_data_a['WaterPocket_Distr'],water_data_b['WaterPocket_Distr'], - torsions = None, - verbose=True) - + water_names, water_ssi = ssi_ensemble_analysis( + water_feat_a['WaterPocket_Distr'], water_feat_b['WaterPocket_Distr'], + water_data_a['WaterPocket_Distr'], water_data_b['WaterPocket_Distr'], + water_states, verbose=True, h2o=True + ) -Additionally, we can see if the pocket occupancy (i.e. the presence/absence of -water at the site) shares SSI - ``pocket_occupancy=True``. Currently this is only enabled with -``ssi_ensemble_analysis``. We need to turn off the periodic boundary conditions -as the distributions are no longer periodic - ``pbc=False``. +Beyond comparing distributions, we can also quantify the amount of feature-feature communication +between the water sites. The corresponding function looks similar to the ensemble comarison above, +however it calculates the amount of shared information between each pair of residues. .. code:: python - data_names, data_ssi = ssi_ensemble_analysis(water_feat_a['WaterPocket_OccupDistr'],water_feat_b['WaterPocket_OccupDistr'], - water_data_a['WaterPocket_OccupDistr'],water_data_b['WaterPocket_OccupDistr'], - pocket_occupancy=True, pbc=False, verbose=True) + water_pairs_names, water_pairs_ssi = ssi_feature_analysis( + water_feat_a['WaterPocket_Distr'], water_feat_b['WaterPocket_Distr'], + water_data_a['WaterPocket_Distr'], water_data_b['WaterPocket_Distr'], + water_states, verbose=True, h2o=True + ) + +This function produces an array for the SSI between all features. This array is two-dimensional, +so we visualize its values using a heat map: -The same protocol applies for atom/ion pockets. +.. code:: python + from pensa.comparison import resnum_heatmap, pair_features_heatmap + pair_features_heatmap( + water_pairs_names, water_pairs_ssi, + "plots/water-pairs_ssi.pdf", + vmin = 0.0, vmax = 1., + cbar_label='SSI', + separator=' & ' + ) +Water and Torsions Combined +--------------------------- +Now we want to investigate information flow between more than one type of features. +To do so, we combine the water sites and the torsions: + +.. code:: python + all_feat_a = water_feat_a['WaterPocket_Distr'] + bbtors_res_feat_a + all_feat_b = water_feat_b['WaterPocket_Distr'] + bbtors_res_feat_b + all_data_a = np.array(list(water_data_a['WaterPocket_Distr']) + list(bbtors_res_data_a), dtype=object) + all_data_b = np.array(list(water_data_b['WaterPocket_Distr']) + list(bbtors_res_data_b), dtype=object) + all_states = water_states + bbtors_states +Note that we only use backbone torsions here to minimize computational effort. +Analysis of sidechain torsions (or of both combined) can often deliver more scientific insigts. + +As we did for the water sites above, we now calculate the SSI for all combined feature-feature pairs. + +.. code:: python + + all_pairs_names, all_pairs_ssi = ssi_feature_analysis( + all_feat_a, all_feat_b, + all_data_a, all_data_b, + all_states, verbose=True + ) + +The number of pairs for an entire protein is enormous so we determine those with +the highest SSI. To alleviate the computational effort for the sort function, we first +filter the pairs by a threshold SSI of 0.5: + +.. code:: python + + relevant = np.abs(all_pairs_ssi) > 0.5 + not_self = np.array([name.split(' & ')[0] != name.split(' & ')[1] for name in all_pairs_names]) + relevant *= not_self + argrelev = np.argwhere(relevant).flatten() + all_relevant_pairs_names = [all_pairs_names[i] for i in argrelev] + all_relevant_pairs_ssi = all_pairs_ssi[relevant] + +Then we run the actual sorting by SSI. + +.. code:: python + + sort_features(all_relevant_pairs_names, all_relevant_pairs_ssi) + + +The Co-SSI feature-feature-ensemble analysis is done in the same manner. + +.. code:: python + + all_new_pairs_names, all_new_pairs_ssi, all_new_pairs_cossi = cossi_featens_analysis( + all_feat_a, all_feat_b, all_feat_a, all_feat_b, + all_data_a, all_data_b, all_data_a, all_data_b, + all_states, all_states, verbose=True + ) + +The output of ``cossi_featens_analysis()`` produces an array for the SSI and the Co-SSI +between all features. These results can again be visualized in a 2D representation. diff --git a/tutorial/PENSA_Tutorial_SSI.py b/tutorial/PENSA_Tutorial_SSI.py index 281f41d..f608c43 100755 --- a/tutorial/PENSA_Tutorial_SSI.py +++ b/tutorial/PENSA_Tutorial_SSI.py @@ -142,7 +142,7 @@ extract_coordinates(ref_file_b, pdb_file_b, trj_file_b, out_name_b, sel_base_b) # Extract the coordinates of the ensemble a aligned to ensemble b -extract_aligned_coords( +extract_aligned_coordinates( out_name_a + ".gro", out_name_a + ".xtc", out_name_b + ".gro", out_name_b + ".xtc" )