MDAnalysis Hydrogenbond calculation taking too much time #4898

M15160058 · 2025-01-23T14:28:00Z

M15160058
Jan 23, 2025

Hello everyone,
I am trying to calculate hydrogen bonds using MDAnalysis but it is taking too much time (more than a day).Even for frame 1, it is taking 5or six hours.
Yes the xtc file is large like # Atoms 770667 and 300ns long. But I want to calculate the hbonds for only four chains(3600 atoms only).
`import pandas as pd
import MDAnalysis as mda
from MDAnalysis.analysis.hydrogenbonds import HydrogenBondAnalysis
import matplotlib.pyplot as plt

Load the Universe

u = mda.Universe("clpb_tetramer_traj1_npt_50.tpr", "concate_cdab.xtc")

Select specific parts (e.g., two chains)

selection = u.select_atoms(
"segid seg_6_Protein_chain_G or segid seg_7_Protein_chain_H or segid seg_8_Protein_chain_I or segid seg_9_Protein_chain_J"
)

Use the selection for analysis

print("Number of atoms in selection:", len(selection))

Define chain selections using segid

chains = {
"A": u.select_atoms("segid seg_6_Protein_chain_G"),
"B": u.select_atoms("segid seg_7_Protein_chain_H"),
"C": u.select_atoms("segid seg_8_Protein_chain_I"),
"D": u.select_atoms("segid seg_9_Protein_chain_J"),
}

Define all chain pair combinations (for inter-chain only)

chain_pairs = [
("A", "B"), ("A", "C"), ("A", "D"),
("B", "C"), ("B", "D"), ("C", "D")
]

Pre-calculate donor and acceptor selections

donor_selectors = {}
acceptor_selectors = {}
for chain_id in chains:
donor_selectors[chain_id] = f"segid {chains[chain_id].segids[0]} and (name N or name O)" # Example: Select N and O atoms as donors
acceptor_selectors[chain_id] = f"segid {chains[chain_id].segids[0]} and name O" # Example: Select O atoms as acceptors

File 1: H-Bond details for the first frame

first_frame_data = []

Process the first frame only

u.trajectory[0]
print("Processing the first frame:")

for i, (chain1, chain2) in enumerate(chain_pairs, start=1):
print(f" Chain pair {i}/{len(chain_pairs)}: {chain1}-{chain2}")
hbonds = HydrogenBondAnalysis(
universe=u,
donors_sel=donor_selectors[chain1],
acceptors_sel=acceptor_selectors[chain2],
d_a_cutoff=3.5,
d_h_a_angle_cutoff=120,
)
hbonds.run()

# Process the results for this pair
for hbond in hbonds.results.hbonds:
    frame, donor, hydrogen, acceptor, distance, angle = hbond
    donor_atom = u.atoms[int(donor)]
    acceptor_atom = u.atoms[int(acceptor)]

    # Extract chain, residue, and atom details
    donor_details = f"{chain1}-{donor_atom.resname}-{donor_atom.resid}-{donor_atom.index+1}-{donor_atom.name}"
    acceptor_details = f"{chain2}-{acceptor_atom.resname}-{acceptor_atom.resid}-{acceptor_atom.index+1}-{acceptor_atom.name}"

    # Append the data for the current hydrogen bond
    first_frame_data.append({
        "Chain Pair": f"{chain1}-{chain2}",
        "Donor (Chain-Residue-Index-Atom)": donor_details,
        "Acceptor (Chain-Residue-Index-Atom)": acceptor_details,
        "Distance (Å)": f"{distance:.2f}",
        "Angle (°)": f"{angle:.2f}"
    })

Save the first frame data to a CSV file

first_frame_df = pd.DataFrame(first_frame_data)
first_frame_df.to_csv("first_frame_hbonds.csv", index=False)
print("First frame H-Bond details saved to 'first_frame_hbonds.csv'.")

File 2: Time vs Number of H-Bonds across trajectory

time_data = []

Iterate over all frames in the trajectory

print("\nProcessing all trajectory frames:")
for ts in u.trajectory:
current_time = ts.time
print(f" Frame {u.trajectory.frame}/{len(u.trajectory)} at time {current_time:.2f} ps")
total_hbonds = 0

for i, (chain1, chain2) in enumerate(chain_pairs, start=1):
    print(f"    Chain pair {i}/{len(chain_pairs)}: {chain1}-{chain2}")
    hbonds = HydrogenBondAnalysis(
        universe=u,
        donors_sel=donor_selectors[chain1], 
        acceptors_sel=acceptor_selectors[chain2], 
        d_a_cutoff=3.5, 
        d_h_a_angle_cutoff=120,
    )
    hbonds.run()

    # Add the number of H-bonds for this pair
    total_hbonds += len(hbonds.results.hbonds)

# Append the total H-bond count for this frame to the time data
time_data.append({"Time (ps)": current_time, "Number of H-bonds": total_hbonds})

Save the time vs H-Bond data to a CSV file

time_df = pd.DataFrame(time_data)
time_df.to_csv("time_vs_hbonds.csv", index=False)
print("Time vs Number of H-Bonds saved to 'time_vs_hbonds.csv'.")

Plot the results

plt.figure(figsize=(10, 6))
plt.plot(time_df["Time (ps)"], time_df["Number of H-bonds"], marker='o')
plt.xlabel("Time (ps)")
plt.ylabel("Number of Hydrogen Bonds")
plt.title("Time vs. Number of Hydrogen Bonds")
plt.grid()
plt.savefig("time_vs_hbonds_plot.png")
plt.show()`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MDAnalysis Hydrogenbond calculation taking too much time #4898

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

MDAnalysis Hydrogenbond calculation taking too much time #4898

M15160058 Jan 23, 2025

Load the Universe

Select specific parts (e.g., two chains)

Use the selection for analysis

Define chain selections using segid

Define all chain pair combinations (for inter-chain only)

Pre-calculate donor and acceptor selections

File 1: H-Bond details for the first frame

Process the first frame only

Save the first frame data to a CSV file

File 2: Time vs Number of H-Bonds across trajectory

Iterate over all frames in the trajectory

Save the time vs H-Bond data to a CSV file

Plot the results

Replies: 0 comments

M15160058
Jan 23, 2025