These past two weeks I have focused on two different tasks: testing the trajectory streaming scripts with LAMMPS and NAMD, and integrating the scripts into WESTPA.

Testing with LAMMPS and NAMD

The previous post focused on using GROMACS to run the MD simulation. In the previous script, GROMACS was instructed to use port 0, which allowed the operating system to assign an available port. In order to use the same approach with LAMMPS and NAMD, we need to check that these MD engines support the same “port 0 trick”. When we instruct NAMD to use port 0, it throws an error: 'IMDport' was set to 0 but it should be positive. Clearly, we have to be more careful with NAMD and provide a specific port number. In the case of LAMMPS, it throws a similar error: ERROR: Illegal fix imd parameter: port < 1024. This means that for both LAMMPS and NAMD we will have to pass the port number explicitly. Since we are already starting the simulation from within a Python script, this can be performed quite easily by using a simple python function:

def get_available_port():
    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    sock.bind(('', 0))  # Bind to an ephemeral port using the port 0 trick
    port = sock.getsockname()[1]
    sock.close()
    return port

This port number can be passed to the MD engine when the subprocess.Popen command is called. This solution works for both LAMMPS and NAMD. There is a slight issue here as the port number is not immediately bound to the simulation, leading to a potential point of failure if another process binds to the port before the simulation starts. This is most likely going to be a problem when many (thousands) of simulations are being started simultaneously by WESTPA. In my testing, I have not encountered this issue yet, but I have not tested with a very large number of simulations. In LAMMPS, there is quite a nice potential solution. LAMMPS allows the use of python code to define variables within the input script. Importantly, the code is only executed when the variable is used, which means that the simulation should bind the port immediately after being assigned it. This can be done by doing the following within the LAMMPS input script.

First the variable needs to be defined:

variable portfunc python get_port
python get_port input 0 return v_portfunc format i here """
def get_port():
    import socket
    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    sock.bind(('', 0))  # Bind to an ephemeral port
    port = sock.getsockname()[1]
    sock.close()
    return port
"""

Then the variable can be used in the fix imd command:

fix 2 all imd ${portfunc} version 3 unwrap off nowait off

The port number will be assigned when the fix imd command is executed, ensuring that the port is bound to the simulation. This is quite a nice solution, but requires building LAMMPS with the PYTHON package enabled.

Integrating scripts into WESTPA

After developing the trajectory streaming scripts, the next step was to integrate them further into WESTPA. The goal was to reduce the code the user needs to supply. These changes are being tracked in the following two pull requests: WESTPA pull request #501 and westpa-traj-streaming pull request #2.

Within WESTPA tools a new file was created traj_streaming.py which contains the TrajectoryStreaming class. This class is responsible for starting the MD simulation and the trajectory streaming client. The class is initialised as follows:

def __init__(self, md_engine: str, topology: str, simulation_string: str):
        """
        Initialize the trajectory streamer.

        Args:
            md_engine: Name of the molecular dynamics engine (e.g., 'gromacs')
            topology: Path to the topology file for the simulation. Any format supported by MDAnalysis.
            simulation_string: String representing the simulation command to be executed.
        """
        self.md_engine = md_engine.lower()
        if self.md_engine not in ACCEPTABLE_MD_ENGINES:
            raise ValueError(f"Unsupported MD engine: {self.md_engine}. " f"Supported engines: {', '.join(ACCEPTABLE_MD_ENGINES)}")
        self.topology = topology
        self.set_simulation_function(simulation_string)

The user needs to provide the name of the MD engine, the path to the topology file, and a string representing the simulation command. The set_simulation_function method is used to check the command to ensure that it is using the required imd flags.

def set_simulation_function(self, sim_func: str):
        """
        Set the simulation function.

        Args:
            sim_func: String representing the simulation function to be used.
        """
        self.simulation_function = sim_func.split()
        # Check and fix IMD flags
        md_engine_flags = IMD_FLAGS[self.md_engine]
        for flag in md_engine_flags:
            if md_engine_flags[flag] is None:
                if flag not in self.simulation_function:
                    self.simulation_function.append(flag)
                    print(f"Adding {flag} to simulation function")
            else:
                if flag not in self.simulation_function:
                    self.simulation_function.append(f"{flag}")
                    self.simulation_function.append(md_engine_flags[flag])
                    print(f"Adding {flag} with value {md_engine_flags[flag]} to simulation function")
                elif self.simulation_function[self.simulation_function.index(flag) + 1] != md_engine_flags[flag]:
                    self.simulation_function[self.simulation_function.index(flag) + 1] = md_engine_flags[flag]
                    print(f"Updating {flag} with value {md_engine_flags[flag]} in simulation function")

The IMD_FLAGS dictionary is defined at the top of the file and contains the required flags for each engine. I have currently only implemented it for GROMACS, but it can be easily extended to include LAMMPS and NAMD. The user then calls the function start_sim_and_get_universe to start the simulation and obtain the MDAnalysis universe object. This function operates the same as the previous script, but it is now encapsulated within the TrajectoryStreaming class. The user can then use the universe object to stream the trajectory data.

A typical usage of the TrajectoryStreaming class would look like this:

from westpa.tools.trajectory_streaming import TrajectoryStreamer
import numpy as np
from MDAnalysis.analysis.distances import self_distance_array


numthreads = 1

# Create a TrajectoryStreamer object
stream = TrajectoryStreamer(
    "gromacs",
    "bstate.gro",
    f"gmx_imd mdrun -s seg.tpr -o seg.trr -c seg.gro -e seg.edr -cpo seg.cpt -g seg.log -nt {numthreads} -imdwait -imdport 0"
)
# Start the simulation and get the MDAnalysis universe object
u = stream.start_sim_and_get_universe()

ag = u.select_atoms("not water")

dist = []
for ts in u.trajectory:
    dist.append(self_distance_array(ag)[0])
np.savetxt("dist.dat", np.array(dist))

Next steps

The next steps will be to rigorously test the integration with WESTPA and ensure that it speeds up the workflow. I will need to test with all three MD engines (GROMACS, LAMMPS and NAMD).