Week 4-5: Implementing Trajectory Streaming via a Python Script
What I have been working on in the last few days follows on from the discussion in my week 1-2 blog post. In that post, I discussed the issues with port assignment and the difficulties of trying to use WESTPA to assign ports to the segments. A proposed solution was to use a python script to start the simulation, with the simulation choosing a port and then communicating that port back to the python script. The python script could then start the MDAnalysis trajectory streaming client. This would ensure that the port is available and that both the simulation and the client are using the same port. In this blog post, I will discuss the implementation of this solution.
The code discussed in this post is being tracked in the following pull request.
Starting the simulation using a python script and assigning a port
The first job was to use the script to start simulations and find a way to communicate the port back to the python script. It had been suggested previously to use a wrapper like GromacsWrapper to start the simulation, but I decided instead to use the subprocess
module. This makes the script more adaptable to other MD engines. The below code shows how subprocess.Popen
is used to start the GROMACS simulation.
numthreads = 1
proc = subprocess.Popen(
[
"gmx_imd",
"mdrun",
"-deffnm",
"seg",
"-nt",
str(numthreads),
"-imdwait",
"-imdport",
"0",
],
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
text=True,
bufsize=1,
)
An important question is how to tell the simulation to find an available port. For this we can use the “port 0 trick”. When a program attempts to bind to port 0, the operating system should automatically assign an available port. When this is used with GROMACS, the following output is produced:
IMD: Setting port for connection requests to 0.
IMD: Setting up incoming socket.
IMD: Listening for IMD connection on port 60159.
We can see that GROMACS is attempting to bind to port 0, and the operating system has assigned port 60159. By capturing the output of the simulation, we can extract the port number from this output. The following code snippet shows how to do this:
assigned_port = None
retcode = proc.poll()
if retcode is not None and retcode != 0:
raise RuntimeError(
f"Simulation returned with error code {retcode}. Check the simulation log for details."
)
start_time = time.time()
for line in proc.stdout:
print(line, end="")
m = re.search(r"IMD connection on port (\d+)", line)
if m:
assigned_port = int(m.group(1))
break
if time.time() - start_time > 60: # 2 minutes timeout
raise RuntimeError(
"IMD port assignment was not printed within 1 minute. Make sure an IMD simulation is being run."
)
else:
raise RuntimeError(
"GROMACS output did not contain expected 'IMD connection on port'. Check the simulation log for details."
)
print(f"Assigned IMD port: {assigned_port}")
Now that there is a port number that is unique to the segment and is consistent between the simulation and the analysis, the analysis can be started. This follows a similar pattern to the script shown in the week 1-2 blog post.
Testing using WESTPA
We can check whether this is working as expected by starting a weighted ensemble simulation and checking the output from each segment. No significant modifications to the standard wESTPA setup are needed, aside from modifying the runseg.sh
script to run the new python script instead of launching the GROMACS simulation directly. In the first iteration of the simulation, 5 segments are run. Snippets of the output from two of the segments are shown below:
Segment 0
:
IMD: Setting port for connection requests to 0.
IMD: Setting up incoming socket.
IMD: Listening for IMD connection on port 60159.
Assigned IMD port: 60159
Port 60159 on localhost is now open!
Segment 1
:
IMD: Setting port for connection requests to 0.
IMD: Setting up incoming socket.
IMD: Listening for IMD connection on port 39119.
Assigned IMD port: 39119
Port 39119 on localhost is now open!
We can see that the two segments have been assigned different ports.
Improvements
This is a good solution but there are some improvements that can be made. It was suggested by Jeremy Leung that a lot of this code could be embedded into WESTPA as a static class. This would reduce the amount of code that needs to be exposed to the user and would make it easier to use. All the user would need to do is define the command to run the simulation and the analysis functions to be run.
I need to test the script with the other molecular dynamics engines which can use IMDv3, NAMD and LAMMPS. What needs to be tested is whether the “port 0 trick” works with these engines, and whether the simulation engines output the assigned port.
As I discussed last week, I started trying to implement trajectory streaming as an additional executable in WESTPA. The script here is essentially an alternative to this, and avoids issues with port assignment. I do wonder whether the “port 0 trick” could be used within python to obtain a list of available ports, but this still has issues with multinode setups and the port being taken by another process before the simulation starts. The ephemeral-port-reserve package has a method to reserve a port, but this is not compatible (as far as I can tell) with GROMACS IMD as it does not use SO_REUSEADDR
. Nevertheless, I will continue to explore this option.