Week 8-9: Creating Test Systems for Benchmarking
As I mentioned at the end of the last post, now that the trajectory streaming setup is relatively complete, the next step is to create a set of test systems that can be used to benchmark the performance of the trajectory streaming setup. The goal is to create a set of systems that are representative of typical use cases, but also vary in size and complexity to provide a range of performance metrics.
Chignolin Folding using the AIMMD Progress Coordinate
The first system I used to test the trajectory streaming setup is the chignolin folding system using the AI for Molecular Mechanism Discovery (AIMMD) progress coordinate: https://arxiv.org/abs/2307.11240(https://arxiv.org/abs/2307.11240). This system was taken from the WESTPA test systems repository. This is a good test case for the trajectory streaming setup as for the progress coordinate to be calculated, a large number of pairwise distances have to be calculated and then these distances are used as input to a neural network to calculate the progress coordinate. That such a large number of pairwise distances have to be caculated means that there is potential for speedup from analysing the trajectory on-the-fly.
The analysis as done in the repository is done using the mdtraj
library, rather than MDAnalysis
. Therefore, the first task was to adapt the existing analysis code to use MDAnalysis
. Once this was done, I then had to convert the analysis code to use the TrajectoryStreaming
class that I had developed in the previous weeks. The key code is as follows:
# Create TrajectoryStreamer object
stream = TrajectoryStreamer("gromacs", "template.gro", "gmx_imd mdrun -s seg.tpr -o seg.trr -c seg.gro -e seg.edr -cpo seg.cpt -g seg.log -nt 2 -imdwait -imdport 0")
# Create universe
u = stream.start_sim_and_get_universe(stream_timeout=50)
ag1, ag2 = atom_select(u)
# Compute descriptors
# Need to convert to float32 for compatibility with the model
descriptors=[]
for ts in u.trajectory:
descriptors.append(numpy.float32(featurize(ag1, ag2, box=ts.dimensions, dmin_file='dmin.npy', dmax_file='dmax.npy')))
The featurize function calculates the distances between all pairs of heavy atoms that are at least 3 residues away from each other. A list of all the descriptors for each frame is updated as the trajectory is streamed. The descriptors are then fed into the neural network to calculate the progress coordinate.
Stream Timeout Issues
The first attempt using this script resulted in the analysis ending before the simulation had completed. This meant that the length of the progress coordinate sent to WESTPA was incorrect and WESTPA would end the job. In addition, the simulations would be left in the background waiting for an IMD signal to continue the simulation. This is due to a timeout being coded into the IMDClient, which stops the client if it does not receive a frame after a certain amount of time. This timeout is used so that the client is not left waiting indefinitely if the simulation does not correctly send the end signal. With this simulation, data was only being sent to the client every 6250 steps. Therefore, more than 5 seconds would pass between frames being sent to the client and the client would end prematurely. In the IMDClient there is an (untested) option to modify the timeout behaviour by changing the timeout
kwarg. In order to allow for longer times between frames being sent to the client, the timeout
kwarg was added to the TrajectoryStreamer
class. This is shown above, where the stream_timeout
kwarg, is passed to the client.
Currently there is no error message or information given in the TrajectoryStreamer
class that informs the user that the client is timing out. In addition the simulations are left running in the background. I need to implement a proper protocol when these errors arise. Further issues with the timeout
kwarg are discussed in IMDClient issue 111.
Benchmarking
Now that the system has been setup with trajectory streaming we can benchmark to see if analysing on-the-fly actually leads to a speed up. In order to benchmark I run the system for 25 iterations with 16 segments per iteration and 2 cores per segment. The wall time for each iteration is measured and compared to the wall time for the same analysis done without trajectory streaming. Without trajectory streaming the mean wall clock time is: 115 \(\pm\) 5 s and with trajectory streaming the mean wall clock time is: 118 \(\pm\) 5 s. So within error the two options give essentially the same performance. This is dissapointing but potentially expected for this setup. Since we are only analysing 5 points for each segment, the amount of analysis to be performed is relatively minimal and therefore there is limited speed up to be obtained from accelerating the analysis. We would likely see much better performance improvements with a more complex analysis that involves more data. In the next week I am going to explore this further.