RCDC simulator release notes

This post describes how to install, use and modify the simulator for our ASPLOS paper on RCDC.  Source code is available here.

Library Dependences

The simulator requires v1.40 of the Boost C++ libraries.

Installation

You’ll need an installation of Intel’s Pin binary instrumentation tool.  Extract the RCDCsim tarball into the PIN_INSTALL_DIR/source/tools/ directory, where PIN_INSTALL_DIR is the top-level directory where Pin is installed.

From the PIN_INSTALL_DIR/source/tools/rcdcsim/ directory, run make to build the simulator.  You can then run make run-unittests to run the included unit tests, and make test-time to run the ls program under the simulator as a simple integration test that everything works.  The Makefile also has a target for running parsec benchmarks, though you will need to edit some paths before this will work.

High-level Simulator Structure

The simulator is structured into multiple processes that communicate through FIFOs.  The frontend (frontend.cpp, PinCallbacks.cpp) is a pintool that instruments every thread in the target program to generate events (e.g. memory accesses).  These events are funneled, via a lock-free queue, into an I/O thread that pushes these events into the frontend FIFO.  The use of a single queue for all program threads maintains causality order (e.g. between a lock release and acquire), which makes the backend simulator easier to write in some cases.

To have valid comparisons across simulation configurations, the configurations should see the same event stream.  For example, to compare performance of RCDC with nondeterministic execution, the two configurations should be fed the same event stream.  Thus, events are broadcast to multiple other FIFOs via the pipefork program (pipefork.cpp), which simply copies events from the frontend FIFO to a number of simulator FIFOs.  Each simulation then runs in its own process, reading events from its own FIFO.

The important simulator files are:

  • SimulatorThread.cpp: contains the simulator’s main(); receives incoming events and sends them to the MultiCacheSimulator
  • MultiCacheSimulator.hpp: creates the cores and manages all system-wide state, like shared caches
  • SMPCache.hpp: manages per-core state, such as that core’s private cache hierarchy
  • HierarchicalCache.hpp: models a single cache; these can be chained together into arbitrarily deep cache hierarchies

Design Decisions

I decided to adopt a multi-process design in this pintool because I was tired of 1) tracking down concurrency bugs in my multi-threaded Pin instrumentation (the irony of which, as a concurrency researcher, is not lost on me), and 2) not being able to use gdb on my pintool (despite the Pin manual’s attestations that this is in fact possible).  These two features combined are particularly unhelpful.  Alternatively, keeping the simulator proper in its own single-threaded process means that the “untrusted” multi-threaded code is kept to a minimum, and that the simulator is very gdb-friendly.  The simulator can even be written in a completely different language: I mean, isn’t it about time we started writing architecture simulators in Haskell?

However, there is a downside to the multi-process design in that there is not really much feedback from the simulator to the event generator.  There is back-pressure via the FIFOs, but RCDC’s execution model in particular requires finer-grained control.  RCDC sometimes requires events for a specific simulated processor P (say, if all other processors are waiting for P to hit its quantum boundary) to make forward progress.  In a situation like this, the simulator has no way to say to the event generator: “stall everyone other than P“.  Instead, the simulator must constantly accept events, buffering them locally until enough events for P arrive to let P finish its quantum, which lets all processors make progress again – which allows the buffered events to be processed.  This buffering is currently done with per-processor buffers, which has the side effect of destroying the nice causality-ordering of incoming events, so causality has to be reconstructed in the simulator.

This was generally not a problem in the experiments we ran, but it did occasionally lead to out-of-memory crashes.  Execution models that do not pause processors would not, however, suffer from this issue.

License and Acknowledgments

All the simulator code is licensed under GPLv3 (except for some stuff I stole from SESC which is GPLv2, and the lockfree queue which is under the Boost Public License).  Many thanks to Tim Blechmann for the lockfree queue, and the SESC developers for the code of an earlier version of this simulator.

Version History

version 0 (27 Jan 2011): initial release
version 1 (7 Nov 2011): added missing header file

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s