In my line of work, deadlock is always close at hand. Running multithreaded programs through prototype compilers, runtime systems, or simulators (and sometimes all three!), there is always the chance that a bug will trigger a deadlock in some layer of the system. There’s nothing like setting off a bunch of experiments, going to sleep, and waking up the next morning to find that the first experiment (or, inevitably, the experiment that ran right after I stopped checking on things) deadlocked and prevented everything else from running. After being burned by this particular variant of Murphy’s Law countless times, I learned to always run benchmarks with a hard timeout. Knowing that my experiments won’t hang forever is honestly right up there with fancy foam mattresses in terms of helping me sleep at night.
I looked around and couldn’t find any standard Unix commands or shell trickery that did what I wanted, so I wrote a little Python script called
timeout.py (available via github) some years ago. Unbeknownst to me, in late 2008 GNU coreutils added a
timeout program (documentation here) that is now available on many Unix systems. The basic idea behind
timeout.py and GNU’s
timeout is the same: run a command with a specified timeout, and 1) exit when the command exits normally or 2) send a signal to the command when the timeout expires.
timeout.py works by setting a
SIGALRM signal to be delivered at the desired timeout period, and then forking a new child process to run the specified command. The
SIGALRM handler then sends a specified signal (
SIGKILL by default) to the child process.
One useful feature that I added to
timeout.py (that GNU
timeout doesn’t have) is the ability to place the child process in its own process group; when the timeout expires a signal is sent to the entire process group. This is helpful for cleaning up multiprocess programs. PARSEC‘s script for launching benchmarks (
parsecmgmt), in particular, forks a bunch of processes to run a benchmark, and killing the top-level
parsecmgmt process will not kill the actual benchmark process which lives a few layers deeper.
timeout.py is written in Python, so it’s easy to modify. Its functionality is also exported via a Python module called
TimerTask that wraps the functionality of Python’s standard
subprocess.Popen() call. One caveat with using the
TimerTask module is that, since it uses
SIGALRM, it may interfere with other uses of
SIGALRM in client code. In particular, you can’t have multiple simultaneous
TimerTasks, each with their own timeouts.
TimerTask tries to be safe and raises an exception if there’s an existing
SIGALRM handler, but a better approach would be to fork a new child process to run each
TimerTask. Maybe I’ll add this someday.
timeout.py is Unix-only due to the use of, e.g., the
SIGALRM signal and the
preexec_fn argument to
subprocess.Popen(). The source code is available under a GPLv3 license at https://gist.github.com/1526975.