Hack your Word documents with VBA

Tom Bergan, one of my grad school buddies, had the bright idea of writing a little lint script for our LaTeX papers. The script would use grep to find various inconsistencies in our papers, e.g., times when we said “non-determinism” instead of “nondeterminism”, didn’t follow “e.g.” with a comma, and other small, domain-specific semantic errors. It worked great and kept our writing more consistent than we ever could have done by hand.

Since I had chosen to write my dissertation in Word and such a long document would be ripe for inconsistencies, I thought I should have a lint script that would help me. Of course, the Word file format is pretty opaque and far beyond grep’s abilities to process. Fortunately, I could encode my writing-checker as a Visual Basic macro instead thanks to Word’s extensive built-in scripting capabilities. While Visual Basic is a pretty terrible programming language, it is still nevertheless a programming language and you can do quite a bit with it. Having API access to the Word document also provides a lot of nice structure, e.g., frames and fields are exposed as collections that are easily iterated over.

I ended up writing a lint macro that would check several invariants: ensuring that Text Boxes didn’t appear anywhere (because Text Boxes aren’t the captioning mechanism you’re looking for), checking that frames were styled consistently, checking that I had used hyperlinks for all my cross-references, checking that text that ought to be a cross-reference (like “Figure 2”) actually was, checking that names of systems and benchmarks were styled appropriately, etc. Using the Selection.GoTo method would search for and then select any offending text, making repairs simple. Figuring out how to use the Word API is a bit of a challenge. Individual functions are well-documented though it’s often hard to discover which function of the many functions to use for a given purpose. There also isn’t much documentation on the discrepancies of Word 2011 for Mac versus the Windows versions, and most of the code examples and documentation you’ll find are for Word on Windows. But the VBA API is quite similar across the two platforms – a real accomplishment by the Mac Office folks!

I’ve posted the code of my lint script to a GitHub gist for easy browsing. It’s under a BSD license so feel free to do with it as you please!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s