Converting LaTeX to Word (Pandoc)

I was preparing a journal article for a medical journal, and the only formats they accepted were Word DOC and RTF.

Oh the horror of having to use Word to write a journal paper


I was horrified because I use LaTeX, where one can produce beautifully typeset documents containing millions of figures, tables, references, and subsections, all without having to worry about referencing each of these items manually in the document. I once tried to write a paper with lots of references and sections in Word, and gave up after a week – every time I made a single change by – for example, reordeing the references or adding a new section – all my inline references would get totally destroyed. So I was definitely not looking to repeat this experience.

So then what?

Once I decided I wasn’t going to touch Word with a 1000-foot pole, it was just a matter of finding some software that could convert from latex or pdf to Word. Easier said than done. While I did find a few options, I eventually settled on Pandoc. It was smooth sailing from there on. Well, not so much.

How to actually use Pandoc

Once I had everything installed, I figured it would be quite simple to convert from LaTeX to Word. In the end it wasn’t that difficult at all, it was just a matter of getting the syntax right. There were also some issues with references, but since I couldn’t see any error messages, it took some trial and error to figure things out. (See the bottom of this post for example code that actually works).

Pandoc can actually convert to/from quite a number of formats. This post focuses only on latex -> word. See the Pandoc page to find out the many other formats you can use!

Here are the ingredients you need:

  • Pandoc
  • Bibtex bibliography file (see below, no special characters!)
  • Journal citation style file (you can usually find what you need from CSL or Zotero)
  • Optional: Journal abbreviations file (this was a pain and half to find and unfortunately I didn’t note where I got it from, so you can find my version here)
  • Input file (latexfile.tex)
  • Miktex or whatever you normally use to compile your latex file
  • Decide if you want doc, docx, or rtf (I tried all three til I found the format I liked; there are subtle differences between the three formats)

Here are some lessons I learned:

  • Not all LaTeX markups work, so it’s a bit of trial and error. I still can’t get the figure or table numbers to show up (which sort of defeats the purpose of not using Word, but at least the citations are OK. And the figures/tables show the markup text, so maybe search/replace can be a temporary workaround until I figure out how to get it to work.)
  • Your bibtex file cannot contain any non-ascii (special) characters. I spent an hour or two going through my bibtex file removing all sorts of non-ascii characters
  • Seems obvious, but your LaTeX file should not contain any references to bibtexkeys not in the biblio file (or it crashes with no warnings)
  • Pandoc needs to be called from Windows Powershell (not CMD) and works with command line instructions
  • I still haven’t been able to get latex page break commands to work, and it seems this has to do with Pandoc generating HTML-style documents
  • Likewise, I will probably have to manually add page numbers to the Word doc when sending it to the co-authors for review

Other tricks & tips:

  • I also installed Vim for Windows so I can write more productively (without having to grab the mouse every 5 seconds) – yeah I know, I use Windows!
  • I then wrote a simple batch file so I can call the pandoc script from within Vi. Weeee 🙂
  • Note that this is the workflow I created for me – if you use other referencing software, then some of this might be easier for you. Also some people successfully use Word to write manuscripts – if you are one of those people, do let me know how you do it without pulling all your hair out (and without manually inserting citations and references)!

Example code for this workflow (minimum working example, if you prefer) available here as a zip file. Program files like Pandoc not included.

Enjoy, and let me know if these tips were helpful in your quest to write more journal papers!

…I can now finally continue with writing the paper. Unless I can find another source of productive procrastination, such as, say, starting a blog?…………

Update – see also: Converting LaTeX to Word – part 3 (Pandoc revisited) and Converting LaTeX to Word – part 2 (LaTeX2RTF)

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s