Wednesday, September 3, 2014

Writing a PhD dissertation in Latex

Writing a PhD dissertation in Latex

Having just finished writing my PhD dissertation using Latex, I'd like to share my experiences -- what was easy, what was hard, and some problems to watch out for.

Writing a PhD dissertation is a daunting task. Besides the actual content, we also need to worry about formatting, references, bibliographies, tables of contents, page numbering, figure layout, dedication pages, ... Using a good document preparation program can help you with these issues, leaving you free to focus on the science. A bad program can suck up your time on annoying, trivial details.

If you've never heard of Latex before, the short version is that it's a program used to create documents, much like Microsoft Word. I decided to use Latex to write my PhD dissertation because I didn't want to waste time on side issues, and I thought Latex would be able to handle those. (Also, I didn't know how to do those in Word, which would otherwise have been the default choice).

Why Latex?

In Latex, your document is plain text. This is advantageous because of the wealth of tools that work with plain text, such as git. I knew before starting that I wanted to manage my dissertation using git (due to its project management features such as tags to help me keep track of which versions I shared with my committee, ability to compare revisions with line-based diffs, and integration with cloud-hosting services such as bitbucket and github), and it's far more pleasant to work with text files than binary ones in git. Writing my dissertation in Latex is a natural fit for git.

Latex distributions are free, lots of people use them and contribute and test code and answers to the community. There's also lots of features both built-in and available through add-on packages to help build documents efficiently.

Latex's pleasant surprises

  • the pdflatex program produces beautiful PDFs from your plain text Latex source files
  • references are easy to manage and share. The Bibtex format is pretty standard and it's easy to get references in this format from most journals
  • citing references from within the text is easy; Latex makes sure the citations are consistently numbered, formatted, and named
  • Latex can automatically generate the bibliography
  • Latex can automatically generate a table of contents
  • Latex can automatically generate internal hyperlinks within a PDF
  • beautiful rendering of mathematical equations
  • can refer to figures, tables, and other sections of text using labels and anchors

There's a learning curve

If you're coming from Word, then learning to use Latex is going to take some time -- it does things differently, and you'll have to figure out a new approach to writing documents. I expected that I would spend a decent chunk of time learning how to use Latex, debugging and fixing my mistakes, and solving problems. This did indeed turn out to be the case.

You can also expect to face your fair share of standard problems (that you'll probably run into no matter what program you use). This includes finding a version of the program that runs on your platform, figuring out where to find add-ons and how to install them, and building a conceptual understanding of how the program works -- so that you understand why errors occur and how to fix them.

Problems and issues I encountered

More of my time than expected was spent managing Latex (instead of working on the content). While there's undoubtedly solutions for each of these problems, it's tough for a beginner. Here are some of the problems that I encountered:

  • the error messages produced by pdflatex were pretty cryptic, which made it difficult to understand and google for the problem
  • default settings and parameterizations are occasionally surprising, and often difficult to discover
  • the built-in "report" document class did not exactly meet my university's formatting requirements -- that's okay, however, it took a lot of work to figure out how to fix the formatting
  • there are multiple contexts, and some characters mean different things in different contexts
  • conflicts between packages. Some Latex packages don't play nice with each other. Sometimes this means that you can't use certain packages together, other times it means that you'll silently get weird results. I believe there are also cases where packages have to be imported in a certain order to get them to work correctly
  • entries in the bibliography had different capitalization in the output PDF than what I had put into the bibliography file
  • it's hard to see where things start and end. Some commands aren't, delimited but are implicitly ended by later commands. Others have effects in some scope, which again is implicitly defined
  • margins were routinely violated. I had assumed that the default behavior would be to respect margins, but this was not the case
  • special characters. If you're not familiar with them, you may accidentally write something totally different from what you meant, without realizing it. Syntax-highlighting text editors are a big help here. Also, figuring out how to write a non-special version of the special characters
  • I had to spend time manually checking the PDF to ensure that everything turned out correctly. Sometimes, there were surprising problems in the output that I wouldn't have found except by actually looking at the PDF (that is, there wasn't an error or warning generated by pdflatex)
  • I had a very hard time finding complete, correct answers to problems. Many answers did not attempt to solve the problem, but rather argued with the premise of the OP. Many worked sometimes, but not in all contexts. Many others had unstated caveats, which later blew up in my face
  • I was unable to find complete, precise documentation for packages, macros, document classes, and commands -- what they are intended to do, and how they are intended to be used. For instance, I needed to know what the "report" document class entailed and what its options meant, so that I could compare to my university's formatting requirements. I couldn't find this information anywhere
  • it was difficult to choose between multiple competing packages solving the same problem -- it was hard to find good comparisons which included caveats, pros and cons, etc.
  • I wasn't able to find resources to help me build a conceptual understanding of how Latex works. This meant that I wasn't able to understand why errors occurred

Conclusion: was using Latex the right decision for me?

Yes. I wanted to write my dissertation in plain text, manage it using git, and have my table of contents, references, and bibliography automatically generated. Latex had no trouble handling these. It was usually fun to use Latex, and the output from pdflatex was beautiful.

On the other hand, I spent much more time than expected troubleshooting, debugging, digging through old forums looking for answers, deciphering cryptic error messages, and wondering why things didn't work. Latex can be a very frustrating and complicated tool, and it's difficult to find help when you need it. I ran into numerous problems that I just couldn't solve and couldn't find solutions to using the internet. These left a bad taste in my mouth. I feel like a lot of Latex goes against standard principles of building robust software, such as encapsulation, abstraction, composition, and invariants.

Nevertheless, it was more than worthwhile to learn to use Latex for my dissertation. I think these issues are traps for beginners, but don't prevent advanced users from getting work done. I expect the downsides of using Latex shrink as one gains more experience using it.

Disclaimer: please keep in mind that I am only reporting my experience and my thoughts, and that it is certainly possible that my conclusions are flawed. I intended for this to be a fair portrayal of Latex.