Luggage, Baggage, and (Legal|Nic)ities

From Nigel:

For Colleen, my magnificent sister, whose humor, intelligence, and adventurous nature are exceeded only by her grounded humanity. You are so appreciated.


Thanks first and foremost to Megan Pearce, whose substance is of a quality and strength one rarely witnesses. Thank you very much for your understanding and patience and for the joy of your company.

This book would be a far lesser work were it not for the accuracy and insight of some very talented reviewers. Thanks very much to: Neil Rashbrook, Mozilla XPFE maven extraordinaire; Eve Maler, Sun Microsystem's XML Standards Architect and W3C member; and the incredibly busy yet helpful Dr. Alisdair Daws. Thanks also to Brendan Eich, Scott Collins, and Ben Goodger for support and specialist feedback.

From Ben:

Notes from ... well, I don't really know what to call myself in the context of the HTML version of this book. I guess the closest term would be porter, in the sense programmers tend to use the word: one who takes code written for one platform and makes it run on another platform. In this case, the "code" is the content of the book, and the original "platform" was the PDF and RTF formats in which the book was originally offered for download.

The broad process went something like this:

  1. Download RTF and PDF copies of the book.
  2. Open all the RTF pages in Microsoft Word and export as HTML.
  3. Strip the HTML from the .html files produced, leaving structured but un-marked-up content.
  4. Using jEdit on Windows, begin analyzing the structure of the text and creating regular expressions to bring out said structure. sed could have done most if not all of the actual replacing, but it still would have been time-consuming to determine the correct transformations.
  5. At this point, the RTF and HTML files were transferred to a laptop running Red Hat Linux, which was brought along on a week-long summer vacation to New England.

  6. Continue "programatically" adding broad swathes of markup through regular expressions. By the end, there were probably fifty regexps responsible for the skeleton of the HTML.
  7. A fair bit of a hand-tweaking of the RE-generated HTML and addition of context-sensitive HTML (marking code blocks with the appropriate language attributes, adding links, marking variables, and conversion of related-item-chains to unordered bag lists. Search the source for <ul class="bag"> for an example.)
  8. Using pdfimages.exe, extract the images from the PDF documents. Convert from .PPM to PNG and, for select images, resize and/or convert to JPG. Optimize PNG files with pngout.exe. Use screenshot tool to capture the vector diagrams (not stored as extractable images).
  9. Develop proper, logical, well-structured cascading stylesheets.

All in all, the process took perhaps two hundred hours to complete. I actually ran through most of the steps once before I realized that a few of my early structural decisions were going to make my life difficult, which prompted me to start again from scratch. If I did it again, I could probably get it done in sixty, with a bit of luck and foresight.

Then again, the last few steps (tweaking the XHTML, CSS) aren't done yet and may not ever be "done".

Get Firefox! Not Quite Valid XHTML 1.0 Transitional Valid CSS Crafted with jEdit ( )