Legal Information
For compliance with the Open Content license that applies to the book:
Copyright (c) 2004 by Pearson Education, Inc. This material may be distributed only subject to the terms and conditions set forth in the Open Publication License, v1.0 or later (the latest version is presently available at http://www.opencontent.org/openpub/).
Written by Nigel McFarlane.
The physical edition of the book may be purchased from Amazon.com, or downloaded in RTF or PDF format from InformIT, or his publisher, PH PTR.
The extra files for the NoteTaker example application may be downloaded from PH PHR.
I am Ben Karel.
I have made the following changes to the original document:
- Converted the book from RTF/PDF format to HTML on or about 13 June 2004.
- Updated the content of the book to reflect the Errata listings on 13 October 2004.
- Moved the Dedication from the front/Introduction section to this page; moved the Acknowledgements from the front/Introduction section to this page. Omitted the index, since it seemed irrelevant for a searchable online book. These changes were made on or about 09 October 2004
From Ben:
Notes from ... well, I don't really know what to call myself in the context of the HTML version of this book. I guess the closest term would be porter, in the sense programmers tend to use the word: one who takes code written for one platform and makes it run on another platform. In this case, the "code" is the content of the book, and the original "platform" was the PDF and RTF formats in which the book was originally offered for download.
The broad process went something like this:
- Download RTF and PDF copies of the book.
- Open all the RTF pages in Microsoft Word and export as HTML.
- Strip the HTML from the .html files produced, leaving structured but un-marked-up content.
- Using jEdit on Windows, begin analyzing the structure of the text and creating regular expressions to bring out said structure. sed could have done most if not all of the actual replacing, but it still would have been time-consuming to determine the correct transformations.
- Continue "programatically" adding broad swathes of markup through regular expressions. By the end, there were probably fifty regexps responsible for the skeleton of the HTML.
- A fair bit of a hand-tweaking of the RE-generated HTML and addition of context-sensitive HTML (marking code blocks with the appropriate language attributes, adding links, marking variables, and conversion of related-item-chains to unordered bag lists. Search the source for <ul class="bag"> for an example.)
- Using pdfimages.exe, extract the images from the PDF documents. Convert from .PPM to PNG and, for select images, resize and/or convert to JPG. Optimize PNG files with pngout.exe. Use screenshot tool to capture the vector diagrams (not stored as extractable images).
- Develop proper, logical, well-structured cascading stylesheets.
At this point, the RTF and HTML files were transferred to a laptop running Red Hat Linux, which was brought along on a week-long summer vacation to New England.
All in all, the process took perhaps two hundred hours to complete. I actually ran through most of the steps once before I realized that a few of my early structural decisions were going to make my life difficult, which prompted me to start again from scratch. If I did it again, I could probably get it done in sixty, with a bit of luck and foresight.
Then again, the last few steps (tweaking the XHTML, CSS) aren't done yet and may not ever be "done".