Introduction
Documentation. The word illicits a mix of fear and depression in even the most hardened programmer. For many it's a hard slog through endless boredom which occurs throughout, or at the end of, the development process. Documentation is never the easiest task. Good documentation takes time, patience, lots of questions about the subject matter (no matter how familiar you think you are with the subject matter, you can be assured you have some misunderstandings), and a degree of ability in condensing knowledge to a form people can instantly connect with.
But even when you get it done there's the question of how to distribute it! A popular choice is HTML - it's portable since everyone has a browser, and as web developers we're all familiar with the syntax. Another common choice is plain text since "someone else" can always transfer it to another format down the line. Some people even believe its entertaining to rely solely on inline source code comments relying on the skills of the user to decipher their personalised coding style, thought process, and intent.
This article series proposes using Docbook XML as the ultimate source format for all documentation. The difference between most formats and Docbook, is that Docbook can be used to generate numerous final formats. That flexibility and the quality of it's output go a long way in explaining Docbook's popularity among documentation authors. If you doubt it's capabilities, bear in mind there are publishers who have adopted Docbook!
The series was written to introduce programmers to a PHP oriented publishing process which uses Docbook XML as the basis for generating professional looking HTML and PDF output. I say PHP oriented, because the Unix "toolchain" commonly associated with Docbook XML has been replaced almost entirely by PHP. This is useful because with PHP's power at your disposal writing various filters to handle stuff like PHP source code highlighting is extremely simple.
Meet The Ingredients!
Docbook XML
The Docbook standard seems to have a reputation for being complex. This is an outright misconception with little foundation - the format is broad in that there are hundreds of possible tags, but shallow in that outside the tag count the rest is very straightforward. Docbook is a simple XML format where a tiny subset of the standard syntax is sufficient for 99% of your requirements. It's as simple as plain old HTML and there are several excellent editors so you're not stuck editing XML by hand (which should be avoided since it's...painful). However I do suggest setting up the shell book (see manual.xml further on) by hand and using an editor for individual chapters/appendices since it makes life easier than putting up with one giant file!
The downside is that you have to understand at least the basic tags and be aware that as with all XML, all elements do nest, with all Docbook XML files required to validate against the standard. XML Editors rely on tag knowledge and nesting consistency - they save time because you are not writing the tags and worrying about validation, appearance and other hand editing pains.
Beyond that it's a simple exercise in learning by doing which I leave to the reader. You'll see some samples later on to give you a feel for the basic syntax. You can read a crash course over on
http://opensource.bureau-cornavin.com/crash-course/en/introduction.html but
the full reference manual is a great deal bigger and more comprehensive. There also exist several useful examples in the PHP community including the Zend Framework Reference Manual and PHPUnit's Pocket Guide, to name a few, which you can checkout from their respective version control repositories.
The reason Docbook has gotten so popular for technical manuals, reference books, and even shorter articles and some magazines is that Docbook is agnostic to the final distributable format. From any Docbook source you can generate HTML, XHTML, RTF, HTML, CHM (Microsoft Help), PostScript, TeX and FO formatted creations among others. FO is itself an intermediary XML syntax which is easily generated into PDF form using a FO Processor (like Apache FOP which we'll meet later). With all these target formats available from a single source document it's easy to see that Docbook affords you flexibility. Why write RTF or HTML, when Docbook also gives you these, and more, with minimal fuss?
Did I mention that the only required tools for Docbook processing all happen to be free and open source?
PHP 5
Transforming Docbook XML into other formats requires a toolchain. The standard setup is to install the Docbook DTDs, the Docbook XSL stylesheets (which instruct the tranformation of the source Docbook XML into varied formats), and a collection of GNU tools on Linux like xsltproc. This is sometimes referred to as the "new way" since it's reasonably uncomplicated compared to what went before in the Linux world. Reasonably is in the eye of the beholder however

.
PHP5 comes with the DOM and XSL Extensions and these both come with all the functions necessary to drive a completely PHP driven toolchain for Docbook. All one needs is the toolchain programmed in PHP so it can be reused. Luckily, Phing (a project build system based on Apache Ant), includes pre-written tasks and filters which serves this purpose admirably.
I should note these were contributed for the benefit of all PHP inclined Docbook users by Bill Karwin, the former Zend Framework maestro. Thank you Bill!
The other facet of a PHP toolchain is that it enables PHP programmers to write custom Phing tasks and filters which can assist in customising output. We'll see my PHP source highlighting examples later.
Phing
Phing is one of those understated libraries eclipsed by the likes of Apache Ant or Ruby Rake in their respective languages. Phing's raison d'etre is to allow PHP programmers describe repetitive tasks in an XML syntax so they can be automatically re-run from a command-line whenever you want. For example, whenever I want to generate documentation from Docbook, I simply issue the command "phing docs" in a console!
Phing is written in PHP5, and that's the main reason I use it. You can create custom tasks and filters in PHP without fiddling with Java or embarking on an exploration of bash scripting (if you normally prefer makefiles). I use several custom tasks to fine tune the whole processing process which are basically PHP classes imported into Phing. Phing takes the pain out of applying your PHP knowledge to the task of generating and manipulating Docbook XML and any of its intermediary or final formats.
The other side of Phing is simple automation. Rather than bang on a console for two minutes, I can encode a Docbook run in Phing's XML syntax and let it automatically carry out all the tasks I defined, in the order I defined them. Two minutes over countless runs does add up and Phing removes that annoyance from my programming life. It excels at automating highly repetitive tasks.
Apache FOP
XSLT processing cannot, obviously, directly generate PDF. To get PDF documents it's necessary to have an intermediary format called XSL Formatting Objects (XSL-FO) created by an XSLT processor from the Docbook sources. This intermediary format can then itself be transformed in a second stage by a suitable XSL-FO processor into PDF, PostScript and a few other lesser used formats.
Apache FOP is chosen as the FO Processor for a few reasons. It's easy to setup and use. You can easily configure it to embed custom (or base 14) fonts into PDF. It's written in Java, accessible from the command line and runs on Windows XP/Vista with little effort. Oh, and it's free. Thou shalt not pay for a XSL-FO processor! No matter how easy the advertising promises to make it!
Installing All This Crap Without Going Insane
If you head is spinning from the deluge of information, take a break by engaging in some menial installation tasks.
PHP 5
I'm going to assume you can safely install PHP 5 without me holding your hand. Just make sure you also include PEAR! If you need to install PEAR separately consult the documentation at
http://pear.php.net.
Phing
Installing Phing is done from the command line using PEAR. Visit
Phing's place on the web where the User Manual exists if you intent attempting a non-PEAR install. Here's the usual steps needed:
pear channel-discover pear.phing.info
pear install phing/phing
You can also install one Phing task I release to my own PEAR channel to handle PHP source highlighting in HTML output.
pear channel-discover pear.phpspec.org
pear install phpspec/PhpDocbookHighlighterTask
I actually use two custom Tasks. The first highlights PHP code in HTML/XHTML documentation generated from Docbook and is found on the PHPSpec PEAR channel. The second is in the ZFBlog subversion repository at
http://svn.astrumfutura.org/zfblog/branches/phing/PhpFoHighlighterTask.php and should be copied to the PEAR/phing/tasks/ext/ directory of your system. This task deals with highlighting PHP source code in XSL-FO output so that in PDF content it is properly highlighted (for reference this code, like the HTML version, is licensed under a New BSD License unless otherwise stated).
Here's a copy to examine - it uses a variation of the PHP Highlighting script for HTML rewritten to apply to XSL-FO using PHP DOM:
<?phprequire_once 'phing/Task.php';
class PhpFoHighlighterTask extends Task
{ private
$_file =
null;
public
function setFile
($file) { $this->_file =
$file;
} public
function init
() {} public
function main
() { $this->_highlightFile
($this->_file
);
$this->
log('PHP in XSL-FO highlighted');
} private
function _highlightFile
($file) { $dom =
new DOMDocument
();
$dom->
load($file);
$xpath =
new DOMXPath
($dom);
$elements =
$xpath->
query("//fo:block[@phing='phpfohighlightertask']");
foreach ($elements as $block) { self::_highlightBlock
($block,
$dom);
$block->
removeAttribute('phing');
} $dom->
save($file);
} private
static function _highlightBlock
($block,
$fo) { $toHighlight =
str_replace( array('>',
'<',
'&',
'"'),
array('>',
'<',
'&',
'"'),
$block->
nodeValue );
// This basically prevents highlighting of non // HTML, XML and PHP source code. Note: All PHP to // be highlighted this way must have <?php at the top if (substr($toHighlight,
0,
5) !==
'<?php' &&
substr($toHighlight,
0,
9) !==
'<!DOCTYPE' && !
preg_match("/^<[^>]*>/",
$toHighlight)) { return;
} // Why manually highlight when it's built into PHP! // edit php.ini or add config to change colours $code =
highlight_string($toHighlight,
true);
$code =
str_replace( array('<code>',
'</code>',
' ',
'<br />',
"\r"),
array('',
'',
' ',
"\n",
"\n"),
$code );
$code =
preg_replace("!\n\n\n+!",
"\n\n",
$code);
$code =
trim($code);
$dom =
new DomDocument;
$dom->
loadXML($code);
$xpath =
new DomXPath
($dom);
$parentSpan =
$xpath->
query('/span')->
item(0);
$style =
$parentSpan->
getAttributeNode('style')->
value;
$colour =
substr($style,
7,
7);
$content =
$parentSpan->
nodeValue;
$inlineParent =
$fo->
createElement('fo:inline');
$inlineParent->
setAttribute('color',
$colour);
$nodes =
$xpath->
query('/span/node()');
foreach ($nodes as $node) { if ($node->
nodeType == XML_ELEMENT_NODE
) { self::_appendInlineChild
($node,
$inlineParent,
$fo);
} else { $child =
$fo->
importNode($node,
true);
$inlineParent->
appendChild($child);
} } // Side effect of XSL-FO complexity is the odd blank monospace box // This strips them out - sort of a workaround. Means this code could // be improved a bit so stripping is not needed to start with! if (preg_match("/^\s+$/",
$inlineParent->
firstChild->
textContent)) { $inlineParent->
removeChild($inlineParent->
firstChild);
} foreach ($block->
childNodes as $node) { $block->
removeChild($node);
} $block->
appendChild($inlineParent);
} private
static function _appendInlineChild
($span,
$inlineParent,
$fo) { $style =
$span->
getAttributeNode('style')->
value;
$colour =
substr($style,
7,
7);
$content =
$span->
nodeValue;
$inlineChild =
$fo->
createElement('fo:inline',
$content);
$inlineChild->
setAttribute('color',
$colour);
$inlineParent->
appendChild($inlineChild);
}}
With Phing installed - our PHP environment is complete. Let's now grab the remaining elements.
Continue reading "Writing Professional Looking Documentation With Docbook, PHP, Phing and Apache FOP: Part 1: Getting Started"