17 Nov 1998
There are two parts to building an index automatically, creating the index terms and incorporating the generated index into your document.
The generated index is constructed from
in your document. DocBook
IndexTerms are not part of the
<para> This paragraph contains an interesting thing<indexterm id="thing"> <primary>thing</primary><secondary>interesting</secondary></indexterm> that will appear in the index. </para>
It is not absolutely necessary to provide an ID for each index term, but the performance of the print backends may degrade significantly if you have a large number of index terms that do not have IDs.
The index will be generated as a separate file. You must arrage to have this file incorporated into your document. The easiest way to do this is by file entity reference. At the top of your document, add an internal subset that defines the index file entity:
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V3.1//EN" [ <!ENTITY genindex.sgm SYSTEM "genindex.sgm"> ]> <book> ... &genindex.sgm; <!-- Put this after the end tag of the last chapter or appendix, or --> <!-- wherever you want the index to appear. It must be a valid location --> <!-- for an index. --> </book>
Before you can process this document, you must make sure that genindex.sgm exists. This is a chicken and egg problem, but it can be solved with the collateindex.pl command:
perl collateindex.pl -N -o genindex.sgm
-N option creates a new index;
indentifies the name of the output file. This name must be the same as the
name you specified in the internal subset.
Creating an index is a multi-step, two-pass process:
In order to create an index, you must first generate the raw index data. This is done with the HTML Stylesheet (even if you want print output).
Process your document with jade using the HTML Stylesheet
-V html-index option:
jade -t sgml -d html/docbook.dsl -V html-index yourdocument.sgm
This will produce a file called HTML.index that contains raw index data.
If you're planning to generate your final document as a single HTML
file using the
nochunks option, make sure you generate
the HTML.index file with that option as well:
jade -t sgml -d html/docbook.dsl -V html-index -V nochunks yourdocument.sgm
Generate an index document with collateindex.pl:
perl collateindex.pl -o genindex.sgm HTML.index
There are a multitude of options to collateindex.pl; see the reference page for more information.
Process your original document again, using whichever stylesheet is appropriate. The new document will contain the generated index.
Any generated index is perhaps better than none, but there are still a few things that cannot be accomplished:
Duplicate page numbers are not suppressed in the index. If the document contains three indexing hits on page 4, the generated index will contain “4, 4, 4”.
Ranges are not automatically constructed. If the document contains indexing hits on pages 4, 5, 6, and 7, the generated index will contain “4, 5, 6, 7” instead of “4–7”.
It is possible that the TeX backend could be made smart enough to do these things automatically. (Sebastian will probably kill me for suggesting that). For the RTF backend, at least in MS Word, it's probably possible to write a WordBasic macro that would automatically fix the index. (If someone does, please pass it along).