Tuesday, January 05, 2010

FAQ: Media-independent XML

Q: I need a media-independent XML output, DocBook, from my InDesign documents. However, reading the XML file exported by BatchXSLT for InDesign shows class attributes for paragraph and character styles.

A: The exported XML always is media-independent. A class attribute is nothing more than 'information'. It says: 'this text part originally was styled with this style sheet'. It does NOT style the text in XML. However, this style information is important and can be used to extract certain elements from the document to create elements in the target format.

Example for DocBook: Assume an Element <citation> must be created:
<citation>
    <authorname type="first">Andreas</authorname>
    <authorname type="last">Imhof</authorname>
    <year>2010</year>
    <title>Amazing XML</title>
    <publicationname>My Blog</publicationname>
</citation>

The one and only way to be able to identify and extract names or dates from an InDesign document is through styles. Each text part to associate to a DocBook element must be marked with its own style.
The created XML output of text marked like this could be:
<span class="AuthorFirstName">Andreas</span> <span class="AuthorName">Imhof</span>....
<div class="publicationYear">2010</div>...
<div class="articleTitle">Amazing XML</div>...
<div class="publicationName">My Blog</div>...

Using XSLT to select the correct text parts through its style class attribute now is easy and the <citation> element can be created.

In other words: style names is the only thing contained in InDesign documents which lets us determine the meaning (not formatting) of text parts.

No comments: