Weave

Weave Part of Literate Programming in XML 05 Oct 2001 0.1 05 Oct 2001 ndw Initial draft. NormanWalsh The weave.xsl stylesheet transforms an &xweb; document into a documentation document. This is accomplished by weaving the documentation from the &xweb; file with a pretty-printed version of the source code. The resulting document is ready to be processed by whatever down-stream publishing tools are appropriate.

The Stylesheet The stylesheet performs some initialization, begins processing at the root of the &xweb; document, and processes fragments and elements. This stylesheet also requires some recursive templates that are stored at the end of the stylesheet.

Initialization The stylesheet initializes the processor by loading its version information (stored in a separate file because it is shared by several stylesheets), telling the processor to preserve whitespace on all input elements, setting the output method, and initializing the excluded result prefixes. The stylesheet also constructs a key for the ID values used on fragments. Because &xweb; documents do not have to be valid according to any particular DTD or Schema, the stylesheet cannot rely on having the IDs identified as type ID in the source document.

Default Exclude Result Prefixes Generally, the namespace declarations for namespaces used by the source code portion of the &xweb; file are not needed in the woven documentation. To reduce the size of the documentation file, and to reduce the clutter of unnecessary declarations, you can specify prefixes that should be excluded. This is done as a parameter so that it can be adjusted dynamically, though it rarely needs to be. The initial value comes from the mundane-result-prefixes attribute on the &xweb; file's top src:fragment.

Named Templates Correctly copying elements requires the ability to calculate applicable namespaces and output the appropriate namespace psuedo-attributes and attributes. These templates accomplish those tasks.

Root Template The root template begins processing at the root of the &xweb; document. It outputs a couple of informative comments and then processes the document. Source code fragments in the &xweb; document are not required to be sequential, we assume that they appear in the order in which they should be documented. This file was generated by weave.xsl version

. Do not edit!

See http://sourceforge.net/projects/docbook/

Fragments The goal when copying the source code fragments is to preserve the src:fragment elements in the documentation file (so that they can be formatted appropriately) but to escape all of the fragment content so that it appears simply as text in the documentation. For example, if the following fragment appears in the &xweb; file: <src:fragment id="foo"> <emphasis>some code</emphasis> </src:fragment> the documentation must contain: <src:fragment id="foo"> <emphasis>some code</emphasis> </src:fragment> The significance of this escaping is less obvious when the fragment contains non-XML code, but it is in fact still relevant. This task is accomplished by constructing a literal src:fragment element and then copying the content of the source document's src:fragment element in a mode that escapes all markup characters.

Copying Content The copy-content template could be as simple as: <xsl:apply-templates mode="copy"/> but we play one more trick for the convenience of &xweb; authors. It's convenient for authors to use newlines at the beginning and end of each program fragment, producing fragments that look like the one shown above. The problem is that white space is significant inside fragments, so the resulting documenation will contain a listing like this: 1 | 2 | <emphasis>some code</emphasis> 3 | The leading and trailing blank lines in this listing are distracting and almost certainly insignificant. Authors can avoid this problem by removing the offending newlines: <src:fragment id="foo"><emphasis>some code</emphasis></src:fragment> but this makes the source document more difficult to read and introduces tedious cut-and-paste problems. To avoid this problem, the copy-content template takes special pains to trim off one optional leading newline and one optional trailing newline. It does this by dealing with the first, last, and middle nodes of the src:fragment elements separately:

Convenience Variables For convenience, we store subexpressions containing the first, last, and all the middle nodes in variables.

Handle First Node Handling the leading newline is conceptually a simple matter of looking at the first character on the line and skipping it if it is a newline. A slight complexity is introduced by the fact that if the fragment contains only a single text node, the first node is also the last node and we have to possibly trim off a trialing newline as well. We separate that out as a special case.

Handle A Fragment that Contains a Single Node If the $first-node is a text node and the fragment contains only a single child, then it is also the last node. In order to deal with a single text node child, we must address four cases: the node has both leading and trailing newlines, the node has only leading newlines, only trailing newlines, or no newlines at all.

More Convenience Variables For convenience, we calculate whether or not the node in question has leading and/or trailing newlines and store those results in variables.

Handle a Single Node With Leading and Trailing Newlines If the node has both leading and trailing newlines, trim a character off each end.

Handle a Single Node With Only Leading Newlines If the node has only leading newlines, trim off the first character.

Handle a Single Node with Only Trailing Newlines If the node has only trailing newlines, trim off the last character.

Handle a Single Node with No Newlines Otherwise, the node has no newlines and it is simply printed.

Handle a First Node with a Leading Newline If the first node is a text node and begins with a newline, trim off the first character.

Handle a First Node without a Leading Newline Otherwise, the first node is not a text node or does not begin with a newline, so use the copy mode to copy it to the result tree.

Handle Last Node Handling the last node is roughly analagous to handling the first node, except that we know this code is only evaluated if the last node is not also the first node. If the last node is a text node and ends with a newline, strip it off. Otherwise, just copy the content of the last node using the copy mode.

Handle the Middle Nodes The middle nodes are easy, just copy them using the copy mode.

Fragment References Fragment references, like fragments, are simply copied to the documentation file. The use of disable-output-escaping is unique to this template (it instructs the tangle stylesheet to make a literal copy of the src:fragref, rather than expanding it, as it usually would).

Copying Elements Copying elements to the result tree can be divided into four cases: copying passthrough elements, copying fragment references and copying everything else.

Copying <sgmltag>src:passthrough</sgmltag> Passthrough elements contain text that is intended to appear literally in the result tree. We simply copy it through.

Copying <sgmltag>src:fragref</sgmltag> Because tangle and weave are XSLT stylesheets that process XSLT stylesheets, processing src:fragref poses a unique challenge. In ordinary tangle processing, they are expanded and replaced with the content of the fragment that they point to. But when weave.xweb is tangled, they must be copied through literally. The disable-output-escaping attribute provides the hook that allows this. When we're weaving, if the disable-output-escaping attribute is yes, the src:fragref is treated literally. When it isn't, the element is copied through literally. <src:fragref linkend="

"/>

</src:fragref>

Copying Everything Else There are two kinds of everything else: elements and other nodes. This element template is quite complex, but it's goal is simple: to translate bona-fide elements in the source document into text in the result document. In other words, where the element <foo> occurs in the source document, the result document should contain <foo>. Three things make this tricky: Elements in the source documents may have namespace nodes associated with them that are not explicitly declared on them. To the best of our ability, we must avoid copying these namespace nodes to the result tree. Attributes must be copied and formatted in some reasonable way in order to avoid excessively long lines in the documentation. Empty elements must be printed using the appropriate empty-element syntax. (It is simply impossible to determine what syntax was used in the source document, the best we can do is always use the empty-element syntax in the result as it is likely to be more common in the soruce.) The plan of attack is: Calculate what namespaces should be excluded (by prefix). Calculate the applicable namespaces. Output the leading < and the element name. Output the applicable namespaces. Output the attributes. If the element is not empty, finish the start tag, copy the element contents, and output and end tag. If the element is empty, finish the start tag with the empty-element syntax. <?

<!--

-->

The preceding template handles elements. Everything else is simply copied.

Calculate Excluded Prefixes Calculating the excluded prefixes requires evaluating the following conditions: If the element we are copying is inside a src:fragment element that specifies a set of mundane-result-prefixes, use those prefixes. Otherwise, use the $mundane-result-prefixes we calculated earlier. Note that in every case we exclude the namespace associated with xml.

Output Attributes The mechanics of outputting the applicable attributes is described in . The only wrinkle here is that if we have already output xmlns declarations for namespaces, the first real attribute is not really the first thing that looks like an attribute in the result.

Count Applicable Namespaces The applicable namespaces are determined by walking recursively over the list of namespace nodes associated with an element. For each namespace node, if it's already on some ancestor, then it doesn't have to be output again, otherwise if it has a prefix that is in the list of excluded prefixes or if it is the Literate Programming namespace, it is not counted (because it will not be output). Otherwise, it is counted. The recursion bottoms out when the list of namespace nodes has been exhausted. The total number of counted namespaces is then returned.

Matching Namespaces Returns 1 if the specified namespace occurs on the context element or some ancestor of the context element, up to but not including src:fragment element. Testing for this condition in copying applicable namespaces avoids duplicating namespace declarations repeatedly in a given fragment. By not including the src:fragment elment (or any of its ancestors) in the search, we can make sure that each fragment will have a complete set of declarations. 0 0

Output Applicable Attributes and Pseudo-Attributes Outputing the attributes (or namespace psuedo-attributes) is straightforward, the only tricky part is pretty-printing the resulting document. Pretty-printing has three cases: Before outputting the very first attribute or psuedo-attribute, we want to output only a single space, to separate the result from the preceding element name. Before outputting any additional attribute or psuedo-attribute, we want to output a line-feed and then indent the result appropriately. This prevents the attributes and psuedo-attributes from appearing as one huge, long line in the result. If the element has no attributes or psuedo attributes, we don't want to output anything; we want the closing tag delimiter to appear immediately after the element name.

Output Applicable Namespaces The applicable namespaces are determined by walking recursively over the list of namespace nodes associated with an element. For each namespace node, if it has a prefix that is in the list of excluded prefixes or if it is the Literate Programming namespace, it is not output, otherwise, it is. The recursion bottoms out when the list of namespace nodes has been exhausted.

xmlns :

Indent Attribute If this is not the first attribute or pseudo-attribute, output a newline and then indent an appropriate amount. Otherwise, simply output a space.

Indenting is accomplished by outputting a series of spaces. The number of spaces is determined by the length of the name of the current element plus two (one for the leading < and one for the space that separates the name from the first attribute).

Spaces is a recursive template that outputs a specified number of spaces.

Given a string, this template walks it recursively counting and returning the number of trailing spaces.

Output Applicable Attributes This template walks recursively over the attributes associated with a node and outputs each one of them in turn. (All attributes are applicable.)

Other Content The remaining elements, processing instructions, and comments are part of the documentation and must simply be copied to the result:

Elements The default template handles copying elements. It is a five step process: Save a copy of the context node in $node so that we can refer to it later from inside an xsl:for-each. Construct a new node in the result tree with the same qualified name and namespace as the context node. Copy the namespace nodes on the context node to the new node in the result tree. We must do this manually because the &xweb; file may have broken the content of this element into several separate fragments. Breaking things into separate fragments makes it impossible for the XSLT processor to always construct the right namespace nodes automatically. Copy the attributes. Copy the children.

Copy Namespaces Copying the namespaces is a simple loop over the elements on the namespace axis, with one wrinkle. It is an error to copy a namespace node onto an element if a namespace node is already present for that namespace. The fact that we're running this loop in a context where we've constructed the result node explicitly in the correct namespace means that attempting to copy that namespace node again will produce an error. We work around this problem by explicitly testing for that namespace and not copying it.

Processing Instructions Processing instructions are simply copied through.

Comments Comments are simply copied through. Note, however, that many processors do not preserve comments in the source document, so this template may never be matched.

Weaving DocBook It's no secret (and probably no surprise) that I use DocBook for most of my document authoring. Web files are no exception, and I have DocBook customization layer that validates woven &xweb; documentation files. In order to validate my woven documentation, I need to make sure that the appropriate document type declaration is associated with the documents. This is a simple change to the xsl:output instruction. This stylesheet turns source fragments and fragment references into DocBook elements and removes namespace bindings. Note that xsl:element is used explicitly instead of xsl:copy so that namespace bindings aren't copied.

<!--

-->