Tangle

Tangle Part of Literate Programming in XML 05 Oct 2001 0.1 05 Oct 2001 ndw Initial draft. NormanWalsh The tangle.xsl stylesheet transforms an &xweb; document into a source code document. This is a relatively straightforward process: starting with the top fragment, all of the source fragments are simply stitched together, discarding any intervening documentation. The resulting tangled document is ready for use by the appropriate processor.

The Stylesheet This &xweb; document contains the source for two stylesheets, tangle.xsl and xtangle.xsl. Both stylesheets produce tangled sources, the latter is a simple customization of the former for producing XML vocabularies. Each of these stylesheets performs some initialization, sets the output method appropriately, begins processing at the root template, and processes fragments, copying the content appropriately.

The <filename>tangle.xsl</filename> Stylesheet The tangle stylesheet produces text output.

The <filename>xtangle.xsl</filename> Stylesheet The xtangle stylesheet produces XML output.

Initialization The stylesheet initializes the processor by loading its version information (stored in a separate file because it is shared by several stylesheets) and telling the processor to preserve whitespace on all input elements. The stylesheet also constructs a key for the ID values used on fragments. Because &xweb; documents do not have to be valid according to any particular DTD or Schema, the stylesheet cannot rely on having the IDs identified as type ID in the source document.

The Root Template The root template begins processing at the root of the &xweb; document. It outputs a couple of informative comments and then directs the processor to transform the src:fragment element with the $top ID. Source code fragments in the &xweb; document are not required to be sequential, so it is necessary to distinguish one fragment as the primary starting point.

Processing Fragments In order to tangle an &xweb; document, we need only copy the contents of the fragments to the result tree. Processing src:fragment elements is conceptually easy, simply copy their children. However, if we simply used: <xsl:apply-templates mode="copy"/> we'd copy the newlines at the beginning and end of a fragment that the author might have added for editing convenience. In environments where whitespace is significant (e.g., Python), this would introduce errors. We must avoid copying the first and last newlines.

Convenience Variables For convenience, we store subexpressions containing the first, last, and all the middle nodes in variables.

Handle First Node Handling the leading newline is conceptually a simple matter of looking at the first character on the line and skipping it if it is a newline. A slight complexity is introduced by the fact that if the fragment contains only a single text node, the first node is also the last node and we have to possibly trim off a trialing newline as well. We separate that out as a special case.

Handle A Fragment that Contains a Single Node If the $first-node is a text node and the fragment contains only a single child, then it is also the last node. In order to deal with a single text node child, we must address four cases: the node has both leading and trailing newlines, the node has only leading newlines, only trailing newlines, or no newlines at all.

More Convenience Variables For convenience, we calculate whether or not the node in question has leading and/or trailing newlines and store those results in variables.

Handle a Single Node With Leading and Trailing Newlines If the node has both leading and trailing newlines, trim a character off each end.

Handle a Single Node With Only Leading Newlines If the node has only leading newlines, trim off the first character.

Handle a Single Node with Only Trailing Newlines If the node has only trailing newlines, trim off the last character.

Handle a Single Node with No Newlines Otherwise, the node has no newlines and it is simply printed.

Handle a First Node with a Leading Newline If the first node is a text node and begins with a newline, trim off the first character.

Handle a First Node without a Leading Newline Otherwise, the first node is not a text node or does not begin with a newline, so use the copy mode to copy it to the result tree.

Handle Last Node Handling the last node is roughly analagous to handling the first node, except that we know this code is only evaluated if the last node is not also the first node. If the last node is a text node and ends with a newline, strip it off. Otherwise, just copy the content of the last node using the copy mode.

Handle the Middle Nodes The middle nodes are easy, just copy them using the copy mode.

Copying Elements Copying elements to the result tree can be divided into four cases: copying passthrough elements, copying fragment references, and copying everything else.

Copying <sgmltag>src:passthrough</sgmltag> Passthrough elements contain text that is intended to appear literally in the result tree. We use XSLT disable-output-escaping to copy it without interpretation:

Copying <sgmltag>src:fragref</sgmltag> With a unique exception, copying fragment references is straightforward: find the fragment that is identified by the cross-reference and process it. The single exception arises only in the processing of src:fragref elements in the weave.xweb document. There is a single template in the weave program that needs to copy a literal src:fragref element to the result tree. That is the only time the branch is executed.

Copying Normal Fragment References To copy a normal fragment reference, identify what the linkend attribute points to, make sure it is valid, and process it.

Fragment is Unique Make sure that the linkend attribute points to exactly one node in the source tree. It is an error if no element exists with that ID value or if more than one exists. Link to fragment "

" does not uniquely identify a single fragment.

Fragment is a <sgmltag>src:fragment</sgmltag> Make sure that the linkend attribute points to a src:fragment element. FIXME: this code should test the namespace name of the $fragment Link "

" does not point to a src:fragment.

Copying Disable-Output-Escaping Fragment References A src:fragref that specifies disable-output-escaping is treated essentially as if it was any other element. The only exception is that the disable-output-escaping attribute is not copied. Because tangle and weave are XSLT stylesheets that process XSLT stylesheets, processing src:fragref poses a unique challenge. In ordinary tangle processing, they are expanded and replaced with the content of the fragment that they point to. But when weave.xweb is tangled, they must be copied through literally. The disable-output-escaping attribute provides the hook that allows this.

Copying Everything Else Everything else is copied verbatim. This is a five step process: Save a copy of the context node in $node so that we can refer to it later from inside an xsl:for-each. Construct a new node in the result tree with the same qualified name and namespace as the context node. Copy the namespace nodes on the context node to the new node in the result tree. We must do this manually because the &xweb; file may have broken the content of this element into several separate fragments. Breaking things into separate fragments makes it impossible for the XSLT processor to always construct the right namespace nodes automatically. Copy the attributes. Copy the children.

For non-XML source docuements, this template will never match because there will be no XML elements in the source fragments.

Copy Namespaces Copying the namespaces is a simple loop over the elements on the namespace axis, with one wrinkle. It is an error to copy a namespace node onto an element if a namespace node is already present for that namespace. The fact that we're running this loop in a context where we've constructed the result node explicitly in the correct namespace means that attempting to copy that namespace node again will produce an error. We work around this problem by explicitly testing for that namespace and not copying it.

Copy XML Constructs In the xtangle.xsl stylesheet, we also want to preserve XML constructs (processing instructions and comments) that we encounter in the fragments. Note that many implementations of XSLT do not provide comments in the source document (they are discarded before building the tree), in which case the comments cannot be preserved.