Functional XML

1 Acknowledgements

The work reported here was initiated by a discussion with Tim Berners-Lee, who also first used the phrase "functional XML" in my hearing. The basic direction was first suggested by Richard Tobin.

2 XML processing

XML processing is a heavily overloaded term, appealing as it does to a wide range of possible understanding of the 'meaning' of an XML document. At the base level, the XML Recommendation assigns a meaning to character streams associated with one of the XML family of media types in terms of a tree-structured document abstraction, whose detailed specification is given by the Infoset Recommendation. Applications of XML, i.e. particular XML vocabularies with an associated semantics, may in turn specify a further layer of meaning in terms of a mapping to/from some abstract data model. Examples of this include W3C XML Schema (schema components), SVG (graphical objects) and RDF (triples).

Sun's original W3C Pipeline Note
Markup Technology's MT Pipeline
Sean McGrath's XPipe
Norm Walsh's SXPipe
Orbeon's XPL
1060 Research's NetKernel

These existing pipeline languages have a common core, in which XML processing is defined by a pipeline, which is itself an XML document. A pipeline specifies a sequence of high-level operations, drawn from an inventory such as the list above, to be chained together, one after another, each operating on the output of the one before. Some pipeline systems also provide operations at a lower level, allowing manipulation of parts of documents. Another common feature is provision for conditional processing. Here's an example of a simple pipeline specifying a sequence of inclusion, validation and styling:

<?xml version="1.0" encoding="utf-8"?>
<p0:pipeline xmlns:p0="http://www.w3.org/2002/02/xml-pipeline">
 <p0:processdef name="transform" definition="MT_XSLT_1.0"/>
 <p0:processdef name="include" definition="MT_XInclude"/>
 <p0:processdef name="validate" definition="MT_W3C_XML_Schema_1.0"/>
 <p0:process type="include">
  <p0:input label="$IN"/>
  <p0:output label="#i2.1"/>
 </p0:process>
 <p0:process type="validate">
  <p0:input label="#i2.1"/>
  <p0:input name="schema" label="po.xsd"/>
  <p0:output label="#i4.2"/>
 </p0:process>
 <p0:process type="transform">
  <p0:input label="#i4.2"/>
  <p0:input name="stylesheet" label="po.xsl"/>
  <p0:output label="$OUT"/>
 </p0:process>
</p0:pipeline>

3 An alternative, functional, perspective on XML processing

An alternative approach to XML processing is already in place in a somewhat fragmented and inconsistent way. Consider the following signals which may be present in an XML document:

the xsi:schemaLocation attribute on a document element
the xml-stylesheet processing instruction/the xsl:version attribute on a (non-XSLT) document element
the http://www.w3.org/2001/04/xmlenc# namespace

Each of these has a W3C Recommendation-based processing semantics -- a document with one of these signals can be understood as saying, respectively:

Validate me.
Transform me.
Decrypt me.

More recently, GRDDL provides a way for a document to indicate, using an data-view:interpreter, a transformation which will produce RDF statements. The presence of this attribute thus can be understood as saying "Understand me."

These signals are neither systematic nor universal. The goal of f(X) is to allow XML documents to indicate their own preferred processing in a systematic and fully general way.

4 The f(X) approach

As noted above, the first-level semantics of an XML document is its own XML infoset. f(X) allows for the creation of XML documents which signal a second-level semantics for themselves in terms of one or more infoset-to-infoset mappings. It does this by specifying a compositional infoset-mapping interpretation for elements in the f(X) namespace, covering all the specifications mentioned above.

The names for the mappings covered by f(X) are chosen to describe the output of that mapping, since that is what such elements are understood to designate. In the simplest cases, their input is the infoset designated in turn by their single child element. Taking schema validation and decryption as our starting point, we get the following examples:

Example: Simple f(X) example: decryption

<?xml version='1.0'?>
<fx:decrypted xmlns:fx="http://www.w3.org/2005/05/fx">
 <EncryptedData xmlns='http://www.w3.org/2001/04/xmlenc#'
                MimeType='text/xml'>
  <CipherData>
   <CipherValue>A23B45C56 . . .</CipherValue>
  </CipherData>
 </EncryptedData>
</fx:decrypted>

Designates the infoset resulting from decrypting the ciphertext and parsing the resulting stream as XML.

Example: Simple f(X) example: validation

<?xml version='1.0'?>
<fx:PSVI xmlns:fx="http://www.w3.org/2005/05/fx">
 <purchaseOrder xmlns="http://www.example.com/PurchaseOrder" xmlns:ad="http://www.example.com/Address" orderDate="1999-10-20" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.example.com/PurchaseOrder po.xsd">
 <shipTo>
  <ad:name>Alice Smith</ad:name>
  <ad:street>123 Maple Street</ad:street>
  <ad:city>Mill Valley</ad:city>
  <ad:state>CA</ad:state>
  <ad:zip>90952</ad:zip>
 </shipTo>
 <billTo>
  <ad:name>Bill Gates</ad:name>
  <ad:street>123 Rich Guy Street</ad:street>
  <ad:city>Redmond</ad:city>
  <ad:state>WA</ad:state>
  <ad:zip>99999</ad:zip>
 </billTo>
 <comment>Hurry, my lawn is going wild!</comment>
 <items>
  <item partNum="872-AA">
   <productName>Lawnmower</productName>
   <quantity>1</quantity>
   <price>148.95</price>
   <comment>Confirm this is electric</comment>
  </item>
  <item partNum="926-AA">
   <productName>Baby Monitor</productName>
   <quantity>1</quantity>
   <price>39.98</price>
   <shipDate>1999-05-21</shipDate>
  </item>
 </items>
</purchaseOrder>
</fx:PSVI>

Designates the post-schema-validation infoset resulting from schema validity assessment of the basic infoset corresponding to the purchaseOrder element.

The simplicity and power of this approach, and the way in which it most clearly moves beyond the existing ad hoc signalling mechanisms mentioned above, become apparent once we actually compose multiple f(X) elements in a single document:

Example: Simple composition with f(X) 1

<?xml version='1.0'?>
<fx:PSVI xmlns:fx="http://www.w3.org/2005/05/fx">
 <fx:decrypted>
  <EncryptedData xmlns='http://www.w3.org/2001/04/xmlenc#'
		 MimeType='text/xml'>
   <CipherData>
    <CipherValue>A23B45C56 . . .</CipherValue>
   </CipherData>
  </EncryptedData>
 </fx:decrypted>
</fx:PSVI>

PSVI of decrypted document

But with respect to validation and decryption, the other order makes sense too:

Example: Simple composition with f(X) 2

<?xml version='1.0'?>
<fx:decrypted xmlns:fx="http://www.w3.org/2005/05/fx">
 <fx:PSVI>
  <EncryptedData xmlns='http://www.w3.org/2001/04/xmlenc#'
		 MimeType='text/xml'>
   <CipherData>
    <CipherValue>A23B45C56 . . .</CipherValue>
   </CipherData>
  </EncryptedData>
 </fx:PSVI>
</fx:decrypted>

Decryption of schema-validated document

Indeed, validation before and after decryption is probably often what is wanted. That is, first we check that the encrypted data is valid per the XML Encryption namespace schema, then we decrypt, then we validate the result to check that it's OK.

Example: Richer composition

<?xml version='1.0'?>
<fx:PSVI xmlns:fx="http://www.w3.org/2005/05/fx">
 <fx:decrypted>
  <fx:PSVI>
   <EncryptedData xmlns='http://www.w3.org/2001/04/xmlenc#'
		  MimeType='text/xml'>
    <CipherData>
     <CipherValue>A23B45C56 . . .</CipherValue>
    </CipherData>
   </EncryptedData>
  </fx:PSVI>
 </fx:decrypted>
</fx:PSVI>

PSVI of decryption of schema-validated document

The designation of the fx:included element is the result of doing XInclude processing on the designation of its child element. Since the simple pattern above is likely to be very common, it can be abbreviated as follows:

Example: f(X) with XInclude, simplified

<?xml version='1.0'?>
<fx:decrypted xmlns:fx="http://www.w3.org/2005/05/fx">
 <fx:included href="encrypted.xml"/>
</fx:decrypted>

Simplified version of Example

The use of fx:included allows us to separate the statement of intended or desired designation from the core document, but does not require it.

6 Summary of f(X) so far

f(X) provides a means for specifying the desired designation of XML documents in a systematic and compositional way. It does so by specifying the designation of three basic classes of XML elements

fx:included elements: Designate the result of first interpreting the href, xpointer and other XInclude attributes per the XInclude spec., then applying these f(X) rules to the resulting infoset;
other elements in the f(X) namespace: Designate the result of the mapping specified by their name applied to the designations of their children;
all other elements: Designate themselves, that is, their ordinary infosets, except in-so-far as they contain elements in the f(X) namespace, which are interpreted per the above two clauses.

7 Completing basic f(X)

A few things need to be added to cover the intended basic functionality.

It should be possible to prevent the special treatment f(X) specifies for the first two classes of elements above -- f(X) provides the fx:sic element for this purpose:

Example: 'Quoting' with fx:sic

<?xml version='1.0'?>
<fx:sic xmlns:fx="http://www.w3.org/2005/05/fx">
 <fx:included href="encrypted.xml"/>
</fx:sic>

Designates a single-element fx:included document.

Also, we provide a sic attribute on fx:included, which defaults to false, but which if true blocks recursive f(X) processing of the inclusion target.

Finally, we need a way of specifying more than one input infoset and, for those specifications which require (or allow) it, parameters. f(X) allows for parameters via attributes on the relevant f(X) elements, and allows additional children where appropriate to designate additional input infosets. For example, for the fx:result (XSLT) we allow a second child to directly provide the stylesheet:

Example: Primary and secondary input infosets

<?xml version='1.0'?>
<fx:result xmlns:fx="http://www.w3.org/2005/05/fx">
  <fx:included href="po.xml"/>
  <fx:included href="po.xsl"/>
</fx:result>

Since infosets such as stylesheets and schema documents are so often static, it also makes sense to allow them to appear as attributes on the relevant f(X) element:

Example: Static infosets as attributes

<fx:result xmlns:fx="http://www.w3.org/2005/05/fx">
            stylesheet="po.xsl">
 <fx:PSVI schemaDocuments="po.xsd address.xsd">
  <fx:included href="po.xml"/>
 </fx:PSVI>
</fx:result>

Style the result of validation, using static resources for schema documents and stylesheet

Finally we need to list at least a preliminary set of built-in f(X) designators for each public specification which can be understood as defining XML-to-XML functions:

fx:valid: Validated W3C XML
fx:PSVI: W3C XML Schema
fx:result: W3C XSLT (v.1 or v.2, depending on stylesheet)
fx:queryResult: W3C XML Query
fx:encrypted: W3C XML Encryption
fx:decrypted: W3C XML Encryption
fx:signed: W3C XML Signature
fx:verified: W3C XML Signature
fx:transcluded: W3C XML Include
fx:gMeta: GRDDL

Editorial note: HST	2005-07-04
Obviously need to fill in detail in each case as to calling sequence, results, etc.

8 Beyond basic f(X): Choosing and binding

As mentioned above, some existing pipeline languages allow for conditional processing. If it is judged appropriate to include something like this in f(X), it can be done easily, following the model of XSLT's choose:

Example: Using fx:case for conditional processing

<fx:case>
 <fx:when test="/root/@version > 3">
  <fx:included href="doc.xml"/>
  <fx:PSVI schemaDocuments="current.xsd">
   <fx:included href="doc.xml"/>
  </fx:PSVI>
 </fx:when>
 <fx:otherwise>
  <fx:PSVI schemaDocuments="stale.xsd">
   <fx:included href="doc.xml"/>
  </fx:PSVI>
 </fx:otherwise>
</fx:case>

Choosing a schema document based on an XPath expression test

fx:when has a test attribute for an XPath expression and two infoset arguments. The first is the infoset to test with the XPath expression, the second the result if the test is satisfied.

Clearly if interpreted literally we have a lot of potential for wasted effort here with respect to the doc.xml resource. There are two possibile ways f(X) could address this. It could do nothing beyond noting that implementors may detect and optimize such cases, or it could provide for explicit binding of infosets to variables, which can then be referenced by XPath expressions and an fx:infoset element:

Example: Binding infosets to variables

<fx:with>
 <fx:variable name="doc" href="doc.xml"/>
 <fx:case>
  <fx:when test="$doc/root/@version > 3">
   <fx:PSVI schemaDocuments="current.xsd">
    <fx:infoset expr="$doc"/>
   </fx:PSVI>
  </fx:when>
  <fx:otherwise>
   <fx:PSVI schemaDocuments="stale.xsd">
    <fx:infoset expr="$doc"/>
   </fx:PSVI>
  </fx:otherwise>
 </fx:case>
</fx:with>

Explicit binding to avoid extra work

The provision of an explicit binding mechanism would clearly be of use, particularly since in cases where testing needs to be done on the result of some more or less complex composition of f(X) elements it would enable the concise specification of dependencies which would otherwise require egregious duplication of structure. However there's a real question as to whether this opens up too many uncertainties. In particular the introduction of variable binding into pure functional programming languages is known to have a significant impact on overall computational complexity. . .