Re-Interpreting the XML Pipeline Note:
Adding Streaming and On-Demand Invocation
Acknowledgements
- The work reported here involved the whole team at Markup: Mary Holstege,
John James, David Merwin, Alex Milowski, Barry Plotkin, Jo Rabin and Richard Tobin
Introduction and Motivation
- The three most important ideas in computer science
- Abstraction, abstraction and abstraction
- As someone mentioned early this week, XML is a great vehicle for
processing information as well as structuring it
- Our first response when confronting an XML processing task should
not be to start writing code
- We already have virtually everything we need to carry out a huge range
of XML processing tasks at the XML level directly
- XSLT
- XML Schema
- XInclude
- . . .
- All we need is a little help putting it all together
XML Pipelines
- The lack of a
coherent XML processing model to support decomposition of complex XML
processing tasks represents a serious bottleneck
- for enterprise use of
XML in general
- for Web Services in particular
- All that's needed is support for the basic tool in the
architect's armoury: Divide and Conquer
- In other words -- XML Pipelines
- Configurations of basic XML processing steps
- Some steps are relatively heavy
- XSLT-based transformation
- W3C XML Schema-based validation
- Others can be much simpler
- XPath-based extraction
- One-for-one renaming
NEEDAPICTURE
Is this just Web Services by another name?
- A lot of the preceding rhetoric sounds like the usual Web Services pep talk
- But the focus here is narrower
- We're looking at decomposing a single XML application
- In a single execution context
- Admittedly the boundary between local pipelines and distributed
choreography is not crisp. . .
A simple illustration
What do we need
- A standard for pipeline specifiction
- Interop matters here just like everywhere else
- High performance
- Pipelines need to be fast to be attractive
- The Sun XML Pipeline W3C Note is a good starting point
- Published by W3C in February of 2002
- Edited by Eve Maler and Norm Walsh
- Many co-submitters, including Markup Technology
The Sun Pipeline design
- An XML document type for describing pipelines
- A pipeline is a sequence of steps, with specified input(s), output(s)
and parameters
- The processing required to perform a step is named, not defined in detail
- Dependency-driven, in the mode of
make
and ant
- Here's a simple example
- All inputs at every
step in a pipeline must have reliable timestamps
- so that their status as
up-to-date or not can be determined
- So inputs and outputs must be actual files
- Precludes message-based
input and output
- Militates against non-local URIs
- Multi-step pipelines require at least serialisation and probably parsing between each
step
Re-interpreting the Sun Pipeline Note
- We can re-interpret the proposed document type
- Removing some limitations
- Enabling more efficient implementation
- We take it as
simply specifying a configuration of operations on XML-encoded
information
- Without the dependency-driven interpretation
- Allows intermediate results to be
passed between components without serialisation
- So we think of pipelines more like shell scripts
- Mapping externally specified inputs to outputs
- Here's a new example for comparison.
- Facilitates deploying pipelines
- For example in servers where they can then operate on
message-derived input to produce message-delivered output
Good tools support good design:
Extreme decomposition
- Making decomposition easy changes how you work
- Successive approximation is an approach to transformation that
needs an efficient pipeline to be attractive
- The ability to trivially
construct a pipeline of eight successive XSLT steps allowed me to build a complex application in a day
- Building a single stylesheet to do the whole thing, although
possible, would have taken much more work, and produced a much less
robust outcome
- Intellectually simpler
- Having intermediate output during development greatly aids debugging
- No intermediate output in production speeds processing
- The task is conversion from Powerpoint via Open Office to XML and HTML
- [Demo the pipeline running]
Good tools support good design:
Declarative forms encourage re-use
- Let's return to our first, XInclude example
- Suppose we want to expose our pipeline as a SOAP-mediated service
- Wrapping the existing pipeline in a few additional steps achieves this
very easily
- Now takes a soap message as input
- And produces a soap message as output [see XED window]
- [demo of the new pipe]
Conclusions
- We've seen a document-oriented example
- And a web services example
- And I've got data-oriented examples I didn't show
- Take home message:
- Think about a two-level decomposition of XML-related tasks
- High-level decomposition into macro-applications using choreography
- Low-level decomposition into micro-applicatoins using pipelines
- Dealing with XML as XML is the XML way
- Adding a new feed to a news service should take 6 hours of
pipeline building
- Not 6 weeks of Java programming
- Watch for the upcoming free release of MTPL, the tool I've shown you today
A word from our sponsors