Lxreplace, a program to make replacements and deletions in an XML document

Richard Tobin, 2005

Lxreplace is a program that allows nodes in an XML document (elements, attributes and text) to be replaced, deleted, or renamed. The nodes to be changed are specified by an XPath. Their replacements are specified by either an XPath expression or an XSLT template.

Synopsis

lxreplace [-xmlns[:prefix]=uri ...] -q query-xpath [ -r replace-xpath | -t template | -n rename-xpath | -d ] < input.xml > output.xml

Description

-xmlns:prefix=uri
-xmlns=uri

These flags allow namespace prefixes and the default namespace to be bound for use in the XPath and QName arguments described below. If your document does not use namespaces, you do not need them.

-q query-xpath

In all cases the nodes to be processed are specified by the -q flag. Nodes that match the query-xpath are processed; the others are left unchanged. How the nodes are processed depends on which of the -r, -t, -n, and -d flags is used.

-r replace-xpath

The -r flag specifies an XPath that is used to construct the replacement for matched nodes. The replace-xpath is evaluated relative to the matched node. If the result is an element then it is processed recursively before replacing the original matched node.

If the matched node is an attribute, then it is deleted if the replace-xpath returns an empty node set. Otherwise the value of the attribute is replaced by the string value of the result.

In this mode, lxreplace streams the input document, so that only the subtree rooted at the matched node and its ancestors are accessible when the replace-xpath is evaluated. If it is necessary to access other parts of the document to construct the replacement, use the -t form instead.

-t template

The -t flag specifies an XSLT template that is used to construct the replacement for matched nodes. This is done by constructing an XSLT stylesheet containing the template in a rule whose match attribute is the query-xpath.

In the stylesheet, the prefix xsl is bound to the XSL namespace. A low-priority template rule is included that provides an identity transform for nodes not matched by the template. For convenience, a number of entities are defined for use in the template:

&this;
Copies the current node and recursively processes its attributes and children.
Equivalent to <xsl:copy><xsl:apply-templates select='@*|node()'/></xsl:copy>.
&attrs;
Processes the attributes of the current node.
Equivalent to <xsl:apply-templates select='@*'/>
&children;
Processes the children of the current node.
Equivalent to <xsl:apply-templates select='node()'/>.
&text;
Copies the text of the current node.
Equivalent to <xsl:value-of select='.'/>.
-n rename-xpath

The -n flag specifies an XPath which is used to rename the matched nodes. The rename-xpath is evaluated relative to the matched node and the result is interpreted as a QName. A QName (qualified name) is either a plain name such as table or a prefixed name such as xhtml:table.

Only elements and attributes can be renamed.

-d

This causes the matched nodes to be simply deleted, rather than replaced. It is equivalent to -r expr where expr is an XPath that selects an empty node set.

If none of -r, -t, -n, and -d is given the effect is equivalent to -r node(), which replaces matched nodes with their children. That is, it "unwraps" the children of the nodes.

Note about quoting

It is usually necessary to quote the xpath and template arguments because they contain characters significant to the shell such as *. It is generally best to use single quotes for this, and use double quotes when needed inside the value. In some cases it is quite difficult to find an approriate quoting.

If you need an XPath whose value is a fixed string (as is commonly the case when renaming elements and attributes), you must doubly quote it; for example '"foo"'. The outer quotes are consumed by the shell, so if you used 'foo' the XPath would evaulate to any <foo> children of the current node instead of to the string "foo".

Examples

This replaces all elements with their children, in effect deleting all element markup:

lxreplace -q '*'

This deletes all <meta> elements:

lxreplace -q 'meta' -d

This changes the name of all <entity> elements to ent (note the double quoting):

lxreplace -q entity -n '"ent"'

This changes the name of all <entity> elements to the value of their type attribute:

lxreplace -q entity -n @type

This adds an attribute text to all <entity> elements, whose value is the text content of the element:

lxreplace -q entity -t '<entity text="{.}">&attrs;&children;</entity>'
This will only work if the element doesn't already have a text attribute, otherwise the old value will be copied back by the call to &attrs;. We could avoid this by using the more complicated template
lxreplace -q entity -t '<entity>&attrs;<xsl:attribute name="text">&text;</xsl:attribute>&children;</entity>'
which constructs the name attribute after copying the old ones.

This replaces all <entity> elements with the value of their text attribute:

lxreplace -q entity -r @text