Name

lxaddids — Add IDs to an XML document

Synopsis

lxaddids [ -xmlns[:prefix]=uri ...] -e element-query [ -i id-attr-name ] [ -p prefix-query ] [ -f format ] [ -c count-query ] [ input-file ]

Description

lxaddids adds ID attributes to an XML document. The values used for IDs are made up of a prefix and a count. By default the prefix is "A" and the count starts at one and increments for each ID generated, but both of these can be changed as described below.

The input-file argument may be a URI instead of a filename. If no input-file argument is given, standard input is used.

-xmlns[:prefix]=uri

binds a prefix (or the default namespace) to a URI for use in XPath queries.

-e element-query

an XPath query identifying the elements to which attributes are to be added. This query is streamed, and the -p and -c queries are evaluated relative to the element without reading its children.

-i id-attr-name

the name to be used for the ID attributes. Any existing attribute with the same name will be replaced. The default name is "id".

-p prefix-query

an XPath query used to construct the prefix part of the IDs. Note that because is is a query, you must quote twice to specify a fixed string, for example -p "'foo'".

-f format

a C-style format string used to create the ID attribute value. This can be used to create unusually-formatted IDs. The format is applied to two values: the prefix (a string) and the count (a number).

-c count-query

an XPath query used to construct the count part of the IDs. When this is used, the count starts at zero and is increased whenever a node matching the query is encountered. If the matching node is a text node, the count is increased by the length of the text, otherwise it is increased by one. This allows IDs to depend on the position of the element in the text of the document.

Examples

In these examples, we assume a file of sentence (<s>) elements containing word (<w>) and punct (<punct>) elements. The <w> elements have a p attribute giving their part of speech.

lxaddids -e w <old.xml >new.xml

Adds an ID attribute to each <w> element. The attributes will have the default name id and the values will be of the default form "A1", "A2", ...

lxaddids -e 's/*' -i ident <old.xml >new.xml

Adds an ID attribute to each child of each <s> element. The attributes will be named ident.

lxaddids -e 's/*' -p 'name()' <old.xml >new.xml

Adds an ID attribute to each child of each <s> element. The ID values will depend on the name of the element they are attached to; on <w> elements they will start with "w" ("w34" for example) and on <punct> elements they will start with "punct".

lxaddids -e w -f '%s-%d' <old.xml >new.xml

Adds an ID attribute to each <w> element. The ID values will be of the form "A-1", "A-2", ...

lxaddids -e 'w[@p="NN"]' -c w <old.xml >new.xml

Adds an ID attribute to each <w> element that has a p attribute whose value is "NN". The numbers in the ID values will count all <w> elements, not just those which have an ID assigned.

lxaddids -e 'w' -cc 's//text()' <old.xml >new.xml

Adds an ID attribute to each <w> element. The numbers in the ID values will count the characters preceding the <w> in the containing <s> element.

Bugs

The program does not check for duplicate IDs, which might arise if there are already IDs on some elements or if the -c option is used in such a way that it does not generate unique IDs.

The -f formats are not handled with full generality; some cases don't work.