lxtransduce

Name

lxtransduce, lxmmaplex — XML transducer

Synopsis

lxtransduce [ -q query ] [ -r ] [ -l [lexicon-name=]lexicon-file ...] [ input-file ]

lxmmaplex input-file.lex output-file.mmlex

Description

lxtransduce is an XML transducer, intended for use in NLP (natural language processing) applications. XPath-based rules are matched against elements in the input document, and when one matches a corresponding rewrite is done. Plain text can also be processed, using regular expressions instead of XPaths; this is useful for tasks such as tokenisation.

The input-file argument may be a URI instead of a filename. If no input-file argument is given, standard input is used.

For details of how to write lxtransduce rule files (grammars), see the lxtransduce manual .

-q query: an XPath specifying the element to whose children the rules are applied. This query is streamed.
-r: apply rules recursively.
-a rule: the rule to be applied at top-level.
-l [lexicon-name=]lexicon-file: bind a lexicon name (default "lex") to a lexicon file.

Description

lxmmaplex converts lxtransduce lexicons from human-readable XML form into an on-disk hash table. This removes the cost of reading in the lexicon when lxtransduce starts, at the expense of slower lookup (one or more disk accesses per lookup). It is appropriate for lexicons with hundreds of thousands of entries or more.

input-file.lex: The file name of the XML lexicon to be converted.
output-file.mmlex: The file name for the hash-table lexicon. The suffix .mmlex is used to distinguish hash-table lexicons from XML ones.