In this paper I suggest reporting requirements for various categories of XML parser. These requirements are expressed in terms of the XML Infoset.
I consider three common categories of XML parser:
In addition, a parser may support XML Namespaces and/or XML Base.
Properties marked (*) are not explicitly required by the XML 1.0 specification.
| Document Information Item | |
|---|---|
| Property | Notes |
| [children](*) | Only element and PI children are required |
| [notations] | Only notations used as attribute values or PI targets are required |
| [entities] | Only unparsed entities used as attribute values are required |
| Element Information Item | |
|---|---|
| Property | Notes |
| [local name](*) | May be combined with [prefix] |
| [prefix](*) | May be combined with [local name] |
| [children](*) | Only element, PI and unexpanded entity children are required |
| [attributes] | May be combined with [namespace attributes] |
| [namespace attributes] | May be combined with [attributes] |
| [base URI](*) | The URI of the containing entity |
| Attribute Information Item | |
|---|---|
| Property | Notes |
| [local name](*) | May be combined with [prefix] |
| [prefix](*) | May be combined with [local name] |
| [normalized value] | |
| [attribute type] | Only so that entity and notation values can be identified |
| Processing Instruction Information Item | |
|---|---|
| Property | Notes |
| [target] | |
| [content] | |
| [base URI](*) | The URI of the containing entity |
| Unexpanded Entity Information Item | |
|---|---|
|
A WDR parser will only return these if an entity is undeclared or cannot be retrieved. Some WDR parsers reject the document in this case. | |
| Property | Notes |
| [name] | |
| [entity] | |
| Character Information Item | |
|---|---|
| Property | Notes |
| [character code] | |
| External Entity Information Item | |
|---|---|
| A WDR parser need only return these if it returns unexpanded entity items that refer to them. | |
| Property | Notes |
| [name] | |
| [system identifier] | |
| [public identifier] | |
| Unparsed Entity Information Item | |
|---|---|
| Property | Notes |
| [name] | |
| [system identifier] | |
| [public identifier] | |
| [notation] | |
| Notation Information Item | |
|---|---|
| Property | Notes |
| [name] | |
| [system identifier] | |
| [public identifier] | |
Minimal parsers return unexpanded entity items instead of the content of external entities.
Validating parsers must return the following additional property:
| Character Information Item | |
|---|---|
| Property | Notes |
| [element content whitespace] | |
Parsers supporting XML Namespaces must return the following additional properties:
| Element Information Item | |
|---|---|
| Property | Notes |
| [namespace name](*) | |
| [in-scope namespaces](*) | This may be returned implicitly by means of the [namespace attributes] property |
| Attribute Information Item | |
|---|---|
| Property | Notes |
| [namespace name](*) | |
Furthermore, provided they do provide the [in-scope namespaces] property of elements, they need not return the [namespace attributes] property.
Parsers supporting XML Base must take xml:base attributes into account when computing [base URI] properties.
Richard Tobin, February 2000