In this paper I suggest reporting requirements for various categories of XML parser. These requirements are expressed in terms of the XML Infoset.
I consider three common categories of XML parser:
In addition, a parser may support XML Namespaces and/or XML Base.
Properties marked (*) are not explicitly required by the XML 1.0 specification.
Document Information Item | |
---|---|
Property | Notes |
[children](*) | Only element and PI children are required |
[notations] | Only notations used as attribute values or PI targets are required |
[entities] | Only unparsed entities used as attribute values are required |
Element Information Item | |
---|---|
Property | Notes |
[local name](*) | May be combined with [prefix] |
[prefix](*) | May be combined with [local name] |
[children](*) | Only element, PI and unexpanded entity children are required |
[attributes] | May be combined with [namespace attributes] |
[namespace attributes] | May be combined with [attributes] |
[base URI](*) | The URI of the containing entity |
Attribute Information Item | |
---|---|
Property | Notes |
[local name](*) | May be combined with [prefix] |
[prefix](*) | May be combined with [local name] |
[normalized value] | |
[attribute type] | Only so that entity and notation values can be identified |
Processing Instruction Information Item | |
---|---|
Property | Notes |
[target] | |
[content] | |
[base URI](*) | The URI of the containing entity |
Unexpanded Entity Information Item | |
---|---|
A WDR parser will only return these if an entity is undeclared or cannot be retrieved. Some WDR parsers reject the document in this case. | |
Property | Notes |
[name] | |
[entity] |
Character Information Item | |
---|---|
Property | Notes |
[character code] |
External Entity Information Item | |
---|---|
A WDR parser need only return these if it returns unexpanded entity items that refer to them. | |
Property | Notes |
[name] | |
[system identifier] | |
[public identifier] |
Unparsed Entity Information Item | |
---|---|
Property | Notes |
[name] | |
[system identifier] | |
[public identifier] | |
[notation] |
Notation Information Item | |
---|---|
Property | Notes |
[name] | |
[system identifier] | |
[public identifier] |
Minimal parsers return unexpanded entity items instead of the content of external entities.
Validating parsers must return the following additional property:
Character Information Item | |
---|---|
Property | Notes |
[element content whitespace] |
Parsers supporting XML Namespaces must return the following additional properties:
Element Information Item | |
---|---|
Property | Notes |
[namespace name](*) | |
[in-scope namespaces](*) | This may be returned implicitly by means of the [namespace attributes] property |
Attribute Information Item | |
---|---|
Property | Notes |
[namespace name](*) |
Furthermore, provided they do provide the [in-scope namespaces] property of elements, they need not return the [namespace attributes] property.
Parsers supporting XML Base must take xml:base attributes into account when computing [base URI] properties.
Richard Tobin, February 2000