Reporting requirements for XML parsers

Introduction

In this paper I suggest reporting requirements for various categories of XML parser. These requirements are expressed in terms of the XML Infoset.

Parser categories

I consider three common categories of XML parser:

Minimal parser
Reads only the document entity. Does not process any parameter entities. Accepts any well-formed document.
Whole-document-reading (WDR) parser
Reads the whole document, including all parameter and general parsed entities. May reject documents that refer to undeclared entities even when this is technically only a validity error.
Validating parser
A WDR parser which checks validity constraints and satisifies certain additional reporting requirements. May reject invalid documents.

In addition, a parser may support XML Namespaces and/or XML Base.

Reporting requirements for WDR parsers

Properties marked (*) are not explicitly required by the XML 1.0 specification.

Document Information Item
PropertyNotes
[children](*)Only element and PI children are required
[notations]Only notations used as attribute values or
PI targets are required
[entities]Only unparsed entities used as attribute values are required

Element Information Item
PropertyNotes
[local name](*)May be combined with [prefix]
[prefix](*)May be combined with [local name]
[children](*)Only element, PI and unexpanded entity children are required
[attributes]May be combined with [namespace attributes]
[namespace attributes]May be combined with [attributes]
[base URI](*)The URI of the containing entity

Attribute Information Item
PropertyNotes
[local name](*)May be combined with [prefix]
[prefix](*)May be combined with [local name]
[normalized value] 
[attribute type]Only so that entity and notation values can be identified

Processing Instruction Information Item
PropertyNotes
[target] 
[content] 
[base URI](*)The URI of the containing entity

Unexpanded Entity Information Item
A WDR parser will only return these if an entity is undeclared or cannot
be retrieved. Some WDR parsers reject the document in this case.
PropertyNotes
[name] 
[entity] 

Character Information Item
PropertyNotes
[character code] 

External Entity Information Item
A WDR parser need only return these if it returns unexpanded entity items that refer to them.
PropertyNotes
[name] 
[system identifier] 
[public identifier] 

Unparsed Entity Information Item
PropertyNotes
[name] 
[system identifier] 
[public identifier] 
[notation] 

Notation Information Item
PropertyNotes
[name] 
[system identifier] 
[public identifier] 

Reporting requirements for minimal parsers

Minimal parsers return unexpanded entity items instead of the content of external entities.

Reporting requirements for validating parsers

Validating parsers must return the following additional property:

Character Information Item
PropertyNotes
[element content whitespace] 

Reporting requirements for parsers supporting XML Namespaces

Parsers supporting XML Namespaces must return the following additional properties:

Element Information Item
PropertyNotes
[namespace name](*) 
[in-scope namespaces](*)This may be returned implicitly by means
of the [namespace attributes] property

Attribute Information Item
PropertyNotes
[namespace name](*) 

Furthermore, provided they do provide the [in-scope namespaces] property of elements, they need not return the [namespace attributes] property.

Reporting requirements for parsers supporting XML Base

Parsers supporting XML Base must take xml:base attributes into account when computing [base URI] properties.

Richard Tobin, February 2000