Clarification/Elaboration of XML Schema requirements

Clarification/Elaboration of XML Schema requirements Henry S. Thompson

14 November 1998

1. A4: Define relationship of schemata to XML document instances

The group requested clarification

2. Anew: Identifier renaming

SGML/(XML) architectural forms provide a number of facilities, including various forms of content model subsetting and element and attribute renaming. All these are really application-side requirements, and are perhaps not quite the same as A12, which are in my view author-side.

3. B5: [Enable applications to use schemata to filter input documents]

Goal Application designers should be able to use schemata to define what they require of documents. Documents which provide more than that should still be processable.
Decomposition Application designers need to be able to specify how documents can 'go beyond' their requirements (cf. D3: Open Content Models, non-required attributes). Application users nned to be able to identify what parts of their (richer) schemata satisfy the application schema requirements (cf. Identifier renaming, 'kind of' assertion)

4. B6: [Switch between expression as attribute and expression as sub-element]

In writing several DTDs for schema languages, I have observed that the ElementType element type and the AttributeType element type were very similar. Perhaps schemata would be easier to write and maintain if this similarity was exploited. There are three ways one could imagine doing this:

Point this out, and exploit it by defining parameter entities in the Schema DTD for the common parts;
Actually abandon the two different element types in favour of, say, ComponentType, and in the content model make clear which sub-components were to be expressed in instances as attributes and which as sub-elements;
As (2), but actually change the data model so that instead of Element and Attribute nodes, in what a schema processor presents to applications we actually have Component nodes, with sub-components which are only incidently differentiated between attribute-expressed and sub-element-expressed. This has at least three sub-cases:
1. You still have to make explicit in a schema for each sub-compenent whether it is to be expressed as attribute or sub-element;
2. You can leave this specification out if the datatype precludes attribute expression;
3. Unless the datatype precludes this, you can specify that instances can choose on a case-by-case basis which expression to use (RDF allows this).

5. B7: Validation of documents across links

Stipulate that XML Link provides a way of expressing one or more varieties of transclusion (find my content over there; replace me with what's over there; . . .).

Goal Allow such links to be transparent to schema-validation, i.e. schema-validity should be assessed on the basis of the result of transclusion as well as on its invocation
Decomposition For find my content over there, c.f. Dnew, CONREF. For replace me with what's over there, need something new.

6. C2 (and D2?): Element Subclassing and Inheritance

The group requested subdivision, against a background of the observation that the balance between requirements language and implementation language was skewed too far towards the implementation style.

Accordingly, there follow hereafter a number of new candidate requirements.

6.1. Goal C2a:

Goal Provide explicit support for 'kind-of' relations between element types.
Reason Support good software engineering in Schema design by allowing declaration reuse within and across schemata. Replace common use of parameter entities with a principled mechanism.
Features If sub is a kind of super, then
- sub is valid in instances whereever super is valid;
- attributes which are valid on instances of super are valid on instances of sub.
Issues Shadowing vs. intersection vs. union wrt associated attribute declarations, content models, content datatype. Can an element type be declared as a kind of more than one other type? Does this mechanism preclude the necessity for separate declaration of attribute sets (cf. SOX)?

6.2. Goal C2b:

Goal Provide explicit support for 'kind-of' relations between attribute types.
Reason As C2a
Features If sub is a kind of super, then sub is valid in instances whereever super is valid;

6.3. Goal C2c:

I thought there was something else here, but I can't reconstruct it. Arguably something which attempts to reconstruct what SOX is doing with parameterised declarations belongs here.

7. D3: Support incomplete constraints on element content models

Goal Allow instances to be valid despite including more than what is declared explicitly
Reason See B5. Also necessary to support content model subsumption (see C2a).
Issues Specify for content model and attributes separately or together? Allow additional material anywhere, at end, at specified loci? Allow for level elision (i.e. if parent requires a child daughter, but makes no mention of any wrapper daughter, should there be a way of saying that child within a wrapper daughter is good enough)?

8. D6: Support for alternate encodings of numeric values

Goal Provide specialised support for constraining the radix in which numeric datatypes are expressed.
Reason If we don't support lexical constraints in general, this is a case which some applications may well require.