A Proposal for Namespaces in XML
Henry S. Thompson
Language Technology Group
HCRC
2 Buccleuch Place
Edinburgh EH8 9LW
Scotland
1. Introduction

I want to include material from a document whose DTD I don't control into a document whose DTD I control, and be able to point to identifiers in the included material. I agree with Tim's principle of element-centering, and I think Martin's proposal is basically on the right lines, but a bit too inflexible in being focussed on external entities with particular public identifiers.

The proposal which follows has two components: an explicit, prolix syntax based on marked sections, and an element-centred short form.

2. New Syntax Part I: Namespace Sections

I propose to add a single new production:

namespaceSection ::= '<!NS[' %Name '[' (%markupdecl*|%content*) ']]>'

Existing productions would need to be changed to allow namespaceSection in expansions of markupdecl and content.

The intent is that for something like

<!NS[ some-identifier [

. . . arbitrary XML . . .

]]>

the syntax is meant to be very similar to e.g. an INCLUDE marked section, but has the additional impact that every Name which appears inside it (i.e. element GIs, attribute names, identifiers, enumerated types) is particular to the namespace identified by 'some-identifier'. The consequence of this inside the namespace section is zero. The consequences of this outside are 2:

apparently identical names are not actually identical, e.g. in
<book id=b1>Troilus and Cressida</book>
<!NS[ excelbooks [
<book id=b1><sheet>....</book>
]]>
not only are the two IDs not in conflict, the two books are different element types as well.
From outside a namespace section, you can (only) refer to Names inside a namespace section via a fully qualified name, e.g.

excelbooks:b1

Note that following the consensus position on the list lately, I've used colon (':') as the name qualification character.

Here's an extended example, which uses two namespace sections, one to declare element types etc. and one to use them:

Full example:

Target doc't:

<!doctype target SYSTEM "[sysid1]" [
<!entity body SYSTEM "[sysid2]">
]>
&body;

Matrix doc't

<!doctype matrix ... [
<!NS[ target [
<!entity % targdtd SYSTEM "[sysid1]">
%targdtd;
]]>
<!element embed - - (target:body)>
]>
<matrix>
. . .
<embed>
<!NS[ target [
&body;
]]>
</embed>
. . .
<xref refid=target:id7>
. . .
</matrix>

I think I agree with Andrew Layman that you should only be able to refer to a Name inside a namespace section from outside with qualified Name, even if the Name is not defined in the referring context, because that's non-monotonic in an unreasonable way, i.e. the referent of an IDREF might change simply because you add some new text to your document.

On the other hand I think it might be sensible to allow reference out of an namespace section to the

unmarked
enclosing document with a null prefix, e.g. :higherId, but it's not clear that would be very sensible or useful.

3. Element-based namespace scoping

The overhead, both conceptual and literal, of using namespace sections within a document instance is acceptable for importing a single large sub-document as in the example above. It becomes less acceptable in the case of DTD fragment use, as exemplified in a number of recent examples from Andrew Layman, Martin Bryan and others.

If we assume that the documents we have in mind to construct using DTD fragments are to be valid(atable), then I claim no additional syntax is required, and all that is necessary is to define instance validity in terms of automatic namespace scoping within explicitly qualified element GIs and attribute names. That is, formally

What this means in practice is that fully qualified names pass their namespaces to their descendants down the grove.

4. Differences between this Proposal and CONCUR-based Proposals

As I see it the crucial difference is that neither authors nor parsers need to worry about marking namespaces on every element and attribute, and that document modifications are monotonic, that is, you can't break existing documents by simply adding things to their DTD. This is an acceptable price to pay, in my view, for losing local transparency. That is, in isolation you can't tell what the namespace of a node is, you need to be able to check its ancestors.

Note further that this approach has the nice property that Martin's does of allowing multiple (e.g. CALS) fragments to be imported into the same namespace.

5. One Thing That's Still Missing

If I just want namespaces to allow me to reuse simple IDs, I have to go to a lot of trouble. This is not a bit deal, it's just a matter of elegance. Suppose I want to tokenise all the paragraphs in a document, using the same IDs repeatedly. Namespaces nearly, but not quite, do what I want:

<p>
<!NS[ P1 [
<w id=w1>Now</w>
<w id=w2>is</w>
. . .
<w id=w16>party</w>
]]>
</p>
<p>
<!NS[ P2 [
<w id=w1>We</w>
<w id=w2>have</w>
<w id=w3>nothing</w>
. . .
<w id=w8>itself</w>
]]>
</p>

This isn't actually valid, alas, if all I have in the DTD is

<!element p (w*)>

because P1:W and P2:W, which are the GIs which really occur in the instance, are not allowed in P. Either I have to include a disjunction over all the qualified forms of W I intend to use, or I have to use :W everywhere instead of W.

I think this is marginal enough a need that the second workaround is acceptable, which is to say I haven't thought of a hack to address it which I think I can sell :-)

6. Conclusion

I recognise that this really breaks SGML. If there's any chance that WG8 will grandfather this, I think it's worth it. To me, going the CONCUR route just for compatibility is not worth it, I'd rather have no official namespace mechanism at all than that one.