Thoughts on Representations, Models, Notations
Henry S. Thompson
HCRC Language Technology Group
University of Edinburgh

6 October 1998

1. Descriptions of the world

We seem to be happy with some sort of limited second-order logic for representing what we know about the world. Individual facts tend to be expressed with N-ary predicates over constants and sets of constants, with the constants modelling individuals in the world, and predicates modelling properties and relations. We often need second order predicates between relations, to model e.g. temporal contingency and sequencing.

1.1. An aside about trees

I'm inclined to include trees as such in my logic, but this is problematic, so we'll go with a second-order sequencing predicate which holds between daughter predications.

I think that's closer to the mark than a single daughter predication between a constant and a sequence.

Quantification over predicates emerges in capturing intentional predicates.

2. Directionality and locality

The evolution of KR in the 1970s focussed on these questions: inference over the kind of logic outlined above was judged both too inefficient and too far from human reasoning, and various forms of KR which localised their predications, i.e. 'attached' a predication to one of its arguments, e.g. semantic nets and description languages.

The crucial point here is that these systems are no different from some second-order logic in terms of their models, but their inference regimes are quite different, and (purported) to be much easier to automate.

The price which is paid for this is either redundancy or assymetry in cost of question answering (cf. the students/classes/teachers example).

The navigation of data graphs for programmers is just a simple version of the inference problem.

3. Typed feature logics

Over the last ten years or so (computational) linguistics have been heavily exploiting and developing an alternative graph-model which clearly separates edge labels from node types called typed feature logic. It has a number of different surface forms, but models are usually understood to be labelled directed graphs, with all nodes associated with a type in a partially-ordered (lattice-structured) universe of types. Notions of subsumption, unification and path traversal are standardly defined.

4. Serialisation

Latterly with XML we've been confronting the serialisation issue: how do we send a fragment of description of the world down a wire? Can we use XML conveniently to do so? Do we want to serialise a localised, directional view, in which case the serialisation casts the assymetry in stone, or the 2nd-order view, in which case either we organise the serialisation by predicate:


<teach teacher='t1' class='c2'/>
<teach teacher='t1' class='c3'/>
<teach teacher='t2' class='c3'/>
. . .
<take student='s12' class='c1'/>
. . .

or we allow alternative serialisations which focus on (hoist?) one or another of the class/student/teacher types, using links to sort things out:


<teacher id='t1'>
 <class id='c3'>
  <name>Western Civ</name>
  <prereqs . . ./>
  <roll>
   <student id='s1' courses='c3 c5'>
    <name>Rafael Sabatini</name>
    . . .
   </student>
   . . .
  </roll>
 </class>
 <class id='c2'>
 . . .
 </class>
</teacher>
<teacher id='t3'>
 <class ref='c3'/>
 . . .
</teacher>

The crucial point here is that we've lost the ability to derive the underlying teach and take predications from this serialisation without some help: we don't know what's what as between entities and relations.

5. Trees and links

As far as I can see (XML-QL to the contrary notwithstanding) we can only manage this problem by providing a meta-vocabulary for specifying what aspects of the serialisation are what. This would sit naturally in the document schema.

There's also a requirement here for handling links (XLink and ID/IDREF) in a consistent way.

6. A uniform and complete alternative

An alternative possibility which

  1. is somewhat more user-friendly than the relation dump approach;
  2. but does not require decoration to reconstruct the data model

is to use attributes for all predications (= edges). We make this work by a uniform use of links:

The somewhat counterintuitive consequence of this is that we dont have sub-elements: all elements other than the root element are empty. So for example the same class/student/teacher dataset as above would look like this:


<class id='c3' name='Western Civ'/>
...
<teacher id='t1' name='Thorsten Veblen' teaches='c1 c4'/>
...
<student id='s1' name='Rafael Sabatini' supervisedBy='t3'
         takes='c3 c5'/>

There are still any number of ways of serialising the relations, depending on where you put the IDREFS, but they are all equivalent wrt reconstructing the underlying model if it's considered as undirected. If it's directed, then the serialisation can reflect the model directly. Note that on this account (smile, Istvan) element types are always node types, and attributes are always edge labels.

7. Links and RDF

I claim that especially once we've done all this, RDF reduced to an application profile of XLink and/or a schema with a particular de-serialisation strategy. I'll write this up in a separate note.