(Computational) Linguistics and the Web: Hot research questions

I've spent the last ten years trying to feed technologies and insights from Linguistics and Computational Linguistics into the infrastructure of the Web. In this talk I'll give brief but intense introductions to four areas of research interest from (C)L and related disciplines which have the potential for making a real impact on the way the Web works:

A novel declarative approach to fixup of broken XML/(X)HTML: 'HTML in the wild' isn't grammatical, and the majority opinion is that only code and/or English can be used to standardise the fixup process. There is precedent for error-correcting parsers, I'll describe a variant that might work for HTML.
Counter-augmented Finite-State Automata for parsing XML: Parsing XML content models which allow numeric occurrence ranges (i.e. between 2 and 10 occurrences of (<x> followed by an optional <y>)) has historically involved worst-case exponential space. A new formalism, FSAs with counters, improves the situation considerably.
Functional XML -- Self-describing documents meet the lambda calculus: XML increasingly is valuable as a vehicle for information which gets manipulated, aggregated, transformed, etc. as a major part of its utility. Traditional approaches to specifying this have been external, i.e. scripting languages for XML data. The alternative presented here is XML documents which define their own processing.
Identity, URIs and the (Semantic) Web: Why is there any reason to suppose that the Semantic Web will succeed where thirty years of AI-based work on Knowledge Representation have, well, failed? The only real difference is the use of URIs for naming properties, classes and individuals. The current dominant ideology with respect to how names work in ordinary language offers some insight on this question.

Henry S. Thompson