Winter Project #2.5: XML Schemas for XPages

Thu Jan 02 10:59:58 EST 2020

Tags: xml xpages

After I did my initial port of my XSP completion assistant to LSP4XML, I got to thinking about improvements I could make to it. One of these was the notion of creating XML Schemas for a project's XPages, similar to how the DXL contributor just passes along the schema files that ship with Notes and Domino.

Doing the same with XSP isn't nearly so easy, and comes with a good number of gotchas that make it not only impossible to fully validate an XPage with schemas, but also difficult-to-impractical to generate such schemas on the fly outside of Designer. But first, two asides!

Aside #1: Mechanisms of Content Assistance

At its core, doing content assistance in a text editor is essentially a matter of the editor saying to the assistance plugin "I have a file here, with this content, at this location, and the user's cursor is in this spot. What should I suggest as the next part to type? And, once they've typed it, can you tell me if it is valid?".

In the simplest case, this could be something like a dictionary of words when writing prose. A content assistance plugin for English-the-language could merely consist of a list of known words, and it would take a prompt from the editor of "ca" and suggest "car", "cat", and so forth. There's an upper limit to what that sort of thing can and should do, and just doing that would probably suffice.

Programming languages are more complicated, and often the best route for autocomplete is to basically also be a whole compiler with full knowledge of the language and the structure of not only the current file, but also of other files in the project and all the dependencies. That way, an IDE can suggest types, method names, parameters, and all sorts of complicated external notions.

XML/HTML content assistance is often somewhere in between there. In both the case of my original XSP implementation and the LSP4XML port, the route I took was to let the in-between layer handle knowledge of raw XML mechanics like tags and attributes, but then basically provide a dictionary of known words when asked. So, when the editor said "the user typed xp:v", my code searched through all of its known components and responded with "maybe xp:view or xp:viewPanel".

I decided over the last couple days to take another route, based on the fact that XML is designed to be a toolkit for making fully-described and -validated markup.

Aside #2: XML Validation

XML itself is something of a meta-language: it has its rules, but it's intended to be a format used to describe other, more-specific grammars. On its own, it has the notion of whether or not a document is well-formed: this means specifically that it follows all the syntax rules of XML, like the proper use of brackets, attribute quotes, element hierarchy, and all that. Well-formedness is comparatively easy to enforce, and basically any XML editor does, but it doesn't say anything about whether a given XML document is a valid example of its kind.

That job is left up to a secondary definition, usually done via either a Document Type Definition file or an XML Schema, though there are more ways than that. These are the things that say, for example, that in XHTML the root element must be <html>, and that element contains zero or one <head> element and one <body> element, and so forth.

With the aid of a document schema, an XML processor (such as a structured text editor) can verify first that a document is well-formed XML and second that all of the elements that it can match up with the schema are valid. Moreover, it can itself maintain the list of potential elements, attributes, and values, and display them in a clean and fast way without the content-assistance plugin having to worry about parsing and substring matching.

That "...elements that it can match up..." caveat comes in to play because XML allows mixing grammars in a single file and the processor may or may not require that all of those grammars be strictly defined. What identifies these grammars in a file is the use of an XML "namespace", which is a URI that may or may not actually go anywhere. For example, the MathML namespace is "http://www.w3.org/1998/Math/MathML" and the SVG one is "http://www.w3.org/2000/svg". Both of those are URLs that resolve to pages, but that's just because the W3C is being nice; the only requirement is that they are unique URIs. In an individual document, these namespaces may be matched up to prefixes, and one can be defined as the base namespace for elements that don't have prefixes.

XSP as an XML Grammar

Which brings us to XPages. XSP-the-markup-language is XML-based and so every XSP document must be well-formed. Designer won't even give you the time of day if, say, you leave off the closing </xp:view> tag at the end of a file. Additionally, XSP has many of the trappings of a fully-validated XML language. It uses namespaces as its prime identifier to identify tags and you can't just write any old tag in one of those namespaces or give an existing tag some random new attribute. Moreover, many attributes and elements have special rules about their content: you can't put text inside <xp:this.data/>, and <xp:text disableOutputTag="foo"/> is illegal. These are all specialities of XML Schema.

XSP does not, though, have any schema. Most of the validation happens at build time (including of XML well-formedness, apparently), and some is even delayed until runtime (like that invalid boolean above). There are some good reasons for this. First and foremost is the fact that XSP really describes Java objects that are themselves contributed programmatically, by way of XSP library classes. Once your path to validity runs through arbitrary Java code, it means that you don't have the option to statically compare the file to a schema - you have to run that code inside your environment. Additionally, the XPages namespace ("http://www.ibm.com/xsp/core") isn't even defined in a single place, and has controls declared across several plugins. Same goes for the ExtLib's "http://www.ibm.com/xsp/coreex" namespace, and then there's the Custom Control namespace, "http://www.ibm.com/xsp/custom", which can't even be determined at a global level and has to be synthesized repeatedly on a project-by-project basis. And then, beyond all of that, XSP has rules that just can't be expressed in XML Schema at all, so Designer would have to have a secondary validator anyway, dampening the benefits of codifying a schema.

But Could It Work, Though?

Still, I figured that, if I could craft a schema that's good enough for basic use, I could get some extra completion and suggestion assistance while hopefully marking everything as vague enough to not run into trouble with the flexible nature of XSP.

My biggest ally here is that, while the core and ExtLib namespaces are technically open at any point, it would be such bad form for, for example, a company-specific library to declare its components as part of it that I can ignore that possibility entirely. In effect, for a given Domino release, those namespaces are sealed and fully describable ahead of time.

So I set out to see if I could make XML schemas to contribute to LSP4XML to give it some more knowledge of what it's working with. Like when I generated the JSON used by the original content assistance, the tack I took was to write a servlet that runs in an XPages context and emits the files I want based on a stock runtime. My initial version of this was too clever: XML Schema allows for subclassing, inheritance, and references, and I originally set out to have the schemas match the structure of the underlying component trees. I got pretty close on this, but ended up spending all my time fighting namespace collisions, and in particular the really-subtle one where the roots of the tree are actually defined in the "http://www.ibm.com/xsp/jsf/core" namespace, which is an IBM fork of JSF's "http://java.sun.com/jsf/core". As a side note, there are no concrete component definitions there, so you can't get any use out of it in an XSP file; it's just an interesting implementation detail.

The Current Results

When I took a step back and gave it another whack, I ended up coming up with something that pretty much works. I'm still not sure if it'll actually be the way I keep going, though. For one, there are currently a bevy of open bugs, some of which may end up being showstoppers. Beyond that, though, since XSP still requires a secondary processor to check validity, it may end up being the most reasonable route to just go back to offering completion ideas and maybe some post-processing to check.

Still, this is one of those types of projects that's worth it just as a learning experience. I hadn't even really looked at XML Schemas since around when they came out, and this sure was a good way to get a crash course. And, in the mean time, I think that the generated schema files are pretty-interesting artifacts.

New Comment