The Modularization of RuleML

David Hirtle, Tshering Dema, Harold Boley

2006-09-01 - Version 0.91

Quick link: Official Model


The official model of the RuleML family of sublanguages, including its modularization history, is presented and explained.

Contents

Motivation

RuleML is a family of sublanguages whose root allows access to the language as a whole and whose members allow to identify customized subsets of the language. Therefore, RuleML's specification employs modular XML Schemas as pioneered by XHTML. Following the general software engineering principle of modularity leads to increased maintainability (taking advantage of inheritance) and interoperability, accommodating rule subcommunities who are able to specify whichever sublanguage in the family (each corresponding to an expressive class, e.g. Datalog and Hornlog) best suits their needs.

Official Model

The top-level branches of the RuleML family of sublanguages are shown below (with further additions planned). For more information on a particular sublanguage family, click the corresponding rectangle.

Official model of the top-level of RuleML

(SVG source)

Derivation RuleML

The official model of the Derivation RuleML family of sublanguages is shown below (where the blue-outlined rectangle is the entry point), with more explanation further down. Use the following links to switch between four levels of refinement (the default view is minimal refinement).

Sublanguages & modules (minimized) ||| Sublanguages only (minimized)

Official model of Derivation RuleML

(SVG source)

Consistent with object-oriented modelling conventions, the most expressive "class" (i.e. sublanguage) is shown at the top and generality decreases in top-down order. As in the the Unified Modeling Language (UML), a diamond-headed arrow indicates an aggregation association (e.g. "datalog is part of hornlog" and "cterm is part of hornlog") while regular-headed arrows indicate generalization as used for inheritance (e.g. "bindatalog is a datalog"). Note that certain aggregation associations, such as hornlog to nafhornlog and hornlogeq, branch and have multiple (here, two) targets. This new notation logically places all target nodes on the same (horizontal) level.

The ovals in this model represent elementary modules which act as "private" constituents of the actual sublanguages (which are represented as rectangles). This composition may happen directly, as is most obvious for datalog, or indirectly through subsequent associations. For example, the model shows that bindatalog is not directly associated with any modules, but it inherits them (with some modification) when it derives from datalog. According to these conventions, ovals cannot be associated with one another because they are dependent on rectangles. A dashed line indicates this dependency, as distinct from the standard aggregation relationship shown with a solid line.

The model conveys meaning on the "implementation" (i.e. XML Schema specification) level as well. In XSD, ovals become non-standalone modules containing element and/or attribute definitions, and are not intended to be used directly for validation. They may, however, be used to create new document types by users wishing to "borrow" certain elements of RuleML much like in XHTML. Rectangles, on the other hand, are schema drivers composed in whole or in part of these modules or derived entirely from other schema drivers.

The association lines in this model also reveal schema dependencies within the "implementation". In XML Schema, connected rectangles are joined using <redefine>, whereas ovals are connected to rectangles using <include>. Elementary modules are generally included "as is", but sublanguages connected with <redefine> either extend or restrict one another. In XML Schema, extension via <redefine> is distinguished from restriction by whether or not there is a self-reference (see this explanation). In other words, if there is no self-reference, the containing schema is deriving from a parent; otherwise, there is a self-reference and the schema is generalizing from a child.

For example, consider the following section of hornlog.xsd:

<xs:redefine schemaLocation="datalog.xsd">
  <xs:group name="arg.content">
    <xs:choice>
      <xs:group ref="arg.content"/>
      <xs:element ref="Expr"/>
      <xs:element name="Plex" type="Plex.type"/>
    </xs:choice>
  </xs:group>
</xs:redefine>  

The group self-reference in this XML Schema excerpt identifies hornlog as being an extension of datalog. On the other hand, the absence of a self-reference, as in the section of bindatalog.xsd below, indicates a restriction:

<xs:redefine schemaLocation="datalog.xsd">
  <xs:group name="Atom.extend">
    <xs:sequence>
      <xs:choice minOccurs="2" maxOccurs="2">
        <xs:element ref="arg"/>					
        <xs:group ref="arg.content"/>
      </xs:choice>
    </xs:sequence>		
  </xs:group>
</xs:redefine>

Most sublanguages contain mixtures of positional and slotted terms: a design decision was made to allow optional slots in all existing sublanguages instead of duplicating all sublanguages with "near copies" differing only by this one feature. With the frame module since 0.89, purely slotted Atoms (called frames) are introduced. Likewise, a purely positional sublanguage (cf. logic) is also possible.

PR RuleML

Production (PR) RuleML is currently under development.

Approach

The modularization of RuleML uses a content-model based approach. The approach is demonstrated graphically (with animation) for the 0.85 DTDs on slide 25 of the Object-Oriented RuleML: Re-Modularized and XML Schematized via Content Models presentation, where rectangles represent element declarations and circles represent their content models. Below, the approach is explained in terms of DTDs and then for XML Schema.

DTDs have limited support for modularity, but it can be achieved in a roundabout way using macro-like parameter entities. In particular, the contents of an external file can be included using an externally-linked parameter entity. For example, the following includes the contents of datalog.dtd:

<!ENTITY % datalog_include SYSTEM "datalog.dtd">
%datalog_include;

Simple inclusion is not enough, though: overriding is also necessary. Previously, this was managed using INCLUDE/IGNORE sections: the section that declared the element which had to be changed was simply IGNOREd, then the element was re-declared.

In version 0.85, this clumsy method of overriding was replaced with a much more elegent solution wherein every element's content model was explicitly defined by a parameter entity. The old rulebase label <_rbaselab>, for example, became declared as follows:

<!ENTITY % _rbaselab.content "(ind)"> 
<!ELEMENT _rbaselab %_rbaselab.content;>

Since parameter entities can overwrite one another (even across files), this content model could be easily replaced with another specified in a different DTD altogether, much like re-assigning a global variable in traditional programming languages. For example, the content model of the rulebase label <_rbaselab> is just (ind) in urcbindatagroundfact.dtd (as above), but was extended to permit a complex term (thus, becoming (ind | cterm)) in hornlog.dtd:

<!ENTITY % _rbaselab.content "(ind | cterm)">

(Note that this overriding entity must be defined before the inclusion of other files.)

The content model-based approach to modularization also works for XML Schema, using groups (and attributeGroups) instead of parameter entities. For example (now using 0.88+ syntax, where <_rbaselab> is now <oid>),

<!ENTITY % oid.content "(ind)"> 
<!ELEMENT oid % oid.content;>

becomes

<xs:attributeGroup name="oid.attlist"/>
<xs:group name="oid.content">
  <xs:choice>
    <xs:element name="Ind" type="Ind-oid.type"/>
  </xs:choice>
</xs:group>
<xs:complexType name="oid.type">
  <xs:group ref="oid.content"/>
  <xs:attributeGroup ref="oid.attlist"/>
</xs:complexType>
<xs:element name="oid" type="oid.type"/>

There is no need for workarounds in XSD: <redefine> makes the specified changes and includes everything else. For example,

<!ENTITY % oid.content "(ind | cterm)">

<!ENTITY % include SYSTEM "datalog.dtd">
%include;

becomes

<xs:redefine schemaLocation="datalog.xsd">
  <xs:group name="oid.content">
    <xs:choice>
      <xs:group ref="oid.content"/>
      <xs:element ref="Cterm"/>
    </xs:choice>
  </xs:group>
</xs:redefine>

History

Specifying RuleML with XML Schema (XSD) has allowed higher precision than DTDs, although it has proven to be non-trivial since the first attempt in 0.8. After some issues were resolved, the transition from DTD to XSD was finally made in 0.85. The 0.85 release also included the modularization being inverted to be more intuitive than the modularization used for 0.7 and 0.8. Other motivating factors behind this switch were simplicity (a single root with two distinct branches), consistency (inheritance in a single direction, for obvious super/subclass relationships) and efficiency (non-redundant implementation).

This attempt to re-modularize the sublanguage hierarchy revealed an inconsistency in XML Schema with respect to using <redefine>: it is straight-forward to extend a particle's occurrence range by increasing the upperbound (i.e. the value of the maxOccurs attribute), but not when it comes to decreasing the lowerbound (i.e. the value of the minOccurs attribute). Particular to RuleML, this "expressiveness gap" in XML Schema does not permit extending a binary atom (e.g. in bindatalog) to have an unbounded number of terms (e.g. in datalog). After some discussion on the W3C XML Schema developers list (xmlschema-dev@w3.org), it was decided that the modularization introduced in RuleML 0.85 could not (properly) be implemented using XML Schema.

At this point the modularization was re-analyzed and various alternatives (discussed in detail later) were evaluated. This lead to a whole new model of modularization, which stayed basically the same for 0.86, 0.87 and 0.88, reflecting both the XML Schema implementation and the expressiveness layering of RuleML. The model was significantly updated for RuleML 0.89: new sublanguages were added while others were made unnecessary, related sublanguages were grouped together, and multiple levels of refinement for viewing the model were introduced.

Alternatives

As discussed in the History section, when technical problems with an earlier modularization of RuleML were discovered, three alternative versions were considered and implementated. These alternatives and their evaluation are documented here for future reference.

Version 1: "Bus"

This first of the three alternative versions of RuleML modularization involves a novel approach, but its implementation still involves a workaround which the W3C XML Schema Validator (XSV) disagrees with in some cases.

The basis of this approach is the separation of the actual schema driver from an "auxiliary" container module for each sublanguage. The auxiliary modules inherit from one another while the drivers are not directly related, except through the auxiliary modules which they include. For example, the datalog sublanguage is specified in the driver datalog.xsd, which includes auxiliary module aux_datalog.xsd. Because of the XML Schema limitation, the atom module must be handled as a special case and be independent of the auxiliary modules, instead being included or redefined by each driver as needed. In this way, the atom module does not have to begin as being binary, so no problems arise as a result of trying to decrease its lower bound. The following diagram may clarify:

     ...                          atom
      |                             |
aux_bindatalog -- [bindatalog] -----|
      |                             |
 aux_datalog ----- [datalog] -------|
      |                             |
aux_urdatalog --- [urdatalog] ------|
      |                             |
     ...                           ...

The similarity between this representation and a bus architecture diagram should be quite clear, hence the mnemonic of "bus" for this version.

When attempting to validate hornlog.xsd, urhornlog.xsd, equalog.xsd or urequalog.xsd with XSV, the following message appears in addition to the usual results of successful validation:

Schema validator crashed
The maintainers of XSV will be notified, you don't need to send mail about this unless you have extra information to provide. If there are Schema errors reported below, try correcting them and re-running the validation.

Other validators such as XMLSpy and Saxon respond much better, but this error nonetheless casts doubt on the validity of these schemas. Because there are better alternatives to this approach, however, we will not pursue the matter further.

Version 2: "Star"

This second approach is largely monolithic in the sense that the driver for each sublanguage is entirely independent of other sublanguage drivers (except for urdatalog, urhornlog, urequalog and negation sublanguages). Thus each driver includes (and/or redefines, as the case may be) from various modules, e.g. bindatalog.xsd:

  ...
	
  <xs:include schemaLocation="modules/core_module.xsd"/>

  <xs:include schemaLocation="modules/desc_module.xsd"/>
	
  <xs:include schemaLocation="modules/clause_module.xsd"/>
	
  <xs:include schemaLocation="modules/boole_module.xsd"/>
	
  <xs:redefine schemaLocation="modules/atom_module.xsd">
    <!-- restrict atoms to binary -->
    <xs:group name="atom.extend">
      <xs:sequence>
        <xs:choice minOccurs="2" maxOccurs="2">
          <xs:element ref="ind"/>
          <xs:element ref="var"/>
        </xs:choice>
      </xs:sequence>	
    </xs:group>
  </xs:redefine>
	
  <xs:include schemaLocation="modules/role_module.xsd"/>
	
  <xs:include schemaLocation="modules/term_module.xsd"/>	
  
  ...

This might be visualized as follows, revealing the reason behind this approach being labeled "star":

          atom    boole    term
             \      |       /
              \     |      /
       core -- [bindatalog] 
              /     |      \    
             /      |       \    
           role   clause   desc  

The implementation of this version of modularization is quite straightforward and validates fine in XSV; the monolithic quality of the schemas avoids the aforementioned issue altogether. However, it deviates from previous versions of RuleML by involving very little inheritance, making sublanguage relationships (and expressivity) unclear. Another downside is that there is a lot of redundancy among the sublanguage drivers with modules being included separately in each.

This approach is basically the same as that used for the modularization of XHTML, but it seems much better suited for such scenarios where only a few sublanguages are involved. It is far from ideal for RuleML.

Version 3: "Tree"

The final approach considered here is similar to a previous modularization though oriented slightly differently and now involving elementary modules. It might be visualized as follows, a tree centered around the datalog sublanguage:

                    core
                      |
                atom  | desc
                   |  |  | 
             role  |  |  |   term
                 \ |  |  | / 
         boole -- [datalog] -- clause
                 *    *    *
              *       *       *    
           *          *          *          
     [urdatalog] [bindatalog] [hornlog]
          *     \              *     *
          *      \           *         *          
    [urcdatalog]  \     [urhornlog] [equalog]
          *        \    /               *    
          *         \  /                *              
   [urcbindatalog]   ur ---------- [urequalog]
          *
          *          
 [urcbindatagroundlog]
          *
          *         
[urcbindatagroundfact]

Clearly this version of modularization involves much more direct inheritance (indicated by "*") than the other versions so far considered. Its implementation avoids the XSD-technical limitation and is consistent with previous RuleML specifications wherein sublanguage relationships are explicitly indicated by taking advantage of inheritance. For example, bindatalog is fundamentally the same as datalog except with binary atoms, so bindatalog.xsd need only slightly redefine datalog.xsd:

  ...

  <!-- bindatalog redefines datalog so that atoms are binary -->
  <xs:redefine schemaLocation="datalog.xsd">
    <xs:group name="atom.extend">
      <xs:sequence>
        <xs:choice minOccurs="2" maxOccurs="2">
          <xs:element ref="ind"/>
          <xs:element ref="var"/>
        </xs:choice>
      </xs:sequence>
    </xs:group>		
  </xs:redefine>
  
  ...

Thus, the XSDs implementing this approach not only capture sublanguage expressiveness, but manage to do it in a compact and efficient way.

Evaluation

The following table roughly summarizes the relative advantages and disadvantages of each version of modularization (where 1 is best, 3 is worst):

Criterion Version 1: "Bus" Version 2: "Star" Version 3: "Tree"
Compactness 3 2 1
Conciseness 2 3 1
Extensibility 3 1 2
Inheritance 2 3 1
Maintainability 3 1 2
Readability 3 2 1
Stability 3 1 1

compactness - the number of files and associated storage space required to implement the modularization

conciseness - a measure of the lack of redundancy within the modularization

extensibility - how easily the modularization (and its implementation) will be able to accommodate predicted extensions to RuleML (e.g. transformation and reaction rules)

inheritance - the level of inheritance involved in the modularization

maintainability - how easily the implementation can be re-used and modified (as necessary) for future versions of RuleML (related to readability and non-proliferation of files)

readability - how easily the modularization and its implementation can be read and understood (related to consistency and simplicity)

stability - how well popular validators (e.g. XSV, XMLSpy and Saxon) react to the implementation

As indicated by the table, the "tree" version of modularization is the most favourable approach, judged to be better than the other versions on every level except extensibility and maintainability. It has therefore been further developed and now forms the basis of the official model for the modularization of RuleML.


Site Contact: Harold Boley. Page Version: 2005-12-15


"Practice what you preach": XML source of this homepage at index.xml (index.xml.txt);
transformed to HTML via the adaptation of Michael Sintek's SliML XSLT stylesheet at homepage.xsl (View | Page Source)