Implementing RuleML Using Schemas, Translators, and Bidirectional Interpreters

W3C Workshop on Rule Languages for Interoperability Position Paper: 27-28 April 2005

This version:
http://www.ruleml.org/w3c-ws-rules/implementing-ruleml-w3c-ws.html
Authors:
Marcel Ball1, Harold Boley2, David Hirtle1, Jing Mei3, Bruce Spencer2
1UNB, 2NRC, 3PKU

Abstract

This paper presents the open source implementation of RuleML using XML Schema definitions, translators, and bidirectional interpreters written in Java.

Status of this Document

This document is a position paper as required for participation to the W3C Workshop on Rule Languages for Interoperability.

It represents the opinion of its authors.

Table of Contents

1 Introduction
2 Modular Schemas for a Family of RuleML Sublanguages
    2.1 Schema Modularization
    2.2 Benchmark for XML Schema Tools
    2.3 RDF Rules as Anchored, Slotted Datalog with Blank Nodes
3 Translators
    3.1 Normalizer
    3.2 POSL Parser and Generator
    3.3 OWL and SWRL Translator
4 Bidirectional Interpreters in Java
    4.1 jDREW Principles
    4.2 OO jDREW Slots
    4.3 OO jDREW Types
    4.4 OO jDREW Extensions
5 Use Cases
6 Conclusions
7 References


1 Introduction

The syntax of RuleML derivation rules has been defined by XML Schema definitions. The model-theoretic semantics of several RuleML sublanguages (including Datalog, Hornlog, and Folog) is defined in the classical way; for sublanguages with negation-as-failure, stable models have been proposed. The operational semantics of all RuleML sublanguages need ultimately be defined by a combination of translators and an interpreter or compiler. We have implemented the operational semantics of Derivation RuleML using XSLT translators and a bidirectional interpreter (the OO jDREW rule engine) written in Java. This reference implementation is available open source via the RuleML and jDREW websites.

2 Modular Schemas for a Family of RuleML Sublanguages

The top-level of the current family of RuleML sublanguages shows the major distinction between Derivation Rules, including Hornlog above Datalog, and Action Rules, including Production Rules. We focus here on various expressive classes of Derivation Rules and their XML Schema Definitions (XSDs). The most recent (public) schema specification of RuleML is always available at http://www.ruleml.org/spec.

2.1 Schema Modularization

Following a general software engineering principle for maintenance, we employ modular XSDs, as pioneered by XHTML, using a content model-based approach to take advantage of inheritance between schemas. Each expressive class syntactically distinguishable via an XSD (such as Datalog vs. Hornlog) can thus be addressed by the URI of its XSD. This permits receivers of a rulebase to validate if it conforms to the specified expressive class, before applying any class-specific tools (such as a Datalog vs. Hornlog interpreter). Moreover, a syntactic class is associated with a semantic class (such as Datalog vs. Hornlog with a function-free vs. function-containing Herbrand model).

A UML diagram of RuleML's modular Derivation Rule Schemas shows the most-expressive class (i.e. RuleML sublanguage) at the top, with generality decreasing in top-down order as consistent with object-oriented modelling conventions. Elementary modules, represented by ovals, act as "behind the scenes" constituents of the actual sublanguages, represented as rectangles. The relationships between these elements of the model are either aggregation, e.g. "Datalog is part of Hornlog" and "cterm is part of Hornlog", as indicated by UML's diamond-headed arrows, or generalization, e.g. "Bindatalog is a Datalog", as indicated by regular-headed arrows.

From an implementation perspective, the ovals are non-standalone modules containing only element and/or attribute definitions and are not intended to be used directly for validation. They may, however, be used to create new document types by others wishing to "borrow" certain elements of RuleML much like XHTML. The rectangles, on the other hand, are schema drivers composed in whole or in part of these modules or derived entirely from other schema drivers. The association lines in this diagram reveal all schema dependencies. In XML Schema, for instance, connected rectangles are joined using <redefine>, whereas ovals are (mostly) connected to rectangles using <include>. Elementary modules are normally included "as is", but sublanguages connected with <redefine> either extend or restrict one another.

2.2 Benchmark for XML Schema Tools

Specifying RuleML with XML Schema has proven to be problematic since the first attempt. After some issues were resolved, the transition from DTD to XSD was finally made but an attempt to re-modularize the sublanguage hierarchy revealed an expressiveness gap in XML Schema with respect to using <redefine>. Since then, there have been a number of RuleML-related exchanges on the W3C XML Schema developers list (xmlschema-dev@w3.org), especially regarding existing tools and their support for modular XSDs.

The complexity of the XML Schema specification has made full implementation difficult for developers. In fact, Michael Kay has stated on the xmlschema-dev list that he suspects no existing tool perfectly adheres to the specification. Testing and development of the latest versions of the RuleML schema specification have demonstrated that it serves as a benchmark for existing tools' support for modular XSDs. For example, the RuleML schemas have helped identify issues with the W3C's XML Schema Validator (XSV) (e.g. redefinition), Altova's XML Spy (e.g. self-references), and others. A validation summary of each version of the RuleML schema specification is maintained to ensure validity of the XSDs.

2.3 RDF Rules as Anchored, Slotted Datalog with Blank Nodes

As an important sublanguage example, the definition of RDF Rules can be introduced in the following steps:

Illustrating an RDF-like Business Rule 1:

<Implies> 
  <body> 
    <And> 
      <Atom> 
        <oid><Var>x</Var></oid>
        <Rel>product</Rel>
        <slot><Ind wref=":price"/><Var>y</Var></slot> 
        <slot><Ind wref=":weight"/><Var>z</Var></slot> 
      </Atom> 
      <Atom> 
        <Rel wref="swrlb:greaterThan"/> 
        <Var>y</Var> 
        <Data>200</Data> 
      </Atom>
      <Atom> 
        <Rel wref="swrlb:lessThan"/> 
        <Var>z</Var> 
        <Data>50</Data> 
      </Atom> 
    </And> 
  </body>
  <head> 
    <Atom> 
      <oid><Var>x</Var></oid>
      <Rel>product</Rel>
      <slot><Ind wref=":shipping"/><Data>0</Data></slot> 
    </Atom>
  </head> 
</Implies>

3 Translators

RuleML-to-RuleML translators are used for preprocessing rulebases such as via normalizers as shown in subsection 3.1. Since RuleML is enabling rule-system interoperation, (XSLT, ...) translators between rulebases in RuleML and other languages are key as exemplified in subsections 3.2 and 3.3.

3.1 Normalizer

RuleML uses an alternating type/role "striped syntax" similar to RDF in conjunction with "reconstructable role tags" in the spirit of Sandro Hawke's StripeSkipping. This allows both a user-friendly, type-only compact form as well as a fully-roled expanded form, or any combination thereof. The latter expanded form is considered the "normal form" because of its compatibility with RDF.

An XSLT stylesheet is available for normalizing the syntax used in a given RuleML document, reconstructing all skipped role tags to be in the fully-expanded, normal form. For example, the compact version of a fact asserting that John sells XMLBible to Mary,

<Atom>
  <Rel>sell</Rel>
  <Ind>John</Ind>
  <Ind>Mary</Ind>
  <Ind>XMLBible</Ind>
</Atom>

is expanded to the normalized form:

  <Atom>
    <opr>
      <Rel>sell</Rel>
    </opr>
    <arg index="1">
      <Ind>John</Ind>
    </arg>
    <arg index="2">
      <Ind>Mary</Ind>
    </arg>
    <arg index="3">
      <Ind>XMLBible</Ind>
    </arg>
  </Atom>

Additional information and examples are available from the Normalizer section of the RuleML schema specification.

3.2 POSL Parser and Generator

As the RuleML format can be verbose and not always convenient for end users to manually edit, a set of converters between the POSL syntax and RuleML were created. These converter servlets use the parsing and output routines from the OO jDREW library to parse a knowledge base in one format (either RuleML or POSL) and generate output containing an equivalent knowledge base in the other format, normalizing the order of slots as needed.

3.3 OWL and SWRL Translator

The ORDBase architecture contains OWLTrans to express the OWL semantics via entailment rules, either targeting a concrete system such as Jess (OWL2Jess) or a general language such as FOL RuleML (OWL2RuleML). With the OWL2RuleML stylesheet, an OWL-Full ontology can be transformed into a collection of RuleML ground facts. On top of this, an FOL RuleML rulebase was produced as the target language for a transformational implementation of the OWL semantics, which can be easily extended by user-defined rules. Alternatively, SWRL rules may be edited with Protégé's SWRL Editor, and transformed to RuleML rules using the SWRL2RuleML stylesheet.

4 Bidirectional Interpreters in Java

As part of the implementation of RuleML, a set of bidirectional interpreters, written in Java to allow portability between platforms, were created. The OO jDREW reasoning engine contains two modes: a Bottom-Up (forward chaining of rules) version, and a goal driven top-down (backward chaining of rules) version which works in a fashion similar to most Prolog systems. Demo applications (using Java Web Start) are available at http://www.jdrew.org/oojdrew/demo, and the source will be made available under an open source license in the near future. Some details about the features available in OO jDREW are detailed below.

4.1 jDREW Principles

The task of developing reference implementations for RuleML, which is an evolving standard, is made easier by using a tool-box approach, which is one of the design issues for jDREW: the flexibility to quickly cope with changes to the syntax and required operations to implement the various semantics. There are utilities in the jDREW toolbox for various tasks: reading files of RuleML statements into the internal clause data structure, storing and manipulating clauses, unification of clauses according to the positions of the selected literals, a basic resolution engine, clause to clause subsumption and clause to clause list subsumption, choice point managers, priority queues for various reasoning tasks, and readable top-level procedures.

Much of the flow of control is oriented around iterators, objects that maintain the state of a partially completed computation. Thus you pay as you go when you want the engine to perform the next step. The advantages of this architecture are its consistency and efficiency. There is a common interface for many different reasoning tasks, and there are few additional data structures introduced for storing intermediate results, other than those required by the abstract reasoning procedure. For instance, asking for the next solution in a top-down machine will cause the choicepoint iterator to initiate a request for the most recent choice to be undone, which will produce requests to other iterators. Likewise, in the bottom-up system, solutions are generated one-at-a-time, so asking for the next solution may cause the following steps: An iterator will be asked to select the next clause in the so-called new results list that matches eligibility requirements (like not being already subsumed). That selected clause will be resolved against the so-called given list, a list of other clauses. The given list is also generated by an iterator, so the new clause is resolved against some member generated by the iterator. The resolution result, if it meets eligibility requirements, will be returned. A subsequent call for a new result will cause the next item to be generated from the given list and then to be resolved against the same selected clause. Note that we did not need to resolve the selected clause against everything in the given list and store all these results, some of which may not have been needed. It is cheaper to save the state of partial computations than to compute and then store these unnecessary results.

4.2 OO jDREW Slots

During the creation of the internal structures, the OO jDREW terms representing atoms and complex terms are normalized, producing the following order for the parameters: oid (object identifier), positional parameters (in their original order), the positional rest term, slotted parameters (in the encounter order), and finally the slotted rest term. Since the ordering of slots within RuleML atoms and complex terms does not carry information, any order can be imposed. In OO jDREW, the slots are ordered based upon the sequence in which they are initially encountered.

By using such a normalized form we are able to implement an efficient unification algorithm that is O(m + n) (where m and n are the numbers of parameters), instead of O(m * n). In our algorithm we scan the two lists of parameters -- matching up roles (and positions in the case of positional parameters) -- and unify those parameters. If a role is present in one term but not in the other then the unmatched role is added to a list of rest terms in case the other has an appropriate rest term (otherwise unification fails). Such a collection of rest terms is used to dynamically generate a Plex (RuleML's closest equivalent to a list) to be assigned to the corresponding rest parameter.

4.3 OO jDREW Types

OO jDREW includes an order-sorted type system as a core component. This type system allows the user to specify a partial order for a set of classes in RDFS via their (multiple) superclasses, allowing for the reuse of lightweight taxonomies of the Semantic Web. Currently, the system only models the taxonomic relationships between the classes, and cannot model properties with their domain and range restrictions. For example, the current system can model that 'Car' is a 'Motor Vehicle', but cannot model that a car must have a make, model, year, etc.

By building an order-sorted type system into OO jDREW we are able to restrict the search space to only those clauses that have the appropriate types specified for their parameters, leading to a faster and more robust system than one where types are reduced to unary predicate calls in the body.

Extensions to the type system are being considered that would expand its modeling ability. In particular, the user could define a signature using RDFS properties to specify that certain slots must be present for a typing to be valid. We would then be able to prescribe that 'Car' has slots for at least make, model, year, which is not possible in the current system.

4.4 OO jDREW Extensions

Negation-as-failure (Naf) has first been implemented in OO jDREW TD, and recently introduced into OO jDREW BU for stratified programs. In bottom-up mode, Naf attempts to look up its argument atom via a unifying fact (when no other rule is applicable). If this look-up succeeds, hence the Naf fails, then this rule will be deleted from the given list, else the rule will be partially evaluated into one without Naf.

Equivalence classes have been implemented in OO jDREW BU for the sublanguage datalogeq (Datalog with Equality). For equality clauses, a newly-built data structure called EqualTable is used to map all individuals to one equivalence class. For each equivalence class, we append a fresh symbol to the original OO jDREW SymbolTable, and all equivalent individuals are redirected to this new symbol. That is, the process of unification and resolution will deal with this new symbol, representing the equivalence class as a whole. As a result, the axioms of equality are satisfied.

Illustrating Naf and Equal with a Datalog-like Business Rule 2:

 
<Implies>
  <head>
    <Atom>
      <Rel>discount</Rel>
      <Var>customer</Var>
      <Var>product</Var>
      <Ind>5.0 percent</Ind>
    </Atom>
  </head>
  <body>
    <And>
      <Atom>
        <Rel>premium</Rel>
	<Var>customer</Var>
      </Atom>
      <Atom>
	<Rel>onsale</Rel>
	<Var>product</Var>
      </Atom> 
      <Naf>
	<Atom>
	  <Rel>special</Rel>
	  <Var>product</Var>
	</Atom>
      </Naf>
    </And>
  </body>
</Implies>
 
 
<Equal>
  <Ind>fatherOFtom</Ind>
  <Ind>bob</Ind>
</Equal> 
<Equal>
  <Ind>fatherOFtom</Ind>
  <Ind>uncleOFmary</Ind>
</Equal> 
<Atom>
  <Rel>premium</Rel>
  <Ind>bob</Ind>
</Atom>
<Atom>
  <Rel>onsale</Rel>
  <Ind>clothes</Ind>
</Atom>
<Atom>
  <Rel>special</Rel>
  <Ind>clothes</Ind>
</Atom>
 
1: premium("uncleOFmary").
1: premium("fatherOFtom").
1: premium("bob").
2: equal("uncleOFmary",uncleOFmary).
2: equal("fatherOFtom",uncleOFmary).
2: equal("bob",uncleOFmary).
2: equal(fatherOFtom,bob).
3: onsale(clothes).
4: special(clothes).
 
 
<Equal>
  <Ind>fatherOFtom</Ind>
  <Ind>bob</Ind>
</Equal> 
<Equal>
  <Ind>fatherOFtom</Ind>
  <Ind>uncleOFmary</Ind>
</Equal> 
<Atom>
  <Rel>premium</Rel>
  <Ind>bob</Ind>
</Atom>
<Atom>
  <Rel>onsale</Rel>
  <Ind>clothes</Ind>
</Atom>
 
1: discount("uncleOFmary", clothes, "5.0 percent").
1: discount("fatherOFtom", clothes, "5.0 percent").
1: discount("bob", clothes, "5.0 percent").
2: onsale(clothes).
3: premium("uncleOFmary").
3: premium("fatherOFtom").
3: premium("bob").
4: equal("uncleOFmary",uncleOFmary).
4: equal("fatherOFtom",uncleOFmary).
4: equal("bob",uncleOFmary).
4: equal(fatherOFtom,bob).
 

5 Use Cases

The primary use case of RuleML and OO jDREW has been the New Brunswick Business Knowledge Base (NBBizKB). The goal of the NBBizKB project was to extract business information from various sources, including the Yahoo! business and New Brunswick Biznet databases, integrate the sources and validate the results using a set of declarative rules.

Once the knowledge was extracted from our sources and converted to a RuleML knowledge base, a set of rules to integrate and validate the knowledge were written. These handcrafted rules fall into three categories: the first set is used to map the NAICS sector codes used in the Biznet database to the Yahoo categories; once these sector-category mappings were created, a second set of rules were used to map the Biznet facts and Yahoo facts into one set of facts with a common format; finally, a set of rules were used to check for possible inconsistencies in the data.

A second use case of RuleML and OO jDREW is the RACOFI Music project. This project used OO jDREW and RuleML as part of a recommendation system for music albums, in conjunction with collaborative filtering. On the RACOFI Music site, users have the ability fill in a search form, and based upon their search parameters and the results from the collaborative filtering algorithm, a knowledge base is generated. This generated knowledge base is then processed by the OO jDREW reasoning engine and used to create a list of recommendations.

6 Conclusions

RuleML has an open source implementation which is maintained as the standard evolves. The syntax of the family of sublanguages is defined by modular XML Schema definitions which serve as a benchmark for existing tools' support for modularity. For normalization and interoperability with other standards, translators have also been realized, primarily via XSLT. Finally, the operational semantics of RuleML are implemented by a set of bidirectional interpreters (OO jDREW) written in Java for cross-platform compatibility.

7 References

Anon Res
Reasoning about Anonymous Resources and Meta Statements on the Semantic Web, G. Yang and M. Kifer, In Journal on Data Semantics, Volume 1, Pages 69-97, 2003.
jDREW
The Design of j-DREW: A Deductive Reasoning Engine for the Web, B. Spencer, In Proceedings of the First CologNET Workshop on Component-Based Software Development and Implementation Technology for Computational Logic Systems. CBD ITCLS 2002, Madrid, Spain. September 20, 2002. pp. 155-166.
OO RuleML
Object-Oriented RuleML: User-Level Roles, URI Grounded Clauses, and Order-Sorted Terms, H. Boley, In Proc. Rules and Rule Markup Languages for the Semantic Web (RuleML-2003). Sanibel Island, Florida, LNCS 2876, Springer-Verlag, October 2003.