DISCLAIMER

The information presented in this document is preliminary.

RuleML Tutorial

Draft, 13 May 2005

This version:
http://www.ruleml.org/papers/tutorial-ruleml-20050513.html
Latest version:
http://www.ruleml.org/papers/tutorial-ruleml.html
Authors:
Harold Boley (http://www.cs.unb.ca/~boley)
Benjamin Grosof (http://ebusiness.mit.edu/bgrosof)
Said Tabet (http://home.comcast.net/~stabet)


Abstract

This document describes RuleML, the Rule Markup Language. RuleML is a markup language for publishing and sharing rule bases on the World Wide Web. RuleML builds a hierarchy of rule sublanguages upon XML, RDF, XSLT, and OWL.

The Kernel Datalog Sublanguage

The Datalog (constructor-function-free) sublanguage of Horn logic is the foundation for the kernel of RuleML. Datalog is the language in the intersection of SQL and Prolog. It can thus be considered as the subset of logic programming needed for representing the information of null-value-free relational databases, including (recursive) views. That is, in Datalog we can define facts corresponding to explicit rows of relational tables and rules corresponding to tables defined implictly by views.

RuleML Datalog, being a markup language, can conveniently represent relational information where all of the columns are natural-language phrases. To explain the Datalog features we will develop a small example formalizing natural-language business rules in RuleML. This example correspond to the 'Eligibility' Category of Terry Moriarty's Business Rule Classification.

Consider the English sentence

"Peter Miller's spending has been min 5000 euro in the previous year."

It can be marked up as the following RuleML Datalog fact:

<Atom>
  <Rel>spending</Rel>
  <Ind>Peter Miller</Ind>
  <Ind>min 5000 euro</Ind>
  <Ind>previous year</Ind>
</Atom>

This markup can be regarded as a kind of parse tree, where tags correspond to non-terminals labeling both inner nodes (oval, RDF-like anonymous resources), e.g. with 'Atom', and leaf nodes (rectangular, RDF-like literals containing PCDATA that correspond to terminals), e.g. with 'Rel':

Going through the tags from inside out, we find that "spending" is marked up as the relation name (table name) for the fact: <Rel>spending</Rel>. On the same level, the three phrases "Peter Miller", "min 5000 euro", and "previous year" are marked up as individual constants that are the three arguments (table columns) of the relation, in the given sequence: <Ind>Peter Miller</Ind>, <Ind>min 5000 euro</Ind>, and <Ind>previous year</Ind>. The entire relation application constitutes an atomic formula, marked up by <Atom> ... </Atom>.

In the above Datalog markup, the three"spending" arguments are not further analyzed; they are considered as just names of individual constants. For identification purposes, they would need to be repeated verbatim by a query that is to retrieve the above fact (as illustrated by the query in the body of rule below). A step-wise refining analysis could exhibit the internal structure and natural-language meaning of the basic three arguments using further, auxiliary, arguments, or going beyond Datalog, with all steps marked up in RuleML.

Notice that in the RuleML Datalog kernel a relation can be n-ary, i.e. have any fixed number, n = 0, 1, 2, 3, ..., of arguments. The unary/binary restriction of Datalog is an important special case, e.g. for combining RuleML with OWL, as in SWRL. While the above markup uses one 3-ary"spending" relation, it could also be reduced to three binary relations branching off from an individual constant that stands for the ternary relationship, as shown in a W3C Working Draft. Moreover, RuleML Datalog provides for the markup of null values via empty individuals, <Ind/>.

Now consider the English sentence

"A customer is premium if their spending has been min 5000 euro in the previous year."

This can be marked up as the following RuleML Datalog rule (an implication):

<Implies>
  <head>
    <Atom>
      <Rel>premium</Rel>
      <Var>customer</Var>
    </Atom>
  </head>
  <body>
    <Atom>
      <Rel>spending</Rel>
      <Var>customer</Var>
      <Ind>min 5000 euro</Ind>
      <Ind>previous year</Ind>
    </Atom>
  </body>
</Implies>

Like for the fact, this rule markup can be viewed as a parse tree:

Looking at these tags, notice that RuleML follows the Java class-vs.-method naming convention by distinguishing upper-case type tags from lower-case role tags. The atomic formula within the <body> role of the <Implies> is like the <Atom> constituting the above fact, except that <Ind>Peter Miller</Ind> is replaced by <Var>customer</Var>, the markup of a variable named "customer". This variable also occurs in an <Atom> within the <head> role of the <Implies>, which applies the unary relation<Rel>premium</Rel> to <Var>customer</Var>.

The rule and the fact can be used together for a first derivation example: The rule's body matches the fact, binding <Var>customer</Var> to <Ind>Peter Miller</Ind>; using this binding to instantiate the same variable in the rule's head, a new <Atom> is derived expressing that <Ind>Peter Miller</Ind> is a <Rel>premium</Rel> customer.

Besides using a single atomic formula in the body, a RuleML Datalog rule can also use an entire conjunction of atoms. This will allow complex conditions via 'and-ed' atoms, which can involve various variables.

As an example with a body 'and-ing' two atoms, consider the English sentence

"The discount for a customer buying a product is 7.5 percent if the customer is premium and the product is luxury."

It can be marked up as the following RuleML Datalog (implication) rule:

<Implies>
  <head>
    <Atom>
      <Rel>discount</Rel>
      <Var>customer</Var>
      <Var>product</Var>
      <Ind>7.5 percent</Ind>
    </Atom>
  </head>
  <body>
    <And>
      <Atom>
        <Rel>premium</Rel>
        <Var>customer</Var>
      </Atom>
      <Atom>
        <Rel>luxury</Rel>
        <Var>product</Var>
      </Atom>
    </And>
  </body>
</Implies>

Viewed as a parse tree, this markup contains an 'and' branch with atomic subtrees:

The main <Implies> tag here has a body whose <And> conjoins the <Rel>premium</Rel> atom of the earlier rule and a similar <Rel>luxury</Rel> atom; they are used for tests over two different variables. The rule's head is an atom applying a "discount" relation to these two variables and to an individual constant that marks up "7.5 percent".

While <Rel>premium</Rel> was defined by our first rule, <Rel>luxury</Rel> could be (partially) defined as

"A Porsche is luxury."

by another RuleML Datalog fact:

<Atom>
  <Rel>luxury</Rel>
  <Ind>Porsche</Ind>
</Atom>

Again, this fact markup can be viewed as a parse tree:

The new rule and fact can augment our earlier rule and fact for a chaining derivation example as follows: The first conjunct of the <Rel>discount</Rel> rule's body chains to the <Rel>premium</Rel> rule, which succeeds as shown earlier, binding<Var>customer</Var> to <Ind>Peter Miller</Ind>. The second conjunct just matches the <Rel>luxury</Rel> fact, binding <Var>product</Var> to <Ind>Porsche</Ind>. So, the <Rel>discount</Rel> rule succeeds with those bindings, proving an atom shown here with its translation to English:

"The discount for Peter Miller buying a Porsche is 7.5 percent."

This derived atom markup may be stored for further processing:

<Atom>
  <Rel>discount</Rel>
  <Ind>Peter Miller</Ind>
  <Ind>Porsche</Ind>
  <Ind>7.5 percent</Ind>
</Atom>

As always, such markup can be viewed as a parse tree:

Notice that we explained the first rule in a bottom-up manner and the second rule in a top-down manner. Actually, each rule can be used in both ways; the RuleML kernel is neutral with respect to any use direction.

RuleML uses Datalog as the kernel of its family of sublanguages. Its syntax is defined by an XML Schema. Its semantics is defined via Herbrand models. Various RuleML Datalog implementations exist, including one as part of jDREW.