RuleML DTDs
]]> R u l e M L

RuleML DTDs

Harold Boley, Benjamin Grosof, Said Tabet, Gerd Wagner

Version History, 2001-01-25: Version 0.7

Version History, 2001-07-11: Version 0.8

Latest version: www.ruleml.org/spec

Ongoing Work: XML Schema Version 0.8

All DTDs: DTD Directory

Some Examples: Examples Directory




This is a revised DTD draft for RuleML. Each DTD in the evolving hierarchy corresponds to a specific RuleML sublanguage. The DTDs use a modularization approach similar to the one in XHTML in order to offer appropriate flexibility and accomodate different implementations and approaches. We will write a technical report on this system of RuleML DTDs (see also KR Principles and DTD Modularization).

Changes

The current Version 0.8 differs from the earlier Version 0.7 by principally shifting the approach from a positional representation of rules towards an object-centered one. This was motivated by RuleML's semantic neutrality w.r.t. backward and forward reasoning as well as by a comparison of the mostly positional XML data model with the mostly object-centered RDF data model: In Version 0.7, a rule ('if') element's children were positionalized, XML-style, in the fixed order of a conclusion followed by premises; in Version 0.8, a rule ('imp') element's children are predicate-object ('role'-'type') pairs, RDF-style, whose positions are immaterial. Thus, while Version 0.8, as well as future versions, remain based on XML, we make an object-centered usage of XML via novel 'role' tag names complementing the normal 'type' tag names. In order not to change too many things at once, we didn't yet work on proceeding from DTDs to XML Schemas. However, we already see some places where the added expressive power of XML Schemas could help Version 0.8, in particular, XML Schema datatypes. The XML Schema expressiveness should also improve the other important change in RuleML, namely the replacement of Version 0.7 ur elements by corresponding Version 0.8 attributes. This permits not only each ind but also each rel (as required for RDF triples) etc. to be regarded as a URI 'object'. Important additions in RuleML 0.8 are the n-tuple (tup) and role-list (roli) datatypes, which can be employed, respectively, in a positional and non-positional manner, to replace all n-ary operations by unary ones.

The DTD files etc. of the earlier Version 0.7 will be kept "as is"; actually, RuleML 0.7 can be regarded as a language in its own right, which may serve future RuleML versions as a "purely positional" reference language for feature comparisons etc. Since to our knowledge not many rulebases have yet been written in Version 0.7 (with the notable exception of GEDCOM), we currently have no full-blown XSLT translator for automatically upgrading them to Version 0.8. However, there is a GEDCOM-oriented upgrading translator (cf. the GEDCOM entry of the RuleML Rulebase Library), which should be easy to adapt to other RuleML 0.7-to-0.8 upgrades. Version 0.8 was partially inspired by a presentation of RuleML 0.7 to an RDF audience. Hence, a RuleML 0.7-to-RuleML 0.8 translator could probably be constructed taking our experimental RuleML 0.7-to-RDF translator as a starting point. Since RuleML 0.7 is a positional system with similarities to RFML, a 0.7-to-0.8 translator could also take advantage of the translator from RFML to RuleML 0.8.

Overview

The upper layer of the RuleML hierarchy of rules is discussed in our main page's design section. In that terminology, the system of RuleML DTDs presented here only covers derivation rules, not reaction rules (special tags for facts have now been introduced).

This is because we think it is important to start with a subset of simple rules, test and refine our principal strategy using these, and then work 'up' to the more general categories of rules in the hierarchy. For this we choose Datalog, a language corresponding to relational databases (ground facts without complex domains or 'constructors') augmented by views (possibly recursive rules), and work a few steps upwards to further declarative rules as allowed in (equational) Horn logic. We also introduce a URL/URI language corresponding to simple objects. The 'UR'-Datalog join of both of these classes then permits inferences over RDF-like 'resources' and can be re-specialized to RDF triples: hierarchy slide.

Regarding the concrete markup syntax, we have been experimenting with several DTDs prior to the current, still preliminary, version. The rationale for our current tags is as follows. Rather than leaving conjunction implicit, an explicit tag pair <and> ... </and> with a sequence of N conjuncts is used (this would preferably be a set of conjuncts), preparing the unavoidable explicit markup of other boolean connectives (mainly <or> ... </or>) and their nesting. As a result of previous discussions, RuleML now uses an XML-RDF-unified data model with "Order-Labeled (OrdLab) Trees" as its notational base; cf. A Web Data Model Unifying XML and RDF. In particular, we conventionally mark up RDF-like 'predicates' or N3-like 'verbs', here called 'roles', by "_"-prefixed tags in XML (if all class-like 'type' tags would start with an upper-case letter, then 'role' tags could also be distinguished, Java-like, by having them start with a lower-case letter, as in The FRODO rdf2java Tool; alternatively different namespaces for RuleML types and roles could be used); dually to regarding "_" as a prefix to be reserved as the first character of role tag names, it can also be viewed as an extension of the opening angular brackets of role start tags, "<_", and end tags, "</_". Using an atom (for a single premise) or an and (for a conjunction of premises) in the role of the body and an atomic conclusion in the role of the head, rules aggregate two commutative roles; in particular, our Horn-like implication rules equivalently become <imp> <_body> <and> prem1 ... premN </and> </_body> <_head> conc </_head> </imp> or become <imp> <_head> conc </_head> <_body> <and> prem1 ... premN </and> </_body> </imp> (thus unifying KIF's "implication" and "reverse implication" syntaxes). The main advantage of roles is that of feature-term or object-centered modeling: If some extra information is to be added to an element such as a priority factor to the imp element, then it is easy to attach, RDF-like, a new _priority role with a float-type value; on the other hand the insertion, XML-like, of the float-type value directly into the child sequence would (be harder to read and) cause all subsequent children to assume a new position in the element (a problem for processing via XSLT etc.). The (head and body) roles of the two subelements (children) of the XML element <imp> head body </imp> or <imp> body head </imp> enable commutativity at the cost of introducing an extra level of markup (but see below). Future equivalence rules <equiv> lequiv requiv </equiv>, with interchangeable (lequiv and requiv) subelements, could use (implicit) _1 and _2 roles also used in all other RuleML connectives such as a binary and. The extra level of markup introduced by roles is most valuable, hence only used in RuleML 0.8, when there are meaningful role names such as <_head> and <_body> (as contrasted to 'structural' role names such as _1 and _2): If meaningful role names are visualized as arc labels in a tree representation like RuleML 0.8's T3 in section Context, this tree does not entail extra depth compared to RuleML 0.7's T1, only meaningful extra names for certain arcs. In general, the type-role alternation of 0.8 markups, similar to RDF, is nicely visualized via a node-label alternation in trees. Moreover, two different roles can be used to uniquely access (e.g., via XSLT) the same type in a position-independent manner such as when the <_head> and <_body> of a (single-premise) rule are both the <atom> type. Even when a type is later changed (say, from <atom> to <and>), the role (say, <_body>) can stay the same (e.g., for uniform XSLT access). Finally, a graphical RuleML editor could directly work on T3-like (OrdLab) trees. The backward variant <imp> head body </imp> of most examples below makes them better comparable to the standard notation of Horn rules. The forward variant <imp> body head </imp>, better comparable to production rules, will be exemplified in section Abstraction, T3[9of16]. Future directed equations could be easily added via a 'foot' role for an equation's right-hand side (the defined function's returned value): <direq> head foot body </direq> or <direq> head body foot </direq>. In the new data model an element can have "mixed content" in the new sense of having both 'role' and 'type' children (see the atom examples in T3/X3 below whose content consists of one _opr-role child before (or after) _1, _2, ... var-type children): while the 'type' children form an ordered sequence as in XML (without need for RDF's Sequence container), the 'role' children are commutative as in RDF (treating an ordered sequence as a unit, as if it was reified into a Sequence container under an _args role as in T6/X6). The "_"-integers _1, _2, ... can be viewed as 'system-generated' roles, which are always useful when one runs out of (meaningful) 'user-defined' roles: Like for 'rest' variables in Lisp, Prolog, etc., there is at most one _1, _2, ... sequence per element for capturing otherwise unnamed, normally adjacent, children (normally, no other, role-named child should intervene in this sequence, so it is improbable that we will introduce _op-role infix variants in the future). In RuleML 0.8 the (implicit) roles _1, _2, ... are used generically for the arguments of relations, functions, and constructors; they could also have been named _arg1, _arg2, ..., but we wanted to be consistent with RDF's rdf:_1, rdf:_2, ... container-element predicates and with numeric indexes into 1-dimensional arrays. Similary, the roles _opr, _opf, _opc are substitutes (required by DTDs) for a generic role _op for the operator of relations, functions, and constructors. Role names need only discriminate between the children of an element, and the current set was partially chosen for mnemonic reasons. While the _op roles could be avoided by regarding an operator, Lisp-like, as part of its own argument sequence (as if it was role _0 or_arg0), we did introduce them in RuleML 0.8 for the following reasons: An _op role for the operator complements one (implicit) _args role for a sequence of the arguments (explicit in T6/X6 of section Context) or a sequence of (implicit) _arg1, _arg2, ... or _1, _2, ... roles for the arguments (explicit in T4/X4 and T5/X5 of section Context). Using n-tuples, _op complements one (implicit) _1 role for a tup of the arguments (cf. T7/X7 of section Context). There may be further sibling roles to discriminate _op from on the level of atom (and nano and cterm) types such as a _qual(ification) role or a _comment role. The _op-role notation allows both prefix and postfix variants, as well as possible future infix variants (especially for binary relations, functions, and constructors), where the _op role would intervene in the sequence of consecutive _1, _2, ... children (which would hinder our current sequence-as-Seq view). While in the current first-order RuleML sublanguages the role _opr always has a child of type rel (similarly for _opf and _opc), so that there is some duplicate role-type markup, in future (syntactically) higher-order RuleML sublanguages the role _opr may also have children of type var (for relation-valued variables) or type nano (for relation-valued function calls) and the roles _1, _2, ... may also have children of type rel (for relations as arguments), so that no role-type redundancy remains in the markup. Full role markup can also give us uniformity to a future feature-term sublanguage within RuleML (prepared by the current role-list datatype in http://www.ruleml.org/0.8/dtd/ruleml-hornlog.dtd). This should be closely coordinated with the use of F-Logic in Triple. Finally, this brings RuleML closer to RDF, DAML+OIL, and other language for the Semantic Web. Similarly, we keep the _head role inside the fact type, which originates from, and permits access uniform to, its rule (or clause) ancestor imp, since there soon should be other named roles in facts, e.g. _priority, so _head will not stay 'lonely' for long.

Context

Here we exemplify RuleML 0.8 in the context of six versions of rule representations as trees and their corresponding XML markups, from totally ordered (most concise) to totally labeled, with Seq containers (most verbose). The first version corresponds to RuleML 0.7 with if replaced by imp. The last version is the one most related to RDF. The T3/X3 version corresponds to RuleML 0.8, which employs RDF-like role labels exactly where they prevent order overspecification (which arbitrarily puts a non-positional type into the child order) and uses the natural XML child order instead of RDF's Seq containers.

We will use the following sample rule:

own person object buy person merchant object keep person object ]]>

<_head> own person object <_body> buy person merchant object keep person object ]]>

<_head> <_opr>own person object <_body> <_opr>buy person merchant object <_opr>keep person object ]]>

<_head> <_opr>own <_1>person <_2>object <_body> <_opr>buy <_1>person <_2>merchant <_3>object <_opr>keep <_1>person <_2>object ]]>

<_head> <_opr>own <_1>person <_2>object <_body> <_1> <_opr>buy <_1>person <_2>merchant <_3>object <_2> <_opr>keep <_1>person <_2>object ]]>

<_head> <_opr>own <_args> <_1>person <_2>object <_body> <_args> <_1> <_opr>buy <_args> <_1>person <_2>merchant <_3>object <_2> <_opr>keep <_args> <_1>person <_2>object ]]>

A comparison of these six tree/markup versions shows the following: Version T1/X1 overspecifies the positions of rule heads and tails and of atom oprs. Version T2/X2 still overspecifies the positions of atom oprs. Version T3/X3 has no overspecification or redundancy. Version T4/X4 uses redundant labels on atom arguments. Version T5/X5 uses additional redundant labels on and arguments. Version T6/X6 Seq-reifies the redundantly labeled arguments. For RuleML 0.8 we thus chose Version T3/X3.

However, RuleML 0.8 also provides an n-tuple datatype in its http://www.ruleml.org/0.8/dtd/ruleml-hornlog.dtd. This tup type can be employed to reduce all atoms to binary "_opr"-"_1" element pairs, where the original n arguments after opr become the n elements of a single tup argument under an (implicit) _1 role. Similarly for nanos and cterms. Such a 'tupping' of arguments may be used to confine positional types to n-tuples and other built-ins (e.g., 'and'). The Version T7/X7 below exemplifies.

<_head> <_opr>own person object <_body> <_opr>buy person merchant object <_opr>keep person object ]]>

Abstraction

For the above T3/X3 version constituting RuleML 0.8 we now exemplify the abstraction achieved by regarding all commutative possibilities as equivalent. In this 'abstract syntax' commutative tree variants will be graph-theoretically equivalent and the corresponding commutative markup variants will be algebraically equivalent. More precisely, for trees the branching order of the (explicitly labeled) "*" arcs is immaterial and for markups the following equation holds: <element>. . .<_role1>...</_role1>. . .<_role2>...</_role2>. . .</element> = <element>. . .<_role2>...</_role2>. . .<_role1>...</_role1>. . .</element>. Note that such an abstraction is also implicit in RDF graphs and serializations, since both the triples within RDF models and the pairs within rdf:Description can be permuted without information loss. The following examples of the sixteen equivalent 'Commutations' of T3/X3 illustrate, starting with our original T3/X3 version.

<_head> <_opr>own person object <_body> <_opr>buy person merchant object <_opr>keep person object ]]>

<_head> person object <_opr>own <_body> <_opr>buy person merchant object <_opr>keep person object ]]>

. . .

<_body> <_opr>buy person merchant object <_opr>keep person object <_head> <_opr>own person object ]]>

. . .

<_body> person merchant object <_opr>buy person object <_opr>keep <_head> person object <_opr>own ]]>

Explanations

Appended below is a preliminary DTD, designated version 0.8, for a Datalog subset of RuleML (Appendix 1). Also appended below is a simple example rulebase that conforms to that DTD, and instructions for how to validate the example against the DTD.

There now also is a family of DTD's, specified in a modular fashion (using parameter ENTITY declarations), also designated vers. 0.8, at this URL: http://www.ruleml.org/0.8/dtd . Note that this family of DTD's is, overall, more raw/immature than just the Datalog member of that family. Note that the Datalog DTD on the website is a bit more complex, a proper superset of, the one appended below. The one below is called "monolith", because it has stripped out the ENTITY interface declarations that are in the website (non-"monolith") version.

To see the DTD's on the website: After you clicked on the *.dtd files, you may have to select View | Page Source. Thus, we provide additional *.dtd.txt links. Downloading should work anyway.

You can try things out "monolithically", as explained in Appendix 3, using the own.ruleml example of Appendix 2 (the Warnings here concern only stylistic matters).

You may also use the non-"monolith" modules to study XML's "INCLUDE"/"IGNORE" overriding method for DTDs that are read in via "ENTITY % ... SYSTEM *.dtd" declarations. But you can get the gist of the definitions also when treating most of these house-keeping directives as no-ops.

After some discussions, we found a set of tag names that sound reasonable to us. Feedback is very welcome.

Facts now use an explicit, abbreviating "fact" tag. Similarly, abbreviating tags will probably be needed for reaction rules and integrity constraints.

User comments on all levels are currently taken care of by XML; look at the sample datalog document own.ruleml.

More sample files -- each referring to the most specific DTD still validating them -- can be found at: http://www.ruleml.org/0.8/exa . See the instructions above (about View | Page Source, etc.) for viewing the content etc.

Issues

Should the 'UR' attribute in inds etc. be renamed from (XHTML-like) 'href' to (our favorite) 'uref', 'ur', 'resource', or something else?

More issues are being collected by Said Tabet.

Appendix 1: DTD for a Datalog subset of RuleML
]]>
Appendix 2: Example RuleML document: a rulebase own.ruleml
<_head> <_opr>own person object <_body> <_opr>buy person merchant object <_opr>keep person object <_head> <_opr>buy person merchant object <_body> <_opr>sell merchant person object <_head> <_opr>sell John Mary XMLBible <_head> <_opr>keep Mary XMLBible ]]>
Appendix 3: Instructions/Trace on Validating the example against the DTD
Go to]]> http://www.stg.brown.edu/service/xmlvalid/ Paste in at URI: http://www.ruleml.org/0.8/exa/own.ruleml > Hit the 'Validate' button > You should get: Validation Results for http://www.ruleml.org/0.8/exa/own.ruleml Warnings: line 39, http://www.ruleml.org/0.8/exa/own.ruleml: warning (901): deprecated sequence within comment ending at: -- Document validates OK.]]>

Site Contact: Harold Boley. Page Version: 2002-03-07


"Practice what you preach": XML source of this homepage at index.xml (index.xml.txt);
transformed to HTML via the adaptation of Michael Sintek's SliML XSLT stylesheet at homepage.xsl (View | Page Source)