Reasoners and rule engines: Jena inference support

This section of the documentation describes the current support for inference available within Jena. It includes an outline of the general inference API, together with details of the specific rule engines and configurations for RDFS and OWL inference supplied with Jena.

Not all of the fine details of the API are covered here: refer to the Jena Javadoc to get the full details of the capabilities of the API.

Note that this is a preliminary version of this document, some errors or inconsistencies are possible, feedback to the mailing lists is welcomed.

Overview of inference support

The Jena inference subsystem is designed to allow a range of inference engines or reasoners to be plugged into Jena. Such engines are used to derive additional RDF assertions which are entailed from some base RDF together with any optional ontology information and the axioms and rules associated with the reasoner. The primary use of this mechanism is to support the use of languages such as RDFS and OWL which allow additional facts to be inferred from instance data and class descriptions. However, the machinery is designed to be quite general and, in particular, it includes a generic rule engine that can be used for many RDF processing or transformation tasks.

We will try to use the term inference to refer to the abstract process of deriving additional information and the term reasoner to refer to a specific code object that performs this task. Such usage is arbitrary and if we slip into using equivalent terms like reasoning and inference engine, please forgive us.

The overall structure of the inference machinery is illustrated below.

ModelFactory

OntModelInfModelModel

bindSchemabindbindbindSchema

ReasonerRegistryReasonerRegistry

Available reasoners

Included in the Jena distribution are a number of predefined reasoners:

rdfs:subPropertyOfrdfs:subClassOf

The Inference API

Generic reasoner API

Finding a reasoner

theInstance

ReasonerRegistrygetTransitiveReasonergetRDFSReasonergetRDFSSimpleReasonergetOWLReasonergetOWLMiniReasonergetOWLMicroReasoner

Note that the factory objects for constructing reasoners are just there to simplify the design and extension of the registry service. Once you have a reasoner instance, the same instance can reused multiple times by binding it to different datasets, without risk of interference - there is no need to create a new reasoner instance each time.

If working with the Ontology API it is not always necessary to explicitly locate a reasoner. The prebuilt instances of OntModelSpec provide easy access to the appropriate reasoners to use for different Ontology configurations.

ModelFactory.createRDFSModel

Configuring a reasoner

ReasonerFactory.createResource

Reasoner.setParameter

For the built in reasoners the available configuration parameters are described below and are predefined in the class.

The parameter value can normally be a String or a structured value. For example, to set a boolean value one can use the strings "true" or "false", or in Java use a Boolean object or in RDF use an instance of xsd:Boolean

Applying a reasoner to data

Once you have an instance of a reasoner it can then be attached to a set of RDF data to create an inference model. This can either be done by putting all the RDF data into one Model or by separating into two components - schema and instance data. For some external reasoners a hard separation may be required. For all of the built in reasoners the separation is arbitrary. The prime value of this separation is to allow some deductions from one set of data (typically some schema definitions) to be efficiently applied to several subsidiary sets of data (typically sets of instance data).

Reasoner.bindSchema

ModelFactoryModelFactory.createInfModel

Accessing inferences

Finally, having created an inference model, any API operations which access RDF statements will be able to access additional statements which are entailed from the bound data by means of the reasoner. Depending on the reasoner these additional virtual statements may all be precomputed the first time the model is touched, may be dynamically recomputed each time or may be computed on-demand but cached.

Reasoner description

Reasoner.getCapabilitiesReasoner.supportsProperty

[API Index] [Main Index]

Some small examples

These initial examples are not designed to illustrate the power of the reasoners but to illustrate the code required to set one up.

Let us first create a Jena model containing the statements that some property "p" is a subproperty of another property "q" and that we have a resource "a" with value "foo" for "p". This could be done by writing an RDF/XML or N3 file and reading that in but we have chosen to use the RDF API:

String NS = "urn:x-hp-jena:eg/";

// Build a trivial example data set
Model rdfsExample = ModelFactory.createDefaultModel();
Property p = rdfsExample.createProperty(NS, "p");
Property q = rdfsExample.createProperty(NS, "q");
rdfsExample.add(p, RDFS.subPropertyOf, q);
rdfsExample.createResource(NS+"a").addProperty(p, "foo");

Now we can create an inference model which performs RDFS inference over this data by using:

InfModel inf = ModelFactory.createRDFSModel(rdfsExample);  // [1]

We can then check that resulting model shows that "a" also has property "q" of value "foo" by virtue of the subPropertyOf entailment:

Resource a = inf.getResource(NS+"a");
System.out.println("Statement: " + a.getProperty(q));

Which prints the output:

    Statement: [urn:x-hp-jena:eg/a, urn:x-hp-jena:eg/q, Literal<foo>]

Alternatively we could have created an empty inference model and then added in the statements directly to that model.

If we wanted to use a different reasoner which is not available as a convenience method or wanted to configure one we would change line [1]. For example, to create the same set up manually we could replace \[1\] by:

Reasoner reasoner = ReasonerRegistry.getRDFSReasoner();
InfModel inf = ModelFactory.createInfModel(reasoner, rdfsExample);

or even more manually by

Reasoner reasoner = RDFSRuleReasonerFactory.theInstance().create(null);
InfModel inf = ModelFactory.createInfModel(reasoner, rdfsExample);

The purpose of creating a new reasoner instance like this variant would be to enable configuration parameters to be set. For example, if we were to listStatements on inf Model we would see that it also "includes" all the RDFS axioms, of which there are quite a lot. It is sometimes useful to suppress these and only see the "interesting" entailments. This can be done by setting the processing level parameter by creating a description of a new reasoner configuration and passing that to the factory method:

Resource config = ModelFactory.createDefaultModel()
                              .createResource()
                              .addProperty(ReasonerVocabulary.PROPsetRDFSLevel, "simple");
Reasoner reasoner = RDFSRuleReasonerFactory.theInstance().create(config);
InfModel inf = ModelFactory.createInfModel(reasoner, rdfsExample);

This is a rather long winded way of setting a single parameter, though it can be useful in the cases where you want to store this sort of configuration information in a separate (RDF) configuration file. For hardwired cases the following alternative is often simpler:

Reasoner reasoner = RDFSRuleReasonerFactory.theInstance()Create(null);
reasoner.setParameter(ReasonerVocabulary.PROPsetRDFSLevel,
                      ReasonerVocabulary.RDFS_SIMPLE);
InfModel inf = ModelFactory.createInfModel(reasoner, rdfsExample);

Finally, supposing you have a more complex set of schema information, defined in a Model called schema, and you want to apply this schema to several sets of instance data without redoing too many of the same intermediate deductions. This can be done by using the SPI level methods:

Reasoner boundReasoner = reasoner.bindSchema(schema);
InfModel inf = ModelFactory.createInfModel(boundReasoner, data);

This creates a new reasoner, independent from the original, which contains the schema data. Any queries to an InfModel created using the boundReasoner will see the schema statements, the data statements and any statements entailed from the combination of the two. Any updates to the InfModel will be reflected in updates to the underlying data model - the schema model will not be affected.

[API Index] [Main Index]

Operations on inference models

ModelFactory

Validation

The most common reasoner operation which can't be exposed through additional triples in the inference model is that of validation. Typically the ontology languages used with the semantic web allow constraints to be expressed, the validation interface is used to detect when such constraints are violated by some data set.

A simple but typical example is that of datatype ranges in RDFS. RDFS allows us to specify the range of a property as lying within the value space of some datatype. If an RDF statement asserts an object value for that property which lies outside the given value space there is an inconsistency.

InfModel.validate()ValidityReportValidityReport.isValid()ValidityReport.Report

For example, to check a data set and list any problems one could do something like:

Model data = RDFDataMgr.loadModel(fname);
InfModel infmodel = ModelFactory.createRDFSModel(data);
ValidityReport validity = infmodel.validate();
if (validity.isValid()) {
    System.out.println("OK");
} else {
    System.out.println("Conflicts");
    for (Iterator i = validity.getReports(); i.hasNext(); ) {
        System.out.println(" - " + i.next());
    }
}

testing/reasoners/rdfs/dttest2.ntbarxsd:integerbar"25.5"^^xsd:decimal

Conflicts
- Error (dtRange): Property http://www.hpl.hp.com/semweb/2003/eg#bar has a typed range Datatype[http://www.w3.org/2001/XMLSchema#integer -> class java.math.BigInteger]that is not compatible with 25.5:http://www.w3.org/2001/XMLSchema#decimal

testing/reasoners/rdfs/dttest3.nt

isValid()

isValid()isValid()Report.isClean()

Extended list statements

The default API supports accessing all entailed information at the level of individual triples. This is surprisingly flexible but there are queries which cannot be easily supported this way. The first such is when the query needs to make reference to an expression which is not already present in the data. For example, in description logic systems it is often possible to ask if there are any instances of some class expression. Whereas using the triple-based approach we can only ask if there are any instances of some class already defined (though it could be defined by a bNode rather than be explicitly named).

InfModellistStatements

Direct and indirect relationships

The second type of operation that is not obviously convenient at the triple level involves distinguishing between direct and indirect relationships. If a relation is transitive, for example rdfs:subClassOf, then we can define the notion of the minimal or direct form of the relationship from which all other values of the relation can be derived by transitive closure.

rdfs:subClassOfrdfs:subPropertyOfReasonerVocabulary

Typically the easiest way to work with such indirect and direct relations is to use the Ontology API which hides the grubby details of these property aliases.

Derivations

InfModel.getDerivation(Statement)DerivationDerivation.PrintTrace

The general form of the Derivation objects is quite abstract but in the case of the rule-based reasoners they have a more detailed internal structure that can be accessed - see .

InfModel.serDerivationLogging(true)

As an illustration suppose that we have a raw data model which asserts three triples:

eg:A eg:p eg:B .
eg:B eg:p eg:C .
eg:C eg:p eg:D .

and suppose that we have a trivial rule set which computes the transitive closure over relation eg:p

String rules = "[rule1: (?a eg:p ?b) (?b eg:p ?c) -&gt; (?a eg:p ?c)]";
Reasoner reasoner = new GenericRuleReasoner(Rule.parseRules(rules));
reasoner.setDerivationLogging(true);
InfModel inf = ModelFactory.createInfModel(reasoner, rawData);

Then we can query whether eg:A is related through eg:p to eg:D and list the derivation route using the following code fragment:

PrintWriter out = new PrintWriter(System.out);
for (StmtIterator i = inf.listStatements(A, p, D); i.hasNext(); ) {
    Statement s = i.nextStatement();
    System.out.println("Statement is " + s);
    for (Iterator id = inf.getDerivation(s); id.hasNext(); ) {
        Derivation deriv = (Derivation) id.next();
        deriv.printTrace(out, true);
    }
}
out.flush();

Which generates the output:

Statement is [urn:x-hp:eg/A, urn:x-hp:eg/p, urn:x-hp:eg/D]
    Rule rule1 concluded (eg:A eg:p eg:D) <-
        Fact (eg:A eg:p eg:B)
    Rule rule1 concluded (eg:B eg:p eg:D) <-
        Fact (eg:B eg:p eg:C)
        Fact (eg:C eg:p eg:D)

Accessing raw data and deductions

InfModelgetRawModel()bind

getDeductionsModel()

Processing control

ModelInfModelReasoneraddremoveInfModel

 InfModel.prepare()

InfModel.rebind()

InfModel.reset()

Tracing

print

Tracing is not supported by a convenience API call but, for those reasoners that support it, it can be enabled using:

reasoner.setParameter(ReasonerVocabulary.PROPtraceOn, Boolean.TRUE);

setTraceOn()

[API Index] [Main Index]

The RDFS reasoner

RDFS reasoner - intro and coverage

RDFSRuleReasoner

ModelFactory.createRDFSModelReasonerRegistry.getRDFSReasoner()

During the preview phases of Jena experimental RDFS reasoners were released, some of which are still included in the code base for now but applications should not rely on their stability or continued existence.

When configured in full mode (see below for configuration information) then the RDFS reasoner implements all RDFS entailments except for the bNode closure rules. These closure rules imply, for example, that for all triples of the form:

eg:a eg:p nnn^^datatype .

we should introduce the corresponding blank nodes:

eg:a eg:p _:anon1 .
_:anon1 rdf:type datatype .

Whilst such rules are both correct and necessary to reduce RDF datatype entailment down to simple entailment they are not useful in implementation terms. In Jena simple entailment can be implemented by translating a graph containing bNodes to an equivalent query containing variables in place of the bNodes. Such a query is can directly match the literal node and the RDF API can be used to extract the datatype of the literal. The value to applications of directly seeing the additional bNode triples, even in virtual triple form, is negligible and so this has been deliberately omitted from the reasoner.

[RDFS Index] [Main Index]

RDFS configuration

The RDFSRuleReasoner can be configured to work at three different compliance levels:

[rdf:type
    rdfs:range rdfs:Class]

setParameter

reasoner.setParameter(ReasonerVocabulary.PROPsetRDFSLevel,
                      ReasonerVocabulary.RDFS_SIMPLE);

or by constructing an RDF configuration description and passing that to the RDFSRuleReasonerFactory e.g.

Resource config = ModelFactory.createDefaultModel()
                  .createResource()
                  .addProperty(ReasonerVocabulary.PROPsetRDFSLevel, "simple");
Reasoner reasoner = RDFSRuleReasonerFactory.theInstance()Create(config);

Summary of parameters

Parameter	Values	Description
PROPsetRDFSLevel	"full", "default", "simple"	Sets the RDFS processing level as described above.
PROPenableCMPScan	Boolean	If true forces a preprocessing pass which finds all usages of rdf:_n properties and declares them as ContainerMembershipProperties. This is implied by setting the level parameter to "full" and is not normally used directly.
PROPtraceOn	Boolean	If true switches on exhaustive tracing of rule executions at the INFO level.
PROPderivationLogging	Boolean	If true causes derivation routes to be recorded internally so that future getDerivation calls can return useful information.

[RDFS Index] [Main Index]

RDFS Example

As a complete worked example let us create a simple RDFS schema, some instance data and use an instance of the RDFS reasoner to query the two.

We shall use a trivial schema:

  <rdf:Description rdf:about="eg:mum">
    <rdfs:subPropertyOf rdf:resource="eg:parent"/>
  </rdf:Description>
 
  <rdf:Description rdf:about="eg:parent">
    <rdfs:range  rdf:resource="eg:Person"/>
    <rdfs:domain rdf:resource="eg:Person"/>
  </rdf:Description>

  <rdf:Description rdf:about="eg:age">
    <rdfs:range rdf:resource="xsd:integer" />
  </rdf:Description>

parentPersonPersonmumparentage

We shall also use the even simpler instance file:

  <Teenager rdf:about="eg:colin">
      <mum rdf:resource="eg:rosy" />
      <age>13</age>
  </Teenager>

Teenagercolinmumrosyage

rdf:typecolinrdf:typePerson

Model schema = RDFDataMgr.loadModel("file:data/rdfsDemoSchema.rdf");
Model data = RDFDataMgr.loadModel("file:data/rdfsDemoData.rdf");
InfModel infmodel = ModelFactory.createRDFSModel(schema, data);

Resource colin = infmodel.getResource("urn:x-hp:eg/colin");
System.out.println("colin has types:");
printStatements(infmodel, colin, RDF.type, null);

Resource Person = infmodel.getResource("urn:x-hp:eg/Person");
System.out.println("\nPerson has types:");
printStatements(infmodel, Person, RDF.type, null);

This produces the output:

colin has types:
 - (eg:colin rdf:type eg:Teenager)
 - (eg:colin rdf:type rdfs:Resource)
 - (eg:colin rdf:type eg:Person)
Person has types:

(eg:Person rdf:type rdfs:Class)
(eg:Person rdf:type rdfs:Resource)

colinTeenagerPersonmumparentparentPersonrdfs:ResourcePersonrdfs:Class

If we add the additional code:

ValidityReport validity = infmodel.validate();
if (validity.isValid()) {
    System.out.println("\nOK");
} else {
    System.out.println("\nConflicts");
    for (Iterator i = validity.getReports(); i.hasNext(); ) {
        ValidityReport.Report report = (ValidityReport.Report)i.next();
        System.out.println(" - " + report);
    }
}

Then we get the additional output:

Conflicts
 - Error (dtRange): Property urn:x-hp:eg/age has a typed range
Datatype[http://www.w3.org/2001/XMLSchema#integer -> class java.math.BigInteger]
that is not compatible with 13

because the age was given using an RDF plain literal where as the schema requires it to be a datatyped literal which is compatible with xsd:integer.

[RDFS Index] [Main Index]

RDFS implementation and performance notes

The RDFSRuleReasoner is a hybrid implementation. The subproperty and subclass lattices are eagerly computed and stored in a compact in-memory form using the TransitiveReasoner (see below). The identification of which container membership properties (properties like rdf:_1) are present is implemented using a preprocessing hook. The rest of the RDFS operations are implemented by explicit rule sets executed by the general hybrid rule reasoner. The three different processing levels correspond to different rule sets. These rule sets are located by looking for files "`etc/*.rules`" on the classpath and so could, in principle, be overridden by applications wishing to modify the rules.

Performance for in-memory queries appears to be good. Using a synthetic dataset we obtain the following times to determine the extension of a class from a class hierarchy:

Set	#concepts	total instances	#instances of concept	JenaRDFS	XSB*
1	155	1550	310	0.07	0.16
2	780	7800	1560	0.25	0.47
3	3905	39050	7810	1.16	2.11

The times are in seconds, normalized to a 1.1GHz Pentium processor. The XSB* figures are taken from a pre-published paper and may not be directly comparable (for example they do not include any rule compilation time) - they are just offered to illustrate that the RDFSRuleReasoner has broadly similar scaling and performance to other rule-based implementations.

The Jena RDFS implementation has not been tested and evaluated over database models. The Jena architecture makes it easy to construct such models but in the absence of caching we would expect the performance to be poor. Future work on adapting the rule engines to exploit the capabilities of the more sophisticated database backends will be considered.

[RDFS Index] [Main Index]

The OWL reasoner

The second major set of reasoners supplied with Jena is a rule-based implementation of the OWL/lite subset of OWL/full.

The current release includes a default OWL reasoner and two small/faster configurations. Each of the configurations is intended to be a sound implementation of a subset of OWL/full semantics but none of them is complete (in the technical sense). For complete OWL DL reasoning use an external DL reasoner such as Pellet, Racer or FaCT. Performance (especially memory use) of the fuller reasoner configuration still leaves something to be desired and will the subject of future work - time permitting.

See also subsection 5 for notes on more specific limitations of the current implementation.

OWL coverage

The Jena OWL reasoners could be described as instance-based reasoners. That is, they work by using rules to propagate the if- and only-if- implications of the OWL constructs on instance data. Reasoning about classes is done indirectly - for each declared class a prototypical instance is created and elaborated. If the prototype for a class A can be deduced as being a member of class B then we conclude that A is a subClassOf B. This approach is in contrast to more sophisticated Description Logic reasoners which work with class expressions and can be less efficient when handling instance data but more efficient with complex class expressions and able to provide complete reasoning.

We thus anticipate that the OWL rule reasoner will be most suited to applications involving primarily instance reasoning with relatively simple, regular ontologies and least suited to applications involving large rich ontologies. A better characterisation of the tradeoffs involved would be useful and will be sought.

We intend that the OWL reasoners should be smooth extensions of the RDFS reasoner described above. That is all RDFS entailments found by the RDFS reasoner will also be found by the OWL reasoners and scaling on RDFS schemas should be similar (though there are some costs, see later). The instance-based implementation technique is in keeping with this "RDFS plus a bit" approach.

Another reason for choosing this inference approach is that it makes it possible to experiment with support for different constructs, including constructs that go beyond OWL, by modification of the rule set. In particular, some applications of interest to ourselves involve ontology transformation which very often implies the need to support property composition. This is something straightforward to express in rule-based form and harder to express in standard Description Logics.

ReasonerRegistry.getOWLReasoner()

Constructs	Supported by	Notes
rdfs:subClassOf, rdfs:subPropertyOf, rdf:type	all	Normal RDFS semantics supported including meta use (e.g. taking the subPropertyOf subClassOf).
rdfs:domain, rdfs:range	all	Stronger if-and-only-if semantics supported
owl:intersectionOf	all
owl:unionOf	all	Partial support. If C=unionOf(A,B) then will infer that A,B are subclasses of C, and thus that instances of A or B are instances of C. Does not handle the reverse (that an instance of C must be either an instance of A or an instance of B).
owl:equivalentClass	all
owl:disjointWith	full, mini
owl:sameAs, owl:differentFrom, owl:distinctMembers	full, mini	owl:distinctMembers is currently translated into a quadratic set of owl:differentFrom assertions.
Owl:Thing	all
owl:equivalentProperty, owl:inverseOf	all
owl:FunctionalProperty, owl:InverseFunctionalProperty	all
owl:SymmetricProperty, owl:TransitiveProperty	all
owl:someValuesFrom	full, (mini)	Full supports both directions (existence of a value implies membership of someValuesFrom restriction, membership of someValuesFrom implies the existence of a bNode representing the value). Mini omits the latter "bNode introduction" which avoids some infinite closures.
owl:allValuesFrom	full, mini	Partial support, forward direction only (member of a allValuesFrom(p, C) implies that all p values are of type C). Does handle cases where the reverse direction is trivially true (e.g. by virtue of a global rdfs:range axiom).
owl:minCardinality, owl:maxCardinality, owl:cardinality	full, (mini)	Restricted to cardinalities of 0 or 1, though higher cardinalities are partially supported in validation for the case of literal-valued properties. Mini omits the bNodes introduction in the minCardinality(1) case, see someValuesFrom above.
owl:hasValue	all

The critical constructs which go beyond OWL/lite and are not supported in the Jena OWL reasoner are complementOf and oneOf. As noted above the support for unionOf is partial (due to limitations of the rule based approach) but is useful for traversing class hierarchies.

Even within these constructs rule based implementations are limited in the extent to which they can handle equality reasoning - propositions provable by reasoning over concrete and introduced instances are covered but reasoning by cases is not supported.

Nevertheless, the full reasoner passes the normative OWL working group positive and negative entailment tests for the supported constructs, though some tests need modification for the comprehension axioms (see below).

The OWL rule set does include incomplete support for validation of datasets using the above constructs. Specifically, it tests for:

Illegal existence of a property restricted by a maxCardinality(0) restriction.
Two individuals both sameAs and differentFrom each other.
Two classes declared as disjoint but where one subsumes the other (currently reported as a violation concerning the class prototypes, error message to be improved).
Range or a allValuesFrom violations for DatatypeProperties.
Too many literal-values for a DatatypeProperty restricted by a maxCardinality(N) restriction.

[OWL Index] [Main Index]

OWL Configuration

ModelFactory.createOntologyModelOntModelSpecOWL_MEM_RULE_INFReasonerRegistry.getOWLReasoner()

There are no OWL-specific configuration parameters though the reasoner supports the standard control parameters:

Parameter	Values	Description
PROPtraceOn	boolean	If true switches on exhaustive tracing of rule executions at the INFO level.
PROPderivationLogging	Boolean	If true causes derivation routes to be recorded internally so that future getDerivation calls can return useful information.

As we gain experience with the ways in which OWL is used and the capabilities of the rule-based approach we imagine useful subsets of functionality emerging - like that supported by the RDFS reasoner in the form of the level settings.

[OWL Index] [Main Index]

OWL Example

As an example of using the OWL inference support, consider the sample schema and data file in the data directory - owlDemoSchema.rdf and owlDemoData.rdf.

The schema file shows a simple, artificial ontology concerning computers which defines a GamingComputer as a Computer which includes at least one bundle of type GameBundle and a component with the value gamingGraphics.

bigName42

We can create an instance of the OWL reasoner, specialized to the demo schema and then apply that to the demo data to obtain an inference model, as follows:

Model schema = RDFDataMgr.loadModel("file:data/owlDemoSchema.rdf");
Model data = RDFDataMgr.loadModel("file:data/owlDemoData.rdf");
Reasoner reasoner = ReasonerRegistry.getOWLReasoner();
reasoner = reasoner.bindSchema(schema);
InfModel infmodel = ModelFactory.createInfModel(reasoner, data);

nForce

Resource nForce = infmodel.getResource("urn:x-hp:eg/nForce");
System.out.println("nForce *:");
printStatements(infmodel, nForce, null, null);

printStatements

public void printStatements(Model m, Resource s, Property p, Resource o) { for (StmtIterator i = m.listStatements(s,p,o); i.hasNext(); ) { Statement stmt = i.nextStatement(); System.out.println(" - " + PrintUtil.print(stmt)); } }

This produces the output:

nForce *:
 - (eg:nForce rdf:type owl:Thing)
 - (eg:nForce owl:sameAs eg:unknownMB)
 - (eg:nForce owl:sameAs eg:nForce)
 - (eg:nForce rdf:type eg:MotherBoard)
 - (eg:nForce rdf:type rdfs:Resource)
 - (eg:nForce rdf:type a3b24:f7822755ad:-7ffd)
 - (eg:nForce eg:hasGraphics eg:gamingGraphics)
 - (eg:nForce eg:hasComponent eg:gamingGraphics)

eg:MotherBoardowl:Thingrdfs:Resourceeg:hasComponent eg:gameGraphicshasGraphicshasComponenteg:unknownMBwhileBoxZX"hasValue(eg:hasComponent,
  eg:gamingGraphics)"

whileBoxZXGamingComputerComputergamingGraphicsnForcehasComponent

Resource gamingComputer = infmodel.getResource("urn:x-hp:eg/GamingComputer");
Resource whiteBox = infmodel.getResource("urn:x-hp:eg/whiteBoxZX");
if (infmodel.contains(whiteBox, RDF.type, gamingComputer)) {
    System.out.println("White box recognized as gaming computer");
} else {
    System.out.println("Failed to recognize white box correctly");
}

Which generates the output:

  White box recognized as gaming computer

Finally, we can check for inconsistencies within the data by using the validation interface:

ValidityReport validity = infmodel.validate();
if (validity.isValid()) {
    System.out.println("OK");
} else {
    System.out.println("Conflicts");
    for (Iterator i = validity.getReports(); i.hasNext(); ) {
        ValidityReport.Report report = (ValidityReport.Report)i.next();
        System.out.println(" - " + report);
    }
}

Which generates the output:

Conflicts - Error (conflict): Two individuals both same and different, may be due to disjoint classes or functional properties Culprit = eg:nForce2 Implicated node: eg:bigNameSpecialMB

… + 3 other similar reports

bigName42hasMotherBoard

[OWL Index] [Main Index]

OWL notes and limitations

Comprehension axioms

A critical implication of our variant of the instance-based approach is that the reasoner does not directly answer queries relating to dynamically introduced class expressions.

For example, given a model containing the RDF assertions corresponding to the two OWL axioms:

class A = intersectionOf (minCardinality(P, 1), maxCardinality(P,1))
class B = cardinality(P,1)

Then the reasoner can demonstrate that classes A and B are equivalent, in particular that any instance of A is an instance of B and vice versa. However, given a model just containing the first set of assertions you cannot directly query the inference model for the individual triples that make up cardinality(P,1). If the relevant class expressions are not already present in your model then you need to use the list-with-posits mechanism described above, though be warned that such posits start inference afresh each time and can be expensive.

Actually, it would be possible to introduce comprehension axioms for simple cases like this example. We have, so far, chosen not to do so. First, since the OWL/full closure is generally infinite, some limitation on comprehension inferences seems to be useful. Secondly, the typical queries that Jena applications expect to be able to issue would suddenly jump in size and cost - causing a support nightmare. For example, queries such as (a, rdf:type, *) would become near-unusable.

Approximately, 10 of the OWL working group tests for the supported OWL subset currently rely on such comprehension inferences. The shipping version of the Jena rule reasoner passes these tests only after they have been rewritten to avoid the comprehension requirements.

Prototypes

As noted above the current OWL rule set introduces prototypical instances for each defined class. These prototypical instances used to be visible to queries. From release 2.1 they are used internally but should not longer be visible.

Direct/indirect

We noted above that the Jena reasoners support a separation of direct and indirect relations for transitive properties such as subClassOf. The current implementation of the full and mini OWL reasoner fails to do this and the direct forms of the queries will fail. The OWL Micro reasoner, which is but a small extension of RDFS, does support the direct queries.

This does not affect querying though the Ontology API, which works around this limitation. It only affects direct RDF accesses to the inference model.

Performance

The OWL reasoners use the rule engines for all inference. The full and mini configurations omit some of the performance tricks employed by the RDFS reasoner (notably the use of the custom transitive reasoner) making those OWL reasoner configurations slower than the RDFS reasoner on pure RDFS data (typically around x3-4 slow down). The OWL Micro reasoner is intended to be as close to RDFS performance while also supporting the core OWL constructs as described earlier.

Once the owl constructs are used then substantial reasoning can be required. The most expensive aspect of the supported constructs is the equality reasoning implied by use of cardinality restrictions and FunctionalProperties. The current rule set implements equality reasoning by identifying all sameAs deductions during the initial forward "prepare" phase. This may require the entire instance dataset to be touched several times searching for occurrences of FunctionalProperties.

Beyond this the rules implementing the OWL constructs can interact in complex ways leading to serious performance overheads for complex ontologies. Characterising the sorts of ontologies and inference problems that are well tackled by this sort of implementation and those best handled by plugging a Description Logic engine, or a saturation theorem prover, into Jena is a topic for future work.

One random hint: explicitly importing the owl.owl definitions causes much duplication of rule use and a substantial slow down - the OWL axioms that the reasoner can handle are already built in and don't need to be redeclared.

Incompleteness

The rule based approach cannot offer a complete solution for OWL/Lite, let alone the OWL/Full fragment corresponding to the OWL/Lite constructs. In addition the current implementation is still under development and may well have omissions and oversights. We intend that the reasoner should be sound (all inferred triples should be valid) but not complete.

[OWL Index] [Main Index]

The transitive reasoner

rdfs:subPropertyOfrdfs:subClassOf

GenericRuleReasoner

It has no configuration options.

The general purpose rule engine

Overview of the rule engine(s)

Jena includes a general purpose rule-based reasoner which is used to implement both the RDFS and OWL reasoners but is also available for general use. This reasoner supports rule-based inference over RDF graphs and provides forward chaining, backward chaining and a hybrid execution model. To be more exact, there are two internal rule engines one forward chaining RETE engine and one tabled datalog engine - they can be run separately or the forward engine can be used to prime the backward engine which in turn will be used to answer queries.

GenericRuleReasonerGenericRuleReasoner

The rule reasoner can also be extended by registering new procedural primitives. The current release includes a starting set of primitives which are sufficient for the RDFS and OWL implementations but is easily extensible.

[Rule Index] [Main Index]

Rule syntax and structure

ClauseEntry

For convenience a rather simple parser is included with Rule which allows rules to be specified in reasonably compact form in text source files. However, it would be perfectly possible to define alternative parsers which handle rules encoded using, say, XML or RDF and generate Rule objects as output. It would also be possible to build a real parser for the current text file syntax which offered better error recovery and diagnostics.

An informal description of the simplified text rule syntax is:

Rule      :=   bare-rule .
          or   [ bare-rule ]
       or   [ ruleName : bare-rule ]
bare-rule :=   term, … term -> hterm, … hterm    // forward rule
or   bhterm <- term, … term    // backward rule

hterm     :=   term
or   [ bare-rule ]

term      :=   (node, node, node)           // triple pattern
or   (node, node, functor)        // extended triple pattern
or   builtin(node, … node)      // invoke procedural primitive
bhterm      :=   (node, node, node)           // triple pattern
functor   :=   functorName(node, … node)  // structured literal
node      :=   uri-ref                   // e.g. http://foo.com/eg
or   prefix:localname          // e.g. rdf:type
or   <uri-ref>          // e.g. <myscheme:myuri>
or   ?varname                    // variable
or   ‘a literal’                 // a plain string literal
or   ’lex’^^typeURI              // a typed literal, xsd:* type names supported
or   number                      // e.g. 42 or 25.5

The "," separators are optional.

The difference between the forward and backward rule syntax is only relevant for the hybrid execution strategy, see below.

The functor in an extended triple pattern is used to create and access structured literal values. The functorName can be any simple identifier and is not related to the execution of builtin procedural primitives, it is just a datastructure. It is useful when a single semantic structure is defined across multiple triples and allows a rule to collect those triples together in one place.

To keep rules readable qname syntax is supported for URI refs. The set of known prefixes is those registered with the object. This initially knows about rdf, rdfs, owl, xsd and a test namespace eg, but more mappings can be registered in java code. In addition it is possible to define additional prefix mappings in the rule file, see below.

Here are some example rules which illustrate most of these constructs:

[allID: (?C rdf:type owl:Restriction), (?C owl:onProperty ?P),
     (?C owl:allValuesFrom ?D)  -> (?C owl:equivalentClass all(?P, ?D)) ]
[all2: (?C rdfs:subClassOf all(?P, ?D)) -> print(‘Rule for ‘, ?C)
[all1b: (?Y rdf:type ?D) <- (?X ?P ?Y), (?X rdf:type ?C) ] ]
[max1: (?A rdf:type max(?P, 1)), (?A ?P ?B), (?A ?P ?C)
-> (?B owl:sameAs ?C) ]

allIDall2printmax1

Rule files may be loaded and parsed using:

List rules = Rule.rulesFromURL("file:myfile.rules");

BufferedReader br = /* open reader */ ;
List rules = Rule.parseRules( Rule.rulesParserFromReader(br) );

String ruleSrc = /* list of rules in line */
List rules = Rule.parseRules( rulesSrc );

In the first two cases (reading from a URL or a BufferedReader) the rule file is preprocessed by a simple processor which strips comments and supports some additional macro commands:

# ...// ...@prefix pre: .pre@include .RDFSOWLOWLMicroOWLMini

So an example complete rule file which includes the RDFS rules and defines a single extra rule is:

# Example rule file
@prefix pre: <http://jena.hpl.hp.com/prefix#>.
@include <RDFS>.

[rule1: (?f pre:father ?a) (?u pre:brother ?f) -> (?u pre:uncle ?a)]

[Rule Index] [Main Index]

Forward chaining engine

prepare()

Once the preparation phase is complete the inference graph will act as if it were the union of all the statements in the original model together with all the statements in the internal deductions graph generated by the rule firings. All queries will see all of these statements and will be of similar speed to normal model accesses. It is possible to separately access the original raw data and the set of deduced statements if required, see above.

If the inference model is changed by adding or removing statements through the normal API then this will trigger further rule firings. The forward rules work incrementally and only the consequences of the added or removed triples will be explored. The default rule engine is based on the standard RETE algorithm (C.L Forgy, RETE: A fast algorithm for the many pattern/many object pattern match problem, Artificial Intelligence 1982) which is optimized for such incremental changes.

When run in forward mode all rules are treated as forward even if they were written in backward ("<-") syntax. This allows the same rule set to be used in different modes to explore the performance tradeoffs.

There is no guarantee of the order in which matching rules will fire or the order in which body terms will be tested, however once a rule fires its head-terms will be executed in left-to-right sequence.

all1b

There are in fact two forward engines included within the Jena code base, an earlier non-RETE implementation is retained for now because it can be more efficient in some circumstances but has identical external semantics. This alternative engine is likely to be eliminated in a future release once more tuning has been done to the default RETE engine.

[Rule Index] [Main Index]

Backward chaining engine

If the rule reasoner is run in backward chaining mode it uses a logic programming (LP) engine with a similar execution strategy to Prolog engines. When the inference Model is queried then the query is translated into a goal and the engine attempts to satisfy that goal by matching to any stored triples and by goal resolution against the backward chaining rules.

Except as noted below rules will be executed in top-to-bottom, left-to-right order with backtracking, as in SLD resolution. In fact, the rule language is essentially datalog rather than full prolog, whilst the functor syntax within rules does allow some creation of nested data structures they are flat (not recursive) and so can be regarded a syntactic sugar for datalog.

As a datalog language the rule syntax is a little surprising because it restricts all properties to be binary (as in RDF) and allows variables in any position including the property position. In effect, rules of the form:

(s, p, o), (s1, p1, o1) ... <- (sb1, pb1, ob1), ....

Can be thought of as being translated to datalog rules of the form:

triple(s, p, o)    :- triple(sb1, pb1, ob1), ...
triple(s1, p1, o1) :- triple(sb1, pb1, ob1), ...
...

where "triple/3" is a hidden implicit predicate. Internally, this transformation is not actually used, instead the rules are implemented directly.

triple(s,p,o)

Because the order of triples in a Model is not defined then this is one violation to strict top-to-bottom execution. Essentially all ground facts are consulted before all rule clauses but the ordering of ground facts is arbitrary.

Tabling

The LP engine supports tabling. When a goal is tabled then all previously computed matches to that goal are recorded (memoized) and used when satisfying future similar goals. When such a tabled goal is called and all known answers have been consumed then the goal will suspend until some other execution branch has generated new results and then be resumed. This allows one to successfully run recursive rules such as transitive closure which would be infinite loops in normal SLD prolog. This execution strategy, SLG, is essentially the same as that used in the well known XSB system.

tableAll()Ptable(P)(A, ?P, ?X)

Thus the rule set:

-> table(rdfs:subClassOf).
[r1: (?A rdfs:subClassOf ?C) <- (?A rdfs:subClassOf ?B) (?B rdfs:subClassOf ?C)]

will successfully compute the transitive closure of the subClassOf relation. Any query of the form (*, rdfs:subClassOf, *) will be satisfied by a mixture of ground facts and resolution of rule r1. Without the first line this rule would be an infinite loop.

reset()

reset()

Note that backward rules can only have one consequent so that if writing rules that might be run in either backward or forward mode then they should be limited to a single consequent each.

[Rule Index] [Main Index]

Hybrid rule engine

The rule reasoner has the option of employing both of the individual rule engines in conjunction. When run in this hybrid mode the data flows look something like this:

The forward engine runs, as described above, and maintains a set of inferred statements in the deductions store. Any forward rules which assert new backward rules will instantiate those rules according to the forward variable bindings and pass the instantiated rules on to the backward engine.

Queries are answered by using the backward chaining LP engine, employing the merge of the supplied and generated rules applied to the merge of the raw and deduced data.

This split allows the ruleset developer to achieve greater performance by only including backward rules which are relevant to the dataset at hand. In particular, we can use the forward rules to compile a set of backward rules from the ontology information in the dataset. As a simple example consider trying to implement the RDFS subPropertyOf entailments using a rule engine. A simple approach would involve rules like:

 (?a ?q ?b) <- (?p rdfs:subPropertyOf ?q), (?a ?p ?b) .

Such a rule would work but every goal would match the head of this rule and so every query would invoke a dynamic test for whether there was a subProperty of the property being queried for. Instead the hybrid rule:

(?p rdfs:subPropertyOf ?q), notEqual(?p,?q) -> [ (?a ?q ?b) <- (?a ?p ?b) ] .

would precompile all the declared subPropertyOf relationships into simple chain rules which would only fire if the query goal references a property which actually has a sub property. If there are no subPropertyOf relationships then there will be no overhead at query time for such a rule.

Note that there are no loops in the above data flows. The backward rules are not employed when searching for matches to forward rule terms. This two-phase execution is simple to understand and keeps the semantics of the rule engines straightforward. However, it does mean that care needs to be take when formulating rules. If in the above example there were ways that the subPropertyOf relation could be derived from some other relations then that derivation would have to be accessible to the forward rules for the above to be complete.

Updates to an inference Model working in hybrid mode will discard all the tabled LP results, as they do in the pure backward case. However, the forward rules still work incrementally, including incrementally asserting or removing backward rules in response to the data changes.

[Rule Index] [Main Index]

GenericRuleReasoner configuration

GenericRuleReasonerReasoner.setParameter

GenericRuleReasoner

String ruleSrc = "[rule1: (?a eg:p ?b) (?b eg:p ?c) -> (?a eg:p ?c)]";
List rules = Rule.parseRules(ruleSrc);
...
Reasoner reasoner = new GenericRuleReasoner(rules);

ruleSet

Summary of parameters

Parameter	Values	Description
PROPruleMode	"forward", "forwardRETE", "backward", "hybrid"	Sets the rule direction mode as discussed above. Default is "hybrid".
PROPruleSet	filename-string	The name of a rule text file which can be found on the classpath or from the current directory.
PROPenableTGCCaching	Boolean	If true, causes an instance of the TransitiveReasoner to be inserted in the forward dataflow to cache the transitive closure of the subProperty and subClass lattices.
PROPenableFunctorFiltering	Boolean	If set to true, this causes the structured literals (functors) generated by rules to be filtered out of any final queries. This allows them to be used for storing intermediate results hidden from the view of the InfModel's clients.
PROPenableOWLTranslation	Boolean	If set to true this causes a procedural preprocessing step to be inserted in the dataflow which supports the OWL reasoner (it translates intersectionOf clauses into groups of backward rules in a way that is clumsy to express in pure rule form).
PROPtraceOn	Boolean	If true, switches on exhaustive tracing of rule executions at the INFO level.
PROPderivationLogging	Boolean	If true, causes derivation routes to be recorded internally so that future getDerivation calls can return useful information.

[Rule Index] [Main Index]

Builtin primitives

The procedural primitives which can be called by the rules are each implemented by a Java object stored in a registry. Additional primitives can be created and registered - see below for more details.

Each primitive can optionally be used in either the rule body, the rule head or both. If used in the rule body then as well as binding variables (and any procedural side-effects like printing) the primitive can act as a test - if it returns false the rule will not match. Primitives used in the rule head are only used for their side effects.

The set of builtin primitives available at the time writing are:

regexp('foo bar', '(.*) (.*)', ?m1, ?m2)

[Rule Index] [Main Index]

Example

As a simple illustration suppose we wish to create a simple ontology language in which we can declare one property as being the concatenation of two others and to build a rule reasoner to implement this.

As a simple design we define two properties eg:concatFirst, eg:concatSecond which declare the first and second properties in a concatenation. Thus the triples:

eg:r eg:concatFirst  eg:p .
eg:r eg:concatSecond eg:q .

mean that the property r = p o q.

Suppose we have a Jena Model rawModel which contains the above assertions together with the additional facts:

eg:A eg:p eg:B .
eg:B eg:q eg:C .

Then we want to be able to conclude that A is related to C through the composite relation r. The following code fragment constructs and runs a rule reasoner instance to implement this:

String rules =
    "[r1: (?c eg:concatFirst ?p), (?c eg:concatSecond ?q) -> " +
    "     [r1b: (?x ?c ?y) <- (?x ?p ?z) (?z ?q ?y)] ]";
Reasoner reasoner = new GenericRuleReasoner(Rule.parseRules(rules));
InfModel inf = ModelFactory.createInfModel(reasoner, rawData);
System.out.println("A * * =>");
Iterator list = inf.listStatements(A, null, (RDFNode)null);
while (list.hasNext()) {
    System.out.println(" - " + list.next());
}

When run on a rawData model contain the above four triples this generates the (correct) output:

A * * =>
 - [urn:x-hp:eg/A, urn:x-hp:eg/p, urn:x-hp:eg/B]
 - [urn:x-hp:eg/A, urn:x-hp:eg/r, urn:x-hp:eg/C]

Example 2

As a second example, we'll look at ways to define a property as being both symmetric and transitive. Of course, this can be done directly in OWL but there are times when one might wish to do this outside of the full OWL rule set and, in any case, it makes for a compact illustration.

This time we'll put the rules in a separate file to simplify editing them and we'll use the machinery for configuring a reasoner using an RDF specification. The code then looks something like this:

// Register a namespace for use in the demo
String demoURI = "http://jena.hpl.hp.com/demo#";
PrintUtil.registerPrefix("demo", demoURI);
// Create an (RDF) specification of a hybrid reasoner which
// loads its data from an external file.
Model m = ModelFactory.createDefaultModel();
Resource configuration =  m.createResource();
configuration.addProperty(ReasonerVocabulary.PROPruleMode, “hybrid”);
configuration.addProperty(ReasonerVocabulary.PROPruleSet,  “data/demo.rules”);
// Create an instance of such a reasoner
Reasoner reasoner = GenericRuleReasonerFactory.theInstance().create(configuration);
// Load test data
Model data = RDFDataMgr.loadModel(“file:data/demoData.rdf”);
InfModel infmodel = ModelFactory.createInfModel(reasoner, data);
// Query for all things related to “a” by “p”
Property p = data.getProperty(demoURI, “p”);
Resource a = data.getResource(demoURI + “a”);
StmtIterator i = infmodel.listStatements(a, p, (RDFNode)null);
while (i.hasNext()) {
System.out.println(" - " + PrintUtil.print(i.nextStatement()));
}

data/demo.rulesdemo:p

[transitiveRule: (?A demo:p ?B), (?B demo:p ?C) -> (?A > demo:p ?C) ]
[symmetricRule: (?Y demo:p ?X) -> (?X demo:p ?Y) ]

Running this on data/demoData.rdf gives the correct output:

- (demo:a demo:p demo:c)
- (demo:a demo:p demo:a)
- (demo:a demo:p demo:d)
- (demo:a demo:p demo:b)

demo:p

[transitiveRule: (?P rdf:type demo:TransProp)(?A ?P ?B), (?B ?P ?C)
                     -> (?A ?P ?C) ]
[symmetricRule: (?P rdf:type demo:TransProp)(?Y ?P ?X)
                     -> (?X ?P ?Y) ]

demo:TransPropdemo:TransProp

-> tableAll().
[rule1: (?P rdf:type demo:TransProp) ->
[ (?X ?P ?Y) <- (?Y ?P ?X) ]
[ (?A ?P ?C) <- (?A ?P ?B), (?B ?P ?C) ]
]

[Rule Index] [Main Index]

Combining RDFS/OWL with custom rules

Sometimes one wishes to write generic inference rules but combine them with some RDFS or OWL inference. With the current Jena architecture limited forms of this is possible but you need to be aware of the limitations.

There are two ways of achieving this sort of configuration within Jena (not counting using an external engine that already supports such a combination).

Firstly, it is possible to cascade reasoners, i.e. to construct one InfModel using another InfModel as the base data. The strength of this approach is that the two inference processes are separate and so can be of different sorts. For example one could create a GenericRuleReasoner whose base model is an external OWL reasoner. The chief weakness of the approach is that it is "layered" - the outer InfModel can see the results of the inner InfModel but not vice versa. For some applications that layering is fine and it is clear which way the inference should be layered, for some it is not. A second possible weakness is performance. A query to an InfModel is generally expensive and involves lots of queries to the data. The outer InfModel in our layered case will typically issue a lot of queries to the inner model, each of which may trigger more inference. If the inner model caches all of its inferences (e.g. a forward rule engine) then there may not be very much redundancy there but if not then performance can suffer dramatically.

Secondly, one can create a single GenericRuleReasoner whose rules combine rules for RDFS or OWL and custom rules. At first glance this looks like it gets round the layering limitation. However, the default Jena RDFS and OWL rulesets use the Hybrid rule engine. The hybrid engine is itself layered, forward rules do not see the results of any backward rules. Thus layering is still present though you have finer grain control - all your inferences you want the RDFS/OWL rules to see should be forward, all the inferences which need all of the results of the RDFS/OWL rules should be backward. Note that the RDFS and OWL rulesets assume certain settings for the GenericRuleReasoner so a typical configuration is:

Model data = RDFDataMgr.loadModel("file:data.n3");
List rules = Rule.rulesFromURL("myrules.rules");

GenericRuleReasoner reasoner = new GenericRuleReasoner(rules);
reasoner.setOWLTranslation(true);               // not needed in RDFS case
reasoner.setTransitiveClosureCaching(true);

InfModel inf = ModelFactory.createInfModel(reasoner, data);

myrules.rules@include

One useful variant on this option, at least in simple cases, is to manually include a pure (non-hybrid) ruleset for the RDFS/OWL fragment you want so that there is no layering problem. [The reason the default rulesets use the hybrid mode is a performance tradeoff - trying to balance the better performance of forward reasoning with the cost of computing all possible answers when an application might only want a few.]

A simple example of this is that the interesting bits of RDFS can be captured by enabling TransitiveClosureCaching and including just the four core rules:

[rdfs2:  (?x ?p ?y), (?p rdfs:domain ?c) -> (?x rdf:type ?c)]
[rdfs3:  (?x ?p ?y), (?p rdfs:range ?c) -> (?y rdf:type ?c)]
[rdfs6:  (?a ?p ?b), (?p rdfs:subPropertyOf ?q) -> (?a ?q ?b)]
[rdfs9:  (?x rdfs:subClassOf ?y), (?a rdf:type ?x) -> (?a rdf:type ?y)]

[Rule Index] [Main Index]

Notes

validate

Validation rules take the general form:

(?v rb:validation on()) ...  ->
    [ (?X rb:violation error('summary', 'description', args)) <- ...) ] .

The validation calls can be "switched on" by inserting an additional triple into the graph of the form:

_:anon rb:validation on() .

This makes it possible to build rules, such as the template above, which are ignored unless validation has been switched on - thus avoiding potential overhead in normal operation. This is optional and the "validation on()" guard can be omitted.

Then the validate call queries the inference graph for all triples of the form:

?x rb:violation f(summary, description, args) .

 rb:violationerrorwarning

Future extensions will improve the formatting capabilities and flexibility of this mechanism.

[Rule Index] [Main Index]

Extensions

There are several places at which the rule system can be extended by application code.

Rule syntax

First, as mentioned earlier, the rule engines themselves only see rules in terms of the Rule Java object. Thus applications are free to define an alternative rule syntax so long as it can be compiled into Rule objects.

Builtins

BuiltingetNamegetArgLengthbodyCallheadActionbodyCall

BuiltinRegistry

The easiest way to experiment with this is to look at the examples in the builtins directory.

Preprocessing hooks

GenericRuleReasoner.addPreprocessingHookRulePreprocessHook

Extending the inference support

InfGraphReasonerReasonerRegistry

In a future Jena release we plan to provide at least one adapter to an example, freely available, reasoner to both validate the machinery and to provide an example of how this extension can be done.

Futures

Contributions for the following areas would be very welcome:

Develop a custom equality reasoner which can handle the "owl:sameAs" and related processing more efficiently that the plain rules engine.
Tune the RETE engine to perform better with highly non-ground patterns.
Tune the LP engine to further reduce memory usage (in particular explore subsumption tabling rather than the current variant tabling).
Investigate routes to better integrating the rule reasoner with underlying database engines. This is a rather larger and longer term task than the others above and is the least likely to happen in the near future.