RDFLib

A Universal RDF Store Interface

This document attempts to summarize some fundamental components of an RDF store. The motivation is to outline a standard set of interfaces for providing the neccessary support needed in order to persist an RDF Graph in a way that is universal and not tied to any specific implementation. For the most part, the core RDF model is adhered to as well as terminology that is consistent with the RDF Model specifications. However, this suggested interface also extends an RDF store with additional requirements neccessary to facilitate the aspects of Notation 3 that go beyond the RDF model to provide a framework for First Order Predicate Logic processing and persistence.

Terminology

Context: A named, unordered set of statements. Also could be called a sub-graph. The named graphs literature and ontology are relevant to this concept. A context could be thought of as only the relationship between an RDF triple and a sub-graph (this is how the term context is used in the Notation 3 Design Issues page) in which it is found or the sub-graph itself.

It's worth noting that the concept of logically grouping triples within an addressable 'set' or 'subgraph' is just barely beyond the scope of the RDF model. The RDF model defines a graph as an arbitrary collection of triples and the semantics of these triples, but doesn't give guidance on how to consistently address such arbitrary collections. Though a collection of triples can be thought of as a resource itself, the association between a triple and the collection it is a part of is not covered. Public RDF is an example of an attempt to formally model this relationship - and includes one other unrelated extension: Articulated Text

Conjunctive Graph: This refers to the 'top-level' Graph. It is the aggregation of all the contexts within it and is also the appropriate, absolute boundary for closed world assumptions / models. This distinction is the low-hanging fruit of RDF along the path to the semantic web and most of it's value is in (corporate/enterprise) real-world problems:

There are at least two situations where the closed world assumption is used. The first is where it is assumed that a knowledge base contains all relevant facts. This is common in corporate databases. That is, the information it contains is assumed to be complete
From a store perspective, closed world assumptions also provide the benefit of better query response times due to the explicit closed world boundaries. Closed world boundaries can be made transparent by federated queries that assume each ConjunctiveGraph is a section of a larger, unbounded universe. So a closed world assumption does not preclude you from an open world assumption.

For the sake of persistence, Conjunctive Graphs must be distinguished by identifiers (that may not neccessarily be RDF identifiers or may be an RDF identifier normalized - SHA1/MD5 perhaps - for database naming purposes ) which could be referenced to indicate conjunctive queries (queries made across the entire conjunctive graph) or appear as nodes in asserted statements. In this latter case, such statements could be interpreted as being made about the entire 'known' universe. For example:

<urn:uuid:conjunctive-graph-foo> rdf:type :ConjunctiveGraph
<urn:uuid:conjunctive-graph-foo> rdf:type log:Truth
<urn:uuid:conjunctive-graph-foo> :persistedBy :MySQL

Quoted Statement: A statement that isn't asserted but is referred to in some manner. Most often, this happens when we want to make a statement about another statement (or set of statements) without neccessarily saying these quoted statements (are true). For example:

Chimezie said "higher-order statements are complicated"

Which can be written as (in N3):

:chimezie :said {:higherOrderStatements rdf:type :complicated}

Formula: A context whose statements are quoted or hypothetical.

Context quoting can be thought of as very similar to reification. The main difference is that quoted statements are not asserted or considered as statements of truth about the universe and can be referenced as a group: a hypothetical RDF Graph

Universal Quantifiers / Variables. (relevant references):

Terms Terms are the kinds of objects that can appear in a quoted/asserted triple. This includes those that are core to RDF:

  • Blank Nodes
  • URI References
  • Literals (which consist of a literal value,datatype and language tag)
Those that extend the RDF model into N3:
  • Formulae
  • Universal Quantifications (Variables)
And those that are primarily for matching against 'Nodes' in the underlying Graph:
  • REGEX Expressions
  • Date Ranges
  • Numerical Ranges

Nodes Nodes are a subset of the Terms that the underlying store actually persists. The set of such Terms depends on whether or not the store is formula-aware. Stores that aren't formula-aware would only persist those terms core to the RDF Model, and those that are formula-aware would be able to persist the N3 extensions as well. However, utility terms that only serve the purpose for matching nodes by term-patterns probably will only be terms and not nodes.

The set of nodes  of an RDF graph is the set of subjects and objects of triples in the graph.

Context-aware: An RDF store capable of storing statements within contexts is considered context-aware. Essentially, such a store is able to partition the RDF model it represents into individual, named, and addressable sub-graphs.

Formula-aware: An RDF store capable of distinguishing between statements that are asserted and statements that are quoted is considered formula-aware.

Such a store is responsible for maintaining this seperation and ensuring that queries against the entire model (the aggregation of all the contexts - specified by not limiting a 'query' to a specifically name context) do not include quoted statements. Also, it is responsible for distinguishing universal quantifiers (variables).

These 2 additional concepts (formulae and variables) must be thought of as core extensions and distinguishable from the other terms of a triple (for the sake of the persistence rountrip - at the very least). It's worth noting that the 'scope' of universal quantifiers (variables) and existential quantifiers (BNodes) is the formula (or context - to be specific) in which their statements reside. Beyond this, a Formula-aware store behaves the same as a Context-aware store.

Conjunctive Query: Any query that doesn't limit the store to search within a named context only. Such a query expects a context-aware store to search the entire asserted universe (the conjunctive graph). A formula-aware store is expected not to include quoted statements when matching such a query.

N3 Round Trip: This refers to the requirements on a formula-aware RDF store's persistence mechanism necessary for it to be properly populated by a N3 parser and rendered as syntax by a N3 serializer.

Transactional Store: An RDF store capable of providing transactional integrity to the RDF operations performed on it.

Interpreting Syntax

The following Notation 3 document:

{?x a :N3Programmer} => {?x :has [a :Migrane]}

Could cause the following statements to be asserted in the store:

_:a log:implies _:b

This statement would be asserted in the partition associated with quoted statements (in a formula named _:a)

?x rdf:type :N3Programmer

Finally, these statements would be asserted in the same partition (in a formula named _:b)

?x :has _:c

_:c rdf:type :Migrane

Formulae and Variables as Terms

Formulae and variables are distinguishable from URI references, Literals, and BNodes by the following syntax:
{ .. } - Formula
  ?x   - Variable
They must also be distinguishable in persistance to ensure they can be round tripped. Other issues regarding the persistence of N3 terms.

Database Management

An RDF store should provide standard interfaces for the management of database connections. Such interfaces are standard to most database management systems (Oracle, MySQL, Berkeley DB, Postgres, etc..)

The following methods are defined to provide this capability:
  • def open(self, configuration, create=True)

    Opens the store specified by the configuration string. If create is True a store will be created if it does not already exist. If create is False and a store does not already exist an exception is raised. An exception is also raised if a store exists, but there is insufficient permissions to open the store.

  • def close(self, commit_pending_transaction=False)

    This closes the database connection. The commit_pending_transaction parameter specifies whether to commit all pending transactions before closing (if the store is transactional).

  • def destroy(self, configuration)
  • This destroys the instance of the store identified by the configuration string.

The configuration string is understood by the store implementation and represents all the neccessary parameters needed to locate an individual instance of a store. This could be similar to an ODBC string, or in fact be an ODBC string if the connection protocol to the underlying database is ODBC. The open function needs to fail intelligently in order to clearly express that a store (identified by the given configuration string) already exists or that there is no store (at the location specified by the configuration string) depending on the value of create.

Triple Interfaces

An RDF store could provide a standard set of interfaces for the manipulation, management, and/or retrieval of it's contained triples (asserted or quoted):

  • def add(self, (subject, predicate, object), context=None, quoted=False)

    Adds the given statement to a specific context or to the model. The quoted argument is interpreted by formula-aware stores to indicate this statement is quoted/hypothetical. It should be an error to not specify a context and have the quoted argument be True. It should also be an error for the quoted argument to be True when the store is not formula-aware.

  • def remove(self, (subject, predicate, object), context)
  • def triples(self, (subject, predicate, object), context=None)

    Returns an iterator over all the triples (within the conjunctive graph or just the given context) matching the given pattern. The pattern is specified by providing explicit statement terms (which are used to match against nodes in the underlying store), or None - which indicates a wildcard. NOTE: This interface is expected to return an interator of tuples of length 3, corresponding to the 3 terms of matching statements, which can be either of : URIRef, Blank Node, Literal, Formula, Variable, or (perhaps) a Context

    This function can be thought of as the primary mechanism for producing triples with nodes that match the correesponding terms and term pattern provided.

    A conjunctive query can be indicated by either providing a value of NULL/None/Empty string value for context or the identifier associated with the Conjunctive Graph.

  • def __len__(self, context=None)

    Number of statements in the store. This should only account for non-quoted (asserted) statements if the context is not specified, otherwise it should return the number of statements in the formula or context given.

Formula / Context Interfaces

These interfaces work on contexts and formulae (for stores that are formula-aware) interchangeably.

  • def contexts(self, triple=None)

    Generator over all contexts in the graph. If triple is specified, a generator over all contexts the triple is in.

  • def remove_context(self, identifier)

Interface Test Cases

Basic Interface Test Case

(parsing,triple patterns,triple pattern removes,size,contextual removes):

Source Graph:

@prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix : <http://test/> .
{:a :b :c;a :foo} => {:a :d :c}.
_:foo a rdfs:Class.
:a :d :c.

Tests

implies = URIRef("http://www.w3.org/2000/10/swap/log#implies")
a = URIRef('http://test/a')
b = URIRef('http://test/b')  
c = URIRef('http://test/c')
d = URIRef('http://test/d')
for s,p,o in g.triples((None,implies,None)):
  formulaA = s
  formulaB = o
#contexts test
assert len(list(g.contexts()))==3
#contexts (with triple) test
assert len(list(g.contexts((a,d,c))))==2
#triples test cases
assert type(list(g.triples((None,RDF.type,RDFS.Class)))[0][0]) == BNode
assert len(list(g.triples((None,implies,None))))==1
assert len(list(g.triples((None,RDF.type,None))))==3
assert len(list(g.triples((None,RDF.type,None),formulaA)))==1
assert len(list(g.triples((None,None,None),formulaA)))==2        
assert len(list(g.triples((None,None,None),formulaB)))==1
assert len(list(g.triples((None,None,None))))==5
assert len(list(g.triples((None,URIRef('http://test/d'),None),formulaB)))==1
assert len(list(g.triples((None,URIRef('http://test/d'),None))))==1
#Remove test cases
g.remove((None,implies,None))
assert len(list(g.triples((None,implies,None))))==0
assert len(list(g.triples((None,None,None),formulaA)))==2
assert len(list(g.triples((None,None,None),formulaB)))==1
g.remove((None,b,None),formulaA)
assert len(list(g.triples((None,None,None),formulaA)))==1
g.remove((None,RDF.type,None),formulaA)
assert len(list(g.triples((None,None,None),formulaA)))==0
g.remove((None,RDF.type,RDFS.Class))
#remove_context tests
formulaBContext=Context(g,formulaB)
g.remove_context(formulaB)
assert len(list(g.triples((None,RDF.type,None))))==2
assert len(g)==3
assert len(formulaBContext)==0
g.remove((None,None,None))
assert len(g)==0

Formula and Variables Term Test Case(s)

Source Graph:

@prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix : <http://test/> .
{?x a rdfs:Class} => {?x a :Klass}.

Tests

implies = URIRef("http://www.w3.org/2000/10/swap/log#implies")
klass = URIRef('http://test/Klass')
for s,p,o in g.triples((None,implies,None)):
  formulaA = s
  formulaB = o
assert type(formulaA) == Formula
assert type(formulaB) == Formula
for s,p,o in g.triples((None,RDF.type,RDFS.Class)),formulaA):
  assert type(s) == Variable
for s,p,o in g.triples((None,RDF.type,klass)),formulaB):
  assert type(s) == Variable

Transactional Test Case(s)

Additional Terms to Model

These are a list of additional kinds of RDF terms (all of which are special Literals)

  • RegExLiteral - a REGEX string which can be used in any term slot in order to match by applying the Regular Expression to statements in the underlying graph.
  • Date (could provide some utility functions for date manipulation / serialization, etc..)
  • DateRange

Namespace Management Interfaces

The following namespace management interfaces (defined in Graph) could be implemented in the RDF store:

  • def bind(self, prefix, namespace)
  • def prefix(self, namespace)
  • def namespace(self, prefix)
  • def namespaces(self)

Open issues

  • Does the Store interface need to have an identifier property or can we keep that at the Graph level?

    The Store implementation needs a mechanism to distinguish between triples (quoted or asserted) in ConjunctiveGraphs (which are mutually exclusive universes in systems that make closed world ssumptions - and queried seperately). This is the seperation the store identifier provides. This is different from the name of a context within a ConjunctiveGraph (or the default context of a conjunctive graph). I tried to diagram the logical seperation of ConjunctiveGraphs,SubGraphs, and QuotedGraphs in this diagram

    An identifier of None can be used to indicate the store (aka all contexts) in methods like triples, __len__, etc. This works as long as we're only dealing with one Conjunctive Graph at a time -- which may not always be the case.

  • Is there any value in persisting terms that lie outside N3 (RegExLiteral,Date,etc..)?

    Potentially, not sure yet.

  • Should a conjunctive query always return quads instead of triples? Since knowing the context from which a triple was matched is an essential aspect of query construction / optimization? Or if having the triples function yield/produce different length tuples is problematic could an additional - and slightly redundant - interface be introduced?:

    def quads((subject,predicate,obj))

    Stores that weren't context-aware could simply return None as the 4th item in the produced/yielded tuples or simply not support this interface

Comments regarding A Universal RDF Store Interface

Login to submit a comment.