RDFLib

http://rdflib.net/4RDF / rdflib Merge ( Page )

Licensing

Current rdflib license (or a similar BSD derivative) can be used to cover the effort as long as appropriate credit was given for those involved.

Sean B. Palmer has released his Notation 3 parser under the W3C license in addition to the GPL license and is willing to release it under other licenses if the current two don't suffice for compatibility reasons.

Zope Interfaces

Zope Interfaces can be investigated at a later point. For now, 'vanilla' python will be used to implement the effort

Transactional Backends

Transaction methods should be defined but not required to provide transactional capabilities. At the very minimum, backends that do not support transactions should be marked as such, in the same way that backends that aren't context aware are identified by their context_aware attribute

Optimized Graph Interfaces

4Suite RDF defines a set of shortcuts for common triple matching patterns that the underlying driver can implement directly for speed. Most (if not all) of these map directly to rdflib.Graph interfaces:

  • subjectsFromPredsAndObj => rdflib.Graph.subjects
  • subjectsFromPredAndObjs => rdflib.Graph.subjects
  • objectsFromSubsAndPred => set(rdflib.Graph.objects)
  • objectsFromSubsAndPredNonDistinct => rdflib.Graph.objects
  • objectsFromSubAndPreds => set(rdflib.Graph.objects)
  • isResource => redundant (rdflib's statement part API's are explicit: Literal, URIRef,and BNodes)
  • resources=> rdflib.Graph.triples
  • size => rdflib.Graph.__len__

See the SQL optimizations for an example of an implementation of these optimized interfaces

The backend should be able to override these functions with it's own implementation-specific optimization.

There is also the possiblity that the recently added triple pattern resolver (which is rather fast) can be used to abstract these backend-level optimizations. In addition, this library could be used to generally abstract RDF query languages from the implementation. Some investigation could be done to determine if such optimization (at the triple matching level) is more effective than backend-specific optimization.

Direction of Code Migration

It was agreed that it makes most sense to port existing 4Suite RDF functionality to rdflib and update existing dependencies on 4RDF to use rdflib.Graph instead. Including:

  • Notation 3 parsing
  • Versa implementation
  • DBM, MySQL, Postgres driver implementations
  • Test suites?

Code Repository

What revision repository would house this collaborative effort? Currently, 4Suite development is done in CVS, rdflib in svn. Since the migration of functionality is in the direction of rdflib, the suggestion was to use the existing rdflib svn repository.

We can manage the svn accounts the same way we are managing the accounts for rdflib.net (or any redfoot run site), namely, you can assert a sha hexdigest of a password for yourself. I have a URIRef (http://eikeon.com#) that identifies me, a foaf:Person that I use to assert:

http://eikeon.com# http://redfoot.net/2005/session#hexdigest dc724af18fbdd4e59189f5fe768a5f8311527050

You can compute the hexdigest with the following python:

sha.new(password).hexdigest()

This will allow us to distribute / automate the management of passwords. To login to the site you use the URIRef that identifies you as your username and the password that corresponds to your hexdigest. If svn does not allow a username to be a URIRef we can use one's foaf:nick for their svn username.

Since I've not automated the bit to grab ones password from their foaf and put it into the svn-auth file... here's what needs to be done to convert into the base64 format that the svn-auth file is happy with:
hexdigest = sha.new(password).hexdigest()
digest = base64.b16decode(hexdigest.upper())
b64 = base64.encodestring(digest)
print "Hex digest:", hexdigest
print "Digest:", digest
print "base64:", b64

Versa Datatypes

Versa's datatypes (listed below) are result types from a Versa query (most of which correspond to triple terms in the underlying graph). All are accounted for except for Lists,Booleans,Numbers, and Sets.

  • Versa Resources => rdflib.URIRefs
  • Versa Strings => rdflib.Literals
  • Versa Booleans and Numbers are unaccounted for but are really a special kind of rdflib.Literal. If / when booleans/numbers are promoted as first class triple terms (subclasses of Literals, perhaps) they would map directly to Versa booleans/numbers. Otherwise they could remain as Python booleans/numbers
  • Versa Lists and Sets should not map to RDF collections as they are constructs specific to the query language and should be as efficient to manipulate and iterate over as possible. They probably should remain as Python lists/sets

My earlier attempt to bind Versa datatypes to Python objects and rdflib terms are worth noting

Comments regarding http://rdflib.net/4RDF / rdflib Merge

by Chimezie Ogbuji on Wednesday 28 September, 2005:

The optimized interfaces (that 4RDF's Versa implementation relies heavily on for speed) may not map effiently to their rdflib.Graph counterparts since in every case, one of their parameters is a list, not a single item and would require calling the corresponding interface N times - where N is the length of the list. For example:
subjectsFromPredsAndObj([pred1,pred2,pred3],object)
would result in
subjects=[]
for pred in predicates:
  subjects.append(rdflib.Graph.subjects(pred,object))
return subjects
Once again, if the triple pattern resolver were used, it would (should) be able to optimize the following triple patterns (and solve for ?subj):
?subj pred1 object
?subj pred2 object
?subj pred3 object
Instead of relying on an explicit, and perhaps unoptimized dispatch to an interface directly.

Login to submit a comment.