I’ve been searching for interesting ways to manipulate RDF graphs in Python, to create an application that would handle Linked Data Resources in an OO-way, i.e. using Python classes and not tables/sets/lists of triples. The data will be persisted in graphs in a triple store, accessed through a SPARQL enpoint.
In this post, I’ll illustrate how I managed to tie RDFLib’s SPARQLStore plugin and RDFAlchemy to reach a rather nice looking result.
RDFLib provides tools to manipulate graphs, but most of the examples I found didn’t load Graph instances from SPARQL, and generally used SPARQLWrapper results (tables) manually. Here comes a first tool to the rescue : SPARQLStore, which allows to dynamically query a remote SPARQL endpoint when navigating an RDFLib graph. I couldn’t find a lot of documentation about it, but I found some hints in slide 51 of the excellent presentation by Tatiana Al-Chueyr Linking the world with Python and Semantics.
Now, I know how I can manipulate the RDFLib Graphs queried from a remote SPARQL endpoint with the usual methods, I’m still not satisfied, because I want OO stuff, not lists of
(predicate, object) tuples.
Here comes the second tool, RDFAlchemy, which allows to create “descriptor” classes mapping RDF Resources to Python classes. Here again, there isn’t much docs available, but I found the excellent tutorial by Ted Lawless Reading and writing RDF for VIVO with RDFAlchemy.
Now we can put the 2 pieces together, modulo a few precautions : the SPARQLStore plugin of RDFLib will generate SPARQL queries everytime a graph is navigated, whereas RDFAlchemy expects the graph to be in memory (at least AFAIU). So we’ll have to manually pre-load the contents of the graphs that we need for all the attributes of the descriptor classes.
I’ve written an example code that illustrates this by trying to query french films in Wikipedia. Here’s a copy of the gist. Attention, it will query DBPedia a lot, so pay attention to the bandwidth and memory if you change parameters.
It isn’t perfect, and I still need to investigate benefits and limitations of the approach. On clear limitation is the number of SPARQL queries made on the endpoint, instead of a smarter pre-loading.
Also, on a side note, during tests I could spot a few issues with SPARQLStore, which seems lagging behind… probably not used by so many people. The RDFAlchemy “project” doesn’t seem to be in great health, mostly unmaintained from what I can see (and notice that I linked to the GitHub clone/fork that seemed to be the latest maintained while the original author’s seems dead), but nevertheless, the code works with a more recent RDFLib, so that’s not so bad.
Stay tuned for more adventures in Linked Data land in Python.