I’ve been experimenting with Linked Open Data about FLOSS projects harvested from different sources of DOAP or ADMS.SW descriptions. I’ve tried and match upstream projects of Debian packages with upstream projects hosted at Apache, Gnome, or Alioth.debian.org, or catalogued on Pypi.
I’m matching them on identical values of the Homepage field (comparing the Homepage Control field set by Debian packagers with the doap:homepage meta-data in the RDF documents harvested from the upstream project catalogues).
Here are initial results of my little experiment, for number of matched projects, and results on project name’s similarity :
||Total matching projs
||Exact same project name
||Same project name (case independant)
||0 (0 %)
||0 (0 %)
||13 (81 %)
||13 (81 %)
||217 (49 %)
||273 (62 %)
||0 (0 %)
||7 (33 %)
||293 (58 %)
The data set contains tens of thousands of projects, with probably many duplicates, but from all of these, only 507 have common homepages.
As you can see, in some cases, the Debian source package names match the upstream project name (sometimes with lower/upper case variants), but in general, the project names aren’t identical, so it is interesting to try and match them by homepage.
For the curious ones, the Apache, Gnome and Pypi project catalogues use to provide RDF meta-data for quite some time. More recently have we introduced ADMS.SW meta-data for Debian source packages, and even more recently for the Alioth projects (through the ADMS.SW exporter plugin for FusionForge).
There are still some ways for improvements, for instance to normalize homepage URLs which tend to vary (trailing slashes, or different HTTP/HTTPS schemes).
Stay tuned for more details.
I’ll be presenting “Authoritative linked data descriptions of debian source packages using ADMS.SW” at OSS 2013.
Here’s the abstract :
The Debian Package Tracking System is a Web dashboard for Debian contributors and advanced users. This central tool publishes the status of subsequent releases of source packages in the Debian distribution.
It has been improved to generate RDF meta-data documenting the source packages, their releases and links to other packaging artifacts, using the ADMS.SW 1.0 model. This constitutes an authoritative source of machine-readable Debian “facts” and proposes a reference URI naming scheme for Linked Data resources about Debian packages.
This should enable the interlinking of these Debian package descriptions with other ADMS.SW or DOAP descriptions of FLOSS projects available on the Semantic Web also using Linked Data principles. This will be particularly interesting for traceability with upstream projects whose releases are packaged in Debian, derivative distributions reusing Debian source packages, or with other FLOSS distributions.
Update: If you are interested, a preprint is available here in HTML form. See also previous installments on ADMS.SW in this blog.
Update: The slides of the presentation I made at Isola are here.
In the series of my efforts of pushing for more Linked Data to be published by FLOSS development tools, here’s another installment.
The ADMS.SW plugin for FusionForge that I’ve developped has seen its first public deployment on the ADULLACT forge, thanks to the funding of the ISA programme of the European Commission.
This means that the 500 more projects hosted on the ADULLACT forge, mainly developped by public adminstrations, are now documented using the RDF Turtle dialect, as Linked Data.
A first use can be for Free and Open source portals which will be able to harvest them from the source.
See more details at First deployment of ADMS.SW plugin for FusionForge on Adullact.
Other forges are expected to follow, like CENATIC’s or Debian’s Alioth, all powered by FusionForge.
The plugin is not yet perfect, and in particular wrt performance issues, but that was kind of expected from a first prototype. Stay tuned for more news.
The Debian PTS now speaks the Turtle representation format for the export of RDF meta-data about Debian source packages.
Alongside HTML pages for humans, and the RDF/XML that had already been added to it this means that a new flavour of RDF is now available.
The Turtle format offers the benefits of both machine-readable meta-data, and a somehow human readable textual format too.
For instance, you may check the apache2 Turtle meta-data from the command-line with :
$ curl -L -s -H "Accept: text/turtle" http://packages.qa.debian.org/apache2
Here’s a link to a colorized HTML preview of http://packages.qa.debian.org/a/apache2.ttl.
Under the hood, the XSLT stylesheets of the PTS have been reworked to produce the Turtle format by default, and later convert them to RDF/XML.
Every Debian source package then has a reference URI in the Linked Data word, in the form
http://packages.qa.debian.org/PACKAGE_NAME, that redirects, through proper content-negociation (the HTTP
Accept header) to the HTML, RDF/XML or Turtle documents. For apache2, these are, resp. at http://packages.qa.debian.org/a/apache2.html, http://packages.qa.debian.org/a/apache2.rdf and http://packages.qa.debian.org/a/apache2.ttl.
The meta-data uses the model of the ADMS.SW ontology (1.0), and the content has also been slightly updated to make it more conformant to the ADMS.SW specifications (checks done with the ADMS.SW validator).
Let’s hope this makes RDF more familiar to Debian folks, and allows more Linked Data interlinking with other resources about FLOSS packages.
I have made a presentation at the Paris MinDebconf 2012 about the work I’ve done to bring more semantic meta-data to the Debian PTS (see previous posts).
Here are my slides :
Also available here as PDF.