The Debian Package Tracking System is a Web dashboard for Debian contributors and advanced users. This central tool publishes the status of subsequent releases of source packages in the Debian distribution.
It has been improved to generate RDF meta-data documenting the source packages, their releases and links to other packaging artifacts, using the ADMS.SW 1.0 model. This constitutes an authoritative source of machine-readable Debian "facts" and proposes a reference URI naming scheme for Linked Data resources about Debian packages.
This should enable the interlinking of these Debian package descriptions with other ADMS.SW or DOAP descriptions of FLOSS projects available on the Semantic Web also using Linked Data principles. This will be particularly interesting for traceability with upstream projects whose releases are packaged in Debian, derivative distributions reusing Debian source packages, or with other FLOSS distributions.
This is a revised version of a previous paper  which was initially accepted at the 8th International Workshop on Semantic Web Enabled Software Engineering (SWESE 2012), but that the authors weren't able to present physically at the workshop.
Asset Description Metadata Schema for Software (ADMS.SW) is a novel ontology developped for describing software packages, releases and projects, which can be applied to describe packages in a Free, Libre and Open Source Software (FLOSS) distribution, using Semantic Web techniques. We consider it is a foundational component that will allow to conduct future Quality Assurance or other large scale FLOSS studies across the Linked Open Data cloud .
FLOSS software ecosystems are composed of many different actors collaborating around single programs, from original upstream authors to downstream packagers in distributions like Debian. Descriptions of FLOSS development artifacts made with standardized and semantic formats like ADMS.SW can help trace some of the process which generally happen in various venues across the ecosystem.
The Need for Linked Data Descriptions of FLOSS
Constructing models of interactions happening along the FLOSS production lines can be interesting, both for researchers and practitioners. Research in empirical software engineering can for instance involve studies conducted by modeling properties and relations between FLOSS production artifacts and actors.
The Semantic Web techniques bring key benefits in terms of semantic interoperability : using a W3C standard like RDF  which is natively extensible helps integrate potentially incoherent data, which fits quite well large scale problems. The size of the communities and diversity of actors and tools present in large FLOSS ecosystems qualify well for such an approach .
The Linked Data approach , can be very convenient to interlink resources representing actors or artifacts belonging to different projects described with RDF. It will allow researchers to integrate in the same "triple store" database, description of FLOSS artifacts or actors with variable structures, still relying on common semantics and a harmonized URI nomenclature that reflects the origin of these resources.
But for FLOSS developers alike, these semantic Web Techniques should offer potential interesting applications, in particular to create new global services that need to interconnect different heterogenous project tools . As an illustration, we can imagine a new "global bug tracking system" that aims at correlating similar bug reports filed in different Linux distributions. It can be helpful to offer better support responses, allowing navigation between reports which may have been related to each-other previously. Such a system will require to interface to lots of different bugtracker APIs. Whereas standards like Open Services for Lifecycle Collaboration (OSLC)  (which rely on extensible semantic formats based on RDF and REST APIs) can help solve some concrete interoperability issues, they only address parts of the problems (and their deployment is not yet spectacular among FLOSS project). Actually, even once semantically compatible data has been collected, it must be integrated in a coherent data store. And therefore, nomenclature, freshness and accuracy issues still represent interesting challenges. Addressing them is a foundational requirement for large scale applications described above.
Authoritative Linked Data Descriptors Produced by FLOSS Projects
We postulate that there are higher chances that meta-data is more accurate and up-to-date when it is produced closest to the very heart of the FLOSS projects, than obtained after a series of collection and conversion activities conducted by third parties. Thus, with the Linked Data principles in mind, significant artifacts produced by FLOSS projects ought to be complemented with meta-data available at the very same Web domains, as a minimal set of authoritative RDF resources. URIs naming these resources can then be rooted at the project's domain name, and serve to identify its artifacts unambiguously.
As an illustration, Semantic Web resources describing
projects from the Apache foundation would be downloaded
from RDF documents available on
http://projects.apache.org/ which would
identify them for instance with URIs like
<http://PROJNAME.apache.org/> (or a
<http://PROJNAME.apache.org/doap#project>)1. Thus, in the description of
the Debian packaging the Apache
program, we could reference its upstream project (from
Debian's point of view) as the RDF resource
Our initiative, coupled with other previous and current efforts, will hopefully help achieve a state when almost every FLOSS project are able to publish on their Web sites or development forges, even minimal, but authoritative Linked Data descriptions of the project or its software artifacts, either as Descriptions Of A Project (DOAP)  or ADMS.SW.
Goal and Structure of this Paper
This paper will introduce a Linked Data interface based on ADMS.SW, which was deployed on the Debian Package Tracking System (PTS), that produces authoritative meta-data descriptions for the core artifacts produced by the Debian project: source packages.
Due to Debian's respected position in the FLOSS ecosystem, such a deployment already covers a great percentage of all FLOSS programs, and can thus be inspirational for many FLOSS projects.
In section 2, we introduce the ADMS.SW specification. Then a brief introduction to the structure of Debian source packages is provided in section 3. Section 4 documents the choices adopted for generating Linked Data representations of Debian source packages and related FLOSS artifacts in the Debian PTS. Section 5 presents a quick review of similar and complementary initiatives, while section 6 illustrates how a trivial project matching can be made with collected Linked Data descriptions.
The ADMS.SW Specification
The Asset Description Metadata Schema for
Software (ADMS.SW) specification
is described as :
[...] a metadata vocabulary to describe
software making it possible to more easily explore, find,
and link software on the Web.
It is an outcome of the ISA programme (Interoperability Solutions for European Public Administrations) of the European Commission, elaborated by a working group of software catalogues and forges experts2. Although it is not specifically covering FLOSS software only, ADMS.SW has nevertheless been geared at addressing meta-data of FLOSS projects hosted in public development forges to facilitate their identification and reuse by Public Administrations.
ADMS.SW specifications are published with a
complementary OWL ontology, referenced as
http://purl.org/adms/sw/, to allow the
publishing of such meta-data as RDF.
As illustrated in Figure 1, it provides three main entities : Software Project, Software Release, and Software Package to model meta-data about software programs, their versions, and the distribution archives of these.
But it also contains various elements related to Software Repositories descriptions in order to facilitate the maintenance of data managed by software catalogues (provenance, timestamping, etc.), based on the RADion common model of ADMS, which describes generic semantic assets.
ADMS.SW 1.0 reuses existing specifications and standards, such as Dublin Core , DOAP , SPDXTM , ISO 19770-2 , ADMS , and the Sourceforge Trove software map. As DOAP is already widely used in practice, ADMS.SW reuses much of its properties. ADMS.SW is also interoperable with the SPDX specification, whose main object, to date, is the description of copyright and license conditions applying to particular software packages or source files.
Debian Source Packages
The Debian project creates a Free Software distribution, which contains thousands of FLOSS binary packages ready to be installed on various computer architectures. Several versions of the Debian distribution are maintained in parallel, in three main suites : `stable', `testing' and `unstable'.
Debian has been studied by many authors, as it represents a good proxy for the entire FLOSS ecosystem, due to the high number of packages it contains, and since its development and Quality Assurance (QA) infrastructure is generally open or easily accessible to researchers in empirical software engineering (see for instance  or ).
Structure of Debian Source Packages
Each binary package is actually built from a particular
Debian source package. Source packages contain
"Makefiles" for package generation,
files containing different meta-data like versions or
package dependency descriptions, and other scripts
necessary for installation, configuration, upgrade or
removal of the binary packages . In addition, it is quite common to
include patches applying to the source code of the
packaged program, to adjust it to Debian specificities or
to include security fixes backported from later upstream
Each revision of a Debian source package is then
generally composed of two file archives : one for the
source code of the upstream version of the packaged
program (ending in
complemented by another one for these Debian specific
files (ending in
.debian.tar.gz)3. Only the latter Debian
specific files archive, and associate meta-data
descriptors change between subsequent revisions of Debian
source packages of the same version of an upstream
The Debian Package Tracking System
For every Debian source package, the Debian Package
Tracking System (PTS) provides a Web dashboard (see a
screenshot, taken from
in Figure 2) which displays
almost all there is to know about the status of that
However, its HTML pages are not really exploitable by machines in a direct form, should anyone need to interface the Debian QA system with other services. One such need seems quite obvious for derivative distributions constructed from Debian, like Ubuntu. Therefore, the PTS provides a custom SOAP interface, but the lack of standard representation of data retrieved from this API may require another ad-hoc converter to be added to every application wishing to interface with it.
As an alternative, we have started implementing a Linked Data  interface for the Debian PTS, using the ADMS.SW ontology to represent Debian source package facts with the standard, thus interoperable, RDF model.
In this section, we present a few complementary initiatives which describe software packages with RDF vocabularies, using DOAP or ADMS.SW and which could be interesting for interoperability with the Debian PTS.
DOAP Published by FLOSS Directories
A number of projects maintain public DOAP descriptions of their programs, or other RDF descriptions of meta-data about the releases they produce. They may be interested in complementing descriptions with ADMS.SW, or could offer sources of descriptions that could be interlinked with the ones produced by the Debian PTS.
A quick survey conducted by the authors showed the following sources5 :
- Gnome project
- Apache project
- PyPI (Python Package Index) directory
- CPAN (Comprehensive Perl Archive Network) directory
A quick review of samples from these sources showed a
lack of consensus on the use of certain meta-data, and
that URIs adopted to reference the same projects or
programs tend to vary, even for
URLs (a great portion of these documents are manually
crafted, or projects may have various pages that can be
considered their homepage, in particular when the
project is not hosted on its own top level domain).
Projects Hosted on FusionForge Forges
An ADMS.SW plugin for the FusionForge 5.2 software development forge has also been created by one of the authors, in order to allow the description of projects hosted on FusionForge based development forges. It may be complemented by another FusionForge plugin providing FOAF profiles  for project participants, which can enrich the Linked Data representations.
The plugin is still under active development, and targetted at a post 5.2 release of FusionForge, so it will take a certain time until it is deployed on public forges hosting FLOSS projects7.
Consuming ADMS.SW in the Joinup Portal
The Joinup portal of the ISA programme aims at integrating in a single portal FLOSS descriptions available from different Public Administration forges, by harvesting descriptions of projects directly in their development project spaces, as ADMS.SW descriptions (more details at https://joinup.ec.europa.eu/software/federated_forge).
Whereas the current version of Joinup doesn't rely on Semantic Web techniques for collection of the projects descriptions, it is expected to be improved to evolve towards ADMS.SW consuming in the future. FLOSS Project descriptions would then complement other Semantic Assets (standards, documentation) catalogued and made available on the reference portal at Joinup as semantic assets expressed with the ADMS vocabulary.
Interlinked Developer Profiles
Project descriptions aren't the only resources that can be interlinked across the FLOSS ecosystem. Iqbal shows in  how developer profiles can also be converted to RDF and interlinked to create a more comprehensive view of the developer communities around a project, for instance. This approach usually involves mining repositories or social sites through custom interfaces (via SOAP for instance), and later converting to RDF. But we believe there would be a great benefit in avoiding such potentially error-prone conversions if development platforms would natively produce DOAP/ADMS.SW (or FOAF) descriptions "out of the box", as explained above.
As with every Linked (Open) Data initiatives, the use of standard representations and their availability on the Semantic Web can lead to lots of different uses.
An obvious case of using such ADMS.SW description of Debian source package is the matching of Debian packages with other packages/projects described in their respective projects directories, allowing more interlinking of resources.
Matching Projects / Software Across Repositories
doap:homepage of the "upstream"
SoftwareProject resources generated by the Debian
PTS can be an obvious matching key, provided that one has
a database of upstream project descriptions (as DOAP ).
As an illustration, we demonstrate this by loading DOAP
descriptions of projects of the Apache foundation
together with a dump of the Debian source package
descriptions in a single triple store (virtuoso). The
example SPARQL query in Listing 3 shows how to query for such
matches between Debian and Apache.
Such a query currently returns 62 matched source packages and Apache upstream projects (see an excerpt in table 1, where URLs have been compacted for brievity).
But the reliability of this matching method isn't very good in practice. There may be many more Apache foundation projects packaged in Debian, but the maintainers may have forgotten to add a homepage link in the package descriptors. Or the URLs mentioned may not be matching, as project homepage naming conventions can vary (and evolve in time).
An alternate matching method could be based on project name litterals, but that isn't always feasable either, due to homonimy for instance. One will refer to  for an analysis of this problem.
The distromatch project, started in 2011, intended to try and help solve these project/packages matching issues, although it is unfortunately not maintained anymore at the time of writing.
In any case, this first quick proof of concept allows us to plan further developments based on such meta-data, which will be tested on real life cases, for instance in constructing RDF harvesters and meta-data aggregators, and eventually merging with initiatives like distromatch.
Large Scale Perspective
The RDF-ization of the Debian PTS has just started. Next steps will include modelling of relations between source and binary packages. These will probably require extending ADMS.SW or integrating complementary ontologies.
When deployments of ADMS.SW have been made on software forges (like FusionForge servers), software catalogues (like Joinup) or in other FLOSS distributions, it will become one of the tools allowing automated traceability at large scale of FLOSS releases and associated artefacts, by interlinking their Linked Data resources.
Some interlinking of security advisories, patches, or bug reports (for instance combined with the OSLC-CM standard) should then be easier, and diminish manual intervention needs, for the benefits of all actors along the FLOSS production chains.
We believe the current early result can be a driving force for more deployments around ADMS.SW as a standardization core. However there seems to be a reluctance in adopting RDF in FLOSS projects, to some extent, maybe linked to an erronous perception that RDF must be expressed as XML (which is certainly not the case, with representations of the RDF model like Turtle , which has been adopted as a default for the Debian PTS).
We can foresee that only when novel inter-project "killer" applications making use of such Linked Data will have been developed, will it be possible to convince FLOSS projects that adoption of Linked Data standards descriptions can really be for their own benefit.
It is likely that even when lots of Linked Data descriptions of FLOSS artifacts are made available by major FLOSS projects, achieving effective interoperability will require many implementation efforts, far beyond a single actor's reach. More standardisation will be needed, and services will have to be provided to establish trusted reference catalogues of Semantic project descriptors (in the direction set by Joinup of the distromatch project for instance). Such actors would provide FLOSS "semantic hubs", or project matching "brokers" which could maintain reference interlinking relations for the concurrent Semantic descriptions which were produced in the many venues of the FLOSS ecosystem.
We have presented a first significant deployment of an ADMS.SW 1.0 implementation, which illustrates the potential for interlinking large sets of FLOSS project descriptions on the Semantic Web. ADMS.SW allows us to describe relations between projects, programs and their releases so that such entities become part of the Linked Open Data "cloud".
In , we envisioned some novel uses of Linked Data representations of FLOSS development artefacts, both for software engineers and researchers observing their efforts. But to achieve the full potential of that approach, the Linked Data representations must be semantically interoperable, authoritative, accurate, and using standard naming schemes for the same resources. We have achieved a first concrete step in this direction, through the current results for the Debian PTS.
The way we did it for the Debian PTS can be inspirational for other FLOSS distributions, either independant, or derived from Debian. By integrating such meta-data generation in the heart of the technical infrastructure of Debian, we hope to establish such an authoritative reference for Debian source packages identification on the Semantic Web.
- Dave Beckett and Tim Berners-Lee. Turtle - terse RDF triple language, W3C team submission, 2008. http://www.w3.org/TeamSubmission/turtle/
- Olivier Berger. Linked data descriptions of debian source packages using ADMS.SW. In Elisa F. Kendall, Jeff Z Pan, Ljiljana Stojanovic, and Yuting Zhao, editors, SWESE 2012: 8th International Workshop on Semantic Web Enabled Software Engineering, pages 43-55, Nara, Japan, 2012. https://hal.archives-ouvertes.fr/hal-00820259/
- Olivier Berger, Sabri Labbene, Madhumita Dhar, and Christian Bac. Introducing OSLC, an open standard for interoperability of open source development tools. In ICSSEA, pages ISSN-0295-6322, Paris, France, 2011. https://hal.archives-ouvertes.fr/hal-00679487/
- Olivier Berger, Ion Valentin Vlasceanu, Christian Bac, Quang Vu Dang, and Stéphane Lauriere. Weaving a semantic web across OSS repositories: Unleashing a new potential for academia and practice. International Journal of Open Source Software and Processes (IJOSSP), 2(2):29-40, 2010. http://www-public.telecom-sudparis.eu/~berger_o/IJOSSP-2010-2/IJOSSP.html
- Christian Bizer, Tom Heath, and Tim Berners-Lee. Linked Data - The Story So Far. International Journal on Semantic Web and Information Systems (IJSWIS), 5(3):1-22, 3 2009. http://dblp.org/rec/journals/ijswis/BizerHB09
- E Gabriella Coleman. Coding Freedom: The Ethics and Aesthetics of Hacking. Princeton University Press, 2012. http://press.princeton.edu/titles/9883.html
- Edd Dumbill. Decentralizing software project registries with DOAP. In Intelligent Search on XML Data - XML, 2004.
- ISO/IEC 19770-2: Software identification tag standard http://www.iso.org/iso/catalogue_detail.htm?csnumber=53670
- Software Package Data eXchange specification, 2011.
- Asset Description Metadata Schema specification 1.00, 2012. https://joinup.ec.europa.eu/asset/adms/asset_release/adms
- Jesus M. Gonzalez-Barahona, Gregorio Robles, Martin Michlmayr, Juan José Amor, and Daniel M. German. Macro-level software evolution: a case study of a large software compilation. Empirical Software Engineering, 14(3):262-285, June 2009. http://dx.doi.org/10.1007/s10664-008-9100-x
- Mike Graves, Adam Constabaris, and Dan Brickley. FOAF: Connecting People on the Semantic Web. Cataloging & classification quarterly, 43(3):191-202, April 2007. http://www.tandfonline.com/doi/abs/10.1300/J104v43n03_10#
- James Howison. Cross-repository data linking with RDF and OWL: Towards common ontologies for representing FLOSS data. In WoPDaSD (Workshop on Public Data at International Conference on Open Source Software), 2008. http://floss.syr.edu/content/cross-repository-data-linking-rdf-and-owl-towards-common-ontologies-representing-floss-data
- Aftab Iqbal and Michael Hausenblas. Integrating developer-related information across open source repositories. In Information Reuse and Integration (IRI), 2012 IEEE 13th International Conference on, pages 69 -76, aug. 2012. http://dx.doi.org/10.1109/IRI.2012.6302993
- Ian Jackson, Christian Schwarz, et al. Debian policy manual. version 126.96.36.199, 2012-09-19. http://www.debian.org/doc/debian-policy/
- Ora Lassila, Ralph R. Swick, and World Wide Web Consortium. Resource description framework (RDF) model and syntax specification, 1998. W3C Recommendation. http://www.w3.org/TR/1999/REC-rdf-syntax-19990222/
- Martin Michlmayr. Managing debian. AUUGN, The journal of AUUG Inc., 25(3), 9 2004. www.ukuug.org/events/linux2003/papers/michlmayr.pdf
- Megan Squire. Integrating projects from multiple open source code forges. IJOSSP, 1(1):46-57, 2009. http://dx.doi.org/10.4018/jossp.2009010103
- Stuart L. Weibel, John A. Kunze, Carl Lagoze, and Misha Wolf. Dublin core metadata for resource discovery, 1998. RFC 2413. https://datatracker.ietf.org/doc/rfc2413/