SPARQL Tutorial

This tutorial aims to introduce you to RDF and SPARQL from the ground up. All examples come from the Nepomuk ontology, and even though the tutorial aims to be generic enough, it mentions things specific to Tracker, those are clearly spelled out.

RDF Triples

RDF data define a graph, composed by vertices and edges. This graph is directed, because edges point from one vertex to another, and it is labeled, as those edges have a name. The unit of data in RDF is a triple of the form:

subject  predicate  object

Or expressed visually:

Triple Graph

Subject and object are 2 graph vertices and the predicate is the edge, the accumulation of those triples form the full graph. For example, the following triples:

<a> a nfo:FileDataObject .
<a> a nmm:MusicPiece .
<a> nie:title "Images" .
<a> nmm:musicAlbum <b> .
<a> nmm:albumArtist <c> .
<a> nmm:albumArtist <d> .
<a> nmm:performer <e> .

<b> a nmm:MusicAlbum .
<b> nie:title "Go Off!" .

<c> a nmm:Artist .
<c> nmm:artistName "Jason Becker" .

<d> a nmm:Artist .
<d> nmm:artistName "Marty Friedman" .

<e> a nmm:Artist .
<e> nmm:artistName "Cacophony" .

Would visually generate the following graph:

Triple Graph

The dot after each triple is not (just) there for legibility, but is part of the syntax. The RDF triples in full length are quite repetitive and cumbersome to write, luckily they can be shortened by providing multiple objects (with , separator) or multiple predicate/object pairs (with ; separator), the previous RDF could be transformed into:

<a> a nfo:FileDataObject, nmm:MusicPiece .
<a> nie:title "Images" .
<a> nmm:musicAlbum <b> .
<a> nmm:albumArtist <c> , <d> .
<a> nmm:performer <e> .

<b> a nmm:MusicAlbum .
<b> nie:title "Go Off!" .

<c> a nmm:Artist .
<c> nmm:artistName "Jason Becker" .

<d> a nmm:Artist .
<d> nmm:artistName "Marty Friedman" .

<e> a nmm:Artist .
<e> nmm:artistName "Cacophony" .

And further into:

<a> a nfo:FileDataObject, nmm:MusicPiece ;
    nie:title "Images" ;
    nmm:musicAlbum <b> ;
    nmm:albumArtist <c>, <d> ;
    nmm:performer <e> .

<b> a nmm:MusicAlbum ;
    nie:title "Go Off!" .

<c> a nmm:Artist ;
    nmm:artistName "Jason Becker" .

<d> a nmm:Artist ;
    nmm:artistName "Marty Friedman" .

<e> a nmm:Artist ;
    nmm:artistName "Cacophony" .

SPARQL

SPARQL defines a query language for RDF data. How does a query language for graphs work? Naturally by providing a graph to be matched, it is conveniently called the "graph pattern".

To begin simple, the simplest query would consist of a triple with all 3 elements defined, e.g.:

ASK { <a> nie:title "Images" }

Which would result in true, as the triple does exist. The ASK query syntax is actually the simplest form of graph testing, resulting in a single boolean row/column containing whether the provided graph exists in the store or not. It also works for more complex graphs, for example:

ASK { <a> nie:title "Images" ;
          nmm:albumArtist <c> ;
          nmm:musicAlbum <b> .
      <b> nie:title "Go Off!" .
      <c> nmm:artistName "Jason Becker" }

But of course the deal of a query language is being able to obtain the stored data. The SELECT query syntax is used for that, and variables are denoted with a ? prefix, variables act as "placeholders" where any data will match and be available to the resultset or within the query as that variable name. The following query would be the opposite to the first ASK query:

SELECT * { ?subject ?predicate ?object }

What does this query do? it provides a triple with 3 variables, that every known triple in the database will match. The * is a shortcut for all queried variables, the query could also be expressed as:

SELECT ?subject ?predicate ?object { ?subject ?predicate ?object }

However, querying for all known data is most often hardly useful, this got unwieldly soon! Luckily, that is not necessarily the case, the variables may be used anywhere in the triple definition, with other triple elements consisting of literals you want to match for, e.g.:

# Give me the title of resource <a> (Result: "Images")
SELECT ?songName { <a> nie:title ?songName }
# What is this text to <b>? (Result: the nie:title)
SELECT ?predicate { <b> ?predicate "Go Off!" }
# What is the resource URI of this fine musician? (Result: <d>)
SELECT ?subject { ?subject nmm:artistName "Marty Friedman" }
# Give me all resources that are a music piece (Result: <a>)
SELECT ?song { ?song a nmm:MusicPiece }

And also combinations of them, for example:

# Give me all predicate/object pairs for resource <a>
SELECT ?pred ?obj { <a> ?pred ?obj }
# The Answer to the Ultimate Question of Life, the Universe, and Everything
SELECT ?subj ?pred { ?subj ?pred 42 }
# Give me all resources that have a title, and their title.
SELECT ?subj ?obj { ?subj nie:title ?obj }

And of course, the graph pattern can hold more complex triple definitions, that will be matched as a whole across the stored data. for example:

# Give me all songs from this fine album
SELECT ?song { ?album nie:title "Go Off!" .
               ?song nmm:musicAlbum ?album }
# Give me all song resources, their title, and their album title
SELECT ?song ?songTitle ?albumTitle { ?song a nmm:MusicPiece ;
                                            nmm:musicAlbum ?album ;
                                            nie:title ?songTitle .
                                      ?album nie:title ?albumTitle }

Stop a bit to think on the graph pattern expressed in the last query: Graph Pattern

This pattern on one hand consists of specified data (eg. ?song must be a nmm:MusicPiece, it must have a nmm:musicAlbum and a nie:title, ?album must have a nie:title, which must all apply for a match to happen.

On the other hand, the graph pattern contains a number of variables, some only used internally in the graph pattern, as a temporary variable of sorts (?album, in order to express the relation between ?song and its album title), while other variables are requested in the result set.

The results of the search are