SPARQL/Property paths

Property paths
Statements in a triplestore have a particular Property in the triples. In SPARQL queries you can also write down property paths in the triples.

Property paths are a shorthand to write down a path of properties between two items. The simplest path is just a single property, which forms an ordinary triple:

You can add path elements with a forward slash.

This is equivalent to either of the following:

Exercise: (re)write the “grandchildren of Bach” query to use this syntax.

An asterisk after a path element means “zero or more of this element”.

If there are no other elements in the path,  means that   might also just be   directly, with no path elements between them at all.

A plus is similar to an asterisk, but means “one or more of this element”. The following query finds all descendants of Bach:

If we used an asterisk instead of a plus here, the query results would include Bach himself.

A question mark is similar to an asterisk or a plus, but means “zero or one of this element”.

You can separate path elements with a vertical bar instead of a forward slash; this means “either-or”: the path might use either of those properties. (But not both – an either-or path segment always matches a path of length one.)

You can also group path elements with parentheses, and freely combine all these syntax elements. This means that another way to find all descendants of Bach is:

Instead of using the “child” property to go from Bach to his descendants, we use the “father” and “mother” properties to go from the descendants to Bach. The path might include two mothers and one father, or four fathers, or father-mother-mother-father, or any other combination. (Though, of course, Bach can’t be the mother of someone, so the last element will always be father.)

Summary of the codes after a path element:

Inverse link
Instead of the normal Triple "subject, predicate, object" it is also possible to write it as inverse link "object, predicate, subject". This can be done by adding  in front of the predicate. For normal triples this is not very useful, but for property paths it avoids using dummy variables.

For example this query finds the siblings of Johan Sebastian Bach, by querying siblings with the same father.

With dummy variable this can be written as

Or without inverse link:

Instances and classes
Most Wikidata properties are “has” relations: has child, has father, has occupation. But sometimes (in fact, frequently), you also need to talk about what something is. But there are in fact two kinds of relations there:


 * Gone with the Wind is a film.
 * A film is a work of art.

Gone with the Wind is one particular film. It has a particular director (Victor Fleming), a specific duration (238 minutes), a list of cast members (Clark Gable, Vivien Leigh, …), and so on.

Film is a general concept. Films can have directors, durations, and cast members, but the concept “film” as such does not have any particular director, duration, or cast members. And although a film is a work of art, and a work of art usually has a creator, the concept of “film” itself does not have a creator – only particular instances of this concept do.

This difference is why there are two properties for “is” in Wikidata:  and. Gone with the Wind is a particular instance of the class “film”; the class “film” is a subclass (more specific class; specialization) of the more general class “work of art”.

So what does this mean for us when we’re writing SPARQL queries? When we want to search for “all works of art”, it’s not enough search for all items that are directly instances of “work of art”:

As I’m writing this, that query only returns 2815 results – obviously, there are more works of art than that! The problem is that this misses items like Gone with the Wind, which is only an instance of “film”, not of “work of art”. “film” is a subclass of “work of art”, but we need to tell SPARQL to take that into account when searching.

One possible solution to this is the  syntax we talked about: Gone with the Wind is an instance of some subclass of “work of art”. (For exercise, try writing that query!) But that still has problems:


 * 1) We’re no longer including items that are directly instances of work of art.
 * 2) We’re still missing items that are instances of some subclass of some other subclass of “work of art” – for example, Snow White and the Seven Dwarfs is an animated film, which is a film, which is a work of art. In this case, we need to follow two “subclass of” statements – but it might also be three, four, five, any number really.

The solution:. This means that there’s one “instance of” and then any number of “subclass of” statements between the item and the class.

I don’t recommend running that query for all works of art. WDQS can handle it (just barely), but your browser might crash when trying to display the results because there’s so many of them. For that reason a  is inserted.

Now you know how to search for all works of art, or all buildings, or all human settlements: the magic incantation, along with the appropriate class. This uses some more SPARQL features that I haven’t explained yet, but quite honestly, this is almost the only relevant use of those features, so you don’t need to understand how it works in order to use WDQS effectively.