Getty Vocabularies: LOD Sample Queries

 

Version:                   3.4

Last updated:           13 June 2017

HTML version:       http://vocab.getty.edu/doc/queries

Queries UI:              http://vocab.getty.edu/queries

Parent document:    http://vocab.getty.edu/doc

Author:                    Vladimir Alexiev

Table of Contents

Table of Contents. 1

1       Introduction. 3

1.1         Sample Queries UI 3

1.2         Revisions. 4

1.2.1          Version 3.0. 4

1.2.2          Version 3.1. 4

1.2.3          Version 3.2. 5

1.2.4          Version 3.3. 5

1.2.5          Version 3.4. 5

2       Finding Subjects. 5

2.1         Top-level Subjects. 5

2.2         Descendants of a Given Parent 5

2.3         Subjects by Contributor Id. 5

2.4         Subjects by Contributor Abbrev. 6

2.5         Preferred Ancestors. 6

2.6         Full Text Search Query. 6

2.7         Stop-Word Removal 7

2.8         Case-insensitive Full Text Search Query. 7

2.9         Exact-Match Full Text Search Query. 7

2.10      Find Person Occupations by broaderExtended. 8

2.11      Find Person Occupations by Double FTS. 9

2.12      Find Quartz Timepieces by Double FTS. 9

2.13      Find Subject by Exact English PrefLabel 9

2.14      Find Subject by Language-Independent PrefLabels. 9

2.15      Combination Full-Text and Exact String Match. 10

2.16      Find Subject by Any Label 10

2.17      Find Ordered Subjects. 10

2.18      Find Ordered Collections. 11

2.19      Get Subjects in Order 11

2.20      Find Contributors by Vocabulary. 11

2.21      Find Sources by Vocabulary. 11

3       Getting Information. 12

3.1         All Data For Subject 12

3.2         All Data for Terms of Subject 13

3.3         Subject Preferred Label 13

3.4         Preferred and Vernacular Terms. 14

3.5         Historic Information on Relations. 14

3.6         Historic Information of Terms. 15

3.7         Preferred Terms for Contributors. 15

3.8         Preferred Terms for Sources. 16

3.9         Concepts Related by Particular Associative Relation. 16

3.10      Recently Created Subjects. 16

3.11      Recently Modified Subjects. 17

3.12      Recent Revision Actions. 17

3.13      OpenRefine Reconciliation Service. 17

3.14      Smart Resource Title. 18

4       TGN-Specific Queries. 18

4.1         Places by Type. 18

4.2         Places, with English or GVP Label 19

4.3         Places by Direct and Hierarchical Type. 19

4.4         Breakdown of Sovereign States by Type. 20

4.5         Inhabited Places That Were Sovereign States. 20

4.6         Places by Type and Parent Place. 20

4.7         Places by Type, with placeTypePreferred. 20

4.8         Places by Triple FTS. 21

4.9         Places by FTS Parents. 22

4.10      Capitals by Association. 22

4.11      Members of the European Union. 23

4.12      Members of the United Nations. 23

4.13      Geo Chart with sgvizler. 23

4.14      Column Chart with sgvizler. 24

4.15      Countries and Capitals By Type and Containment 25

4.16      Places by Coordinate Bounding Box. 25

4.17      Places Within Bounding Box. 26

4.18      Places by Type Within Bounding Box. 26

4.19      Places Outside Bounding Box (Overseas Possessions) 26

4.20      Places Nearby Each Other. 26

5       ULAN-Specific Queries. 27

5.1         Agents by Type. 27

5.2         Associative Relations of Agent 27

5.3         Female Artists. 28

5.4         Female Artists as a Hobby. 28

5.5         Native American Painters. 29

5.6         Names of Native American Painters. 29

5.7         Architects Born in the 14th or 15th Century. 30

5.8         Indian and Pakistani Architectural Groups. 30

5.9         Non-Italians Who Worked in Italy. 30

5.10      Artists Associated to a Given Patron or His Family. 31

5.11      German, Dutch, Flemish printmakers, listed with their teachers. 31

5.12      Artists Whose Identity May be Associated or Confused With Another. 32

5.13      Ordered Hierarchy of Given Subject 33

5.14      Ancient Artists or Groups by Nationality. 33

5.15      Art Repositories in the USA by State. 34

5.16      Popes and Their Reigns. 34

5.17      Pope Reign Durations. 35

5.18      Life Events. 36

5.19      Artists with Name, Bio, Nationality, Type. 36

6       Language Queries. 36

6.1         Scientific Names by Language. 36

6.2         Scientific Names not in English and Latin. 36

6.3         Find Terms by Language Tag. 36

6.4         Languages and ISO Codes. 37

6.5         Language URLs. 37

6.6         Custom Language Tags. 37

6.7         Count Terms by Language. 37

7       Counting and Descriptive Info. 38

7.1         Descriptive Info from VOID.. 38

7.2         Number of Entities from VOID.. 38

7.3         Number of Sources. 38

7.4         Associative Relations Count 39

7.5         Number of AAT Revision Actions. 39

7.6         TGN Top Place Types. 39

7.7         ULAN Facet Counts. 40

7.8         ULAN Agents by Type. 40

7.9         ULAN Agents by Nationality. 40

7.10      ULAN Events by Type. 41

7.11      Breakdown of Historic Relations. 41

7.12      Breakdown of Historic Terms. 42

7.13      GraphDB SysInfo. 42

8       Explore the Ontology. 42

8.1         Ontology Classes and Properties. 42

8.2         Ontology Values. 42

 

1        Introduction

This is a companion document to the main GVP LOD documentation: http://vocab.getty.edu/doc. We provide 90 sample queries for various tasks with GVP LOD. If you write an interesting query, please contribute it!

This document is available at two URLs:

·        http://vocab.getty.edu/doc/queries: plain HTML version.

·        http://vocab.getty.edu/queries: a sample queries UI that is integrated with the SPARQL endpoint. This is available from the Queries link at the heading bar

1.1       Sample Queries UI

You can use the queries UI as follows:

/doc/img/019-AAT-Forest1.png

·        (4) Use the left frame to select a section of the document. The selected heading is highlighted

·        (5) The selected section appears in the bottom frame. Please read it carefully, since often it explains the purpose of the query, possible alternatives, or builds up the query through a series of elaborations. Often the best query is last in the section. When you hover the mouse over a query in the bottom frame, a "SPARQL" button appears. Click the button to copy the query to the SPARQL editor in the top frame

·        Edit the query if needed (e.g. to provide a different vocabulary concept or keyword)

·        (6) Adjust whether to return Implicit triples, and whether to expand results across owl:sameAs

·        Click Submit

After the semantic repository returns the query results, the Queries link shows the query that was executed in a tool-tip. Selecting it reloads the last edited query into the editor, and the last selected section in the left and bottom frames.

See Forest UI for more info.

1.2       Revisions

1.2.1        Version 3.0

25 Apr 2015

·        Created by splitting off from the main document. See that doc for previous revision notes

·        Added Case-insensitive Full Text Search Query

·        Added OpenRefine Reconciliation Service

·        Added 18 ULAN-Specific Queries and 6 Counting and Descriptive Info queries

·        Moved a number of queries to their own section Language Queries

·        Fixed query All Data For Subject to use iso:superOrdinate instead of iso:subordinateArray; and to include ULAN data

1.2.2        Version 3.1

5 June 2015

·        Removed interactivity (SPARQL button) from a number of partial query fragments, or sub-optimal queries that build up to an optimal query

·        Added query Smart Resource Title, which is used to display specific Resource Titles

·        Added query Exact-Match Full Text Search Query

1.2.3        Version 3.2

15 Dec 2015

·        Added more detailed instructions to OpenRefine Reconciliation Service

·        Added query Combination Full-Text and Exact String Match, which is particularly useful for reconciliation

·        Added query GraphDB SysInfo

·        Added AAT ID to ULAN Events by Type

·        Fixed some broken links

1.2.4        Version 3.3

20 May 2016

·        Added query Life Events

·        Opimized several queries

·        Added Stop-Word Removal as part of FTS preprocessing

1.2.5        Version 3.4

13 June 2017

·        Added Count Terms by Language

·        Added Artists with Name, Bio, Nationality, Type

·        Added missing single end-quote in OpenRefine Reconciliation Service

·        Fixed 18 broken links (out of 330)

·        Clarification in Find Sources by Vocabulary

·        Clarification about .json endpoint in Geo Chart with sgvizler

2        Finding Subjects

2.1       Top-level Subjects

The top-level Subjects of AAT are gvp:Facets, so the query is easy:

select * {?f a gvp:Facet; skos:inScheme aat: ; gvp:prefLabelGVP/xl:literalForm ?l}

The same holds of TGN (there's only two: World and Extraterrestrial Places):

select * {?f a gvp:Facet; skos:inScheme tgn: ; gvp:prefLabelGVP/xl:literalForm ?l}

The same holds of ULAN (there are 5 facets, see ULAN Hierarchy and Classes):

select * {?f a gvp:Facet; skos:inScheme ulan: ; gvp:prefLabelGVP/xl:literalForm ?l}

2.2       Descendants of a Given Parent

Let's look for AAT descendants of 300194567 "drinking vessels". This finds "rhyta" and other interesting records, including "Fichtelgebirgehumpen":

select * {?x gvp:broaderExtended aat:300194567; skos:inScheme aat: ; gvp:prefLabelGVP/xl:literalForm ?l}

2.3       Subjects by Contributor Id

You can easily find subjects contributed by a particular Contributor if you know the id. E.g. the Getty Conservation Institute (GCI) in AAT is aat_contrib:10000088. Let's find their contributions to  aat:300033618 paintings (visual works):

select * {

  ?x a gvp:Subject; dct:contributor aat_contrib:10000088;

    gvp:broaderExtended aat:300033618;

    gvp:prefLabelGVP/xl:literalForm ?l}

Please note that the different vocabularies use different namespaces for sources and contributors. CGI has a different URL as TGN contributor: tgn_contrib:10000088. See next section for accessing contributions across vocabularies.

2.4       Subjects by Contributor Abbrev

If you know the abbreviation of a Contributor but not the id, you can still find easily its contributions. This works even across vocabularies, assuming that the same contributor abbreviation (foaf:nick) was used consistently in all vocabularies. E.g. for Getty Conservation Instpitute (GCI):

select ?x ?l {

  ?x a gvp:Subject; dct:contributor [foaf:nick "GCI"];

  gvp:prefLabelGVP/xl:literalForm ?l}

We use a blank node since we don't need the URL of the contributor.

If you want to find only contributions to a particular vocabulary, filter by skos:inScheme:

select ?x ?l {

  ?x a gvp:Subject; dct:contributor [foaf:nick "GCI"];

  skos:inScheme aat: ; gvp:prefLabelGVP/xl:literalForm ?l}

Or here's how to find all contributions by J. Paul Getty Museum (JPGM) in ULAN:

select ?x ?l {

  ?x a gvp:Subject; dct:contributor [foaf:nick "JPGM"];

  skos:inScheme ulan: ; gvp:prefLabelGVP/xl:literalForm ?l}

2.5       Preferred Ancestors

Fetch all preferred ancestors of 300226882 "baking dishes" very efficiently (no traversal). We also fetch each parent: this can be used to reconstruct the hierarchy in memory.

select * {

  aat:300226882 gvp:broaderPreferredExtended ?parent.

  ?parent gvp:prefLabelGVP/xl:literalForm ?l.

  OPTIONAL {?parent gvp:broaderPreferred ?grandParent}}

2.6       Full Text Search Query

This is the query used for the Full Text Search.

select ?Subject ?Term ?Parents ?Descr ?ScopeNote ?Type (coalesce(?Type1,?Type2) as ?ExtraType) {

  ?Subject luc:term "fishing* AND vessel*"; a ?typ.

  ?typ rdfs:subClassOf gvp:Subject; rdfs:label ?Type.

  filter (?typ != gvp:Subject)

  optional {?Subject gvp:placeTypePreferred [gvp:prefLabelGVP [xl:literalForm ?Type1]]}

  optional {?Subject gvp:agentTypePreferred [gvp:prefLabelGVP [xl:literalForm ?Type2]]}

  optional {?Subject gvp:prefLabelGVP [xl:literalForm ?Term]}

  optional {?Subject gvp:parentStringAbbrev ?Parents}

  optional {?Subject foaf:focus/gvp:biographyPreferred/schema:description ?Descr}

  optional {?Subject skos:scopeNote [dct:language gvp_lang:en; rdf:value ?ScopeNote]}}

·        If the user selected Brief, we use predicate luc:term (just before the red text), for Full, we use predicate luc:text

·        If the user selected only one of the vocabularies (e.g. AAT), we add a clause like

?subject skos:inScheme aat:

 

The following result columns are included:

·        Subject

·        GVP preferred Term

·        Abbreviated Parent string

Two field pairs that the FTS concatenates (if present):

·        Description (ULAN one-line biography) and

·        ScopeNote

·        subject Type (e.g. gvp:AdminPlaceConcept, gvp:GroupConcept) and

·        ExtraType, being the preferred TGN place type (e.g. "inhabited places") or ULAN agent type (e.g. "museums (institutions)")

If you use this for an auto-completion application, be kind to our service and wait until the user has typed 3-4 chars and has waited a bit, before firing a query.

2.7       Stop-Word Removal

For proper FTS results you should preprocess the Lucene query as highlighted above, i.e. the string entered by the user:

·        Replace punctuation with a space.

·        Keep letters, digits, apostrophes.

·        Use a regexp library that knows about Unicode classes \P{L} and \P{Nd}.

·        Be careful not to remove Greek letters, Chinese hieroglyphs, and accented characters (e.g. Spanish, Dutch, Chinese transliterations)

·        Remove stop-words.

·        Because Lucene doesn’t index stop words, you should also remove them from the query.

·        If you don’t do this, queries like “academy of sciences” or “arts and crafts” won’t find anything.

·        According to Stackoverflow, the default ENGLISH_STOP_WORDS_SET is:
"a", "an", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"

·        Add wild-card * after each word

·        Add conjunction (AND) between each two words

For Chinese hieroglyphs, use the Exact-Match Full Text Search Query pattern:

·        Our Lucene Analyzer doesn’t do any lexcal processing on hieroglyphs, so you should not do any of the above steps

·        Do not to add wild-cards nor AND

·        Enclose the whole query in double quotes. For example, to search for “rhyta” in Chinese:

luc:term ' "萊坦酒杯" '

2.8       Case-insensitive Full Text Search Query

In auto-complete applications you'd often want to treat the user input in a case-insensitive way. The FTS index luc:term includes all terms lower-cased and stemmed, so it already takes care of filtering. But you'd also want to sort the results case-insensitively. Assuming you want to limit to AAT concepts only (not hierarchies or guide terms), here's an appropriate query (compared to the previous section, we omit Descr, Type and ExtraType):

select ?Subject ?Term ?Parents ?ScopeNote {

  ?Subject a skos:Concept; luc:term "gold"; skos:inScheme aat: ;

     gvp:prefLabelGVP [xl:literalForm ?Term].

  optional {?Subject gvp:parentStringAbbrev ?Parents}

  optional {?Subject skos:scopeNote [dct:language gvp_lang:en; rdf:value ?ScopeNote]}

} order by asc(lcase(str(?Term)))

We need to strip the language tag from ?Term by using the str() function, since order is undefined for literals with language tag, e.g. the relative order is undefined for "a"@en_GB and "b"@en_GB (two literals with the same language tag).

Thanks to Athanasios Velios for providing the inspiration for this query.

2.9       Exact-Match Full Text Search Query

Sometimes you know the exact label for a concept and you want to find it in the fastest possible way. There are two ways to do it:

·        If you know the exact capitalization and language tag (and indeed whether a tag was used), use rdfs:label. In the query below we don't use a tag (most ULAN labels don't have a language tag). This finds one match, the famous painter:

select ?Subject ?Term ?Parents ?ScopeNote {

  ?Subject a skos:Concept; rdfs:label "Leonardo da Vinci";

     gvp:prefLabelGVP [xl:literalForm ?Term].

  optional {?Subject gvp:parentStringAbbrev ?Parents}

  optional {?Subject skos:scopeNote [dct:language gvp_lang:en; rdf:value ?ScopeNote]}}

·        If you don’t know the exact capitalization or language tag, use Lucene exact match syntax, denoted with double quote. Please note that in the query below, we use single quote to delimit the literal. This finds two matches, the painter and "Museo Nazionale della Scienza e della Tecnica Leonardo da Vinci" (it finds a substring in the latter label)

select ?Subject ?Term ?Parents ?ScopeNote {

  ?Subject a skos:Concept; luc:term ' "leonardo da vinci" ';

     gvp:prefLabelGVP [xl:literalForm ?Term].

  optional {?Subject gvp:parentStringAbbrev ?Parents}

  optional {?Subject skos:scopeNote [dct:language gvp_lang:en; rdf:value ?ScopeNote]}}

·        If the label includes a Lucene special word (e.g. "and"), omit it. E.g. see a query for "Arts and Crafts" (an arts movement) below. This also finds Byrdcliffe since it has an altLabel "Byrdcliffe Arts and Crafts Colony", and has the benefit that it will find "and" written as an ampersand, e.g. "Arts & Crafts"

select ?Subject ?Term ?Parents ?ScopeNote {

  ?Subject a skos:Concept; luc:term ' "arts crafts" ';

     gvp:prefLabelGVP [xl:literalForm ?Term].

  optional {?Subject gvp:parentStringAbbrev ?Parents}

  optional {?Subject skos:scopeNote [dct:language gvp_lang:en; rdf:value ?ScopeNote]}}

·        If the label you want to search for includes an apostrophe, you can either use triple quotes to delimit the string literal (shown below), or backslash escaping. E.g. this finds Hà So'n Bình, Tỉnh, a former Vietnamese administrative division):

select ?Subject ?Term ?Parents ?ScopeNote {

  ?Subject a skos:Concept; luc:term """ "Hà So'n Bình, Tỉnh" """;

     gvp:prefLabelGVP [xl:literalForm ?Term].

  optional {?Subject gvp:parentStringAbbrev ?Parents}

  optional {?Subject skos:scopeNote [dct:language gvp_lang:en; rdf:value ?ScopeNote]}}

This has been discussed at the former support site and at the Google forum.

2.10   Find Person Occupations by broaderExtended

A user was searching for "president" occupations by luc:term (FTS) and further restrictions by

·        FILTER regex(?parentStringAbbrev, "Agents Facet")

·        FILTER langMatches(lang(?label), "en")

But there's a much faster way to restrict to a part of the hierarchy: gvp:broaderExtended. Let's explore the data:

·        Look at e.g. aat:300025470 "presidents" with inference "Explicit and Implicit"

·        Notice it has many gvp:broaderExtended:
aat:300024978, aat:300024979, aat:300024980, aat:300025426, aat:300025427, aat:300025432, aat:300264089

·        How to pick the best root? You don't want to explore all them broaders one by one.

·        Click on the Hierarchy tab, which sends you to the Getty site. (The semantic site does not have a decent hierarchical display yet)

·        aat:300024979 "people (agents)" looks ok.

·        But aat:300024980 <Persons by Occupation> (a Guide Term) is exactly what we need!

·        Use gvp:prefLabelGVP instead of lang "en" because some concepts may NOT have an "en" prefLabelGVP

·        Don't forget a wildcard in luc:term, or you'll miss "first ladies" = "presidents' wives"

·        Consider NOT ordering by ?label, since luc:term natively returns them in relevance order:

·        By ?label:                            first ladies, presidents, vice-presidents

·        By Lucene relevance:        presidents, vice-presidents, first ladies

A query following all these guidelines is short, nice and efficient:

select * {

?c a gvp:Concept;

   gvp:broaderExtended aat:300024980 ; # <Persons by Occupation>

   gvp:prefLabelGVP/xl:literalForm ?label ;

   luc:term "president*"}

You may also try with "engraver*"

2.11   Find Person Occupations by Double FTS

For the query above we had to find out the ID of the guide term <Persons by Occupation> "by hand". But we can use FTS (e.g. "occupation*") to find that subject. If you use that FTS query alone, you'll find many "occupation*" (namely, 82), but if you also search by the desired target term, you find only the same 3 results as above:

select * {

  ?concept a gvp:Concept; gvp:broaderExtended ?broader.

  ?concept luc:term "president*".

  ?broader luc:term "occupation*".

  ?concept gvp:prefLabelGVP [xl:literalForm ?concept_label].

  ?broader gvp:prefLabelGVP [xl:literalForm ?broader_label]}

Note that we had to use blank nodes "[..]" instead of property paths "/" because of a SPARQL parser bug (SES-2024).

Searching for a target term, plus terms of some of its gvp:broaderExtended ancestors, is a powerful technique that we call "Double FTS"

2.12   Find Quartz Timepieces by Double FTS

For another example of "Double FTS", let's find Quartz Timepieces:

select * {

  ?concept a gvp:Concept; gvp:broaderExtended ?broader.

  ?concept luc:term "quartz*".

  ?broader luc:term "time*".

  ?concept gvp:prefLabelGVP [xl:literalForm ?concept_label].

  ?broader gvp:prefLabelGVP [xl:literalForm ?broader_label]}

There's only one matching ?concept (aat:300225923 "quartz clocks"), which however is returned twice because it has two ?broader  ("timepieces" vs <timepieces by form>). If you want it just once, try "select distinct ?concept ?concept_label".

2.13   Find Subject by Exact English PrefLabel

The FTS queries above are perfect for auto-completion services, i.e. for returning many potential matches when the user enters a keyword (perhaps partial). But if you are working on co-referencing (e.g. see OpenRefine Reconciliation Service), you may need something simpler: match by exact label:

select * {?subj gvp:prefLabelGVP/xl:literalForm "rhyta"@en}

Note that prefLabelGVP is usually in the plural.

For AAT, it's important to specify the language tag, else the query will return nothing

·        Most prefLabelGVP in AAT are in English (but not all, see next section)

·        for TGN, 80% of the terms don't have a language

2.14   Find Subject by Language-Independent PrefLabels

Most AAT prefLabelGVP are in English but not all. In AAT, 1.3k (6%) are in a different language:

select (count(*) as ?c) {

  ?subj skos:inScheme aat: ;

  gvp:prefLabelGVP/xl:literalForm ?label

  filter(lang(?label) != "en")}

Of these exceptions, almost all are in en-us "American English" (curiously, even the acronym "IEEE 802.11"@en-us is marked so). And there's one in Spanish: aat:300181273 "troffers"@es (a kind of recessed lamps).

In TGN, 80% of labels don't have a language.

Since you may not always know the language, you may prefer to search by string without providing the language tag:

select distinct ?subj {

  ?subj skos:prefLabel ?lab

   filter(str(?lab)="Sofia")}

·        We use str() to strip off the lang tag

·        We search for all prefLabels (not just prefLabelGVP)

·        We use the shortcut skos:prefLabel instead of having to go through xl:prefLabel/xl:literalForm

·        We have to add "distinct" because a subject may have the same string in several languages, e.g. "rhyta"@en and "rhyta"@el-Latn (the latter is Greek transliterated to Latin)

Important: This query is very inefficient, since it needs to check every record: there is no index for str(). Possibly the only worse variant is to use regex() matching, which is even more expensive.

2.15   Combination Full-Text and Exact String Match

·        The exact string comparison described in the previous section is too slow

·        Find Subject by Exact English PrefLabel is fast, but you must know the language tag. Even if you're searching for English only, you cannot be sure the tag is "en": there are some concepts (e.g. 300015045 watercolor or 300041349 drypoints) that have variants in "en-GB" and "en-US" but not "en".

·        Full Text Search Query often returns more results than you would like.

You can try Exact-Match Full Text Search Query. Another option is to combine FTS and string equality: first use a Lucene query to restrict to a small result set, then use string equality to improve precision.

The following query finds precisely the 3 AAT subjects having label "oil" (including "oiling" that has an altLabel "oil (process)"). Note that we use gvp:term and not xl:literalForm in the filter, in order to compare the pure term without qualifier. We also convert ?term to lowercase, and search for "en" terms only.

select ?x ?label {
  ?x luc:term "oil";
     gvp:prefLabelGVP/xl:literalForm ?label.
  filter exists {
    ?x (xl:prefLabel|xl:altLabel)/gvp:term ?term.
    filter (lcase(str(?term))="oil" && langMatches(lang(?term),"en"))}}

This query was made following a suggestion by Charles R Butcosk of Colby College. He uses an OpenRefine reconciliation service that takes human-readable labels for materials and techniques, breaks down the label to component parts (e.g., "Oil on canvas" to "oil" and "canvas"), and uses this query to return prefLabelGVP and subject ID for possible matches in AAT. See OpenRefine Reconciliation Service for a technical description how you can make such service.

2.16   Find Subject by Any Label

Since GVP prefers the Plural form as Descriptor (prefLabel), "rhyton" will find nothing.

However, AAT consistently also provides singular forms. To widen the search to cover all labels, we altLabel using SPARQL 1.1 Property Path notation "|":

select distinct ?subj {?subj skos:prefLabel|skos:altLabel "rhyton"@en}

Or we can simply access rdfs:label, which is an inferred property that includes both prefLabel and altLabel:

select * {?subj rdfs:label "rhyton"@en}

2.17   Find Ordered Subjects

Find AAT subjects ("collections") whose children have "forced" (explicit) ordering.

·        All subjects have gvp:displayOrder, so we check for non-trivial one (>1)

·        We get the label using an anonymous node notation "[..]"

select ?coll ?label {

  ?coll gvp:prefLabelGVP [xl:literalForm ?label]; skos:inScheme aat: .

  filter(exists {[gvp:displayOrder ?order] gvp:broader ?coll

                 filter(?order>1)})} limit 100

2.18   Find Ordered Collections

In this section we do something similar as the previous one, but illustrate some different approaches:

·        We look for skos:OrderedCollection: this class is used for those subjects whose children have non-trivial "forced" ordering

·        We get the label using SPARQL Property Path syntax "/"

·        We use the shorter prefix "xl:" instead of "xl:": both of these prefixes are defined in the repository, and mean the same thing.

select * {?x a skos:OrderedCollection; gvp:prefLabelGVP/xl:literalForm ?label}

2.19   Get Subjects in Order

Some subjects e.g. 300020605 "Pompeian wall painting styles" have children laid out in a particular order:

select * {

  ?x gvp:broader aat:300020605.

  optional {?x gvp:displayOrder ?ord}.

  ?x gvp:prefLabelGVP [xl:literalForm ?l]

} order by ?ord

Because Ontotext GraphDB preserves the order in which resources are first inserted (see Sort Order item 2), subjects and terms are returned in the desired order, even if you don't use the custom field gvp:displayOrder in your query:

select * {

  ?x gvp:broader aat:300020605.

  ?x gvp:prefLabelGVP [xl:literalForm ?l]}

2.20   Find Contributors by Vocabulary

To get subjects that belong to a specific vocabulary, we can use skos:inScheme:

select * {?subject skos:inScheme aat:}

Sources and contributors are also split by vocabulary, but don't have a property to link them to that vocabulary. So we have to use Named Graphs:

select * {graph <http://vocab.getty.edu/dataset/aat>

  {?contrib a foaf:Agent; foaf:name ?name; foaf:nick ?abbrev}}

2.21   Find Sources by Vocabulary

Let's find all TGN sources. We again use Named Graphs:

select * {graph <http://vocab.getty.edu/dataset/tgn>

  {?source a bibo:Document; bibo:shortTitle ?name}}

You may wonder how come we didn't have to filter out Local Sources, which have explicit type bibo:DocumentPart and inferred type bibo:Document. See section Number of Sources: Ontotext GraphDB puts all inferred statements in the default (empty) graph, so the named graph above doesn't include the inferred type statements.

3        Getting Information

3.1       All Data For Subject

This complex query is used to get all info about a Subject and its owned sub-objects:

CONSTRUCT {

  ?s  ?p1 ?o1. # subject

  ?ac ?p2 ?o2. # change action

  ?t  ?p3 ?o3. # term/note

  ?ss ?p4 ?o4. # subject local source

  ?ts ?p6 ?o6. # term/note local source

  ?st ?p7 ?o7. # statement about relations/types

  ?ar ?p8 ?o8. # anonymous array of subject

  ?l1 ?p9 ?o9. # list element of subject

  ?l2 ?pA ?oA. # list element of anonymous array

  ?th ?pB ?oB. # thing: place, agent

  ?tx ?pC ?oC. # thing extension: geometry, biography, event

  ?sn ?pD ?oD. # statement about nationality

} WHERE {

  BIND (ulan:500115493 as ?s) # tgn:3000034, aat:300198841

  {?s ?p1 ?o1 FILTER(!isBlank(?o1) && ?p1 != gvp:narrowerExtended && ?p1 != skos:narrowerTransitive)}

  UNION {?s skos:changeNote ?ac. ?ac ?p2 ?o2}

  UNION {?s dct:source ?ss. ?ss a bibo:DocumentPart. ?ss ?p4 ?o4}

  UNION {?s skos:scopeNote|xl:prefLabel|xl:altLabel ?t.

     {?t ?p3 ?o3 FILTER(!isBlank(?o3))}

     UNION {?t dct:source ?ts. ?ts a bibo:DocumentPart. ?ts ?p6 ?o6}}

  UNION {?st rdf:subject ?s. ?st ?p7 ?o7}

  UNION {?s skos:member/^rdf:first ?l1. ?l1 ?p9 ?o9}

  UNION {?s ^iso:superOrdinate ?ar FILTER NOT EXISTS {?ar xl:prefLabel ?t1}.

     {?ar ?p8 ?o8 FILTER(!isBlank(?o8))}

     UNION {?ar skos:member/^rdf:first ?l2. ?l2 ?pA ?oA}}

  UNION {?s foaf:focus ?th.

     {?th ?pB ?oB}

     UNION {?th schema:geo|gvp:biography|bio:event ?tx. ?tx ?pC ?oC}

     UNION {?sn rdf:subject ?th. ?sn ?pD ?oD}}

}

The corresponding graph  is (see Per-Entity Exports for details):

/doc/img/028-construct-subject.png

Blank nodes (owl:Restriction types) are possible for Subject, Term and Note (?o1 and ?o3), so we filter them out.

We also filter out the properties described in Reduced SKOS Inference, because they caused very large Per-Entity Exports for subjects high-up in the ULAN and TGN hierarchies. (This is not necessary after the reduction was implemented, but better belt and suspenders than neither).

Other interesting subjects to try:

·        aat:300198841 rhyta: has numerous terms

·        aat:300018398 Jin (Six Dynasties) period: concept with ordered children, involves anonymous iso:ThesaurusArray

·        tgn:7011179 Siena: has associative relations (allies) and historic info

·        tgn:7015574 Machupicchu: place of great historic significance

·        tgn:3000034 Great Lakes region: has altitude and bounding box

·        ulan:500115493 Albrecht Duerer: has numerous associative relations

·        ulan:500125282 National Gallery (US): incorporated org

·        ulan:500115987: J. Paul Getty Trust. Has child organizations (broaderPartitive) that are ordered

·        ulan:500048836 "Abraham": unknown artist, from Cornu inventory

·        ulan:500125282 Unknown Inca: should appear as gvp:UnknownPersonConcept (having only Nationality)

3.2       All Data for Terms of Subject

The following query returns all data about the Terms of a given subject. It can be used to make displays as on Getty's site.

select ?l ?lab ?lang ?pref ?historic ?display ?pos ?type ?kind ?flag ?start ?end ?comment {

  values ?s {tgn:7001393}

  values ?pred {xl:prefLabel xl:altLabel}

  ?s ?pred ?l.

  bind (if(exists{?s gvp:prefLabelGVP ?l},"pref GVP",if(?pred=xl:prefLabel,"pref","")) as ?pref)

  ?l xl:literalForm ?lab.

  optional {?l dct:language [gvp:prefLabelGVP [xl:literalForm ?lang]]}

  optional {?l gvp:displayOrder ?ord}

  optional {?l gvp:historicFlag [skos:prefLabel ?historic]}

  optional {?l gvp:termDisplay [skos:prefLabel ?display]}

  optional {?l gvp:termPOS [skos:prefLabel ?pos]}

  optional {?l gvp:termType [skos:prefLabel ?type]}

  optional {?l gvp:termKind [skos:prefLabel ?kind]}

  optional {?l gvp:termFlag [skos:prefLabel ?flag]}

  optional {?l gvp:estStart ?start}

  optional {?l gvp:estEnd ?end}

  optional {?l rdfs:comment ?comment}

} order by ?ord

Other interesting subjects to try:

·        aat:300000590 miniature golf courses: time-dependent names (e.g. Putt-Putt courses)

·        tgn:7001393 Athens: names in various languages. Time-dependent names, e.g. Roman and Ancient Greek

·        ulan:500060426 Hokusai, Katsushika: a lot of names that varied with time and artistic period

3.3       Subject Preferred Label

For each of the Find queries, you can get the preferred label in addition to the URI by adding this fragment:

 ?x gvp:prefLabelGVP [xl:literalForm ?label]

·        Here we reach to the xl:Label preferred by GVP (each Subject has exactly one). Since we don't care about the xl:Label node (only about the label stored there), we use a blank node in the query.

E.g. when we add this fragment to "Subjects by Contributor Abbrev", we get this query:

select * {

  ?x a gvp:Subject; dct:contributor ?contrib;

     gvp:prefLabelGVP [xl:literalForm ?label].

  ?contrib foaf:nick "GCI"}

3.4       Preferred and Vernacular Terms

Foreign places (e.g. in the Egyptian governorate tgn:1001225Muḩāfaz̧at Dumyāţ) often have vernacular names that differ from the preferred name of the place. Let's find place concepts having alternate labels in the Vernacular, and list them together with the preferred label. (Please note that some preferred labels are also in the Vernacular.)

select ?x ?pref (group_concat(?vern; separator="; ") as ?vernacular) {

  ?x gvp:broaderPartitiveExtended tgn:1001225; gvp:prefLabelGVP/xl:literalForm ?pref;

     xl:altLabel [gvp:termFlag <http://vocab.getty.edu/term/flag/Vernacular>; xl:literalForm ?vern].

  filter exists {?x xl:altLabel [gvp:termFlag <http://vocab.getty.edu/term/flag/Vernacular>]}

} group by ?x ?pref

Here we use the following SPARQL features:

·        The group_concat() function to put all vernacular labels in the same result field. It requires a "group by" all other result variables

·        "filter exists" to check that the concept ?x has at least one vernacular altLabel

3.5       Historic Information on Relations

Here is an example query to fetch the associative relations of tgn:7011179 Siena, together with optional Historic Information.

SELECT ?concept1 ?rel ?concept2 ?start ?end ?comment ?hist {

  bind(tgn:7011179 as ?concept1)

  ?concept1 ?rel ?concept2.

  ?rel sesame:directSubPropertyOf skos:related.

OPTIONAL {

    ?statement rdf:subject ?concept1; rdf:predicate ?rel; rdf:object ?concept2.

    OPTIONAL {?statement gvp:estStart ?start}.

    OPTIONAL {?statement gvp:estEnd ?end}.

    OPTIONAL {?statement rdfs:comment ?comment}.

    OPTIONAL {?statement gvp:historicFlag ?hist}}}

We find Associative Relationships as sub-properties of skos:related. However, rdfs:subPropertyOf is reflexive, so it would also return skos:related itself. So we use the special property sesame:directSubPropertyOf to find proper sub-properties of skos:related. Another option using only the standard rdfs:subPropertyOf would be:

?rel rdfs:subPropertyOf skos:related filter (?rel != skos:related)

Historic info is available only on the explicitly stated relations, not on inferred relations (skos:related). Associative relations are always instantiated in both directions, so we can search only for outgoing relations of the subject (tgn:7011179). We could simplify the query by using it directly, instead of bind(tgn:7011179 as ?concept1).

The results are as follows (Siena was ally of Arezzo, Pisa and Pistoia):

concept1

rel

concept2

start

end

comment

hist

tgn:7011179

gvp:tgn3301_ally_of

tgn:7006072

1250

1400

Ghibelline allies during the 13th and 14th centuries

<historic>

tgn:7011179

gvp:tgn3301_ally_of

tgn:7005060

1250

1400

Ghibelline allies during the 13th and 14th centuries

<historic>

tgn:7011179

gvp:tgn3301_ally_of

tgn:7006082

1250

1400

Ghibelline allies during the 13th and 14th centuries

<historic>

3.6       Historic Information of Terms

Fetch all terms with some Historic Information, together with that information. Let’s look at AAT terms:

select * {

  ?c skos:inScheme aat:; xl:prefLabel|xl:altLabel ?term.

  ?term xl:literalForm ?literal.

  optional {?term gvp:historicFlag [skos:prefLabel ?historic]}

  optional {?term gvp:estStart ?start}

  optional {?term gvp:estEnd ?end}

  optional {?term rdfs:comment ?comment}

  filter (bound(?historic) || bound(?start) || bound(?end) || bound(?comment))}

limit 20

The filter picks only terms that have at least one of the Historic Information properties.

The way the GVP database is structured, terms for the same concept and having the same xl:literalForm (differing only by language) are managed as one and carry the same historic info. You'll see such duplication e.g. for "Chachapoya"@nl, @en, @es. There are about 24.5k terms with historic info across AAT and TGN, eliminating duplicates goes down to 4k.

We can eliminate the duplicates like this:

select * {

  ?c skos:inScheme aat:; xl:prefLabel|xl:altLabel ?term.

  ?term xl:literalForm ?literal.

  filter not exists {

    ?c xl:prefLabel|xl:altLabel ?term1.

    ?term1 xl:literalForm ?literal1

    filter (str(?literal1) = str(?literal) && str(?term1) < str(?term))}

  optional {?term gvp:historicFlag [skos:prefLabel ?historic]}

  optional {?term gvp:estStart ?start}

  optional {?term gvp:estEnd ?end}

  optional {?term rdfs:comment ?comment}

  filter (bound(?historic) || bound(?start) || bound(?end) || bound(?comment))}

limit 20

Some explanations about the last filter:

·        We look for another ?term1 of the same concept ?c, having the same ?literal1 as ?term

·        If ?term1 has a "smaller" URL string then "filter not exists" rejects the ?term

As a result, we pick only one of the terms having the same literal; the criterion "least URL" is arbitrary but fast

3.7       Preferred Terms for Contributors

Let's find terms that are preferred for specific contributors. You can similarly use gvp:contributorNonPreferred and gvp:contributorAlternatePreferred.

select * {[xl:literalForm ?term] gvp:contributorPreferred [foaf:name ?c_name; foaf:nick ?c_abbrev]}

limit 100

Here we use blank nodes both before and after the property, to return only strings and disregard URIs:

·        The first blank node ignores  the term's URL and returns only the term

·        The second blank node ignores  the contributor's URL and returns only the contributor name & abbreviation.

This returns a mix of terms (labels) from all vocabularies. To filter by vocabulary, use something like this:

select * {

  [] skos:inScheme aat: ;

     xl:prefLabel|xl:altLabel [

       xl:literalForm ?term;

       gvp:contributorPreferred [foaf:name ?c_name; foaf:nick ?c_abbrev]]}

limit 100

We use the SPARQL 1.1 Property Path notation "|" and 3 blank nodes. If you are confused by these blank nodes, let's rewrite by using explicit variables, and a subset of them in the select:

select ?term ?c_name ?c_abbrev {

  ?concept skos:inScheme aat: ;

     xl:prefLabel|xl:altLabel ?t.

  ?t xl:literalForm ?term;

     gvp:contributorPreferred ?c.

  ?c foaf:name ?c_name;

     foaf:nick ?c_abbrev}

limit 100

This is more understandable, though less elegant.

3.8       Preferred Terms for Sources

Terms that are preferred for specific sources. We want only the labels, so we use blank nodes [..] to disregard the URIs.

You can similarly use gvp:sourceNonPreferred and gvp:sourceAlternatePreferred.

select * {[xl:literalForm ?term] gvp:sourcePreferred [bibo:shortTitle ?source]}

To filter by vocabulary, we can use this:

select ?term ?source {

  ?concept skos:inScheme aat: ;

    xl:prefLabel|xl:altLabel ?t.

  ?t xl:literalForm ?term;

    gvp:sourcePreferred ?s.

  ?s bibo:shortTitle ?source}

3.9       Concepts Related by Particular Associative Relation

While investigating the precise meaning of the Associative Relationship 2100 "distinguished from", we tried this query.  It retrieves related concepts, their labels and scope notes (in English).

·        We use blank nodes "[..]" instead of property paths "/" because of a SPARQL parser bug (SES-2024)

·        We filter by the string representation of the two concept URIs because this associative relation is symmetric, and we don't want to get it both forward and inverse in the result set.

select * {

  ?c1 gvp:aat2100_distinguished_from ?c2. filter (str(?c1) < str(?c2))

  ?c1 gvp:prefLabelGVP [xl:literalForm ?l1];

    skos:scopeNote [rdf:value ?n1; dct:language gvp_lang:en].

  ?c2 gvp:prefLabelGVP [xl:literalForm ?l2];

    skos:scopeNote [rdf:value ?n2; dct:language gvp_lang:en]}

We don't need to filter by skos:inScheme because gvp:aat2100_distinguished_from can link only AAT concepts, nothing else.

3.10   Recently Created Subjects

Recently created subjects have exactly one dct:created date (there's no such guarantee for subjects created a long time ago). Let's find those created since 1 Jan 2015:

select * {

  ?x a gvp:Subject; gvp:prefLabelGVP/xl:literalForm ?lab.

  ?x dct:created ?cre filter (?cre >= "2015-01-01T00:00:00"^^xsd:dateTime)}

Since dct:created is expressed with up to the second accuracy, we have to compare against timestamp "T00:00:00", and provide the appropriate XSD type. This ensures the timestamps will be comparable, and uses the GraphDB Literal Index, so the query is fast.

3.11   Recently Modified Subjects

Recently modified subjects have dct:modified after their dct:created (there's no such guarantee for subjects created a long time ago). There's a dct:modified timestamp for every revision action (see Revision History Representation), and there are a lot of them (3.8M as of Mar 2016). So let's limit to AAT, and look for the last 100 revision actions. We use max() to select only the latest modification time for each subject.

select ?x ?lab (max(?mod) as ?mod1) {

  {select * {?x skos:inScheme aat:; dct:modified ?mod} order by desc(?mod) limit 100}

  ?x gvp:prefLabelGVP/xl:literalForm ?lab

} group by ?x ?lab

If you need to find all subjects changed since a certain date, use the following query. (Note: you need to copy it manually to the edit box put a more recent date, since we don't want it to get progressively slower with time.)

select ?x ?lab (max(?mod) as ?mod1) {

  ?x skos:inScheme aat:; dct:modified ?mod

  filter (?mod >= "2016-03-01T00:00:00"^^xsd:dateTime)

  ?x gvp:prefLabelGVP/xl:literalForm ?lab

} group by ?x ?lab

3.12   Recent Revision Actions

Let's get Revision History for Subject info for recent revision actions. Revision actions are linked to the subject using skos:changeNote. Revisions have rdf:type of prov:Create, prov:Modify or prov:Publish, but it subsumed by dc:type that provides more detailed information. We limit to the last 100 actions in AAT:

select * {

  ?x skos:inScheme aat:; gvp:prefLabelGVP/xl:literalForm ?lab; skos:changeNote ?rev.

  ?rev dc:type ?type; prov:startedAtTime ?time

  optional {?rev dc:description ?descr}

} order by desc(?time) limit 100

If the result set is large, the ORDER BY will be slow, since the whole result set is sorted in memory.

3.13   OpenRefine Reconciliation Service

OpenRefine (formerly Google Refine) is a popular and powerful tool for working with messy tabular data: cleaning it; transforming it (including to LOD); extending it with web services; linking it to structured databases. It was originally used for populating Freebase, then open sourced by Google. DERI created some useful extensions: Reconcile & interlink, Export RDF. LODRefine is a repackaging of these extensions, adding reconciliation against DBpedia, Crowd-sourcing, and Statistics. It was popularized for use by GLAM professionals by Ruben Verborgh, Seth Holland and Max De Wilde through the sites http://openrefine.org/ and http://freeyourmetadata.org/.

How can GVP LOD be used as an OpenRefine reconciliation service? The DERI extension includes a "SPARQL full-text search-based Reconciliation" that unfortunately cannot be used, because there's no way to specify that the luc:term index should be used (see issue/33). Nevertheless, one can use the GVP SPARQL service by querying for a fixed label (similar to Find Subject by Exact English PrefLabel), getting JSON format and parsing the result. Inge van Stokkom of the Rijksmuseum worked out a detailed solution. We reproduce it here with a few changes. Assume you have NL labels and you want to look them up in AAT and fetch the AAT identifier and the EN prefLabelGVP:

·        Create a column by fetching a URL based on the column that contains the terms

'http://vocab.getty.edu/sparql.json?query=select+distinct*{?x+skos:inScheme+aat:;(xl:prefLabel|xl:altLabel)/gvp:term"' + escape(value, 'url') + '"@nl}'

·        Parse the JSON to obtain the URL:

value.parseJson().results.bindings[0].x.value

·        Parse the identifier out of the URL by adding a column based on this column:

value[27,37]

·        Use another query to fetch prefLabelGVP:

'http://vocab.getty.edu/sparql.json?query=select*where{?x+gvp:prefLabelGVP[skosxl:literalForm ?label];dc:identifier"' + escape(value, 'url') + '"}'

·        Parse the JSON to obtain the label:

value.parseJson().results.bindings[0].label.value

See Combination Full-Text and Exact String Match for another variant of a query that may work better for reconciliation.

3.14   Smart Resource Title

The most appropriate title for a resource depends on the type of resource, as shown in Resource Titles. The following query gets such titles.

By convention, a query parameter is indicated with "$" (here $x), as opposed to a free variable that's indicated with "?". Before using this query, you should bind the parameter $x.

Then the query tries a number of branches with UNION, and takes the first match (LIMIT 1). Each branch returns a maximum of 1 result (except the last one: rdfs:label could potentially return more). The ordering of branches is important: the first successful branch will be picked.

Another option would be to use OPTIONAL, a different ?label variable in each branch, and COALESCE, but that's less efficient.

select ?label {

        {$x a gvp:ObsoleteSubject; skos:prefLabel ?label}

  union {$x a gvp:Subject; xl:prefLabel [xl:literalForm ?label; dct:language gvp_lang:en]}

  union {$x a gvp:Subject; gvp:prefLabelGVP [xl:literalForm ?label]}

  union {$x a xl:Label; xl:literalForm ?label}

  union {$x a gvp:ScopeNote; rdf:value ?value.

         bind(if(strlen(?value)>50,concat(substr(?value,1,50),"..."),?value) as ?label)}

  union {$x a foaf:Agent; foaf:name ?label}

  union {$x a bibo:Document; bibo:shortTitle ?label}

  union {$x a skos:Concept; skos:prefLabel ?label}

  union {values ?type {schema:Place schema:Person schema:Organization}

         $x a ?type; ^foaf:focus/gvp:prefLabelGVP/xl:literalForm ?label}

  union {$x a schema:GeoCoordinates; schema:latitude ?lat; schema:longitude ?long.

         bind(concat(?lat,",",?long) as ?label)}

  union {$x a gvp:Biography; schema:description ?label}

  union {$x a schema:Event; dct:type/gvp:prefLabelGVP/xl:literalForm ?label}

  union {$x rdfs:label ?label}

} limit 1

4        TGN-Specific Queries

4.1       Places by Type

Remember that place types are AAT concepts. To find places by type, we could locate the needed AAT concept and use it. But it's easier and clearer to use the label of that concept. Remember that you have to specify the language. E.g. looking for "republics", we find 180:

select * {

  ?c gvp:prefLabelGVP [xl:literalForm ?lab];

     gvp:placeType [skos:prefLabel "republics"@en]}

Because AAT provides labels in plural (skos:prefLabel) and singular (skos:altLabel) and rdfs:label includes both, we can get away with being a little less precise and providing the type name in singular (same results):

select * {

  ?c gvp:prefLabelGVP [xl:literalForm ?lab];

     gvp:placeType [rdfs:label "republic"@en]}

4.2       Places, with English or GVP Label

For AAT we usually return the label preferred by GVP (gvp:prefLabelGVP/xl:literalForm, which is most often in English). But in TGN, prefLabelGVP is in the Vernacular language (e.g. Afghanistan has "Afghānestān"@prs-latn, that's Pashtu transliterated to Latin).

You may prefer the English label, but not all TGN places have an English label.

Thus we can use the coalesce() function to pick the English label if present, else the prefLabelGVP:

select ?c (coalesce(?labEn,?labGVP) as ?lab) {

  ?c gvp:placeType [rdfs:label "republics"@en]

  optional {?c xl:prefLabel [xl:literalForm ?labEn; dct:language gvp_lang:en]}

  optional {?c gvp:prefLabelGVP [xl:literalForm ?labGVP]}}

4.3       Places by Direct and Hierarchical Type

Let's try to find all countries. But what exactly is a country? AAT has a number of related concepts: dig a bit in the hierarchy of 300232420 "sovereign states". The scope notes can explain the difference:

·        300387506 "countries (sovereign states)": independent states, or regions once independent and still distinct in race, language, institutions, or historical memories. 
Example: "England, Scotland, and Northern Ireland are countries in the nation of the United Kingdom"

·        300128207 "nations": Sovereign states typically originating with a large group of people associated with a particular territory and possessing distinct ethnic, historical, or cultural characteristics

·        300232420 "sovereign states": Political units, such as nations, that exercise and are recognized internationally as possessing sovereignty.
Includes city-states like the Vatican and empires like the Roman Empire, the Russian Empire and the Kievan Rus

So "countries" is too narrow. Let's first search for placeTypePreferred "nations", we get 195:

select * {

  ?c gvp:prefLabelGVP [xl:literalForm ?lab];

     gvp:placeTypePreferred [rdfs:label "nations"@en]}

There are some "nations" having that in placeTypeNonPreferred (e.g. Armenia, Czechoslovakia, Cyprus, Soviet Union), 16:

select * {

  ?c gvp:prefLabelGVP [xl:literalForm ?lab];

     gvp:placeTypeNonPreferred [rdfs:label "nations"@en]}

We get both by searching in placeType (a generalization of placeTypePreferred and placeTypeNonPreferred), 211:

select * {

  ?c gvp:prefLabelGVP [xl:literalForm ?lab];

     gvp:placeType [rdfs:label "nations"@en]}

We get more results if we search for either "direct type" or "hierarchical type" (using SPARQL 1.1 Property Paths), 255:

select distinct * {

  ?c gvp:prefLabelGVP [xl:literalForm ?lab];

     gvp:placeType|(gvp:placeType/gvp:broaderGenericExtended) [rdfs:label "nations"@en]}

This includes sub-types of "nations", such as republics, island nations, etc.

Did you notice the "select distinct"? Without it you'll get 413 results, which includes 158 duplicates. The reason is that some (not all) TGN records have both "nations" and a sub-type thereof as direct placeTypes.  For example:

·        tgn:1000046 Bolivia has direct type: "republics" and its super-type "nations"

·        tgn:7024573 Greenland has direct type "island nations" without its super-type "nations"

You can find all TGN records with a sub-type of "nations" but without type "nations" like this, 44:

select * {

  ?c gvp:prefLabelGVP [xl:literalForm ?lab];

     gvp:placeType/gvp:broaderGenericExtended [rdfs:label "nations"@en].

  filter not exists {?c gvp:placeType [rdfs:label "nations"@en]}}

Finally, we get most results if we search for the hierarchical type "sovereign states", which is a super-type of "nations", 320.

select distinct * {

  ?c gvp:prefLabelGVP [xl:literalForm ?lab];

     gvp:placeType|(gvp:placeType/gvp:broaderGenericExtended) [rdfs:label "sovereign states"@en]}

You can get 3 lessons from this section:

·        To find places by type, find the most general applicable type in the AAT hierarchy (this takes some browsing and data examination)

·        Use gvp:placeType, which includes both placeTypePreferred (usually Current) and placeTypeNonPreferred (usually Historic)

·        Search for either direct or hierarchical type: gvp:placeType|(gvp:placeType/gvp:broaderGenericExtended)

4.4       Breakdown of Sovereign States by Type

Given the series of increasing numbers in the previous section, let's see the breakdown of "sovereign states" by placeTypePreferred:

select ?type (count (distinct ?x) as ?c) {

  ?x gvp:placeType|(gvp:placeType/gvp:broaderGenericExtended) [rdfs:label "sovereign states"@en];

     gvp:placeTypePreferred [gvp:prefLabelGVP [xl:literalForm ?type]]

} group by ?type order by desc(?c)

4.5       Inhabited Places That Were Sovereign States

Something interesting in the previous table: there are 11 inhabited places (cities/villages) that used to be sovereign states. Let's find them, and also output the parentString:

select * {

  ?c gvp:prefLabelGVP [xl:literalForm ?lab]; gvp:parentString ?parents;

     gvp:placeType|(gvp:placeType/gvp:broaderGenericExtended) [rdfs:label "sovereign states"@en];

     gvp:placeTypePreferred [rdfs:label "inhabited places"@en]}

4.6       Places by Type and Parent Place

Often we need to search for places of certain kind, within a certain locality (parent place).

·        For type, we use all lessons from Places by Direct and Hierarchical Type, thus the last query in that section.

·        For parent place, we use gvp:broaderPartitiveExtended, which finds all descendants of a given place. We also name the parent place by (any one of its) label(s). This works only if the place label is unique, which is true for the examples below.

Let's find archaeological sites in Egypt; there are 66:

select distinct * {

  ?place skos:inScheme tgn: ;

    gvp:placeType|(gvp:placeType/gvp:broaderGenericExtended) [rdfs:label "archaeological sites"@en];

    gvp:broaderPartitiveExtended [rdfs:label "Egypt"@en];

    gvp:prefLabelGVP [xl:literalForm ?name];

    gvp:parentString ?parents}

4.7       Places by Type, with placeTypePreferred

Let's find all "riverine bodies of water" in Germany (this includes rivers, streams, brooks, watercourses, waterfalls, etc). There are 7424, of which 301 are rivers. You may also want to print the placeTypePreferred, to distinguish between Aar (river) and Aar (stream), both in Hesse, Germany:

select distinct * {

  ?place skos:inScheme tgn: ;

    gvp:placeTypePreferred [gvp:prefLabelGVP [xl:literalForm ?type]];

    gvp:placeType|(gvp:placeType/gvp:broaderGenericExtended) [rdfs:label "riverine bodies of water"@en];

    gvp:broaderPartitiveExtended [rdfs:label "Germany"@en];

    gvp:prefLabelGVP [xl:literalForm ?name];

    gvp:parentString ?parents}

Notes:

·        If you forget the "distinct", you'll get 7673 results, of which 249 duplicates. The likely reason is that these 249 places have two types, both sub-types of "riverine bodies of water" (Places by Direct and Hierarchical Type includes a similar analysis regarding "nations" and sub-types thereof).

·        This query is inspired by the sample queries at this XML-based service.

4.8       Places by Triple FTS

Place names offer great ambiguity, since when people migrate they often carry old place names to new places; and some places or features are named after others. Just try a couple of searches on the Getty TGN Site to be convinced:

·        London: finds 142 places, amongst them:

·        About 20 boroughs of Greater London, since their Official names are "London Borough of .."

·        About 20 places in the US (3 in Ohio alone!), Canada, Jamaica, etc.

·        Hydrographic features, such as 10 creeks, branches, ditches, outflow canals, the Bay of London, the London Basin

·        London Bridge: there are a railway station, a ridge, a stream and a creek named that way!

·        Green Park, since one of its labels has a qualifier "London" to disambiguate it from other such parks (how ironic this increases the ambiguity for the name "London"!)

·        London Road Station, located in Manchester

·        San Francisco: finds 683 places in the US, Mexico, Spain (including Tenerife and Majorca), Chile, even Antarctica

·        Even if you limit to the United States, there are 65 matches

·        Even if you use the nickname Frisco and limit to the United States, there are 61 matches

·        So it's quite hard to find the San Francisco, California

Therefore mechanisms to add specificity to TGN searches are very desirable. This section describes such a mechanism.

 

TGN places form a containment hierarchy (gvp:broaderPartitiveExtended) and have types with a useful hierarchy (gvp:placeType/gvp:broaderGenericExtended, see TGN Place Types). Extending the approach from Find Person Occupations by Double FTS, we get powerful ways to find places by using any combination of words from the labels of:

·        The place: we limit to skos:inScheme tgn:

·        Its types and super-types: we limit to skos:inScheme aat:

·        Its parent places: we limit to skos:inScheme tgn:

·        Unlike searching for AAT concepts, we don't use wildcards in luc:term, because there are many useful short words that increase specificity (e.g. "CA" is the USPS code for California and an ISO code for Canada).

For example, the following query is looking for places called "Athos" having type "religious" (center), located in "Greece".

select distinct * {

  ?p   skos:inScheme tgn:; luc:term "athos";     gvp:prefLabelGVP [xl:literalForm ?pLab].

  ?t1  skos:inScheme aat:; luc:term "religious"; gvp:prefLabelGVP [xl:literalForm ?t1Lab].

  ?pp1 skos:inScheme tgn:; luc:term "greece";    gvp:prefLabelGVP [xl:literalForm ?ppLab].

  ?p gvp:placeType|(gvp:placeType/gvp:broaderGenericExtended) ?t1.

  ?p gvp:broaderPartitiveExtended ?pp1.

}

This will find Mount Athos, which is a religious center, inhabited place and mountain in Greece.

Note: To try the searches in the following tables, you need to edit the query in an obvious way.

There are many other ways to find Mount Athos:

p

t1

t2

pp

Result

Athos

religious

 

Greece

Mount Athos

mount

religious

 

Mount Athos (Greece), Mount Aracadi (Greece), Mount Zion (Israel), Mount Grace Priory (England), Convent of Mount Sinai (Egypt)

religious

mountain

Mount Athos. Despite their names, all the others are not actually mountains

Athos

religious

 

Mount Athos. If you don't specify "religious" you'll get a bunch of administrative regions

religious

China

Religious centers in China (e.g. Lhasa in Tibet)

Inca

 

Centers related to the Inca culture (e.g. Machupicchu): 9

under*

 

Underwater sites, undersea features; but also underground stations

You need to be mindful of performance for those queries, especially if you don't specify a search keyword for ?p, because then Ontotext GraphDB needs to look for gvp:broader*Extended of all combinations of the other variables, and these can be a lot.

·        Limiting to skos:inScheme aat:, there are 50 ?t1 matching "religious" and 30 ?t2 matching "mountain". That's 150 combinations, and looking for all their gvp:broader*Extended is already a lot of work.

·        If you don't limit ?t2 to skos:inScheme aat:, there are 27k TGN places matching "mountain". Together with ?t1, that makes 1.3M combinations, and looking for all their gvp:broader*Extended takes forever.

·        Furthermore, if you want to check for two types ?t1 and ?t2, you may have to use the direct type only and not the hierarchical type, since that slows down the query

4.9       Places by FTS Parents

We modify the query to include two parent places in sequence.

p

t1

pp1

pp2

Result

Sofia

Mexico

 

Sofia, New Mexico, US

ranch

Mexico

 

Ranches in Mexico: 4762

Sofia

Bulgaria

 

Sofia (city) and Sofiya-Grad (administrative division)

San Francisco

CA

US

San Francisco, California. We use the USPS code of California

Frisco

CA

US

San Francisco, California. We also use its nickname

E.g. for the last row:

select * {

  ?p   skos:inScheme tgn:; luc:term "frisco"; gvp:prefLabelGVP [xl:literalForm ?pLab].

  ?pp1 skos:inScheme tgn:; luc:term "CA";     gvp:prefLabelGVP [xl:literalForm ?pp1Lab].

  ?pp2 skos:inScheme tgn:; luc:term "US";     gvp:prefLabelGVP [xl:literalForm ?pp2Lab].

  ?p   gvp:broaderPartitiveExtended ?pp1.

  ?pp1 gvp:broaderPartitiveExtended ?pp2}

These searches are considerably more powerful than the Getty website search:

·        It searches for words that are parts of the corresponding label

·        It allows hierarchical type search (direct type or its super-types)

·        It allows conjunctive type or super-place search (e.g. "religious" and "mountain" together)

·        It allows any super-places, not just nations

4.10   Capitals by Association

Cities usually are not immediate children of the entity they are capital of, e.g.:

·        above Sofia (city) is Sofiya-grad (administrative region),

·        above Sacramento (city) is Sacramento (county).

TGN includes an associative relation that connects capitals to entities, this finds 352:

select * {

  ?cap gvp:tgn3201_capital_of ?ent.

  ?cap skos:inScheme tgn:; gvp:prefLabelGVP [xl:literalForm ?capLab].

  ?ent skos:inScheme tgn:; gvp:prefLabelGVP [xl:literalForm ?entLab]}

Note: currently there is a bit of hesitation about the true code of this relation, so if the above doesn't work, try with gvp:tgn3101_capital_of.

4.11   Members of the European Union

Let's find the member countries of the European Union by associative relation, finds 25:

select * {

  ?c gvp:tgn3317_member_of [rdfs:label "European Union"@en];

     gvp:prefLabelGVP [xl:literalForm ?lab]}

4.12   Members of the United Nations

Let's find the member countries of the United Nations (190). But this time let's fetch:

·        The English labels, not the GVP-preferred labels (which are usually in the Vernacular). All the UN members have an English label in TGN.

·        Optionally the date of joining, gvp:estStart (turns out no nation has ever left the UN, so we don't need gvp:estEnd). We get that from an anonymous node (being rdf:Statement) having the same rdf:subject, rdf:predicate and rdf:object as the first line of the query.

select * {

  ?c gvp:tgn3317_member_of [rdfs:label "United Nations"@en];

     xl:prefLabel [xl:literalForm ?lab; dct:language gvp_lang:en].

  optional {

    [rdf:subject ?c;

     rdf:predicate gvp:tgn3317_member_of;

     rdf:object [rdfs:label "United Nations"@en];

     gvp:estStart ?start]}}

4.13   Geo Chart with sgvizler

Say we want to display the year each country joined the UN on a Geo Chart, with the color of each country representing the year. We need the ?country name and the ?year as an integer:

select ?country ?year {

   [rdf:subject [xl:prefLabel [xl:literalForm ?country; dct:language gvp_lang:en]];

    rdf:predicate gvp:tgn3317_member_of;

    rdf:object [rdfs:label 'United Nations'@en];

    gvp:estStart ?start].

   bind(xsd:integer(str(?start)) as ?year)}

We need to fiddle with datatypes a bit:

·        str() removes the datatype xsd:gYear

·        xsd:integer() converts to int, so the color scale can be set right.

We can then use the excellent sgvizler library that makes Google Charts out of SPARQL results. You can see the code and a live result at http://jsfiddle.net/valexiev/NULCH/. A static copy of the result:

/doc/img/029-year-joining-of-UN.png

A large part of the world joined the UN right after WW2. The former Soviet Union republics and new East European states joined around 1992. Who joined last? Timor-Leste and Switzerland in 2002 (those Swiss really value their independence!).

Hints:

·        Explore the available chart types and options at Google Charts Gallery

·        Specify http://vocab.getty.edu/sparql.json as the endpoint URL (not merely http://vocab.getty.edu/sparql), so the chart can get the data in machine-readable JSON form.

data-sgvizler-endpoint="http://vocab.getty.edu/sparql.json"

·        Note: former versions used http://vocab.getty.edu/sparql.rj as the JSON endpoint, but now this returns the SPARQL editing UI

·        Read the Google GeoChart documentation for the available formats and options.

·        Tweak data-sgvizler-chart-options following the examples: the separator is |, and you should not enclose string values in anything

4.14   Column Chart with sgvizler

Let's explore the number of nations that joined the UN, grouping in buckets of 5 years:

select ?year (count(*) as ?countries) {

    [rdf:predicate gvp:tgn3317_member_of;

      rdf:object [rdfs:label 'United Nations'@en];

      gvp:estStart ?start].

    bind (xsd:integer(xsd:integer(str(?start))/5)*5 as ?year)      

} group by ?year order by ?year

Again we need to fiddle with datatypes a bit:

·        str() removes the datatype xsd:gYear

·        The inner xsd:integer() converts to int, so we can apply the division operation

·        The outer xsd:integer() rounds down to integer

It takes more fiddling to get the Google ColumnChart options  right (e.g. hAxis.format=####|legend.position=none). You can see the code and live result at http://jsfiddle.net/valexiev/TCr59/, and here is a static copy:

/doc/img/030-growth-of-UN.png

4.15   Countries and Capitals By Type and Containment

Rather than using the associative relation tgn3201_capital_of, let's try to find nations and capitals by type and containment. Here we use direct type for ?capital (since that type doesn't have any subtypes), but hierarchical type for ?nation.

select distinct * {

  ?capital gvp:broaderPartitiveExtended ?nation.

  ?capital gvp:placeType [rdfs:label "national capital"@en].

  ?nation gvp:placeType|(gvp:placeType/gvp:broaderGenericExtended) [rdfs:label "sovereign state"@en].

  ?capital gvp:prefLabelGVP [xl:literalForm ?capital_name].

  ?nation gvp:prefLabelGVP [xl:literalForm ?nation_name]}

4.16   Places by Coordinate Bounding Box

Find places whose coordinates fall in a specific  rectangle (in this case, enclosing the continental territory of The Netherlands):

select ?place ?name ?lat ?long {

  ?place skos:inScheme tgn: ;

    foaf:focus [wgs:lat ?lat; wgs:long ?long];

    gvp:prefLabelGVP [xl:literalForm ?name].

  filter (50.787185 <= ?lat && ?lat <= 53.542265 && 3.389722 <= ?long && ?long <= 7.169019)}

Note that the comparison may be problematic for rectangles that cross the +/-180 degree meridian.

Take a look at TGN Overview: the coordinates are available:

·        One hop away from the main node (foaf:focus) as wgs:lat, wgs:long

·        Two hops away from the main node (foaf:focus/schema:geo) as schema:latitude, schema:longitude.

We prefer the shorter variant.

Note: a few regions like the Great Lakes Region include a bounding box in foaf:focus/schema:geo/schema:box.

This returns 23084 places:

·        Not all of them are in The Netherlands: some are in neighboring countries.

·        On the other hand, this excludes overseas territories of The Netherlands

4.17   Places Within Bounding Box

Ontotext GraphDB includes some Geo-spatial Extensions that allow you to simplify the above query, make it more efficient, and avoid the confusion around the +/-180 degree meridian:

prefix ontogeo: <http://www.ontotext.com/owlim/geo#>

select * {

  ?place skos:inScheme tgn: ;

    foaf:focus [ontogeo:within(50.787185 3.389722 53.542265 7.169019)];

    gvp:prefLabelGVP [xl:literalForm ?name]}

We don't mention the coordinate fields, because ontogeo:within() looks in wgs:lat and wgs:long.

4.18   Places by Type Within Bounding Box

Let's specialize the previous query and look for castles around The Netherlands, we get 170:

prefix ontogeo: <http://www.ontotext.com/owlim/geo#>

select distinct * {

  ?place skos:inScheme tgn: ;

    gvp:placeType|(gvp:placeType/gvp:broaderGenericExtended) [rdfs:label "castles (fortifications)"@en];

    foaf:focus [ontogeo:within(50.787185 3.389722 53.542265 7.169019)];

    gvp:prefLabelGVP [xl:literalForm ?name];

    gvp:parentString ?parents}

4.19   Places Outside Bounding Box (Overseas Possessions)

Let's find places that are administratively part of The Netherlands, but are not within the bounding box of its main continental territory (i.e. Overseas Possessions):

select ?place ?name ?lat ?long {

  ?place skos:inScheme tgn: ;

    foaf:focus [wgs:lat ?lat; wgs:long ?long];

    gvp:prefLabelGVP [xl:literalForm ?name];

    gvp:broaderPartitiveExtended [rdfs:label "The Netherlands"@en]

  filter (!(50.787185 <= ?lat && ?lat <= 53.542265 && 3.389722 <= ?long && ?long <= 7.169019))}

This includes 564 places in Bonaire, Sint Eustatius and Saba (Lesser Antilles)

4.20   Places Nearby Each Other

Ontotext GraphDB includes some Geo-spatial Extensions that go beyond searching within a rectangle: you can search for places nearby other places (by radius), compute distance, and search within a polygon.

Let's look for castles nearby mountains (within 10km). We look only for direct types gvp:placeType, because adding 2 hierarchical types slows down the query considerably but doesn't find more results.

The distance can be specified in "km" (default) or in "mi". We also return the distance (always in km), and sort by it.

TGN knows about 719 castles within 10km of a mountain (e.g. Castillo de Atalaya is exactly at Sierra de la Atalaya in Murcia, Spain). Get your hiking shoes ready!!

prefix ontogeo: <http://www.ontotext.com/owlim/geo#>

select * {

  ?castle skos:inScheme tgn:;

    gvp:placeType [rdfs:label "castles (fortifications)"@en];

    gvp:prefLabelGVP [xl:literalForm ?castle_name];

    foaf:focus [wgs:lat ?castle_lat; wgs:long ?castle_long];

    gvp:parentString ?parents.

  ?mountain skos:inScheme tgn:;

    gvp:placeType [rdfs:label "mountains"@en];

    gvp:prefLabelGVP [xl:literalForm ?mountain_name];

    foaf:focus [wgs:lat ?mountain_lat; wgs:long ?mountain_long;

                ontogeo:nearby(?castle_lat ?castle_long 10)].

  bind (ontogeo:distance(?castle_lat, ?castle_long, ?mountain_lat, ?mountain_long) as ?dist)

} order by asc(?dist)

Notes:

·        onto:within and onto:nearby are predicates, so they take their arguments without commas (this is in fact an rdf:List), and are used in a predicate position

·        onto:distance() is a function, so it takes its arguments comma-separated, and is used as a function (in this case in a bind())

·        You can see more examples of Ontotext GraphDB-specific Geo queries at the link above

·        We first find all castles (total of 490) and then restrict mountains (total of 25282) by ontogeo:nearby(). This method is significantly faster than either of:

·        First find castles, then find mountains, then filter by ?dist<=10

·        First find mountains (greater cardinality), then restrict castles (smaller cardinality)

5        ULAN-Specific Queries

5.1       Agents by Type

Let's find all painters, using AAT concept aat:300025136  painters (artists). We could use the AAT code, but it's easier to use the label directly. Search by exact literal is very fast (no text comparison is performed), but you must specify the lang tag. We use broaderGenericExtended to also find sub-types of "painters". (See Places by Type and the following few sections for similar queries)

select distinct * {

  ?c gvp:prefLabelGVP [xl:literalForm ?lab];

     gvp:agentType|(gvp:agentType/gvp:broaderGenericExtended)

       [rdfs:label "painters (artists)"@en]}

·        We use "distinct" to eliminate duplicates, since a person may have several agentType under "painters (artists)".

The Getty website displays "painter" for persons (e.g. Dürer) and "painters" for groups (e.g. Dürer Workshop). But these terms represent the same AAT concept, which allows you to search for persons and groups uniformly. If you want to limit to one of them, you have to restrict by type. E.g. for groups:

select distinct * {

  ?c a gvp:GroupConcept; gvp:prefLabelGVP [xl:literalForm ?lab];

     gvp:agentType|(gvp:agentType/gvp:broaderGenericExtended)

       [rdfs:label "painters (artists)"@en]}

5.2       Associative Relations of Agent

Let's get all associative relationships of ulan:500115493 Duerer, Albrecht, showing relationship type, associated subject, preferred name, preferred display biography, and display date (comment)

select * {

  ulan:500115493 ?rel ?x.

  ?rel sesame:directSubPropertyOf skos:related.

  ?x gvp:prefLabelGVP/xl:literalForm ?name.

  ?x foaf:focus/gvp:biographyPreferred/schema:description ?bio.

  optional {[

    rdf:subject ulan:500115493;

    rdf:predicate ?rel;

    rdf:object ?x;

    rdfs:comment ?comment]}}

We know that associative relationships are sub-properties of skos:related. However, rdfs:subPropertyOf is reflexive, so it would also return skos:related itself. So we use the special property sesame:directSubPropertyOf to find proper sub-properties. Another option using only the standard rdfs:subPropertyOf would be:

select * {

  ulan:500115493 ?rel ?x.

  ?rel rdfs:subPropertyOf skos:related filter (?rel != skos:related).

  ?x gvp:prefLabelGVP/xl:literalForm ?name.

  ?x foaf:focus/gvp:biographyPreferred/schema:description ?bio.

  optional {[

    rdf:subject ulan:500115493;

    rdf:predicate ?rel;

    rdf:object ?x;

    rdfs:comment ?comment]}}

We find the display date (comment) in an optional rdf:Statement with addressing components rdf:subject, rdf:predicate, rdf:object. Since we don't care about the identity of that statement, we use a blank node [..]

5.3       Female Artists

We want to find female artists, i.e. gender being aat:300189557 female and type being any descendant of Guide Term aat:300025101 <people in the visual arts>. We'll use the AAT subjects directly; we have used the corresponding labels in other queries. We want to return subject, preferred name, and display bio:

select * {

  filter exists {?x gvp:agentType/gvp:broaderGenericExtended aat:300025101}

  ?x gvp:prefLabelGVP/xl:literalForm ?name;    

     foaf:focus/gvp:biographyPreferred [

       schema:gender aat:300189557;

       schema:description ?bio]}

ULAN knows quite a lot of female artists: 21573.

·        We use "filter exists" for the test "artist" (aat:300025101) because a person may have several gvp:agentType that connect to that profession.

·        In contrast, the test "female" (aat:300189557) is expressed as a simple triple pattern, because a person has only one gender (at least in his/her preferred biography).

·        We know that an agent type is a concept not Guide Term, so unlike Agents by Type, we've skipped the direct "gvp:agentType" branch from the second property path.

5.4       Female Artists as a Hobby

To make this more interesting, let's find females that have visual arts as a hobby or second occupation. We look for the desired occupationin gvp:agentTypeNonPreferred and exclude it from gvp:agentTypePreferred using "filter not exists":

select * {

  filter  exists {?x gvp:agentTypeNonPreferred/gvp:broaderGenericExtended aat:300025101}

  filter not exists {?x gvp:agentTypePreferred/gvp:broaderGenericExtended aat:300025101}

  ?x gvp:prefLabelGVP/xl:literalForm ?name;

     foaf:focus/gvp:biographyPreferred [

       schema:gender aat:300189557;

       schema:description ?bio]}

This finds 86, e.g. ulan:500293977 Victoria, Princess of Great Britain (English princess, photographer, 1868-1935).

This query is a little slower (12 sec) since it has to filter through a lot of data. I tried to speed it up using two approaches, but there was no noticeable improvement

·        Moving the condition "female" (which is quite selective since ULAN just like VIAF or Wikidata has a "gender gap") to the top of the query

·        Connecting the path "?x foaf:focus/gvp:biographyPreferred ?b" to an intermediate variable ?b so it won't have to be navigated twice (once for the condition "female", then to fetch ?bio)

5.5       Native American Painters

Find painters (type aat:300025136 or any of its descendants) with nationality/culture Native American (aat:300017437 or any of its descendants).

select * {

  {select distinct ?x {

    ?x foaf:focus/(schema:nationality|(schema:nationality/gvp:broaderGenericExtended)) aat:300017437;

       gvp:agentType|(gvp:agentType/gvp:broaderGenericExtended) aat:300025136}}

  ?x gvp:prefLabelGVP/xl:literalForm ?name;

     foaf:focus/gvp:biographyPreferred/schema:description ?bio}

This finds 499.

·        Unlike the previous two queries, here both AAT subjects are Concepts, so they may be applied directly or as a descendant, and we need to use alternative "|" in both property paths.

·        If you find it hard to read these property paths, check out ULAN Overview and make sure you get your bearings in the graph.

·        If you don't use "distinct", it returns 1487 rows since some people have more than one agentType under "painter" and more than one nationality under "Native American". We prefer to put this "distinct" into a sub-query "{select …}"

·        If you try to use two filters (for agentType "painter" and for nationality "Native American"), you'll find the query to be too slow, because these filters don't have a selective base result-set to filter upon.

5.6       Names of Native American Painters

You may notice some interesting names in the previous query, so let's explore them by counting and sorting by popularity.

ULAN doesn't really have a concept of "family name" (indeed, this is very hard to define in a way not biased to a particular culture. But in gvp:prefLabelGVP, GVP consistently puts family names first and separated by comma, so we use that with a replace() and regex. Single Native American names like "Kicking Bear" that don't include a comma come through unchanged:

select ?family (count(*) as ?c) {

  {select distinct ?x {

    ?x foaf:focus/(schema:nationality|(schema:nationality/gvp:broaderGenericExtended)) aat:300017437;

       gvp:agentType|(gvp:agentType/gvp:broaderGenericExtended) aat:300025136}}

  ?x gvp:prefLabelGVP/xl:literalForm ?name.

   bind (replace(str(?name),",.*","") as ?family)

} group by ?family order by desc(?c)

The most popular Native American Painter name is Martinez. But we also get some very interesting names like Bad Heart Bull, Standing Soldier, Kills Two, Little Chief, and Bear.

5.7       Architects Born in the 14th or 15th Century

Select all architects (type aat:300024987 or its descendants) with birth date between 1300 and 1499. We'll take a shortcut: search only in the preferred biography: observation shows that if there is a birth date at all, it will be found in the preferred biography:

select * {

  ?x a gvp:PersonConcept;

     gvp:prefLabelGVP/xl:literalForm ?name;

     gvp:agentTypePreferred|(gvp:agentTypePreferred/gvp:broaderGenericExtended) aat:300024987;

     foaf:focus/gvp:biographyPreferred [

       schema:description ?bio;

       gvp:estStart ?birth]

     filter ("1300"^^xsd:gYear < ?birth && ?birth <= "1499"^^xsd:gYear)}

We have to provide proper types xsd:gYear to the query literals in order for the comparisons to work.

5.8       Indian and Pakistani Architectural Groups

Find Indian and Pakistani groups that are associated with the creation of architecture and have existed in 1947:

·        rdf:type gvp:GroupConcept (corporate bodies)

·        schema:nationality aat:300018863 Indian or aat:300266840 Pakistani or descendants thereof

·        gvp:agentType aat:300312082 architectural firm or aat:300024987 architects or descendants thereof

·        gvp:estStart <= 1947 and estEnd > 1947 or missing

select ?x ?name ?bio ?start ?end {

  ?x a gvp:GroupConcept.

  {select ?x {?x gvp:agentType|(gvp:agentType/gvp:broaderGenericExtended) ?TYP}

   values ?TYP {aat:300312082 aat:300024987}}.

  {select ?x {?x foaf:focus/schema:nationality ?NAT}

   values ?NAT {aat:300018863 aat:300266840}}.

  ?x gvp:prefLabelGVP/xl:literalForm ?name;

     foaf:focus/gvp:biographyPreferred ?biography.

  ?biography schema:description ?bio;

     gvp:estStart ?start.

  optional {?biography gvp:estEnd ?end}

  filter (?start <= "1947"^^xsd:gYear && (!bound(?end) || ?end > "1947"^^xsd:gYear))}

We use the following SPARQL features:

·        values ?VAR {list}: iterates ?VAR over list

·        sub-queries {select …}

·        optional{} and then testing with bound()

5.9       Non-Italians Who Worked in Italy

Find non-Italians who worked in Italy and lived during a given time range

·        Having event that took place in tgn:1000080 Italy or any of its descendants

·        Birth date between 1250 and 1780

·        Just for variety, we look for artists as descendants of facets ulan:500000003 "Corporate bodies" or ulan:500000002 "Persons, Artists", rather than having type "artist" as we did in previous queries. In the previous query we used values{..} but we here use filter(in(..)).

·        Not having nationality aat:300111198 Italian or any of its descendants

select ?x ?name ?bio ?birth {

  {select distinct ?x

    {?x foaf:focus/bio:event/(schema:location|(schema:location/gvp:broaderExtended)) tgn:1000080-place}}

  ?x gvp:prefLabelGVP/xl:literalForm ?name;

     foaf:focus/gvp:biographyPreferred [

       schema:description ?bio;

       gvp:estStart ?birth].

  filter ("1250"^^xsd:gYear <= ?birth && ?birth <= "1780"^^xsd:gYear)

  filter exists {?x gvp:broaderExtended ?facet.

    filter (?facet in (ulan:500000003, ulan:500000002))}

  filter not exists {

    ?x foaf:focus/(schema:nationality|(schema:nationality/gvp:broaderExtended)) aat:300111198}}

With all these nested clauses, the query is quite expensive. We put the most selective clause first, as a sub-query. There are 378 results and they are accurate:

·        In many cases you can tell directly from the display biography that the result is relevant, e.g. "German sculptor, active 1393-1405 in Italy"

·        How about ulan:500046216 Antoine de Lonhy (Netherlandish illuminator, active ca. 1460-1462)? The short bio does not say he was active in Italy. But there are 3 events, and the last one says: "active: Avigliana (Torino province, Piedmont, Italy)"

The query works!

5.10   Artists Associated to a Given Patron or His Family

Find artists associated in any way to a given patron (ulan:500122256 Louis XIV, King of France) or any of his family (only direct relations are considered).

·        Agent type aat:300025101 <people in the visual arts> or descendant

·        Any relation skos:related to the root person ulan:500122256

·        Or to someone who has associative relationship with identifier "15xx" to the root

select ?x ?name ?bio {

  {select distinct ?x {

    {bind(ulan:500122256 as ?y)}

    union {?y ?rel ulan:500122256.

           ?rel rdfs:subPropertyOf skos:related; dc:identifier ?id

           filter (strstarts(?id,"15"))}

    ?x skos:related ?y;

       gvp:agentType|(gvp:agentType/gvp:broaderExtended) aat:300025101}}.

  ?x gvp:prefLabelGVP/xl:literalForm ?name;

     foaf:focus/gvp:biographyPreferred/schema:description ?bio}

Order of processing in the query:

·        We first fetch the family circle of the patron: that's 8 people ?y according to ULAN. We use a union that adds the root and these 8 people.

·        Then find distinct related ?x (the same artist may be related by many different relations or to many family members)

·        Finally fetch the data of ?x

You'll find the query to be slow if you try to it in other order

5.11   German, Dutch, Flemish printmakers, listed with their teachers

Find German, Dutch, Flemish printmakers, born 16-17th centuries, listed with their teachers

·        Nationality aat:300111192 German, aat:300020929 Netherlandish, aat:300111156 Belgian or a descendant thereof

·        agentType aat:300025164 "printmakers" or a descendant thereof. We must use agentType and not agentTypePreferred, since in old times, most often "printmaker" was a second profession of artists.

·        Relation: one of {gvp:ulan1102_student_of  gvp:ulan1105_apprentice_of gvp:ulan1108_influenced_by gvp:ulan1112_master_was}

·        Born between 1500 and 1699

·        Sort by teachers.

·        List preferred names, display biography, relationship type

This query is quite expensive (takes 20 sec to first page of results, and about a minute for all results), so please be patient.

select ?student ?s_name ?s_bio ?rel ?teacher ?t_name ?t_bio {

  {select * {

    values ?rel {gvp:ulan1102_student_of gvp:ulan1105_apprentice_of

                 gvp:ulan1108_influenced_by gvp:ulan1112_master_was}

    ?student ?rel ?teacher}}

  filter exists {values ?nationality {aat:300111192 aat:300020929 aat:300111156}

    ?student foaf:focus/(schema:nationality|(schema:nationality/gvp:broaderExtended)) ?nationality}

  filter exists {?student gvp:agentType|(gvp:agentType/gvp:broaderExtended) aat:300025164}

  filter exists {?student foaf:focus/gvp:biographyPreferred/gvp:estStart ?s_birth.

            filter("1500"^^xsd:gYear < ?s_birth && ?s_birth <= "1699"^^xsd:gYear)}

  ?student gvp:prefLabelGVP [xl:literalForm ?s_name];

     foaf:focus [gvp:biographyPreferred [schema:description ?s_bio]].

  ?teacher gvp:prefLabelGVP [xl:literalForm ?t_name];

     foaf:focus [gvp:biographyPreferred [schema:description ?t_bio]].

} order by ?t_name

Please note that a few pairs might seem duplicates but are not, e.g.:

student

s_name

s_bio

rel

teacher

t_name

t_bio

ulan:500003802

Heusch, Willem de

Dutch landscapist and etcher, 1625-1692

gvp:ulan1102_student_of

ulan:500032845

Both, Jan

Dutch painter, ca. 1618-1652

ulan:500003802

Heusch, Willem de

Dutch landscapist and etcher, 1625-1692

gvp:ulan1108_influenced_by

ulan:500032845

Both, Jan

Dutch painter, ca. 1618-1652

Here Willem de Heusch was both a student of, and influenced by Jan Both, so two rows are returned

5.12   Artists Whose Identity May be Associated or Confused With Another

·        Persons with agent type aat:300025101 <people in the visual arts> or a descendant

·        Having relations (gvp:ulan1005_possibly_identified_with, gvp:ulan1006_formerly_identified_with gvp:ulan1007_distinguished_from, gvp:ulan1008_meaning_-usage_overlaps_with) to another

select ?x ?x_name ?x_bio ?rel ?y ?y_name ?y_bio {

   {select ?rel

     {values ?rel {gvp:ulan1005_possibly_identified_with gvp:ulan1006_formerly_identified_with

                   gvp:ulan1007_distinguished_from gvp:ulan1008_meaning_-usage_overlaps_with}}}

   ?x ?rel ?y.

   filter exists {?x gvp:agentTypePreferred|(gvp:agentTypePreferred/gvp:broaderExtended) aat:300025101}

   ?x gvp:prefLabelGVP [xl:literalForm ?x_name];

         foaf:focus [gvp:biographyPreferred [schema:description ?x_bio]].

   ?y gvp:prefLabelGVP [xl:literalForm ?y_name];

         foaf:focus [gvp:biographyPreferred [schema:description ?y_bio]].

}

SPARQL features:

·        We wrap VALUES in a sub-query, else the overall query is very slow

·        Because these are symmetric relations, we add restriction "str(?x) < str(?y)" so we get each pair only once

5.13   Ordered Hierarchy of Given Subject

Let's get all descendants of ulan:500125789 National Museums in Berlin (very easy, see Descendants of a Given Parent). But we also want to display them in a properly ordered hierarchy. Now it gets trickier, since we need to determine a global order from gvp:displayOrder and the hierarchy structure. So the order of a child should use and extend the order of its parent, recursively.

SPARQL does not allow the definition of recursive queries (unlike SPIN). So we have to cheat: we assume what is the maximum number of descendant levels for the given root (in this case it's 3) and we "unroll" the recursive chain that many times. Starting from the root, we write 3+1 layers of progressively complex queries and UNION them.

select ?x ?o1 ?o2 ?o3 ?name ?bio {

  {bind(ulan:500125789 as ?x)} union

  {?x gvp:broaderPreferred ulan:500125789; gvp:displayOrder ?o1} union

  {?x gvp:broaderPreferred [gvp:broaderPreferred ulan:500125789; gvp:displayOrder ?o1]; gvp:displayOrder ?o2} union

  {?x gvp:broaderPreferred [gvp:broaderPreferred [gvp:broaderPreferred ulan:500125789; gvp:displayOrder ?o1]; gvp:displayOrder ?o2]; gvp:displayOrder ?o3}.

  ?x gvp:prefLabelGVP/xl:literalForm ?name;

     foaf:focus/gvp:biographyPreferred/schema:description ?bio

} order by ?o1 ?o2 ?o3

The longest chain looks like this:

/doc/img/032-ordered-hierarchy-query.png

The chains before it are shortened versions of it, removing layers from the left.

·        The global order is represented by the variables ?o1 ?o2 ?o3. Luckily, a null value (unbound) sorts before a number.

·        It's important to use gvp:broaderPreferred not merely gvp:broader: GVP vocabularies are poly-hierarchical, but here we want to depict the mono-hierarchy consisting of preferred parents.

E.g. ulan:500125801 Friedrichswerdersche Kirche collection has two parents:

·        National Museums in Berlin

·        National Gallery < National Museums in Berlin

A poly-hierarchy would display Friedrichswerdersche Kirche twice in the hierarchy (and if it had any children, these would be duplicated too)

5.14   Ancient Artists or Groups by Nationality

·        Find artists (type aat:300025101 <people in the visual arts> or a descendant) or groups (type aat:300157460 studios (organizations) or a descendant)

·        Who have start (birth or foundation) date < -0001 (i.e. BC)

·        Sort by preferred nationality

select ?nationality ?x ?name ?bio {

  filter (?type in (aat:300025101, aat:300157460))

  ?x gvp:agentTypePreferred|(gvp:agentTypePreferred/gvp:broaderExtended) ?type.

  ?x foaf:focus [gvp:biographyPreferred [gvp:estStart ?start]].

  filter(?start <= "-0001"^^xsd:gYear).

  ?x foaf:focus [gvp:nationalityPreferred [gvp:prefLabelGVP [xl:literalForm ?nationality]]];

     gvp:prefLabelGVP [xl:literalForm ?name];

     foaf:focus [gvp:biographyPreferred [schema:description ?bio]]

} order by ?nationality

Note that you have to specify the appropriate format and data type for the Year literal, so the comparison can work. There are over 3k results, and the query takes a couple of minutes. This might return some modern artists, for which GVP has no info about life span. As explained in Estimated Dates, gvp:estStart and gvp:estEnd are set as outer bounds of the life dates. GVP prefers to set them too wide rather than too narrow, so there are some biographies with date range 0001…2090

5.15   Art Repositories in the USA by State

Find all art repositories in the United States, sorted by state

·        Agent type in (300312243, 300312242, 300312241, 300312281, 300264595) and their descendants

·        Has event with type "location (activity or state)"

·        Whose location is a descendant of the USA (tgn:7012149 United States)

·        Display subject, preferred name, display bio, location, state

·        Sort by state, location

select ?x ?name ?bio ?loc_name ?st_name {

  filter (?type in (aat:300312243, aat:300312242, aat:300312241, aat:300312281, aat:300264595))

  ?x gvp:agentTypePreferred|(gvp:agentTypePreferred/gvp:broaderExtended) ?type.

  ?x gvp:prefLabelGVP/xl:literalForm ?name;

     foaf:focus [

       bio:event [dct:type [rdfs:label "location (activity or state)"@en]; schema:location ?place];

       gvp:biographyPreferred [schema:description ?bio]].

  ?location foaf:focus ?place.

  ?state gvp:broaderPartitive tgn:7012149;

    gvp:placeTypePreferred [rdfs:label "states (political divisions)"@en];

    gvp:prefLabelGVP [xl:literalForm ?st_name].

  {filter (?location=?state)} union {?location gvp:broaderPartitiveExtended ?state}.

  ?location gvp:prefLabelGVP [xl:literalForm ?loc_name]

} order by ?st_name ?loc_name

That's probably the most complex sample query. You need a good grasp of both the overall semantic representation and ULAN specifics to understand it. We use the following SPARQL features:

·        Property paths, including alternative "|" and chaining "/"

·        Blank nodes "[...]"

·        Access concepts ("location (activity or state)" and "states (political divisions)") by exact label. Don't forget to include the language tag!

·        UNION between a FILTER and a triple pattern (?location can either be the whole state, or some place under the state)

5.16   Popes and Their Reigns

select ?x ?name ?bio ?start ?end {

  ?x gvp:agentTypePreferred [rdfs:label "popes"@en];

     gvp:prefLabelGVP [xl:literalForm ?name];

     foaf:focus [

       bio:event [dct:type [rdfs:label "reign"@en]; gvp:estStart ?start; gvp:estEnd ?end];

       gvp:biographyPreferred [schema:description ?bio]]

} order by ?start ?end

·        We need to order first by ?start then by ?end because some popes ruled under a year, so the next pope has the same ?start

·        If you examine the sequence of reigns, you may find some gaps. ULAN may not have records for all popes, and there is one (ulan:500324155 Pius VI) for which the reign is not recorded.

5.17   Pope Reign Durations

Let's chart the durations of Popes' reigns.

select ?dur (count(*) as ?c) {

  ?x gvp:agentTypePreferred [rdfs:label "popes"@en];

     foaf:focus [bio:event [dct:type [rdfs:label "reign"@en]; gvp:estStart ?start; gvp:estEnd ?end]].

  bind(xsd:integer(str(?end))-xsd:integer(str(?start)) as ?dur)

} group by ?dur order by ?dur

There are 10 popes that reigned less than a year (how sad!), and another 10 that reigned only a year. (Further 10 popes reigned for 8 years). This query produces the following chart using yasgui.org:

/doc/img/032-pope-reign-duration.png

It would be interesting to also chart duration vs age when the pope started his reign.

The average reign is 8.18 years:

select (avg(?dur) as ?avg) {

  ?x gvp:agentTypePreferred [rdfs:label "popes"@en];

     foaf:focus [bio:event [dct:type [rdfs:label "reign"@en]; gvp:estStart ?start; gvp:estEnd ?end]].

  bind(xsd:integer(str(?end))-xsd:integer(str(?start)) as ?dur)

}

5.18   Life Events

Let’s get all life events of Thomas Moran (American painter, 1837-1926):

select * {

  bind (ulan:500015257 as ?x).

  ?x foaf:focus [bio:event ?event].

  ?event dct:type [gvp:prefLabelGVP [xl:literalForm ?type]].

  bind(exists {?x foaf:focus [gvp:eventPreferred ?event]} as ?pref)

  optional {?event rdfs:comment ?comment}

  optional {?event gvp:displayOrder ?order}

  optional {?event gvp:estStart ?start}

  optional {?event gvp:estEnd ?end}

  optional {?event schema:location [gvp:prefLabelGVP [xl:literalForm ?place]]}

} order by ?order desc(?pref)

Please note that events having the same type (e.g. "active" also known as "floruit") often express different opinions about the event details (dates), not several actual instances. In such case you should display only the pref (Preferred) event. But ULAN offers no information which event types imply single-instance events.

5.19   Artists with Name, Bio, Nationality, Type

This is a basic query inspired by a question at the Google support forum. It uses a number of blank nodes […] to avoid unnecessary variables.

select ?x ?name ?bio ?nationality ?type {
  ?x gvp:broaderExtended ulan:500000002. # Persons, Artists
  optional {?x gvp:agentTypePreferred [gvp:prefLabelGVP [xl:literalForm ?type]]}
  optional {?x foaf:focus [gvp:nationalityPreferred [gvp:prefLabelGVP [xl:literalForm ?nationality]]]}
  optional {?x gvp:prefLabelGVP [xl:literalForm ?name]}
  optional {?x foaf:focus [gvp:biographyPreferred [schema:description ?bio]]}
}

6        Language Queries

6.1       Scientific Names by Language

"Scientific name" is the zoology/botany taxon name (genus, species, etc) and comes from Latin. However, all such terms are also emitted in English. The distribution of scientific names across languages can be obtained with this query:

select ?lang (count(*) as ?c) {

  ?t gvp:termKind <http://vocab.getty.edu/term/kind/ScientificOrTechnical>; dct:language ?l.

  ?l gvp:prefLabelGVP/xl:literalForm ?lang

} group by ?lang

There are a thousand in English and Latin, and a few odd names in Spanish, Italian, French.

6.2       Scientific Names not in English and Latin

Then I wondered which are the odd scientific names (neither English nor Latin). Find out with this query:

select ?c ?lab {

  ?c xl:prefLabel|xl:altLabel ?t.

  ?t gvp:termKind <http://vocab.getty.edu/term/kind/ScientificOrTechnical>;

     xl:literalForm ?lab.

  filter (lang(?lab) not in ("en", "la"))}

6.3       Find Terms by Language Tag

Say you want to find all Berber terms transliterated to Latin (no? I do this every day!):

select * {?subj xl:prefLabel|xl:altLabel [xl:literalForm ?term; dct:language gvp_lang:ber-Latn]}

SPARQL features:

·        Property path alternative "|"

·        Blank node since we don't care about the identity of the xl:Label but only about its literal form.

You could try this other variant, which is more comprehensive since it finds not only terms in Berber (e.g. "Zemmour"@ber-latn), but also in Berber Dialects (e.g. "Ait Youssi"@ber-Latn-x-dialect)

select * {

  ?subj xl:prefLabel|xl:altLabel [xl:literalForm ?term]

  filter(langMatches(lang(?term),"ber-Latn"))}

·        langMatches() does proper language comparison

·        You could try to use substr(lang()) or strstarts(lang()), but the comparison must be case-insensitive (see Language Tag Case).

In both cases, this second variant is much slower, since it has to check every label literal for matching language tag. Use the dct:language property of labels to your advantage!

6.4       Languages and ISO Codes

We use the Guide Term 300389738 <languages and writing systems by specific example> and the two specific sources described at Language Tags and Sources to return all languages together with their ISO2 and ISO3 language codes (where assigned):

select ?lang ?name ?iso2 ?iso3 {

  ?lang gvp:broader aat:300389738; gvp:prefLabelGVP/xl:literalForm ?name.

  optional {?lang xl:altLabel [gvp:term ?iso2; dct:source aat_source:2000075479]}

  optional {?lang xl:altLabel [gvp:term ?iso3; dct:source aat_source:2000075493]}}

Some observations on the ISO codes:

·        We use gvp:term to fetch the pure term, since xl:literalForm may include a parenthesized qualifier (see Term). E.g.:

·        aat_term:1000577208-en "san (language)" code of Sanscrit vs aat_term:1000016577-en "San (Khoisan-speaking peoples styles)"

·        aat_term:1000574141-en "adz (Adzera)" language code vs aat_term:1000294838-en "adz (cutting tool)"

·        Only the "more important" languages have alpha-2 codes (e.g. Abkhaz has one, but Abaza doesn't)

·        Out of 1883 languages, 1330 (70%) have an alpha-3 code. The rest are more exotic languages, variants (e.g. transliterated; liturgical; medieval), or modifications (e.g. GVP uses Frisian. ISO has defined codes for its predecessor Old Frisian and its dialects West, Saterland and North Frisian, but not for Frisian itself).

6.5       Language URLs

Find all "logical" language URLs (Language Dual URLs):

select * {?x owl:sameAs ?y FILTER(str(?x) > str(?y))} order by str(?x)

You must check "Expand results over equivalent URIs" in the SPARQL UI. Or use this direct query link.

6.6       Custom Language Tags

GVP has assigned some custom language tags, see GVP Language Tags. You can find them with this query:

select ?lang ?name {

  ?l gvp:broader aat:300389738; gvp:prefLabelGVP [xl:literalForm ?name].

  bind(strafter(str(?l),"http://vocab.getty.edu/language/") as ?lang)

  filter regex(?lang,"qqq|x-")

} order by ?lang

You must check "Expand results over equivalent URIs" in the SPARQL UI.

The query uses the fact that gvp_lang: URLs use the lang tag as local name, and that custom tags either start with "qqq" (e.g. qqq-ET for Ethiopian), or include "x-" (e.g. grc-Latn-x-liturgic for Liturgical Greek)

6.7       Count Terms by Language

Here is how to find the distribution of AAT terms by language.

select (count(*) as ?c) ?lang {

  [skos:inScheme aat:; xl:prefLabel|xl:altLabel [dct:language [gvp:prefLabelGVP/xl:literalForm ?lang]]]

} group by ?lang order by desc(?c)

This terse query can be understood better if we introduce some more variables, instead of using blank nodes.

select (count(*) as ?c) ?lang {

  ?concept skos:inScheme aat:; xl:prefLabel|xl:altLabel ?lab.

  ?lab dct:language ?lng.

  ?lng gvp:prefLabelGVP/xl:literalForm ?lang

} group by ?lang order by desc(?c)

As of Nov 2016, AAT has 109 languages. The ones with over 1k occurrences are en, zh, nl, es, de, fr, la, it:

c

lang

142266

English (language)@en

60460

Dutch (language)@en

54961

Spanish (language)@en

26226

Chinese (traditional) (language)@en

20460

German (language)@en

15298

Chinese (transliterated Hanyu Pinyin) (language)@en

15266

Chinese (transliterated Wade-Giles) (language)@en

15218

Chinese (transliterated Pinyin without tones) (language)@en

6500

French (language)@en

5480

American English (language)@en

3017

British English (language)@en

1981

Latin (language)@en

1750

Italian (language)@en

7        Counting and Descriptive Info

Counts can give you a better feel of the available semantic information.

7.1       Descriptive Info from VOID

As an alternative to getting the VOID descriptive info as a Turtle file (see VOID Deployment), you can query it in the repository. The following query returns all descriptive triples (same as the Turtle file):

select * {graph <http://vocab.getty.edu/.well-known/void> {?s ?p ?o}}

7.2       Number of Entities from VOID

Get the class partitions (number of entities per class), ordering by decreasing count:

select ?dataset ?class ?count

{?dataset void:classPartition [void:class ?class; void:entities ?count]}

order by desc(?count)

This uses the pre-calculated counts in the VOID descriptive info, so it's very fast.

7.3       Number of Sources

Sources in GVP are represented using the class bibo:Document. We can get their number using a specialization of the class partition query from the previous section:

select * {?dataset void:classPartition [void:class bibo:Document; void:entities ?count]}

The result is:

dataset

count

http://vocab.getty.edu/dataset/aat

42867

http://vocab.getty.edu/dataset/tgn

3281

http://vocab.getty.edu/dataset/ulan

61291

http://vocab.getty.edu/dataset

419588

How come the total for AAT, TGN and ULAN does not equal the number for the aggregated dataset (comprising the 3 vocabularies)? The reason is this:

·        Local Sources are represented explicitly as bibo:DocumentPart, but are also inferred to have class bibo:Document because that's the domain of bibo:locator

·        Ontotext GraphDB places all inferred statements in the default graph, which corresponds to http://vocab.getty.edu/dataset

So the latter number includes an excess of 312k local sources, i.e. bibo:DocumentPart

7.4       Associative Relations Count

Let's count AAT associative relations by type. All associative relations are sub-properties of skos:related, but rdfs:subPropertyOf is reflexive (i.e. skos:related is also considered a sub-property of itself). So we use a special property sesame:directSubPropertyOf to select only the proper sub-properties.

select ?p (count(*) as ?c) {

  ?x skos:inScheme aat:. ?y skos:inScheme aat:.

  ?x ?p ?y. ?p sesame:directSubPropertyOf skos:related

} group by ?p

If you dislike using the Sesame property, you can achieve the same (but a bit slower) with this query:

select ?p (count(*) as ?c) {

  ?x skos:inScheme aat:. ?y skos:inScheme aat:.

  ?x ?p ?y. ?p rdfs:subPropertyOf skos:related.

  filter (?p != skos:related)

} group by ?p

You may notice that the relations come in pairs: symmetric relations, having consecutive numbers (e.g. gvp:aat2205_causes-is_required and gvp:aat2206_caused_by-requires), having exactly the same count. The relations that don't have an inverse are symmetric (i.e. self-inverse), e.g. gvp:aat2110_meaning-usage_overlaps_with.

7.5       Number of AAT Revision Actions

Count of Revision History actions (Create, Modify, Publish) by ?action kind. We limit to aat: because the number including tgn: or ulan: is too large, and you'd need to wait 5-10 minutes.

select ?action (count(*) as ?c) {

  ?x skos:inScheme aat:; skos:changeNote ?y. ?y dc:type ?action

} group by ?action

7.6       TGN Top Place Types

Let's find the top 100 place types (looking only at preferred types). There's exactly one preferred type per place, and at most one prefLabel per language for each type. Let's get the prefLabelGVP (usually in EN) and prefLabel in NL:

select ?type ?en ?nl ?c {

  {select ?type (count(*) as ?c) {

     ?s gvp:placeTypePreferred ?type .

  } group by ?type order by desc(?c) limit 100}

  ?type gvp:prefLabelGVP [xl:literalForm ?en].

  optional {?type xl:prefLabel [xl:literalForm ?nl; dct:language gvp_lang:nl]}}

The most popular place types in TGN are: inhabited places, creeks, streams, lakes.

·        We first get the COUNT LIMIT 100 in a sub-query

·        Then fetch the labels of the returned types in the main query

7.7       ULAN Facet Counts

Counting the number of agents per facet is easy:

select ?x ?name (count(*) as ?c) {

  ?x a gvp:Facet; skos:inScheme ulan:; gvp:prefLabelGVP/xl:literalForm ?name.

  ?x skos:member ?y

} group by ?x ?name

7.8       ULAN Agents by Type

Let's count ULAN agents by type. Types are represented by AAT concepts, form a BTG hierarchy, and apply to both persons and groups.

select ?typ ?type (count(*) as ?c) {

  ?x gvp:agentType ?typ.

  ?typ gvp:prefLabelGVP/xl:literalForm ?type

} group by ?typ ?type order by desc(?c)

(To sort alphabetically, use ?type).

The top ULAN agent types are:

typ

type

c

aat:300025103

artists (visual artists)

124821

aat:300025136

painters (artists)

64867

aat:300024987

architects

50640

aat:300025181

sculptors

16232

aat:300312082

architectural firms

15384

aat:300025164

printmakers

12519

aat:300112172

draftsmen (artists)

9880

aat:300264595

repositories (institutions)

9593

An agent may have several agentTypes, so the above returns more than the total number of agents. If you want to count only by preferred type, use gvp:agentTypePreferred: there's only one per agent.

7.9       ULAN Agents by Nationality

Let's count ULAN agents by nationality. Nationalities are represented by AAT concepts and apply to both persons and groups.

select ?nat ?nationality (count(*) as ?c) {

  ?x schema:nationality ?nat.

  ?nat gvp:prefLabelGVP/xl:literalForm ?nationality

} group by ?nat ?nationality order by desc(?c)

The top nationalities in ULAN are:

nat

nationality

c

aat:300107956

American (North American)

46461

aat:300111159

British (modern)

33889

aat:300111188

French (culture or style)

23286

aat:300111192

German (culture or style)

23036

aat:300111198

Italian (culture or style)

22678

aat:300111215

Spanish (culture or style)

10278

aat:300111175

Dutch (culture or style)

9294

aat:300111178

English (culture or style)

7618

Similar to the previous section, this may return more than the total number of agents, since an agent may have several nationalities. If you want to count only by preferred nationality, use gvp:nationalityPreferred instead of schema:nationality.

Please note that ULAN Nationalities include any significant social grouping or designation of the agent: fact nationality, culture, race, ethnicity, religion, even sexual orientation.

7.10   ULAN Events by Type

Let's count ULAN life events by type:

select ?t ?type (count(*) as ?c) {

  ?x a bio:Event; dct:type ?t.

  ?t gvp:prefLabelGVP/xl:literalForm ?type

} group by ?t ?type order by desc(?c)

The top event types are as follows (remember that birth/death are not represented as events, but are reflected in Biographies):

t

type

c

Note

aat:300393177

active (professional function)

49430

the state of being professionally active

aat:300393211

location (activity or state)

23378

location (place of existence) of an agent

aat:300054766

exhibitions (events)

1863

when works by the artist were exhibited

aat:300393206

reigns

476

when a ruler reigned

aat:300054638

documentation (activity)

282

when the artist or his/her works were documented

aat:300026842

awards

179

 

aat:300055407

immigration

140

 

aat:300393178

flourished

123

time when the person has achieved full development or success.
For Non-artists, this is preferred instead of "active"

aat:300054360

education

101

 

7.11   Breakdown of Historic Relations

Breakdown of historic relations per type and per vocabulary:

select ?voc ?pred ?hist (count(*) as ?c) {

  [] gvp:historicFlag ?hist; rdf:subject [skos:inScheme ?voc]; rdf:predicate ?pred

} group by ?voc ?pred ?hist

Historic information on relations is represented as an rdf:Statement. We don't care about the identity of the statement so we use a blank node [].

·        We fetch the vocabulary of the source of the relation using rdf:subject, another bank node […], and skos:inScheme

·        We fetch the relation using rdf:predicate

The relations include:

·        hierarchical (broader*)

·        types (TGN placeType* and ULAN agentType*)

·        associative

See Historic Information on Relations for getting individual info.

7.12   Breakdown of Historic Terms

Breakdown of historic terms per vocabulary:

select ?voc ?hist (count(*) as ?c) {

  ?x skos:inScheme ?voc; (xl:prefLabel|xl:altLabel) ?l. ?l gvp:historicFlag ?hist

} group by ?voc ?hist

See Historic Information of Terms for getting individual info.

7.13   GraphDB SysInfo

The following query returns useful info about the GraphDB instance, including number of triples (total and explicit), storage space (used and free), commits, repository signature, build number of the software.

DESCRIBE <http://www.ontotext.com/SYSINFO> FROM <http://www.ontotext.com/SYSINFO>

8        Explore the Ontology

For easiest understanding, see the GVP Ontology documentation, or the ontology reference (namespace document). But you can also explore it with queries.

8.1       Ontology Classes and Properties

This returns the classes and properties defined by the ontology, together with some details:

select ?x ?type (coalesce(?descr,?label) as ?description) ?domain ?range {

  ?x rdfs:isDefinedBy <http://vocab.getty.edu/ontology>; a ?type.

  optional {?x dct:description ?descr}

  optional {?x rdfs:label ?label}

  optional {?x rdfs:domain ?domain}

  optional {?x rdfs:range ?range}}

·        Uncheck "Include inferred" so you don't get duplicate rows due to super-types.

·        coalesce() returns the more detailed dct:description if available, else rdfs:label

8.2       Ontology Values

This returns all values (skos:Concepts, mostly Term Characteristics) in small schemes defined by the ontology:

select * {

  ?x skos:inScheme [rdfs:isDefinedBy <http://vocab.getty.edu/ontology>; rdfs:label ?scheme];

     skos:prefLabel ?value; skos:scopeNote ?note; skos:example ?example}

Because the scheme URL is always a prefix of the value URL, we print the scheme's label instead. Some example results (the prefix http://vocab.getty.edu/ is omitted):

x

scheme

value

note

example

term/flag/Vernacular

Term Flag

Vernacular

Term is in the "vernacular" language

"Firenze" is the vernacular in Italian (TGN)

term/kind/Abbreviation

Term Kind

Abbreviation

Term is an abbreviation, initialism, or acronym

DVD, CD-ROM (AAT)

term/kind/CommonTerm

Term Kind

Common term

Preferred common language term. Used for subjects that also include a Scientific term

domestic cat (AAT)

term/kind/ScientificOrTechnical

Term Kind

Scientific or Technical term

A Scientific term

"Felis domesticus" is the scientific term for "cats" (AAT)

You can click on the value URLs (first column) and then the Object tab to explore terms having that characteristic. For example, there are 628 AAT terms that have Term Flag "Vernacular":

/doc/img/032-terms-Vernacular.png