Query for best match to a string with SPARQL? - string

I have a list with movie titles and want to look these up in DBpedia for meta information like "director". But I have trouble to identify the correct movie with SPARQL, because the titles sometimes don't exactly match.
How can I get the best match for a movie title from DBpedia using SPARQL?
Some problematic examples:
My List: "Die Hard: with a Vengeance" vs. DBpedia: "Die Hard with a Vengeance"
My List: "Hachi" vs. DBpedia: "Hachi: A Dog's Tale"
My current approach is to query the DBpedia endpoint for all movies and then filter by checking for single tokens (without punctuations), order by title and return the first result. E.g.:
SELECT ?resource ?title ?director WHERE {
?resource foaf:name ?title .
?resource rdf:type schema:Movie .
?resource dbo:director ?director .
FILTER (
contains(lcase(str(?title)), "die") &&
contains(lcase(str(?title)),"hard")
)
}
ORDER BY (?title)
LIMIT 1
This approach is very slow and also sometimes fails, e.g.:
SELECT ?resource ?title ?director WHERE {
?resource foaf:name ?title .
?resource rdf:type schema:Movie .
?resource dbo:director ?director .
FILTER (
contains(lcase(str(?title)), "hachi")
)
}
ORDER BY (?title)
LIMIT 10
where the correct result is on second place:
resource title director
http://dbpedia.org/resource/Chachi_420 "Chachi 420"#en http://dbpedia.org/resource/Kamal_Haasan
http://dbpedia.org/resource/Hachi:_A_Dog's_Tale "Hachi: A Dog's Tale"#en http://dbpedia.org/resource/Lasse_Hallström
http://dbpedia.org/resource/Hachiko_Monogatari "Hachikō Monogatari"#en http://dbpedia.org/resource/Seijirō_Kōyama
http://dbpedia.org/resource/Thachiledathu_Chundan "Thachiledathu Chundan"#en http://dbpedia.org/resource/Shajoon_Kariyal
Any ideas how to solve this problem? Or even better: How to query for best matches to a string with SPARQL in general?
Thanks!

I adapted the regex-approach mentioned in the comments and came up with a solution that works pretty well, better than anything I could get with bif:contains:
SELECT ?resource ?title ?match strlen(str(?title)) as ?lenTitle strlen(str(?match)) as ?lenMatch
WHERE {
?resource foaf:name ?title .
?resource rdf:type schema:Movie .
?resource dbo:director ?director .
bind( replace(LCASE(CONCAT('x',?title)), "^x(die)*(?:.*?(hard))*(?:.*?(with))*.*$", "$1$2$3") as ?match )
}
ORDER BY DESC(?lenMatch) ASC(?lenTitle)
LIMIT 5
It's not perfect, so I'm still open for suggestions.

Related

How to escape brackets in SPARQL string?

I'm trying to make a sparql query to: http://sparql.lynx-project.eu/
The graph: http://sparql.lynx-project.eu/graph/eurovoc
Which contains some entries with brackets in the prefLabel i.e. "sanction (EU)".
I'm trying to retrieve such exact match of such entries with:
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT ?c ?label
WHERE {
GRAPH <http://sparql.lynx-project.eu/graph/eurovoc> {
?c a skos:Concept .
?c ?p ?label.
FILTER regex(?label, "^sanction (EU)$", "i" )
FILTER (lang(?label) = "en")
FILTER (?p IN (skos:prefLabel, skos:altLabel ) )
}
}
It doesn't return anything. Also tried to escape the brackets with backslash but the query breaks. Do you know how to escape brackets in a sparql string?? thanks in advance!

SPARQL query not equivalent

How to do not equivalent query in protege:
I am trying to get spouse from my ontology and it seems like I get duplicates like mother is spouse to mother and I am trying to have query that checks that if they are the same then it will filter it out.
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX : <http://example.com/owl/families/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX aa: <http://example.com/owl/families#Maija>
SELECT distinct ?mother ?father
#CONSTRUCT {?mother :hasSpouse ?father .}
WHERE {
?mother :hasChild ?c .
?father :hasChild ?c .
?mother :notEqualto ?father .
}
Just after I posted this question I got the answer:
SELECT distinct ?mother ?father
WHERE {
?mother :hasChild ?c .
?father :hasChild ?c .
FILTER (?mother != ?father)
}

SPARQL DBpedia - Retrieve category information in any language by using labels

I have a problem, which I explain on following example:
I want to retrieve all information in any language on a category. I must use ?category as a label and the language labels en, as they are inputs in my program.
The query looks like this, but when I change the language I don't receive any information on the category. I know the problem lies in the dcterms:subject, because ?category returns http://dbpedia.org/resource/Category:Countries_in_Europe (see first example below).
For example to search for a category label in german you have to use http://de.dbpedia.org/resource/Kategorie:Staat_in_Europa (see second example below).
prefix dcterms: <http://purl.org/dc/terms/>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT DISTINCT ?objectLabel WHERE {
?subject dcterms:subject ?category ; rdfs:label ?objectLabel
?category rdfs:label "Countries in Europe"#en .
FILTER (LANG(?objectLabel)='en')
}
Equivalent query in different language that doesn't work as example:
prefix dcterms: <http://purl.org/dc/terms/>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT DISTINCT ?objectLabel WHERE {
?subject dcterms:subject ?category ; rdfs:label ?objectLabel
?category rdfs:label "Staat in Europa"#de .
FILTER (LANG(?objectLabel)='de')
}
Is there a similar or different way / method to solve the problem? Thanks in advance for any help.

How to obtain Bio2RDF resource using SPARQL query?

I am working with Bio2RDF biological database and would like to get resource URL for Novobiocin which is [http://bio2rdf.org/drugbank:DB01051 ] using SPARQL query.Using yasgui.org I selected http://drugbank.bio2rdf.org/sparql service and executed the following query:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dcterms: <http://purl.org/dc/terms/>
SELECT ?resource WHERE { ?resource dcterms:identifier "drugbank:DB01051"}
However, It did not retrieve the result. Could you tell me why it does not work, please ?

SPARQL how to deal with different cased queries?

I am still a bit new to SPARQL. I have set up a dbpedia endpoint for our company. I have no idea what the end user will be querying and, since DBpedia is case sensitive I pass both title case & uppercase versions for subjects vs something like a person; e.g. "Computer_programming" vs "Alcia_Keys". Rather than pass in 2 separate queries what is the most effecient way to achieve this? I've tried the IN operator (from this question) but I seem to be failing somewhere.
select ?label ?abstract where {
IN (<http://dbpedia.org/resource/alicia_keys>, <http://dbpedia.org/resource/Alicia_Keys>) rdfs:label ?label;
dbpedia-owl:abstract ?abstract.
}
LIMIT 1"""
since DBpedia is case sensitive I pass both title case & uppercase
versions for subjects vs something like a person; e.g.
"Computer_programming" vs "Alcia_Keys". Rather than pass in 2 separate
queries what is the most effecient way to achieve this?
URIs should be viewed as opaque. While DBpedia generally has some nice structure so that you can lucky by concatenating http://dbpedia.org/resource and some string with _ replacing , that's really not a very robust way to do something. A better idea is to note that the string you're getting is probably the same as a label of some resource, modulo variations in case. Given that, the best idea would be to look for something with the same label, modulo case. E.g.,
select ?resource where {
values ?input { "AliCIA KeYS" }
?resource rdfs:label ?label .
filter ( ucase(str(?label)) = ucase(?input) )
}
That's actually going to be pretty slow, though, because you'll have to find every resource, do some string processing on its label. It's an OK approach, in principle though.
What can be done to make it better? Well, if you know what kind of thing you're looking for, that will help a lot. E.g., you could restrict the query to Persons:
select distinct ?resource where {
values ?input { "AliCIA KeYS" }
?resource rdf:type dbpedia-owl:Person ;
rdfs:label ?label .
filter ( ucase(str(?label)) = ucase(?input) )
}
That's an improvement, but it's still not all that fast. It still, at least conceptually, has to touch each Person and examine their name. Some SPARQL endpoints support text indexing, and that's probably what you need if you want to do this efficiently.
The best option, of course, would be to simply ask your users for a little bit more information, and to normalize the data in advance. If your user provides "AliCIA KEyS", then you can do the normalization to "Alicia Keys"#en, and then do something ilke:
select distinct ?resource where {
values ?input { "Alicia Keys"#en }
?resource rdfs:label ?input .
}

Resources