Get dbpedia link of entity using stanford NER - nlp

I am trying to find entities from text using stanford NER. It is working fine so far. Now I want to find the dbpedia link of the entities.
I have seen it is available in alchemy API.
Is it possible to find the dbpedia links of entities using stanford NER?

Normally all the entities in Dbpedia have rdfs:label that is a string assigned to the entity. Therefore, when you are faced with a name extracted by your NER, you can use it for filtering purposes. The following example will provide he URI of all the entities that have label Sulfuric acid:
select distinct *
where {
?URI rdfs:label ?name.
filter(str(?name)="Sulfuric acid")
}
However, labels are not always what you seek, you sometimes need to actually look for the name assigned to your URI. For example, if you open sulfuric acid page, you can see that it contains dbpprop:iupacname. As a result you need to change the query to:
select distinct *
where {
?URI dbpprop:iupacname ?name.
filter(str(?name)="Sulfuric acid")
}
In this particular example the result sets are the same. But imagine you are tasked with finding London then you need to change your property to foaf:name and when running both the following queries, the result sets are quite different.
select distinct *
where {
?URI rdfs:label ?name.
filter(str(?name)="London")
}
this contains 8 results while the following query contains 21 results.
select distinct *
where {
?URI foaf:name ?name.
filter(str(?name)="London")
}
So my point is that you need to decide if you want to use labels or names. And if you decide to use names, you need to find the appropriate property to write a SPARQL query. After that, you just need a method to access DBpedia with your query.

You can use Stanford NER to extract the entity names and DBpedia Spotlight to link to the DBpedia URIs.

Related

Sparql query for searching specific label

I want a lookup service for Wikidata similiar to what DBpedia lookup (https://lookup.dbpedia.org/) is for dbpedia but for Wikidata.
Because I didn't find any I try to create a query which searches the labels of Wikidata.
Here is my problem. I have this object https://www.wikidata.org/wiki/Q67639471 (Standards of Conduct Committee - Fourth Assembly)
The following Sparql query should find the object shouldn't it?
select distinct ?o ?oLabel where {
?o rdfs:label ?oLabel.
filter(contains(?oLabel, "Standards of Conduct Committee - Fourth Assembly"#en)).
}
But the query times out everytime. When I add the line ?o wdt:P31 wd:Q865588.(?o is an instance of comitee) then it finds it.
Why doesn't the query find the exiting object?
Does anybody know how to make or find such a lookup service?

finding organization and industry/sector from string in dbpedia

I am generating a short list of 10 to 20 strings which I want to lookup on dbpedia to see if they have an organization tag and if so return the industry/sector tag. I have been looking at the SPARQLwrapper queries on their website but am having trouble constructing one that returns organization and sector/industry for my string. Is there a way to do this?
If I use the code below I get a list of industry types I think rather than the industry of the company.
from SPARQLWrapper import SPARQLWrapper, JSON
sparql = SPARQLWrapper("http://dbpedia.org/sparql")
sparql.setQuery("""
SELECT ?industry WHERE
{ <http://dbpedia.org/resource/IBM> a ?industry}
""")
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
Instead of looking at queries which are meant to help you understand the querying tool, you should start by looking at the data which is being queried. For instance, just click http://dbpedia.org/resource/IBM, and look at the properties (the left hand column) to see its rdf:type values (of which there are MANY)!
Note that IBM is not described as a ?industry. IBM is described as a <http://dbpedia.org/resource/Public_company> (among other things). On the other hand, IBM is also described as having three values for <http://dbpedia.org/ontology/industry> --
<http://dbpedia.org/resource/Cloud_computing>
<http://dbpedia.org/resource/Information_technology>
<http://dbpedia.org/resource/Cognitive_computing>
I don't know whether these are what you're actually looking for or not, but hopefully what I've done above will start you down the right path to whatever you do want to get out of DBpedia.

CrossRef API query.affiliation for a multi word string

I'm learning to use the CROSSREF rest API, and want to search for all DOI's with affiliation to a given university. How do I create a multi word search query for AND, not AND/OR?
There are over 100 million DOI's on CrossRef, but only 8217730 have authors affiliation metadata.
https://api.crossref.org/works?sample=10&filter=has-affiliation:true
Now if I want to query search for affiliations with "University of Southern Mississippi," I could use mississippi+university+southern, but this return AND/OR the three words.
I want only affiliations with all three words.
https://api.crossref.org/works?sample=10&query.affiliation=mississippi+university+southern
This returns all with the word university or southern or mississippi
CrossRef field query instructions are here
https://github.com/CrossRef/rest-api-doc#field-queries
and a github comment about the topic is here
https://github.com/CrossRef/rest-api-doc/commit/a4d047e0d1556e80aaab0f4b5aae420da2a99ea2 and here https://github.com/CrossRef/rest-api-doc/issues/170
The problem is the sample parameter in your URI. The sample parameter gives you random result.
Being able to select random results is useful for both testing and sampling. You can use the sample parameter to retrieve random results.
API description for Sample
So if you want to query just for all results of '' then use the URI without the sample part. Like:
https://api.crossref.org/works?query.affiliation=mississippi+university+southern
CrossRef API doesn’t support Boolean operators. The query.affiliation path returns multiple results with scores. You will see the highest score in the resulting json has a key: value out of ‘chosen: true’.
I believe that they have recently transitioned from Solr to Elasticsearch - https://www.crossref.org/blog/behind-the-scenes-improvements-to-the-rest-api/ - which supports Boolean search - https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html - but I’m not sure how to use it on CrossRef API endpoints not if you even can.

Neo4j quick lookup of string within property of many nodes

I'm using neo 2.2.2 and i'm currently using Regex search to find a string in the name property over 600k nodes.
Each node is structured with a minimum of the following two properties.
{
name: 'some string of text',
sid: 12345
}
I've created an index on name and another index on sid. Lookups on sid are very fast. Searches [using regex] are very slow. Currently I'm searching for a string with a * before and after.
What can be done with neo to make searching for string within a property very fast?
If doing something special within neo is not ideal, I could theoretically standup some supplementary algorithm/service separate from Neo4j that searches for a string value within the name property, and then gives me the sid, which then is used to look up the node within neo.
Help me do fast string search with neo4j, please. :)
You can use legacy fulltext indexing to speed up your search. This blog shows you how.
In general Regexes are very expensive. From my point of view, you should find another solution for that.
Could you please tell us more about your use case and why you want to use Regex?
One solution for that you already suggest. Store SID and Name in another format (or database), which has better performance for Regex searching than Neo4j.
Or do some analysis of name property content and base on that create representation of the content as a graph.
e.g.
* Node for a count of letters in name property
* Node for starting letter
* Split name property to multiple properties
* etc...

Which DBpedia dataset dump contains dbpedia ontology labels?

My aim is to be able to perform the following kind of queries within Virtuoso, without being dependent on whether DBpedia's SPARQL endpoint is currently up or not:
SELECT ?label WHERE {
?prop rdfs:domain <http://dbpedia.org/ontology/SpaceMission> .
?prop rdfs:label ?label .
FILTER (lang(?label) = 'en')
}
The problem is that I cannot seem to be able to identify the correct data set to download that would also include the labels of these properties. I currently have the mapping-based properties installed: http://downloads.dbpedia.org/3.9/en/mappingbased_properties_en.nt.bz2 and I do get the properties but there are no property labels there.
Is there another installation I am missing or is the only way to query that data from DBpedia's live endpoint?

Resources