Extract entities from text using Knowledge Bases in Python

Extract entities from text using Knowledge Bases in Python - nlp

I have an entity extraction tasks which needs KBs like wikidata, freebase, DBpedia. Given the huge size of them, it is hard to download and extract entities from them. Is there a python client which can make API calls to get the extractions through them with unstructured text as input?

For DBPedia at least, you can use DBPedia Spotlight, something like that:
spotlight_url = 'http://api.dbpedia-spotlight.org/en/annotate?'
params = dict(text="Barack Obama was a president", confidence='0.2', support='10')
headers = {'Accept':'application/json'}
resp = requests_retry_session().get(url=spotlight_url, params=params,headers=headers)
results = resp.json()
If you were to do loads of queries, you'd have a local install of the knowledge base in a triplestore and a local install of Spotlight too.

Related

Using paraview filters in Python, Paraview python api

I have been using Paraview to visualize and analyse VTU files. I find the calculate gradient filter quite useful. I would like to know if there is a python API for Paraview which I can use to use this filter.
I'm looking for something like this.
import paraview as pv
MyFile = "Myfile0001.vtu"
Divergence = pv.filters.GradientOfUnstructuredDataset.(Myfile)

ParaView is fully scriptable in python. Each part of this doc has a 'do it in python' version.
Whereas API doc does not necessary exist, you can use the Python Trace (in Tool menu), that records action from the GUI and save it as a python script.
EDIT
To get back data as an array, it needs some additional steps as ParaView works on a client/server mode. You should Fetch the data and then you can manipulate the vtkObject, extract the array and convert it to numpy.
Something like
from paraview.simple import *
from vtk.numpy_interface import dataset_adapter as dsa
gridvtu = XMLUnstructuredGridReader(registrationName='grid', FileName=['grid.vtu'])
gradient = GradientOfUnstructuredDataSet(registrationName='Gradient', Input=gridvtu)
vtk_grid = servermanager.Fetch(gradient)
wraped_grid = dsa.WrapObject(vtk_grid)
divergence_array = wraped_grid.PointData["Divergence"]
Note that divergence_array is a numpy.ndarray
You also can write pure vtk code, as in this example on SO

How to use companies-house 0.1.2 python API wrapper to get company filing history?

I recently learned that Companies House has API that allows access to companies filling history and I want to get data from the API and load it in pandas dataframe.
I have set up API account but I am having difficulties with the python wrapper companies-house 0.1.2 https://pypi.org/project/companies-house/
from companies_house.api import CompaniesHouseAPI
ch = CompaniesHouseAPI('my_api_key')
This works, but when I try to get the data with get_company or get_company_filing_history I seem to pass incorrect parameters. I tried passing CompaniesHouseAPI.get_company('02627406') but get KeyError: 'company_number'. Quite puzzled as there is no example provided in the documentation. Please help me figure out what should I pass as a parameter/parameters in both functions.
# what errors
CompaniesHouseAPI.get_company('02627406')
I am not a python expert but want to learn by doing interesting projects. Please help. If you know how to get financial history from Companies House API using another python wrapper your solution is also welcome.

I recently wrote a blog post describing how to make your own wrapper and then use that to create an application that loads the data into a pandas dataframe as you described. You can find it here.
By creating your own wrapper class, you avoid the limitations of whichever library you have chosen. You may also learn a lot about calling an API from python and working with the response.
Here is a code example that does not need a Companies House-specific library.
import requests
import json
url = "https://api.companieshouse.gov.uk/search/companies?q={}"
query = "tesco"
api_key = "vLmk-4YxYS-QH8nMi8767zJSlcPlo3MKn41-d" #Fake key - insert your key here
response = requests.get(url.format(query),auth=(api_key,''))
json_search_result = response.text
search_result = json.JSONDecoder().decode(json_search_result)
for company in search_result['items']:
print(company['title'])
Running this should give you the top 20 matches for the keyword "tesco" from the Companies House search function. Check out the blog post to see how you could adapt this to perform any function from the API.

Random tweet collecting without keyword

I want a script code to collecting random tweet from Chicago without any keyword that every 30 min run automatically and collect tweet for 20 millisecond (for example)
All Available codes need keywords and in most of them I can't define geographic location.
Thanks for your helps.

See these pages : An Introduction to Text Mining using Twitter Streaming API and Python and this page too run a python script every hour

This is very doable. With Twitter's REST API a keyword is required; however, Twitter also provides a Streaming API which can use either a keyword or a location to filter tweets. In your case, you would need to define the bounding box of of Chicago in longitudes and latitudes. Then supply this to Twitter's statuses/filter endpoint documented here: https://developer.twitter.com/en/docs/tweets/filter-realtime/api-reference/post-statuses-filter.html. This endpoint has a locations parameter that you would use. It returns tweets as they are posted. No timer required.
You can use tweepy for this. Or, with TwitterAPI you would simply do something like this:
from TwitterAPI import TwitterAPI
api = TwitterAPI(CONSUMERKEY,CONSUMERSECRET,ACCESSTOKENKEY,ACCESSTOKENSECRET)
r = api.request('statuses/filter', {'locations':'-87.9,41.6,-87.5,42.0'})
for item in r:
print(item)

Using quantopian for data analysis

I want to know were Quantopian gets data from?
If I want to do an analysis on a stock market other than NYSE, will I get the data? If not, can I manually upload the data so that I can run my algorithms on it.

1.) Quantopian gets its data from several places, and provides most online although some are premium and require subscription.
2.) Yes, you can get standard stock market data, but if you have something like a Bloomberg, other subscription or something else you've built and want to pull it in, you can use fetcher.
The basic code is:
fetch_csv(url, pre_func=None, post_func=None, date_column='date',
date_format='%m/%d/%y', timezone='UTC', symbol=None, **kwargs)
Here is an example for something like Dropbox:
def initialize(context):
# fetch data from a CSV file somewhere on the web.
# Note that one of the columns must be named 'symbol' for
# the data to be matched to the stock symbol
fetch_csv('https://dl.dropboxusercontent.com/u/169032081/fetcher_sample_file.csv',
date_column = 'Settlement Date',
date_format = '%m/%d/%y')
context.stock = symbol('NFLX')
def handle_data(context, data):
record(Short_Interest = data.current(context.stock, 'Days To Cover'))

You can get data for non-NYSE stocks as well like Nasdaq securities. Screens are also available by fundamentals(market, exchange, market cap). These screens can limit stocks analyzed from the broad universe.

You can get stock data from Yahoo or other quant sites.

Python: Universal XML parser

I'm trying to make simple Python 3 program to read weather information from XML web source, convert it into Python-readable object (maybe dictionary) and process it (for example visualize multiple observations into graph).
Source of data is national weather service's (direct translation) xml file at link provided in code.
What's different from typical XML parsing related question in Stack Overflow is that there are repetitive tags without in-tag identificator (<station> tags in my example) and some with (1st line, <observations timestamp="14568.....">). Also I would like to try parse it straight from website, not local file. Of course, I could create local temporary file too.
What I have so far, is simply loading script, that gives string containing xml code for both forecast and latest weather observations.
from urllib.request import urlopen
#Read 4-day forecast
forecast= urlopen("http://www.ilmateenistus.ee/ilma_andmed/xml/forecast.php").read().decode("iso-8859-1")
#Get current weather
observ=urlopen("http://www.ilmateenistus.ee/ilma_andmed/xml/observations.php").read().decode("iso-8859-1")
Shortly, I'm looking for as universal as possible way to parse XML to Python-readable object (such as dictionary/JSON or list) while preserving all of the information in XML-file.
P.S I prefer standard Python 3 module such as xml, which I didn't understand.

Try xmltodict package for simple conversion of XML structure to Python dict: https://github.com/martinblech/xmltodict

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Extract entities from text using Knowledge Bases in Python - nlp

I have an entity extraction tasks which needs KBs like wikidata, freebase, DBpedia. Given the huge size of them, it is hard to download and extract entities from them. Is there a python client which can make API calls to get the extractions through them with unstructured text as input?

Related

Using paraview filters in Python, Paraview python api

How to use companies-house 0.1.2 python API wrapper to get company filing history?

Random tweet collecting without keyword

Using quantopian for data analysis

Python: Universal XML parser

Categories

Resources