Let “Weather” be one of these: Rainy, Sunny, Cloudy.
I can create an Alloy model that says: “weather” is a relation between City and one Weather.
sig Forecast {weather: City -> one Weather}
sig City, Weather {}
one sig Rainy, Sunny, Cloudy extends Weather {}
Here’s a sample instance:
Boston – Sunny
Seattle – Cloudy
Miami – Sunny
Given that model, I should be able to assert: Every City has Weather.
assert Every_city_has_weather {
all forecast: Forecast | all city: City | one forecast.weather[city]
}
I can then ask the Alloy Analyzer to check the assert:
check Every_city_has_weather
The Analyzer returns the expected result: No counterexample found
Excellent.
Now I would like to assert that there may be a Weather for which no City has that weather. In the example above, no City has the value Rainy.
I am having difficulty expressing this. I tried this: There is some w: Weather such that there is no City when joining the weather relation with w. Here’s the Alloy assert:
assert A_weather_may_not_be_in_any_city {
all forecast: Forecast | some w: Weather | no forecast.weather.w
}
Then I asked the Alloy Analyzer to check my assertion:
check A_weather_may_not_be_in_any_city
The Analyzer responded with a counterexample (it showed an instance where each Weather value is mapped to a City).
Apparently my logic is not right. Can you think of the right logic for expressing this?
If you want to see that some instance exists, you should use a run and not a check statement. An assert says something is true of every instance.
Given that you want to say "There is some w: Weather such that there is no City when joining the weather relation with w", I'd suggest expressing this very directly:
some w: Weather | no c: City | ...
Related
I want to get the latitude and longitude of the companies listed in a dataframe already cleaned but the only information that I have is the name of the company and the country (In this case just UK).
DataFrame
After trying different things I have got some of the lats and longs but not the ones located in UK in most of the cases.
This is the code I tried:
base_url= "https://maps.googleapis.com/maps/api/geocode/json?"
AUTH_KEY = "AI**************QTk"
geolocator = GoogleV3(api_key = AUTH_KEY)
parameters = {"address": "Revolut, London",
"key": AUTH_KEY}
print(f"{base_url}{urllib.parse.urlencode(parameters)}")
r = requests.get(f"{base_url}{urllib.parse.urlencode(parameters)}")
data = json.loads(r.content)
data.get("results")[0].get("geometry").get("location") #That works for the first company
df["loc"] = df["Company name for communication"].apply(geolocator.geocode)
df["point"]= df["loc"].apply(lambda loc: tuple(loc.point) if loc else None)
df[['lat', 'lon', 'altitude']] = pd.DataFrame(df['point'].to_list(), index=df.index)
DataFrame with long and lat wrong
I would agree so much any help. Let me know if my explanation is not clear to provide more details. Thank you!
If you are only trying to get Geocoding API results in the UK, then you would want to make use of component filtering.
The Geocoding API can return address results restricted to a specific area. You can specify the restriction using the components filter. For more information, see Component Filtering. Specifically, you would want to include the country.
Note that the value should be a country name or a two letter ISO 3166-1 country code. The API follows the ISO standard for defining countries, and the filtering works best when using the corresponding ISO code of the country. For example
Here is a sample Geocoding web request with country components filtering in the UK looks like:
https://maps.googleapis.com/maps/api/geocode/json?address=high+st+hasting&components=country:gb&key=YOUR_API_KEY
This will only return a result that is only located in the UK, and will return zero results if not available.
You may also want to take a look at region biasing.
Note that if you bias for the region, the returned result prefers results in the country, but doesn't restrict them to that country and will return a result for an address. Unlike component filtering, this takes a ccTLD (country code top-level domain) argument specifying the region bias. Most ccTLD codes are identical to ISO 3166-1 codes, with some notable exceptions. For example, the United Kingdom's ccTLD is "uk" (.co.uk) while its ISO 3166-1 code is "gb" (technically for the entity of "The United Kingdom of Great Britain and Northern Ireland").
Please also take a look at the Geocoding API Best Practices
I have got the results using the component filtering with this code:
#Get the location of first company
base_url= "https://maps.googleapis.com/maps/api/geocode/json?"
AUTH_KEY = "AI********************Tk"
geolocator = GoogleV3(api_key = AUTH_KEY)
components = [ ('country', 'GB' )]
def get_location(x):
return geolocator.geocode(x, components=components)
df["loc"] = df["Company name for communication"].apply(get_location)
df["point"]= df["loc"].apply(lambda loc: tuple(loc.point) if loc else None)
df[['lat', 'lon', 'altitude']] = pd.DataFrame(df['point'].to_list(), index=df.index)
df
DataFrame with lat and lon
I am looking to do the opposite of what has been done here:
import re
text = '1234-5678-9101-1213 1415-1617-1819-hello'
re.sub(r"(\d{4}-){3}(?=\d{4})", "XXXX-XXXX-XXXX-", text)
output = 'XXXX-XXXX-XXXX-1213 1415-1617-1819-hello'
Partial replacement with re.sub()
My overall goal is to replace all XXXX within a text using a neural network. XXXX can represent names, places, numbers, dates, etc. that are in a .csv file.
The end result would look like:
XXXX went to XXXX XXXXXX
Sponge Bob went to Disney World.
In short, I am unmasking text and replacing it with a generated dataset using fuzzy.
You can do it using named-entity recognition (NER). It's fairly simple and there are out-of-the-shelf tools out there to do it, such as spaCy.
NER is an NLP task where a neural network (or other method) is trained to detect certain entities, such as names, places, dates and organizations.
Example:
Sponge Bob went to South beach, he payed a ticket of $200!
I know, Michael is a good person, he goes to McDonalds, but donates to charity at St. Louis street.
Returns:
Just be aware that this is not 100%!
Here are a little snippet for you to try out:
import spacy
phrases = ['Sponge Bob went to South beach, he payed a ticket of $200!', 'I know, Michael is a good person, he goes to McDonalds, but donates to charity at St. Louis street.']
nlp = spacy.load('en')
for phrase in phrases:
doc = nlp(phrase)
replaced = ""
for token in doc:
if token in doc.ents:
replaced+="XXXX "
else:
replaced+=token.text+" "
Read more here: https://spacy.io/usage/linguistic-features#named-entities
You could, instead of replacing with XXXX, replace based on the entity type, like:
if ent.label_ == "PERSON":
replaced += "<PERSON> "
Then:
import re, random
personames = ["Jack", "Mike", "Bob", "Dylan"]
phrase = re.replace("<PERSON>", random.choice(personames), phrase)
I am trying to read a wiki page, collect and enumerate all sentences.
#read the wiki page
import wikipedia
eliz = wikipedia.page("Elizabeth II")
fullText2=eliz.content
m = re.split('(?<!\w\.\w.)(?<![A-Z][a-z]\.)(?<=\.|\?)(\s|[A-Z].*)',fullText2)
docs=[]
for i in m:
print (i)
docs.append(i)
But it seems it doesn't work properly to split sentences:
for example I get this in the prints as a whole!:
"Elizabeth received private tuition in constitutional history from
Henry Marten, Vice-Provost of Eton College, and learned French from a
succession of native-speaking governesses. A Girl Guides company, the
1st Buckingham Palace Company, was formed specifically so she could
socialise with girls her own age. Later, she was enrolled as a Sea
Ranger.In 1939, Elizabeth's parents toured Canada and the United
States. As in 1927, when her parents had toured Australia and New
Zealand, Elizabeth remained in Britain, since her father thought her
too young to undertake public tours. Elizabeth "looked tearful" as her
parents departed. They corresponded regularly, and she and her parents
made the first royal transatlantic telephone call on 18 May."
I am trying to build a topic hierarchy by following the below mentioned two DBpedia properties.
skos:broader property
dcterms:subject property
My intention is to given the word identify the topic of it. For example, given the word; 'suport vector machine', I want to identify topics from it such as classification algorithm, machine learning etc.
However, sometimes I am bit confused as how to build a topic hierarchy as I am getting more than 5 URIs for subject and many URIs for broader properties. Is there a way to measure strength or something and reduce the additional URIs that I get from DBpedia and to assign only the highest probable URI?
It seems there are two questions there.
How to limit the number of DBpedia Spotlight results.
How to limit the number of subjects and categories for a particular result.
My current code is as follows.
from SPARQLWrapper import SPARQLWrapper, JSON
import requests
import urllib.parse
## initial consts
BASE_URL = 'http://api.dbpedia-spotlight.org/en/annotate?text={text}&confidence={confidence}&support={support}'
TEXT = 'First documented in the 13th century, Berlin was the capital of the Kingdom of Prussia (1701–1918), the German Empire (1871–1918), the Weimar Republic (1919–33) and the Third Reich (1933–45). Berlin in the 1920s was the third largest municipality in the world. After World War II, the city became divided into East Berlin -- the capital of East Germany -- and West Berlin, a West German exclave surrounded by the Berlin Wall from 1961–89. Following German reunification in 1990, the city regained its status as the capital of Germany, hosting 147 foreign embassies.'
CONFIDENCE = '0.5'
SUPPORT = '120'
REQUEST = BASE_URL.format(
text=urllib.parse.quote_plus(TEXT),
confidence=CONFIDENCE,
support=SUPPORT
)
HEADERS = {'Accept': 'application/json'}
sparql = SPARQLWrapper("http://dbpedia.org/sparql")
all_urls = []
r = requests.get(url=REQUEST, headers=HEADERS)
response = r.json()
resources = response['Resources']
for res in resources:
all_urls.append(res['#URI'])
for url in all_urls:
sparql.setQuery("""
SELECT * WHERE {<"""
+url+
""">skos:broader|dct:subject ?resource
}
""")
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
for result in results["results"]["bindings"]:
print('resource ---- ', result['resource']['value'])
I am happy to provide more examples if needed.
It seems you are trying to retrieve Wikipedia categories relevant to a given paragraph.
Minor suggestions
First, I'd suggest you to perform a single request, collecting DBpedia Spotlight results into VALUES, for example, in this way:
values = '(<{0}>)'.format('>) (<'.join(all_urls))
Second, if you are talking about topic hierarchy, you should use SPARQL 1.1 property paths.
These two suggestions are slightly incompatible. Virtuoso is very inefficient, when a query contains both multiple starting points (i. e. VALUES) and arbitrary length paths (i. e. * and + operators).
Here below I'm using the dct:subject/skos:broader property path, i.e. retrieving the 'next-level' categories.
Approach 1
The first way is to order resources by their general popularity, e. g. their PageRank:
values = '(<{0}>)'.format('>) (<'.join(all_urls))
sparql.setQuery(
"""PREFIX vrank:<http://purl.org/voc/vrank#>
SELECT DISTINCT ?resource ?rank
FROM <http://dbpedia.org>
FROM <http://people.aifb.kit.edu/ath/#DBpedia_PageRank>
WHERE {
VALUES (?s) {""" + values +
""" }
?s dct:subject/skos:broader ?resource .
?resource vrank:hasRank/vrank:rankValue ?rank.
} ORDER BY DESC(?rank)
LIMIT 10
""")
Results are:
dbc:Member_states_of_the_United_Nations
dbc:Country_subdivisions_of_Europe
dbc:Republics
dbc:Demography
dbc:Population
dbc:Countries_in_Europe
dbc:Third-level_administrative_country_subdivisions
dbc:International_law
dbc:Former_countries_in_Europe
dbc:History_of_the_Soviet_Union_and_Soviet_Russia
Approach 2
The second way is to calculate category frequency a given text...
values = '(<{0}>)'.format('>) (<'.join(all_urls))
sparql.setQuery(
"""SELECT ?resource count(?resource) AS ?count WHERE {
VALUES (?s) {""" + values +
""" }
?s dct:subject ?resource
} GROUP BY ?resource
# https://github.com/openlink/virtuoso-opensource/issues/254
HAVING (count(?resource) > 1)
ORDER BY DESC(count(?resource))
LIMIT 10
""")
Results are:
dbc:Wars_by_country
dbc:Wars_involving_the_states_and_peoples_of_Europe
dbc:Wars_involving_the_states_and_peoples_of_Asia
dbc:Wars_involving_the_states_and_peoples_of_North_America
dbc:20th_century_in_Germany
dbc:Modern_history_of_Germany
dbc:Wars_involving_the_Balkans
dbc:Decades_in_Germany
dbc:Modern_Europe
dbc:Wars_involving_the_states_and_peoples_of_South_America
With dct:subject instead of dct:subject/skos:broader, results are better:
dbc:Former_polities_of_the_Cold_War
dbc:Former_republics
dbc:States_and_territories_established_in_1949
dbc:20th_century_in_Germany_by_period
dbc:1930s_in_Germany
dbc:Modern_history_of_Germany
dbc:1990_disestablishments_in_West_Germany
dbc:1933_disestablishments_in_Germany
dbc:1949_establishments_in_West_Germany
dbc:1949_establishments_in_Germany
Conclusion
Results are not very good. I see two reasons: DBpedia categories are quite random, tools are quite primitive. Perhaps it is possible to achieve better results, combining approaches 1 and 2. Anyway, experiments with a large corpus are needed.
I'm trying to take a sentence and extract the relationship between Person(PER) and Place(GPE).
Sentence: "John is from Ohio, Michael is from Florida and Rebecca is from Nashville which is in Tennessee."
For the final person, she has both a city and a state that could get extracted as her place. So far, I've tried using nltk to do this, but have only been able to extract her city and not her state.
What I've tried:
import re
from nltk import ne_chunk, pos_tag, word_tokenize
from nltk.sem.relextract import extract_rels, rtuple
sentence = "John is from Ohio, Michael is from Florida and Rebecca is from Nashville which is in Tennessee."
chunked = ne_chunk(pos_tag(word_tokenize(sentence)))
ISFROM = re.compile(r'.*\bfrom\b.*')
rels = extract_rels('PER', 'GPE', chunked, corpus = 'ace', pattern = ISFROM)
for rel in rels:
print(rtuple(rel))
My output is:
[PER: 'John/NNP'] 'is/VBZ from/IN' [GPE: 'Ohio/NNP']
[PER: 'Michael/NNP'] 'is/VBZ from/IN' [GPE: 'Florida/NNP']
[PER: 'Rebecca/NNP'] 'is/VBZ from/IN' [GPE: 'Nashville/NNP']
The problem is Rebecca. How can I extract that both Nashville and Tennesee are part of her location? Or even just Tennessee alone?
It seems to me that you have to first extract intra-location relationship (Nashville in Tennessee). Then ensure that you transitively assign all locations to Rebecca (if Rebecca is in Nashville and Nashville is in Tennessee then Rebecca is in Nashville and Rebecca is in Tennessee).
That would be one more relationship type and some logic for the above inference (things get complicated pretty quickly but it is hard to avoid it).