Why I can not extract data from this dictionary though a loop? - python-3.x

I'm extracting from data which is of type dictionary.
import urllib3
import json
http = urllib3.PoolManager()
url = 'https://raw.githubusercontent.com/leanhdung1994/BigData/main/fr-esr-principaux-etablissements-enseignement-superieur.json'
f = http.request('GET', url)
data = json.loads(f.data.decode('utf-8'))
data[0]["geometry"]["coordinates"]
geo = []
n = len(data)
for i in range(n):
geo.append(data[i]["geometry"]["coordinates"])
It returns an error
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-26-52e67ffdcaa6> in <module>
12 n = len(data)
13 for i in range(n):
---> 14 geo.append(data[i]["geometry"]["coordinates"])
KeyError: 'geometry'
This is weird, because, when I only run data[0]["geometry"]["coordinates"], it returns [7.000275, 43.58554] without error.
Could you please elaborate on this issue?

Error is occuring because in few of the response dictionaries you don't habe "geometry" key.
Check before appending to geo list, that "geometry" key exists in response dict.
Try following code.
import urllib3
import json
http = urllib3.PoolManager()
url = 'https://raw.githubusercontent.com/leanhdung1994/BigData/main/fr-esr-principaux-etablissements-enseignement-superieur.json'
f = http.request('GET', url)
data = json.loads(f.data.decode('utf-8'))
geo = []
n = len(data)
for i in range(n):
if "geometry" in data[i]:
geo.append(data[i]["geometry"]["coordinates"])
print(geo)

I believe the problem is that there are places in your data which do not have a "geography" key. As a preliminary matter, your data structure is not technically a dictionary. It is a 'list' of 'dictionaries'. You can tell that by using the print(type(data)) and print(type(data[0]) commands.
I took your code but added the following lines:
dataStructure = data[0]
print(type(dataStructure))
geo = []
n = len(data)
for i in range(321):
try:
geo.append(data[i]["geometry"]["coordinates"])
except:
print(i)
If you run this, you will see that at index positions 64 and 130, there is no geometry key. You may want to explore those entries specifically and see whether they should be removed from your data or whether you just need to alter the keyword to something else for those lines.

Related

chunked list throws 'zip argument #2 must support iteration' when creating multiple dicts

I have an issue converting a chunked list into multiple dictionaries in order to send my request batched:
fd = open(filename, 'r')
sqlFile = fd.read()
fd.close()
commands = sqlFile.split(';')
for command in commands:
try:
c = conn.cursor()
c.execute(command)
// create a list with the query results with batches of size 100
for batch in grouper(c.fetchall(),100):
// This is where the error occurs:
result = [dict(zip([key[0] for key in c.description], i)) for i in batch]
# TODO: Send the json with 100 items to API
except RuntimeError:
print('Error.')
The issue is that it only iterates through the batches once and gives the following error. Actually, the number of rows are 167. So there should be a result of 100 items to be sent in a first request, while the second iteration should contain 67 items to be sent in a second request.
TypeError: zip argument #2 must support iteration
I solved the issue by making a dictionary right away with c.rowfactory = makeDictFactory(c):
def makeDictFactory(cursor):
columnNames = [d[0] for d in cursor.description]
def createRow(*args):
return dict(zip(columnNames, args))
return createRow
def getAndConvertDataFromDatabase:(filename)
fd = open(filename, 'r')
sqlFile = fd.read()
fd.close()
commands = sqlFile.split(';')
for command in commands:
try:
c = conn.cursor()
c.execute(command)
c.rowfactory = makeDictFactory(c)
data = c.fetchall()
for batch in [data[x:x+100] for x in range(0, len(data), 100)]:
return postBody(json.dumps(batch,default = myconverter), dataList[filename])
except RuntimeError:
print('Error.')

OSMNX Looping with geocoder

What would be the best way to loop through a list of addresses using geocoder.geocode if there are some locations that don't exist on the map? How could I skip them so that the loop continues without this exception? Exception: Nominatim geocoder returned no results for query "Beli manastir planina,Bakar,Croatia"
Below is what I have tried.
L = [location1, location2, location3, ..., location n]
KO = []
for l in L:
KO = list(ox.geocoder.geocode(l))
if Exception:
continue
KO.append(KO)
Also I tried this:
try:
KO = []
for l in L:
KO = list(ox.geocoder.geocode(l))
KO.append(KO)
except Exception:
pass
Any help is appreciated.
Nest your for loop and try/except differently. Here's a minimal reproducible example:
import osmnx as ox
ox.config(log_console=True, use_cache=True)
locations= ['Pazinska,Zagreb,Croatia',
'Pulska, Pazin,Croatia',
'Zadarska,Zagreb,Croatia',
'Pazinska,Pula,Croatia']
coords = []
for location in locations:
try:
coords.append(ox.geocoder.geocode(location))
except Exception as e:
print(e)

Python: How to obtain desired list?

I'm trying to learn Spark so I'm totally new to it.
I have a file with thousands of lines where each one is structured like:
LFPG;EDDW;00;E170;370;LFPG;EDDW;189930555;150907;1826;!!!!;AFR1724;AFR;0;AFR1724-LFPG-EDDW-20150907180000;N;0;;;245382;;;150907;1800;0;;X;;;;;;;;;;370;;0;20150907175700;AA45458743;;;;;NEXE;NEXE;;;;;20150907180000;;;;245382;;;;;;;;;;;;;;;;;;;;;;;;;;;;AFR;;;;;;;;;;;0
The above line represents flight information from an airplane, it took off from LFPG (1st element) and landed in EDDW (2nd element), the rest of the information is not relevant for the purpose.
I'd like to print or save in a file the top ten busiest airports based on the total number of aircraft movements, that is, airplanes that took off or landed in an airport.
So in a sense, the desired output would be:
AIRPORT_NAME #TOTAL_MOVEMENTS #TAKE-OFFs #LANDINGS
I have already implement this program in python and would like to transform it using the MAP/Reduce paradigm using Spark.
# Libraries
import sys
from collections import Counter
import collections
from itertools import chain
from collections import defaultdict
# START
# Defining default program argument
if len(sys.argv)==1:
fileName = "airports.exp2"
else:
fileName = sys.argv[1]
takeOffAirport = []
landingAirport = []
# Reading file
lines = 0 # Counter for file lines
try:
with open(fileName) as file:
for line in file:
words = line.split(';')
# Relevant data, item1 and item2 from each file line
origin = words[0]
destination = words[1]
# Populating lists
landingAirport.append(destination)
takeOffAirport.append(origin)
except IOError:
print ("\n\033[0;31mIoError: could not open the file:\033[00m %s" %fileName)
airports_dict = defaultdict(list)
# Merge lists into a dictionary key:value
for key, value in chain(Counter(takeOffAirport).items(),
Counter(landingAirport).items()):
# 'AIRPOT_NAME':[num_takeOffs, num_landings]
airports_dict[key].append(value)
# Sum key values and add it as another value
for key, value in airports_dict.items():
#'AIRPOT_NAME':[num_totalMovements, num_takeOffs, num_landings]
airports_dict[key] = [sum(value),value]
# Sort dictionary by the top 10 total movements
airports_dict = sorted(airports_dict.items(),
key=lambda kv:kv[1], reverse=True)[:10]
airports_dict = collections.OrderedDict(airports_dict)
# Print results
print("\nAIRPORT"+ "\t\t#TOTAL_MOVEMENTS"+ "\t#TAKEOFFS"+ "\t#LANDINGS")
for k in airports_dict:
print(k,"\t\t", airports_dict[k][0],
"\t\t\t", airports_dict[k][1][1],
"\t\t", airports_dict[k][1][0])
A test file can be download from: https://srv-file7.gofile.io/download/YCnWxr/traffic1day.exp2
So far I've been able to get the very first and second elements from the file, but I don't know quite well how to implement the filter or reduce in order to obtain the frequency time that each airports appears on each list and then merge both list adding the airport name, the sum of takeOffs and landings and the number of takeoffs and landings.
from pyspark import SparkContext, SparkConf
if __name__ == "__main__":
conf = SparkConf().setAppName("airports").setMaster("local[*]")
sc = SparkContext(conf = conf)
airports = sc.textFile("traffic1hour.exp2", minPartitions=4)
airports = airports.map(lambda line : line.split('\n'))
takeOff_airports = airports.map(lambda sub: (sub[0].split(';')[0]))
landing_airports = airports.map(lambda sub: (sub[0].split(';')[1]))
takeOff_airports.saveAsTextFile("takeOff_airports.txt")
landing_airports.saveAsTextFile("landing_airport.txt")
Any hint or guide it will be much appreciated.

object has no attribute error with python3

I have a error when trying to call calculate_similarity2 function which in in DocSim.py file from my notebook.
The error message is : 'DocSim' object has no attribute 'calculate_similarity2'
Here the content of my docsim File :
import numpy as np
class DocSim(object):
def __init__(self, w2v_model , stopwords=[]):
self.w2v_model = w2v_model
self.stopwords = stopwords
def vectorize(self, doc):
"""Identify the vector values for each word in the given document"""
doc = doc.lower()
words = [w for w in doc.split(" ") if w not in self.stopwords]
word_vecs = []
for word in words:
try:
vec = self.w2v_model[word]
word_vecs.append(vec)
except KeyError:
# Ignore, if the word doesn't exist in the vocabulary
pass
# Assuming that document vector is the mean of all the word vectors
# PS: There are other & better ways to do it.
vector = np.mean(word_vecs, axis=0)
return vector
def _cosine_sim(self, vecA, vecB):
"""Find the cosine similarity distance between two vectors."""
csim = np.dot(vecA, vecB) / (np.linalg.norm(vecA) * np.linalg.norm(vecB))
if np.isnan(np.sum(csim)):
return 0
return csim
def calculate_similarity(self, source_doc, target_docs=[], threshold=0):
"""Calculates & returns similarity scores between given source document & all
the target documents."""
if isinstance(target_docs, str):
target_docs = [target_docs]
source_vec = self.vectorize(source_doc)
results = []
for doc in target_docs:
target_vec = self.vectorize(doc)
sim_score = self._cosine_sim(source_vec, target_vec)
if sim_score > threshold:
results.append({
'score' : sim_score,
'sentence' : doc
})
# Sort results by score in desc order
results.sort(key=lambda k : k['score'] , reverse=True)
return results
def calculate_similarity2(self, source_doc=[], target_docs=[], threshold=0):
"""Calculates & returns similarity scores between given source document & all the target documents."""
if isinstance(source_doc, str):
target_docs = [source_doc]
if isinstance(target_docs, str):
target_docs = [target_docs]
#source_vec = self.vectorize(source_doc)
results = []
for doc in source_doc:
source_vec = self.vectorize(doc)
for doc1 in target_docs:
target_vec = self.vectorize(doc)
sim_score = self._cosine_sim(source_vec, target_vec)
if sim_score > threshold:
results.append({
'score' : sim_score,
'source sentence' : doc,
'target sentence' : doc1
})
# Sort results by score in desc order
results.sort(key=lambda k : k['score'] , reverse=True)
return results
here in instruction code when i try to call the fucntion :
To create DocSim Object
ds = DocSim(word2vec_model,stopwords=stopwords)
sim_scores = ds.calculate_similarity2(source_doc, target_docs)
the error message is :
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-54-bb0bd1e0e0ad> in <module>()
----> 1 sim_scores = ds.calculate_similarity2(source_doc, target_docs)
AttributeError: 'DocSim' object has no attribute 'calculate_similarity2'
i don't undersantand how to resolve this problem.
I can access to all function except calculate_similarity2
Can you help me please?
thanks
You have defined the calculate_similarity2 function inside the __init__ scope. Try getting it out of there

How to load from Json?

I'am trying to get the information from the web api with Python 3 but it gives me an error. That's my code:
import json, urllib.request, requests
def findLocation():
"""returns latlng location value in the form of a list."""
send_url = 'http://freegeoip.net/json'
r = requests.get(send_url)
j = json.loads(r.text)
lat = j['latitude']
lon = j['longitude']
return lat, lon
location = findLocation()
print(findLocation()[0])
print(findLocation()[1])
def readJsonUrl(url):
"""reads the Json returned by the google api and converts it into a format
that can be used in python."""
page = urllib.request.urlopen(url)
data_bytes = page.read()
data_str = data_bytes.decode("utf-8")
page.close()
return data_str
search =readJsonUrl("https://maps.googleapis.com/maps/api/place/textsearch/json?query=indian+restaurantsin+Coventry&location=52.4066,-1.5122&key=AIzaSyCI8n1sI4CDRnsYo3hB_oH1trfxbt2IEaw")
print(search['website'])
Error:
Traceback (most recent call last):
File "google api.py", line 28, in <module>
print(search['website'])
TypeError: string indices must be integers
Any help is appreciated.
The function you are using readJsonUrl() returns a string not JSON. Therefore, when you try search['website'] it fails because the indices on a string can only be integers.
Try parsing the string value to a JSON object. To do this you can try the accepted answer here Convert string to JSON using Python
data_str is string ( not dict format ) You should convert data_str to dict! just add this line to your code : convert_to_dict = json.loads(data_str). then return convert_to_dict ... and done.
Try this :
import json, urllib.request, requests
def findLocation():
send_url = 'http://freegeoip.net/json'
r = requests.get(send_url)
j = json.loads(r.text)
lat = j['latitude']
lon = j['longitude']
return lat, lon
location = findLocation()
print(findLocation()[0])
print(findLocation()[1])
def readJsonUrl(url):
page = urllib.request.urlopen(url)
data_bytes = page.read()
data_str = data_bytes.decode("utf-8")
page.close()
convert_to_dict = json.loads(data_str) # new line
return convert_to_dict # updated
search = readJsonUrl("https://maps.googleapis.com/maps/api/place/textsearch/json?query=indian+restaurantsin+Coventry&location=52.4066,-1.5122&key=AIzaSyCI8n1sI4CDRnsYo3hB_oH1trfxbt2IEaw")
print(search['your_key']) # now you can call your keys
The reason for for the TypeError: string indices must be integers is because your readJsonUrl function is returning str object instead of dict object. Using the json.loads function can help you transfer the str object to dict object.
You can try something like the following:
def readJsonUrl(url):
with (urllib.request.urlopen(url)) as page:
raw = page.read().decode("utf-8")
json_data = json.loads(raw)
return json_data
search =readJsonUrl("https://maps.googleapis.com/maps/api/place/textsearch/json?query=indian+restaurantsin+Coventry&location=52.4066,-1.5122&key=AIzaSyCI8n1sI4CDRnsYo3hB_oH1trfxbt2IEaw")
print(search['results'])
Hope it helps.

Resources