How to load from Json?

How to load from Json? - python-3.x

I'am trying to get the information from the web api with Python 3 but it gives me an error. That's my code:
import json, urllib.request, requests
def findLocation():
"""returns latlng location value in the form of a list."""
send_url = 'http://freegeoip.net/json'
r = requests.get(send_url)
j = json.loads(r.text)
lat = j['latitude']
lon = j['longitude']
return lat, lon
location = findLocation()
print(findLocation()[0])
print(findLocation()[1])
def readJsonUrl(url):
"""reads the Json returned by the google api and converts it into a format
that can be used in python."""
page = urllib.request.urlopen(url)
data_bytes = page.read()
data_str = data_bytes.decode("utf-8")
page.close()
return data_str
search =readJsonUrl("https://maps.googleapis.com/maps/api/place/textsearch/json?query=indian+restaurantsin+Coventry&location=52.4066,-1.5122&key=AIzaSyCI8n1sI4CDRnsYo3hB_oH1trfxbt2IEaw")
print(search['website'])
Error:
Traceback (most recent call last):
File "google api.py", line 28, in <module>
print(search['website'])
TypeError: string indices must be integers
Any help is appreciated.

The function you are using readJsonUrl() returns a string not JSON. Therefore, when you try search['website'] it fails because the indices on a string can only be integers.
Try parsing the string value to a JSON object. To do this you can try the accepted answer here Convert string to JSON using Python

data_str is string ( not dict format ) You should convert data_str to dict! just add this line to your code : convert_to_dict = json.loads(data_str). then return convert_to_dict ... and done.
Try this :
import json, urllib.request, requests
def findLocation():
send_url = 'http://freegeoip.net/json'
r = requests.get(send_url)
j = json.loads(r.text)
lat = j['latitude']
lon = j['longitude']
return lat, lon
location = findLocation()
print(findLocation()[0])
print(findLocation()[1])
def readJsonUrl(url):
page = urllib.request.urlopen(url)
data_bytes = page.read()
data_str = data_bytes.decode("utf-8")
page.close()
convert_to_dict = json.loads(data_str) # new line
return convert_to_dict # updated
search = readJsonUrl("https://maps.googleapis.com/maps/api/place/textsearch/json?query=indian+restaurantsin+Coventry&location=52.4066,-1.5122&key=AIzaSyCI8n1sI4CDRnsYo3hB_oH1trfxbt2IEaw")
print(search['your_key']) # now you can call your keys

The reason for for the TypeError: string indices must be integers is because your readJsonUrl function is returning str object instead of dict object. Using the json.loads function can help you transfer the str object to dict object.
You can try something like the following:
def readJsonUrl(url):
with (urllib.request.urlopen(url)) as page:
raw = page.read().decode("utf-8")
json_data = json.loads(raw)
return json_data
search =readJsonUrl("https://maps.googleapis.com/maps/api/place/textsearch/json?query=indian+restaurantsin+Coventry&location=52.4066,-1.5122&key=AIzaSyCI8n1sI4CDRnsYo3hB_oH1trfxbt2IEaw")
print(search['results'])
Hope it helps.

Related

Why substring cannot be found in the target string?

To understand the values of each variable, I improved a script for replacement from Udacity class. I convert the codes in a function into regular codes. However, my codes do not work while the codes in the function do. I appreciate it if anyone can explain it. Please pay more attention to function "tokenize".
Below codes are from Udacity class (CopyRight belongs to Udacity).
# download necessary NLTK data
import nltk
nltk.download(['punkt', 'wordnet'])
# import statements
import re
import pandas as pd
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
url_regex = 'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_#.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+'
def load_data():
df = pd.read_csv('corporate_messaging.csv', encoding='latin-1')
df = df[(df["category:confidence"] == 1) & (df['category'] != 'Exclude')]
X = df.text.values
y = df.category.values
return X, y
def tokenize(text):
detected_urls = re.findall(url_regex, text) # here, "detected_urls" is a list for sure
for url in detected_urls:
text = text.replace(url, "urlplaceholder") # I do not understand why it can work while does not work in my code if I do not convert it to string
tokens = word_tokenize(text)
lemmatizer = WordNetLemmatizer()
clean_tokens = []
for tok in tokens:
clean_tok = lemmatizer.lemmatize(tok).lower().strip()
clean_tokens.append(clean_tok)
return clean_tokens
X, y = load_data()
for message in X[:5]:
tokens = tokenize(message)
print(message)
print(tokens, '\n')
Below is its output:
I want to understand the variables' values in function "tokenize()". Following are my codes.
X, y = load_data()
detected_urls = []
for message in X[:5]:
detected_url = re.findall(url_regex, message)
detected_urls.append(detected_url)
print("detected_urs: ",detected_urls) #output a list without problems
# replace each url in text string with placeholder
i = 0
for url in detected_urls:
text = X[i].strip()
i += 1
print("LN1.url= ",url,"\ttext= ",text,"\n type(text)=",type(text))
url = str(url).strip() #if I do not convert it to string, it is a list. It does not work in text.replace() below, but works in above function.
if url in text:
print("yes")
else:
print("no") #always show no
text = text.replace(url, "urlplaceholder")
print("\nLN2.url=",url,"\ttext= ",text,"\n type(text)=",type(text),"\n===============\n\n")
The output is shown below.
The outputs for "LN1" and "LN2" are same. The "if" condition always output "no". I do not understand why it happens.
Any further help and advice would be highly appreciated.

Why I can not extract data from this dictionary though a loop?

I'm extracting from data which is of type dictionary.
import urllib3
import json
http = urllib3.PoolManager()
url = 'https://raw.githubusercontent.com/leanhdung1994/BigData/main/fr-esr-principaux-etablissements-enseignement-superieur.json'
f = http.request('GET', url)
data = json.loads(f.data.decode('utf-8'))
data[0]["geometry"]["coordinates"]
geo = []
n = len(data)
for i in range(n):
geo.append(data[i]["geometry"]["coordinates"])
It returns an error
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-26-52e67ffdcaa6> in <module>
12 n = len(data)
13 for i in range(n):
---> 14 geo.append(data[i]["geometry"]["coordinates"])
KeyError: 'geometry'
This is weird, because, when I only run data[0]["geometry"]["coordinates"], it returns [7.000275, 43.58554] without error.
Could you please elaborate on this issue?

Error is occuring because in few of the response dictionaries you don't habe "geometry" key.
Check before appending to geo list, that "geometry" key exists in response dict.
Try following code.
import urllib3
import json
http = urllib3.PoolManager()
url = 'https://raw.githubusercontent.com/leanhdung1994/BigData/main/fr-esr-principaux-etablissements-enseignement-superieur.json'
f = http.request('GET', url)
data = json.loads(f.data.decode('utf-8'))
geo = []
n = len(data)
for i in range(n):
if "geometry" in data[i]:
geo.append(data[i]["geometry"]["coordinates"])
print(geo)

I believe the problem is that there are places in your data which do not have a "geography" key. As a preliminary matter, your data structure is not technically a dictionary. It is a 'list' of 'dictionaries'. You can tell that by using the print(type(data)) and print(type(data[0]) commands.
I took your code but added the following lines:
dataStructure = data[0]
print(type(dataStructure))
geo = []
n = len(data)
for i in range(321):
try:
geo.append(data[i]["geometry"]["coordinates"])
except:
print(i)
If you run this, you will see that at index positions 64 and 130, there is no geometry key. You may want to explore those entries specifically and see whether they should be removed from your data or whether you just need to alter the keyword to something else for those lines.

Read a CSV from Google Cloud Storage using Google Cloud Functions in Python script

I'm new in GCP and I'm trying to do a simple API with Cloud Functions. This API needs to read a CSV from Google Cloud Storage bucket and return a JSON. To do this, in my local I can run normally, open a file.
But in Cloud Functions, I received a blob from bucket, and don know how manipulate this, I'm receiving error
I try convert blob to Bytes and to string but i don't know exactly how do it
Code working in my local env:
data1 = '2019-08-20'
data1 = datetime.datetime.strptime(data1, '%Y-%m-%d')
data2 = '2019-11-21'
data2 = datetime.datetime.strptime(data2, '%Y-%m-%d')
with open("/home/thiago/mycsvexample.csv", "r") as fin:
#create a CSV dictionary reader object
print(type(fin))
csv_dreader = csv.DictReader(fin)
#iterate over all rows in CSV dict reader
for row in csv_dreader:
#check for invalid Date values
#convert date string to a date object
date = datetime.datetime.strptime(row['date'], '%Y-%m-%d')
#check if date falls within requested range
if date >= data1 and date <= data2:
total = total + float(row['total'])
print(total)
Code in Google Cloud Functions:
import csv, datetime
from google.cloud import storage
from io import BytesIO
def get_orders(request):
"""Responds to any HTTP request.
Args:
request (flask.Request): HTTP request object.
Returns:
The response text or any set of values that can be turned into a
Response object using
`make_response <http://flask.pocoo.org/docs/1.0/api/#flask.Flask.make_response>`.
"""
request_json = request.get_json()
if request.args and 'token' in request.args:
if request.args['token'] == 'mytoken888888':
client = storage.Client()
bucket = client.get_bucket('mybucketgoogle.appspot.com')
blob = bucket.get_blob('mycsvfile.csv')
byte_stream = BytesIO()
blob.download_to_file(byte_stream)
byte_stream.seek(0)
file = byte_stream
#with open(BytesIO(blob), "r") as fin:
#create a CSV dictionary reader object
csv_dreader = csv.DictReader(file)
#iterate over all rows in CSV dict reader
for row in csv_dreader:
#check for invalid Date values
date = datetime.datetime.strptime(row['date'], '%Y-%m-%d')
#check if date falls within requested range
if date >= datetime.datetime.strptime(request.args['start_date']) and date <= datetime.datetime.strptime(request.args['end_date']):
total = total + float(row['total'])
dict = {'total_faturado' : total}
return dict
else:
return f'Passe parametros corretos'
else:
return f'Passe parametros corretos'
Error in Google Cloud Functions:
Traceback (most recent call last): File "/env/local/lib/python3.7/site-packages/google/cloud/functions/worker.py", line 346, in run_http_function result = _function_handler.invoke_user_function(flask.request) File "/env/local/lib/python3.7/site-packages/google/cloud/functions/worker.py", line 217, in invoke_user_function return call_user_function(request_or_event) File "/env/local/lib/python3.7/site-packages/google/cloud/functions/worker.py", line 210, in call_user_function return self._user_function(request_or_event) File "/user_code/main.py", line 31, in get_orders_tramontina for row in csv_dreader: File "/opt/python3.7/lib/python3.7/csv.py", line 111, in __next__ self.fieldnames File "/opt/python3.7/lib/python3.7/csv.py", line 98, in fieldnames self._fieldnames = next(self.reader) _csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
I try do some other things but no sucess...
Someone can help me with this blob, to convert this or manipulate with the right way?
Thank you all

This is the code that worked for me:
from google.cloud import storage
import csv
client = storage.Client()
bucket = client.get_bucket('source')
blob = bucket.blob('file')
dest_file = '/tmp/file.csv'
blob.download_to_filename(dest_file)
dict = {}
total = 0
with open(dest_file) as fh:
# assuming your csv is del by comma
rd = csv.DictReader(fh, delimiter=',')
for row in rd:
date = datetime.datetime.strptime(row['date'], '%Y-%m-%d')
#check if date falls within requested range
if date >= datetime.datetime.strptime(request.args['start_date']) and date <= datetime.datetime.strptime(request.args['end_date']):
total = total + float(row['total'])
dict['total_faturado'] = total

I'm able to do this too using a library gcsfs
https://gcsfs.readthedocs.io/en/latest/
def get_orders_tramontina(request):
"""Responds to any HTTP request.
Args:
request (flask.Request): HTTP request object.
Returns:
The response text or any set of values that can be turned into a
Response object using
`make_response <http://flask.pocoo.org/docs/1.0/api/#flask.Flask.make_response>`.
"""
request_json = request.get_json()
if request.args and 'token' in request.args:
if request.args['token'] == 'mytoken':
fs = gcsfs.GCSFileSystem(project='myproject')
total = 0
with fs.open('mybucket.appspot.com/mycsv.csv', "r") as fin:
csv_dreader = csv.DictReader(fin)
#iterate over all rows in CSV dict reader
for row in csv_dreader:
#check for invalid Date values
date = datetime.datetime.strptime(row['date'], '%Y-%m-%d')
#check if date falls within requested range
if date >= datetime.datetime.strptime(request.args['start_date'], '%Y-%m-%d') and date <= datetime.datetime.strptime(request.args['end_date'], '%Y-%m-%d'):
total = total + float(row['total'])
dict = {'total_faturado' : total}
return json.dumps(dict)```

Try to download file as string, that way you can check for invalid data values, and eventually write that to a file.
change blob.download_to_file(byte_stream) to my_blob_str = blob.download_as_string()
I think your actual problem is byte_stream = BytesIO() since your output reads iterator should return strings, not bytes (did you open the file in text mode?)
It is expecting a string, but gets bytes. What is the purpose of byte_stream? If random, just remove it.

object has no attribute error with python3

I have a error when trying to call calculate_similarity2 function which in in DocSim.py file from my notebook.
The error message is : 'DocSim' object has no attribute 'calculate_similarity2'
Here the content of my docsim File :
import numpy as np
class DocSim(object):
def __init__(self, w2v_model , stopwords=[]):
self.w2v_model = w2v_model
self.stopwords = stopwords
def vectorize(self, doc):
"""Identify the vector values for each word in the given document"""
doc = doc.lower()
words = [w for w in doc.split(" ") if w not in self.stopwords]
word_vecs = []
for word in words:
try:
vec = self.w2v_model[word]
word_vecs.append(vec)
except KeyError:
# Ignore, if the word doesn't exist in the vocabulary
pass
# Assuming that document vector is the mean of all the word vectors
# PS: There are other & better ways to do it.
vector = np.mean(word_vecs, axis=0)
return vector
def _cosine_sim(self, vecA, vecB):
"""Find the cosine similarity distance between two vectors."""
csim = np.dot(vecA, vecB) / (np.linalg.norm(vecA) * np.linalg.norm(vecB))
if np.isnan(np.sum(csim)):
return 0
return csim
def calculate_similarity(self, source_doc, target_docs=[], threshold=0):
"""Calculates & returns similarity scores between given source document & all
the target documents."""
if isinstance(target_docs, str):
target_docs = [target_docs]
source_vec = self.vectorize(source_doc)
results = []
for doc in target_docs:
target_vec = self.vectorize(doc)
sim_score = self._cosine_sim(source_vec, target_vec)
if sim_score > threshold:
results.append({
'score' : sim_score,
'sentence' : doc
})
# Sort results by score in desc order
results.sort(key=lambda k : k['score'] , reverse=True)
return results
def calculate_similarity2(self, source_doc=[], target_docs=[], threshold=0):
"""Calculates & returns similarity scores between given source document & all the target documents."""
if isinstance(source_doc, str):
target_docs = [source_doc]
if isinstance(target_docs, str):
target_docs = [target_docs]
#source_vec = self.vectorize(source_doc)
results = []
for doc in source_doc:
source_vec = self.vectorize(doc)
for doc1 in target_docs:
target_vec = self.vectorize(doc)
sim_score = self._cosine_sim(source_vec, target_vec)
if sim_score > threshold:
results.append({
'score' : sim_score,
'source sentence' : doc,
'target sentence' : doc1
})
# Sort results by score in desc order
results.sort(key=lambda k : k['score'] , reverse=True)
return results
here in instruction code when i try to call the fucntion :
To create DocSim Object
ds = DocSim(word2vec_model,stopwords=stopwords)
sim_scores = ds.calculate_similarity2(source_doc, target_docs)
the error message is :
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-54-bb0bd1e0e0ad> in <module>()
----> 1 sim_scores = ds.calculate_similarity2(source_doc, target_docs)
AttributeError: 'DocSim' object has no attribute 'calculate_similarity2'
i don't undersantand how to resolve this problem.
I can access to all function except calculate_similarity2
Can you help me please?
thanks

You have defined the calculate_similarity2 function inside the __init__ scope. Try getting it out of there

KeyError: 'lattitude'

I'm currently trying to use an API to get the data in Buffalo and return it in from a JSON URL and them place it in the format of: longitude, Latitude, and Viodesc.
However, I believe I am reaching difficulties when iterating due to some values not having latitude and longitude thus giving me a KeyError of 'latitude'.
I'm not sure if this is the fault in my code as well as how to go about changing it
import json
from urllib import request
def get_ticket_data(string):
answer = []
urlData = string
webURL = request.urlopen(urlData)
data = webURL.read()
ans = json.loads(data.decode())
for x in ans:
arr = []
arr.append(x["lattitude"])
arr.append(x["longtitude"])
arr.append(x["viodesc"])
return answer.append(ans)

You can catch the Exception 'KeyError' which is raised when the particular key is not found. Handle the exception so that even if the key is missing you can move on to the next record without stopping the code.
Code Snippet:
import json
from urllib import request
def get_ticket_data(string):
answer = []
urlData = string
webURL = request.urlopen(urlData)
data = webURL.read()
ans = json.loads(data.decode())
for x in ans:
try:
arr = []
arr.append(x["lattit**strong text**ude"])
arr.append(x["longtitude"])
arr.append(x["viodesc"])
except KeyError:
continue
return answer.append(ans)
Hope it helps!

Another different attempt would be to check before appending:
import json
from urllib import request
def get_ticket_data(string):
answer = []
urlData = string
webURL = request.urlopen(urlData)
data = webURL.read()
ans = json.loads(data.decode())
for x in ans:
arr = []
arr.append(x["lattit**strong text**ude"]) if x["lattit**strong text**ude""] else pass
arr.append(x["longtitude"]) if x["longitude"] else pass
arr.append(x["viodesc"]) if x["viodesc"] else pass
return answer.append(ans)
Using the inline-if will let you append if the value exists other wise it would not append.
It will all depend on how you will treat the information latter. Another approach would be to fill it with "" in case there is no latitude. For this approach you could do:
import json
from urllib import request
def get_ticket_data(string):
answer = []
urlData = string
webURL = request.urlopen(urlData)
data = webURL.read()
ans = json.loads(data.decode())
for x in ans:
arr = []
arr.append(x["lattit**strong text**ude"]) if x["lattit**strong text**ude""] else ""
arr.append(x["longtitude"]) if x["longitude"] else ""
arr.append(x["viodesc"]) if x["viodesc"] else ""
return answer.append(ans)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to load from Json? - python-3.x

Related

Why substring cannot be found in the target string?

Why I can not extract data from this dictionary though a loop?

Read a CSV from Google Cloud Storage using Google Cloud Functions in Python script

object has no attribute error with python3

KeyError: 'lattitude'

Categories

Resources