Parsing nested objects Python 3.4 - python-3.x

I am trying to extract user reviews linked to a business from the google places API.
I am using the requests library :
import requests
r = requests.get('https://maps.googleapis.com/maps/api/place/details/json?placeid=ChIJlc_6_jM4DW0RQUUtaQj2_lk&key=AIzaSyBuS4meH_HW3FO1cpUaCm6jbqzRCWe7mjc')
json_data_dic = r.json()
print(json_data_dic)
So I get the json object converted to a python object for me to parse and extract my user ratings & reviews .
I get a "lump" of text back (see below). As someone new to python/coding, all I see is a tangle of nested dictionaries and lists. In situations where there is nesting- how do I refer to an items such as 'rating', or the 'text' of the review?
Any guidance would be appreciated.
{
"html_attributions": [],
"result": {
"reviews": [
{
"relative_time_description": "in the last week",
"profile_photo_url": "//lh6.googleusercontent.com/-06-3qCU8jEg/AAAAAAAAAAI/AAAAAAAAACc/a1z-ga9rOhs/photo.jpg",
"rating": 1,
"time": 1481050128,
"author_name": "Tass Wilson",
"text": "Worse company I've had the pleasure of dealing with. They don't follow through on what they say, their technical support team knows jack all. They are unreliable and lie. The only good part about their entire team is the sales team and thats to get you in the door, signed up and committed to a 12 month contract so they can then screw you over many times without taking you out to dinner first\n\nI would literally rather go back to using smoke signals and frikkin carrier pigeons then use Orcon again as an internet provider",
"aspects": [
{
"rating": 0,
"type": "overall"
}
],
"language": "en",
"author_url": "https://www.google.com/maps/contrib/116566965301711692941/reviews"
},
{
"relative_time_description": "3 weeks ago",
"rating": 1,
"time": 1479258972,
"author_name": "Anne-Marie Petersen",
"text": "I have experienced nothing but pain with them - from the start to (almost the end) in fact so I could skip the 5 days without internet I have had my other provider set up my fibre and just cut the loss of the extra month as the final bad money I will ever have to pay them. I called them to ask why I hadn't been offered an upgrade to fibre - I was told I wasn't eligible for fibre due to bad credit rating. I flipped out. Namely because I have a good credit score - it's something I check regularly. So I said well how do you know - they gave me the number of the company they use so I could call them. I hung up, called the number - it was the YELLOW PAGES number. I call back, I get given the same number (same person answered) I am seeing RED by this point, so I say just give me the name of the company. I find the number myself - I then call them only to be told they don't even work with Orcon. Then the guy offers to do a quick scan of the system to see if my name is in then. Doesn't even appear. Round and round the mulberry bush - I called another company and finally have had my fibre installed and everything ago. I still have no idea how to use the extra remote they've given me but the internet is fabulous. Oh - and I also got sick of every time something was wrong it was always MY fault even though I knew they would go offline and fix something. I used to WORK at a telco company let me tell you I get the system. Finally I have to send the modem back but I've already been advised to take it into their head office and take a photo of myself handing it in because they have on numerous occasions said to people that they've never received the modem even though they had.... why the hell are they still even a company????\n\nUPDATE>> got sent two cancellation notices - neither of which were for accounts that I had power over apparently. Have taken to twitter to have a public forum so they can't get me on not returning my modem.",
"aspects": [
{
"rating": 0,
"type": "overall"
}
],
"language": "en",
"author_url": "https://www.google.com/maps/contrib/113533966859873380749/reviews"
},
{
"relative_time_description": "4 months ago",
"profile_photo_url": "//lh4.googleusercontent.com/-lAEJbEHaIoE/AAAAAAAAAAI/AAAAAAAAEFo/IATRvjK2Oak/photo.jpg",
"rating": 1,
"time": 1469481312,
"author_name": "Keith Rankine",
"text": "Everything works well until you try to cancel your account. Do not be fooled into thinking you cannot give them notice to cancel within your contract term. They will try everything in their power to squeeze an extra month from you. \nI had a 12 month contract and informed them of my wish to cancel on the anniversary on that sign up date. All their emails were carefully worded to imply that this could not be done. If read carefully, and you argue, they will agree to it.",
"aspects": [
{
"rating": 0,
"type": "overall"
}
],
"language": "en",
"author_url": "https://www.google.com/maps/contrib/115729563512218272075/reviews"
},
{
"relative_time_description": "a month ago",
"rating": 1,
"time": 1476082876,
"author_name": "Wayne Furlong",
"text": "Completely useless. Dishonest, lazy and downright incompetent. Corporate bullies. I'm so much happier with Bigpipe.",
"aspects": [
{
"rating": 0,
"type": "overall"
}
],
"language": "en",
"author_url": "https://www.google.com/maps/contrib/113527219809275504416/reviews"
},
{
"relative_time_description": "3 months ago",
"rating": 1,
"time": 1471292986,
"author_name": "Shaun b",
"text": "Recently upgraded to \"unlimited\" Fibre with Orcon. Most mornings (5-9) we have a limited wired or wireless connection. Too often (as is the case this morning) we have no internet (so while at home we have to use phone data). This is on Wellington's cbd area. Their customer service is such that a reply could take upwards of 4 weeks. I intend to change provider.",
"aspects": [
{
"rating": 0,
"type": "overall"
}
],
"language": "en",
"author_url": "https://www.google.com/maps/contrib/101110905108291593535/reviews"
}
],
"utc_offset": 780,
"adr_address": "<span class=\"street-address\">1 The Strand</span>, <span class=\"extended-address\">Takapuna</span>, <span class=\"locality\">Auckland</span> <span class=\"postal-code\">0622</span>, <span class=\"country-name\">New Zealand</span>",
"photos": [
{
"html_attributions": [
"Orcon"
],
"height": 877,
"photo_reference": "CoQBdwAAAO0RRplNcUkeQUxtJLTNk3uAOTadHfKZQ8g2NMa6XLRmGX2oKdUHItfnKZP0CG2WwIj198PwzfDRJpZIw4M1wSENCEOD9mFjITSwWTMjHkw1PzHb9teT6vuuROxcCdH-fwCYp0tkeBc75R8RHb2drPbTk-NN_5q88jkJTfNwdZQDEhB-25Az9550mGd00B-zK-LRGhQpTusm33tZBFXA1952txiuAUsgQA",
"width": 878
}
],
"id": "a7c161a7081101d8897c2dd2fb41fa94b812b050",
"scope": "GOOGLE",
"vicinity": "1 The Strand, Takapuna, Auckland",
"international_phone_number": "+64 800 131 415",
"url": "https://maps.google.com/?cid=6484891029444838721",
"types": [
"point_of_interest",
"establishment"
],
"name": "Orcon",
"rating": 1.8,
"geometry": {
"viewport": {
"northeast": {
"lat": -36.7889585,
"lng": 174.77310355
},
"southwest": {
"lat": -36.7890697,
"lng": 174.77302975
}
},
"location": {
"lat": -36.78901820000001,
"lng": 174.7730694
}
},
"place_id": "ChIJlc_6_jM4DW0RQUUtaQj2_lk",
"formatted_address": "1 The Strand, Takapuna, Auckland 0622, New Zealand",
"reference": "CmRRAAAAD0loSIVYDAuRKLbv5Cp6ZM_jxKHbzJ7EOrDLakY1PAlmq5YDTJ82A4qzWje0ILFv3lsEdaUpCtkuVHuOxXW6so5yqxDSfkgEnXbzd84jtfItuxis7Izu-y87vwkD7JO4EhBZB6aIdHSchBT6_USM5B5VGhTTRgmnDKndDt6amWnPkXw-57-Pww",
"icon": "https://maps.gstatic.com/mapfiles/place_api/icons/generic_business-71.png",
"website": "https://www.orcon.net.nz/",
"formatted_phone_number": "0800 131 415",
"address_components": [
{
"short_name": "1",
"types": [
"street_number"
],
"long_name": "1"
},
{
"short_name": "The Strand",
"types": [
"route"
],
"long_name": "The Strand"
},
{
"short_name": "Takapuna",
"types": [
"sublocality_level_1",
"sublocality",
"political"
],
"long_name": "Takapuna"
},
{
"short_name": "Auckland",
"types": [
"locality",
"political"
],
"long_name": "Auckland"
},
{
"short_name": "Auckland",
"types": [
"administrative_area_level_1",
"political"
],
"long_name": "Auckland"
},
{
"short_name": "NZ",
"types": [
"country",
"political"
],
"long_name": "New Zealand"
},
{
"short_name": "0622",
"types": [
"postal_code"
],
"long_name": "0622"
}
]
},
"status": "OK"
}

json_data_dic.get("result").get("reviews") or json_data_dic['result']['reviews'] gives you the list of reviews
json_data_dic.get("result").get("reviews")[0].get("text") returns the text of the first review
If you need to get each review:
for review in json_data_dic.get("result").get("reviews"):
print review.get("text")
In general, use .get(KEY) or [KEY] access a dictionary item by the key and use
[INDEX] access an item in a list by the index (starting from 0)

Related

Convert data from spreadsheets to nested json

I'm using mongodb in my project. And I'll import about 20,000 products to the database in the end. So, I tried to write a script to convert the data from the spreadsheet to json, and then upload them to the mongodb. But many fields were missing.
I'm trying to figure out how to layout the spreadsheet so it would contain nested data. But I didn't find any resources to do so but only this package:
https://www.npmjs.com/package/spread-sheet-to-nested-json
But it has one problem, it will always contain "title" and "children", not the actual name of the field.
This is my product json:
[
{
"sku": "ADX112",
"name": {
"en": "Multi-Mat Gallery Frames",
"ar": "لوحة بإطار"
},
"brand": "Dummy brand",
"description": {
"en": "Metal frame in a Black powder-coated finish. Tempered glass. 2 removable, acid-free paper mats included with each frame. Can be hung vertically and horizontally. D-rings included. 5x7 and 8x10 frames include easel backs. Sold individually. Made in China.",
"ar": "إطار اسود. صنع في الصين."
},
"tags": [
"art",
"frame",
"لوحة",
"إطار"
],
"colors": [
"#000000"
],
"dimensions": [
"5x7",
"8x10"
],
"units_in_stock": {
"5x7": 5,
"8x10": 7
},
"thumbnail": "https://via.placeholder.com/150",
"images": [
"https://via.placeholder.com/150",
"https://via.placeholder.com/150"
],
"unit_size": {
"en": [
"individual",
"set of 3"
],
"ar": [
"فردي",
"مجموعة من 3"
]
},
"unit_price": 2000,
"discount": 19,
"category_id": "631f3ca65b2310473b978ab5",
"subCategories_ids": [
"631f3ca65b2310473b978ab5",
"631f3ca65b2310473b978ab5"
],
"featured": false
}
]
How can I layout a spreadsheet so it would be a template for future imports?

Indexing e-mails in Azure Search

I'm trying to best index contents of e-mail messages, subjects and email addresses. E-mails can contain both text and HTML representation. They can be in any language so I can't use language specific analysers unfortunately.
As I am new to this I have many questions:
First I used Standard Lucene analyser but after some testing and
checking what each analyser does I switched to using "simple"
analyser. Standard one didn't allow me to search by domain in
user#domain.com (It sees user and domain.com as tokens). Is "simple" the best I can use in my case?
How can I handle HTML contents of e-mail? I thought this should be
possible to do it in Azure Search but right now I think I would need
to strip HTML tags myself.
My users aren't tech savvy and I assumed "simple" query type will be
enough for them. I expect them to type word or two and find messages
containing this word/containing words starting with this word. From my tests it looks I need to append * to their queries to get "starting with" to work?
It would help if you included an example of your data and how you index and query. What happened, and what did you expect?
The standard Lucene analyzer will work with your user#domain.com example. It is correct that it produces the tokens user and domain.com. But the same happens when you query, and you will get records with the tokens user and domain.com.
CREATE INDEX
"fields": [
{"name": "Id", "type": "Edm.String", "searchable": false, "filterable": true, "retrievable": true, "sortable": true, "facetable": false, "key": true, "indexAnalyzer": null, "searchAnalyzer": null, "analyzer": null, "synonymMaps": [] },
{"name": "Email", "type": "Edm.String", "filterable": true, "sortable": true, "facetable": false, "searchable": true, "analyzer": "standard"}
]
UPLOAD
{
"value": [
{
"#search.action": "mergeOrUpload",
"Id": "1",
"Email": "user#domain.com"
},
{
"#search.action": "mergeOrUpload",
"Id": "2",
"Email": "some.user#some-domain.com"
},
{
"#search.action": "mergeOrUpload",
"Id": "3",
"Email": "another#another.com"
}
]
}
QUERY
Query, using full and all.
https://{{SEARCH_SVC}}.{{DNS_SUFFIX}}/indexes/{{INDEX_NAME}}/docs?search=user#domain.com&$count=true&$select=Id,Email&searchMode=all&queryType=full&api-version={{API-VERSION}}
Which produces results as expected (all records containing user and domain.com):
{
"#odata.context": "https://<your-search-env>.search.windows.net/indexes('dg-test-65392234')/$metadata#docs(*)",
"#odata.count": 2,
"value": [
{
"#search.score": 0.51623213,
"Id": "1",
"Email": "user#domain.com"
},
{
"#search.score": 0.25316024,
"Id": "2",
"Email": "some.user#some-domain.com"
}
]
}
If your expected result is to only get the record above where the email matches completely, you could instead use a phrase search. I.e. replace the search parameter above with search="user#domain.com" and you would get:
{
"#search.score": 0.51623213,
"Id": "1",
"Email": "user#domain.com"
}
Alternatively, you could use the keyword analyzer.
ANALYZE
You can compare the different analyzers directly via REST. Using the keyword analyzer on the Email property will produce a single token.
{
"text": "some-user#some-domain.com",
"analyzer": "keyword"
}
Results in the following tokens:
"tokens": [
{
"token": "some-user#some-domain.com",
"startOffset": 0,
"endOffset": 25,
"position": 0
}
]
Compared to the standard tokenizer, which does a decent job for most types of unstructured content.
{
"text": "some-user#some-domain.com",
"analyzer": "standard"
}
Which produces reasonable results for cases where the email address was part of some generic text.
"tokens": [
{
"token": "some",
"startOffset": 0,
"endOffset": 4,
"position": 0
},
{
"token": "user",
"startOffset": 5,
"endOffset": 9,
"position": 1
},
{
"token": "some",
"startOffset": 10,
"endOffset": 14,
"position": 2
},
{
"token": "domain.com",
"startOffset": 15,
"endOffset": 25,
"position": 3
}
]
SUMMARY
This is a long answer already, so I won't cover your other two questions in detail. I would suggest splitting them to separate questions so it can benefit others.
HTML content: You can use a built-in HTML analyzer that strips HTML tags. Or you can strip the HTML yourself using custom code. I typically use Beautiful Soup for cases like these or simple regular expressions for simpler cases.
Wildcard search: Usually, users don't expect automatic wildcards appended. The only application that does this is the Outlook client, which destroys precision. When I search for "Jan" (a common name), I annoyingly get all emails sent in January(!). And a search for Dan (again, a name), I also get all emails from Danmark (Denmark).
Everything in search is a trade-off between precision and recall. In your first example with the email address, your expectation was heavily geared toward precision. But, in your last wildcard question, you seem to prefer extreme recall with wildcards on everything. It all comes down to your expectations.

Why AdSense web interface and API are reporting different numbers?

I've managed to get API working but it is reporting numbers which are always a bit lower than the ones reported by AdSense web (or mobile app) interface.
This is the response from API explorer:
"kind": "adsense#report",
"totalMatchedRows": "1",
"headers": [
{
"name": "EARNINGS",
"type": "METRIC_CURRENCY",
"currency": "EUR"
},
{
"name": "CLICKS",
"type": "METRIC_TALLY"
}
],
"rows": [
[
"7.58",
"17"
]
],
"totals": [
"7.58",
"17"
],
"averages": [
"7.58",
"17"
],
"startDate": "2019-10-07",
"endDate": "2019-10-07"
}
For example, for two days ago (2019-10-07) web interface is showing 24 clicks with estimated earning of 7,65 EUR and when I call API for the same day it shows 17 clicks and earning of 7,58 EUR.
And for every other day numbers are lower than the ones on the web interface.
I found the answer to this. If you leave out "useTimezoneReporting" parameter then it will default to PST timezone which is far away from me. When you set this parameter to "true" then it generates numbers based on AdSense account timezone.
Now everything works fine.

How would I call the data in this object?

I am new to web development and am about half way through a full-stack web development course. How would I go about calling the value of the data stored with the source: "Rotten Tomatoes"?
I have tried Ratings[1].Value and it does not seem to work.
var movieObject = JSON.parse(body);
console.log('Rotten Tomatoes Rating: ', movieObject.Ratings[1].Value);
Body Content:
{
"Title": "Avatar",
"Year": "2009",
"Rated": "PG-13",
"Released": "18 Dec 2009",
"Runtime": "162 min",
"Genre": "Action, Adventure, Fantasy",
"Director": "James Cameron",
"Writer": "James Cameron",
"Actors": "Sam Worthington, Zoe Saldana, Sigourney Weaver, Stephen Lang",
"Plot": "A paraplegic marine dispatched to the moon Pandora on a unique mission becomes torn between following his orders and protecting the world he feels is his home.",
"Language": "English, Spanish",
"Country": "UK, USA",
"Awards": "Won 3 Oscars. Another 85 wins & 128 nominations.",
"Poster": "https://images-na.ssl-images-amazon.com/images/M/MV5BMTYwOTEwNjAzMl5BMl5BanBnXkFtZTcwODc5MTUwMw##._V1_SX300.jpg",
"Ratings": [
{
"Source": "Internet Movie Database",
"Value": "7.8/10"
},
{
"Source": "Rotten Tomatoes",
"Value": "83%"
},
{
"Source": "Metacritic",
"Value": "83/100"
}
],
"Metascore": "83",
"imdbRating": "7.8",
"imdbVotes": "967,488",
"imdbID": "tt0499549",
"Type": "movie",
"DVD": "22 Apr 2010",
"BoxOffice": "$749,700,000",
"Production": "20th Century Fox",
"Website": "http://www.avatarmovie.com/",
"Response": "True"
}
Your call is correct, so it must be something in the setup. Best guess is that you're calling JSON.parse() on an object that has already been converted to an object.

ElasticSearch: Suggestion Completion Multi Search

I am using the suggestion api within ES with completion. My implementation works (code below) but I would like to search for multiple words within a query. In the example below if I query search "word" it finds "wordpress" and outputs "Found". What I am am trying to accomplish is querying with something like "word blog magazine" which are all tags and have an output of "Found". Any help would be appreciated!
Mapping:
curl -XPUT "http://localhost:9200/test_index/" -d'
{
"mappings": {
"product": {
"properties": {
"description": {
"type": "string"
},
"tags": {
"type": "string"
},
"title": {
"type": "string"
},
"tag_suggest": {
"type": "completion",
"index_analyzer": "simple",
"search_analyzer": "simple",
"payloads": false
}
}
}
}
}'
Add document:
curl -XPUT "http://localhost:9200/test_index/product/1" -d'
{
"title": "Product1",
"description": "Product1 Description",
"tags": [
"blog",
"magazine",
"responsive",
"two columns",
"wordpress"
],
"tag_suggest": {
"input": [
"blog",
"magazine",
"responsive",
"two columns",
"wordpress"
],
"output": "Found"
}
}'
_suggest query:
curl -XPOST "http://localhost:9200/test_index/_suggest" -d'
{
"product_suggest":{
"text":"word",
"completion": {
"field" : "tag_suggest"
}
}
}'
The results are as we would expect:
{
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"product_suggest": [
{
"text": "word",
"offset": 0,
"length": 4,
"options": [
{
"text": "Found",
"score": 1
},
]
}
]
}
If you're willing to switch to using edge ngrams (or full ngrams if you need them), I think it will solve your problem.
I wrote up a pretty detailed explanation of how to do this in this blog post:
https://qbox.io/blog/an-introduction-to-ngrams-in-elasticsearch
But I'll give you a quick and dirty version here. The trick is to use ngrams together with the _all field and the match AND operator.
So with this mapping:
PUT /test_index
{
"settings": {
"analysis": {
"filter": {
"ngram_filter": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 20
}
},
"analyzer": {
"ngram_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"ngram_filter"
]
}
}
}
},
"mappings": {
"doc": {
"_all": {
"type": "string",
"analyzer": "ngram_analyzer",
"search_analyzer": "standard"
},
"properties": {
"word": {
"type": "string",
"include_in_all": true
},
"definition": {
"type": "string",
"include_in_all": true
}
}
}
}
}
and some documents:
PUT /test_index/_bulk
{"index":{"_index":"test_index","_type":"doc","_id":1}}
{"word":"democracy", "definition":"government by the people; a form of government in which the supreme power is vested in the people and exercised directly by them or by their elected agents under a free electoral system."}
{"index":{"_index":"test_index","_type":"doc","_id":2}}
{"word":"republic", "definition":"a state in which the supreme power rests in the body of citizens entitled to vote and is exercised by representatives chosen directly or indirectly by them."}
{"index":{"_index":"test_index","_type":"doc","_id":3}}
{"word":"oligarchy", "definition":"a form of government in which all power is vested in a few persons or in a dominant class or clique; government by the few."}
{"index":{"_index":"test_index","_type":"doc","_id":4}}
{"word":"plutocracy", "definition":"the rule or power of wealth or of the wealthy."}
{"index":{"_index":"test_index","_type":"doc","_id":5}}
{"word":"theocracy", "definition":"a form of government in which God or a deity is recognized as the supreme civil ruler, the God's or deity's laws being interpreted by the ecclesiastical authorities."}
{"index":{"_index":"test_index","_type":"doc","_id":6}}
{"word":"monarchy", "definition":"a state or nation in which the supreme power is actually or nominally lodged in a monarch."}
{"index":{"_index":"test_index","_type":"doc","_id":7}}
{"word":"capitalism", "definition":"an economic system in which investment in and ownership of the means of production, distribution, and exchange of wealth is made and maintained chiefly by private individuals or corporations, especially as contrasted to cooperatively or state-owned means of wealth."}
{"index":{"_index":"test_index","_type":"doc","_id":8}}
{"word":"socialism", "definition":"a theory or system of social organization that advocates the vesting of the ownership and control of the means of production and distribution, of capital, land, etc., in the community as a whole."}
{"index":{"_index":"test_index","_type":"doc","_id":9}}
{"word":"communism", "definition":"a theory or system of social organization based on the holding of all property in common, actual ownership being ascribed to the community as a whole or to the state."}
{"index":{"_index":"test_index","_type":"doc","_id":10}}
{"word":"feudalism", "definition":"the feudal system, or its principles and practices."}
{"index":{"_index":"test_index","_type":"doc","_id":11}}
{"word":"monopoly", "definition":"exclusive control of a commodity or service in a particular market, or a control that makes possible the manipulation of prices."}
{"index":{"_index":"test_index","_type":"doc","_id":12}}
{"word":"oligopoly", "definition":"the market condition that exists when there are few sellers, as a result of which they can greatly influence price and other market factors."}
I can apply partial matching across both fields (would work with as many fields as you want) like this:
POST /test_index/_search
{
"query": {
"match": {
"_all": {
"query": "theo go",
"operator": "and"
}
}
}
}
which in this case, returns:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.7601639,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "5",
"_score": 0.7601639,
"_source": {
"word": "theocracy",
"definition": "a form of government in which God or a deity is recognized as the supreme civil ruler, the God's or deity's laws being interpreted by the ecclesiastical authorities."
}
}
]
}
}
Here is the code I used here (there's more in the blog post):
http://sense.qbox.io/gist/e4093c25a8257499f54ced5a09f35b1eb48e4e3c
Hope that helps.

Resources