How do I access JSON elements with urllib and Python3 - python-3.x

I'm trying to access an API via urllib in Python3. The API is here:
https://api.zxinfo.dk/doc/#!/zxinfo/getGameById
The used a script I found here which prints the JSON dump OK, but when I try and access individual values I get the error 'KeyError: controls'
Access nested JSON values using Python
How do I access the individual elements?
I have spent today looking at various other questions but I can't get to the bottom of it.
Thanks in advance.
import json
import urllib.request
data = urllib.request.urlopen("https://api.zxinfo.dk/api/zxinfo/games/0001551?mode=compact".read().decode('utf8')
output = json.loads(data)
print(json.dumps(output, indent=2))
for item in output ['controls']:
title=item ['control']
print(title)

Related

How to make discord embed pages that can be browsed? (discord.py)

I am trying to make a command that can add to a database and then the info is extracted from database and sent in an embed. So, I tried to make embed pages, example if 10 entries are made in the database, it automatically creates a page 2 that can be accessed using buttons.
I tried using for loop like this:
pages = (entries // 10) + 1
for i in range(pages):
embed=discord.Embed(title=f"Page {i+1}")
db[f"pagesembed_{i+1}"]=embed
But I got a JSON decode error, so I decided to convert the embed value to string like this:
embed=str(discord.Embed(title=f"Page {i+1}"))
Then when I try to load into a message, it gives me Application Command raised an exception: AttributeError: 'str' object has no attribute 'to_dict'
I don't know what to do, I have contemplated using SQLite for storing the page embeds, maybe that should work but I need to ask if there is another way of doing it, I would really appreciate if someone would help! Thank you.
You can dump the embed class with pickle and than load it from the database
save example:
import pickle
import discord
exampleEmbedToSave = discord.Embed(title="example", description="example")
db["embed"] = pickle.dumps(exampleEmbedToSave) # the function pickle.dumps will dump the class into a string
Load example:
import pickle
import discord
loadedEmbed = pickle.loads(db["embed"]) # pickle.loads is used to load the class from the string
# >> In a command:
await channel.send(embed=loadedEmbed)

Parsed HTML using Python of a web page is different than the actual page

I need to get and store the PM2.5 and PM10 values from the table in https://app.cpcbccr.com/AQI_India/. I use BeautifulSoup4 to scrape the web page, but the parsed HTML I got is different from the actual page. For example, I get this
instead of this.
I wrote the required code to get the table rows and table data etc., but since my parsed HTML is missing rows of the table body, it couldn't find them, so now I only have this to see my parsed HTML:
from bs4 import BeautifulSoup
import requests
import pandas as pd
url = "https://app.cpcbccr.com/AQI_India/"
soup = BeautifulSoup(requests.get(url).text, 'html.parser')
with open("Desktop/soup.html", "a") as dumpfile:
dumpfile.write(str(soup))
How can I get all of the table? Thanks in advance.
Try the below code. I have implemented the data scraping script for https://app.cpcbccr.com/AQI_India/ using the API way. Using requests you can hit the API and it will send back the result which you have to convert in to JSON format.
import json
import requests
from urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
def scrap_air_quality_index():
payload = 'eyJzdGF0aW9uX2lkIjoic2l0ZV8zMDEiLCJkYXRlIjoiMjAyMC0wNy0yNFQ5OjAwOjAwWiJ9:'
session = requests.Session()
response = session.post('https://app.cpcbccr.com/aqi_dashboard/aqi_all_Parameters',data=payload,verify=False)
result = json.loads(response.text)
extracted_metrics = result['metrics']
print(extracted_metrics)
I have checked the API calls in the network section where i got the API url https://app.cpcbccr.com/aqi_dashboard/aqi_all_Parameters which i'm using for getting the data using an additional mandatory parameter which is a payload without this you will not be able to get the data. You can leverage script and add saving of data(refer screenshot ) to .csv or excel file.
Image of API URL
Image of json result of metrics.

How to find exact Class of a table in any website while doing web scraping?

import pandas as pd
import requests
from bs4 import BeautifulSoup
url = "https://knoema.com/atlas/topics/Tourism/Travel-and-Tourism-Total-Contribution-to-GDP/Contribution-of-travel-and-tourism-to-GDP"
page=requests.get(url)
soup=BeautifulSoup(page.content, 'html.parser')
elem=soup.find_all("table",class_="rank")
len(elem)
This is giving me length 0. Actually i want to import table from given website into pandas data frame but unable to findout exact Class of a table in given website link Link. Can Anyone use above code and fix it?
The website has a bunch of json outputs with the data. Just make a get request to https://knoema.com/api/1.0/data/WTTC2019?time=2000,2005,2010,2011,2012,2013,2014,2015,2016,2017,2018&country=1001700,1001680,1001870,1001740,1001110,1001920,1001130,1001120,1001750,1001770,1001580,1001650,1001610,1001720,1001950,1001910,1001760,1001730,1001670,1001640,1001690,1001630,1001710,1001620,1001800,1001660,1001590,1001930,1001980,1001150,1001790,1001820,1001840,1001940,1001890,1001960,1001970,1001780,1001810,1001900,1001160,1001600,1001850,1001140,1001880,1001830,1000360,1001240,1001260,1001430,1000290,1001390,1001440,1000260,1001400,1000340,1000370,1000490,1000300,1001410,1001270,1001420,1001220,1000020,1000050,1000100,1000130,1000190,1000200,1000210,1000090,1000860,1000110,1000160,1000830,1000900,1000880,1000890,1000780,1000850,1000220,1000040,1000150,1000180,1000450,1000590,1000660,1000670,1000310,1000600,1000640,1000740,1000680,1000690,1000520,1000530,1000620,1000750,1000550,1000560,1000700,1000710,1000510,1000610,1000720,1000570,1000630,1000380,1000650,1000420,1000430,1000330,1000470,1000400,1000390,1000280,1000410,1000320,1000440,1000350,1000580,1000480,1000730,1001180,1001190,1001200,1001290,1001320,1001360,1001370,1001300,1001330,1001520,1001350,1001310,1001230,1001250,1001280,1001530,1001500,1001550,1001540,1001560,1001490,1001510,1001460,1001470,1001480,1000950,1000800,1000770,1000810,1000820,1000920,1000840,1000910,1000940,1000790,1000870,1000930,1000540,1000460,1000970,1001090,1001020,1001040,1001050,1000990,1000980,1001080,1001000,1001070,1001010,1001030,1001060&variable=1000100&measure=1000020&frequencies=A.
That website also has a python package that you can check out for their api: https://pypi.org/project/knoema/

Why output from google video intelligence not in JSON format

I have been trying to use the google video intelligence API from https://cloud.google.com/video-intelligence/docs/libraries and I tried the exact same code. The response output was supposed to be in json format however the output was either a google.cloud.videointelligence_v1.types.AnnotateVideoResponse or something similar to that.
I have tried the code from many resources and recently from https://cloud.google.com/video-intelligence/docs/libraries but still no JSON output was given. What I got when I checked the type of output I got:
type(result)
google.cloud.videointelligence_v1.types.AnnotateVideoResponse
So, how do I get a JSON response from this?
If you specify an outputUri, the results will be stored in your GCS bucket in json format. https://cloud.google.com/video-intelligence/docs/reference/rest/v1/videos/annotate
It seems like you aren't storing the result in GCS. Instead you are getting the result via the GetOperation call, which has the result in AnnotateVideoResponse format.
I have found a solution for this. What I had to do was import this
from google.protobuf.json_format import MessageToJson
import json
and run
job = client.annotate_video(
input_uri='gs://xxxx.mp4',
features=['OBJECT_TRACKING'])
result = job.result()
serialized = MessageToJson(result)
a = json.loads(serialized)
type(a)
what I was doing was turn the results into a dictionary.
Or for more info, try going to this link: google forums thread

Python BeautifulSoup

I am using Python BeautifulSoup to extract some data from a famous song site.
Here is the snippet of code:
import requests
from bs4 import BeautifulSoup
url= 'https://gaana.com/playlist/gaana-dj-bollywood-top-50-1'
res = requests.get(url)
while(res.status_code!=200):
try:
res = requests.get('url')
except:
pass
print (res)
soup = BeautifulSoup(res.text,'lxml')
songs = soup.find_all('meta',{'property':'music:song'})
print (songs[0])
Here is the sample output:
<Response [200]>
<meta content="https://gaana.com/song/o-saathi" property="music:song"/>
Now i want to extract the url within content as string so that i can further use that url in my program.
Someone please Help me.
It's in the comments, but I just want to explain: beautifulsoup returns most results as a list or other iterable object. You show that you understand this in your code by using songs[0], but in this case what's been returned is a dictionary.
As explained in this StackOverflow post, you have need to query not only songs[0] but also the property within the dictionary (the two together are called a key pair and are the chief way to get data out of a dictionary).
Last note: while I've been a big fan of BeautifulSoup4 for basic web scraping, you may consider the lxml library. It's pretty well documented; to really take advantage of it you have to learn Python-variety Xpaths, which are sort of like regex for XML/HTML; but for advanced scraping it's probably the last best option short of Selenium, and it returns cleaner data than bs4.
Good luck!

Resources