python regex usage: how to start with , least match , get content in middle [duplicate]

python regex usage: how to start with , least match , get content in middle [duplicate] - python-3.x

I wrote some code to get data from a web API. I was able to parse the JSON data from the API, but the result I gets looks quite complex. Here is one example:
>>> my_json
{'name': 'ns1:timeSeriesResponseType', 'declaredType': 'org.cuahsi.waterml.TimeSeriesResponseType', 'scope': 'javax.xml.bind.JAXBElement$GlobalScope', 'value': {'queryInfo': {'creationTime': 1349724919000, 'queryURL': 'http://waterservices.usgs.gov/nwis/iv/', 'criteria': {'locationParam': '[ALL:103232434]', 'variableParam': '[00060, 00065]'}, 'note': [{'value': '[ALL:103232434]', 'title': 'filter:sites'}, {'value': '[mode=LATEST, modifiedSince=null]', 'title': 'filter:timeRange'}, {'value': 'sdas01', 'title': 'server'}]}}, 'nil': False, 'globalScope': True, 'typeSubstituted': False}
Looking through this data, I can see the specific data I want: the 1349724919000 value that is labelled as 'creationTime'.
How can I write code that directly gets this value?
I don't need any searching logic to find this value. I can see what I need when I look at the response; I just need to know how to translate that into specific code to extract the specific value, in a hard-coded way. I read some tutorials, so I understand that I need to use [] to access elements of the nested lists and dictionaries; but I can't figure out exactly how it works for a complex case.
More generally, how can I figure out what the "path" is to the data, and write the code for it?

For reference, let's see what the original JSON would look like, with pretty formatting:
>>> print(json.dumps(my_json, indent=4))
{
"name": "ns1:timeSeriesResponseType",
"declaredType": "org.cuahsi.waterml.TimeSeriesResponseType",
"scope": "javax.xml.bind.JAXBElement$GlobalScope",
"value": {
"queryInfo": {
"creationTime": 1349724919000,
"queryURL": "http://waterservices.usgs.gov/nwis/iv/",
"criteria": {
"locationParam": "[ALL:103232434]",
"variableParam": "[00060, 00065]"
},
"note": [
{
"value": "[ALL:103232434]",
"title": "filter:sites"
},
{
"value": "[mode=LATEST, modifiedSince=null]",
"title": "filter:timeRange"
},
{
"value": "sdas01",
"title": "server"
}
]
}
},
"nil": false,
"globalScope": true,
"typeSubstituted": false
}
That lets us see the structure of the data more clearly.
In the specific case, first we want to look at the corresponding value under the 'value' key in our parsed data. That is another dict; we can access the value of its 'queryInfo' key in the same way, and similarly the 'creationTime' from there.
To get the desired value, we simply put those accesses one after another:
my_json['value']['queryInfo']['creationTime'] # 1349724919000

I just need to know how to translate that into specific code to extract the specific value, in a hard-coded way.
If you access the API again, the new data might not match the code's expectation. You may find it useful to add some error handling. For example, use .get() to access dictionaries in the data, rather than indexing:
name = my_json.get('name') # will return None if 'name' doesn't exist
Another way is to test for a key explicitly:
if 'name' in resp_dict:
name = resp_dict['name']
else:
pass
However, these approaches may fail if further accesses are required. A placeholder result of None isn't a dictionary or a list, so attempts to access it that way will fail again (with TypeError). Since "Simple is better than complex" and "it's easier to ask for forgiveness than permission", the straightforward solution is to use exception handling:
try:
creation_time = my_json['value']['queryInfo']['creationTime']
except (TypeError, KeyError):
print("could not read the creation time!")
# or substitute a placeholder, or raise a new exception, etc.

Here is an example of loading a single value from simple JSON data, and converting back and forth to JSON:
import json
# load the data into an element
data={"test1": "1", "test2": "2", "test3": "3"}
# dumps the json object into an element
json_str = json.dumps(data)
# load the json to a string
resp = json.loads(json_str)
# print the resp
print(resp)
# extract an element in the response
print(resp['test1'])

Try this.
Here, I fetch only statecode from the COVID API (a JSON array).
import requests
r = requests.get('https://api.covid19india.org/data.json')
x = r.json()['statewise']
for i in x:
print(i['statecode'])

Try this:
from functools import reduce
import re
def deep_get_imps(data, key: str):
split_keys = re.split("[\\[\\]]", key)
out_data = data
for split_key in split_keys:
if split_key == "":
return out_data
elif isinstance(out_data, dict):
out_data = out_data.get(split_key)
elif isinstance(out_data, list):
try:
sub = int(split_key)
except ValueError:
return None
else:
length = len(out_data)
out_data = out_data[sub] if -length <= sub < length else None
else:
return None
return out_data
def deep_get(dictionary, keys):
return reduce(deep_get_imps, keys.split("."), dictionary)
Then you can use it like below:
res = {
"status": 200,
"info": {
"name": "Test",
"date": "2021-06-12"
},
"result": [{
"name": "test1",
"value": 2.5
}, {
"name": "test2",
"value": 1.9
},{
"name": "test1",
"value": 3.1
}]
}
>>> deep_get(res, "info")
{'name': 'Test', 'date': '2021-06-12'}
>>> deep_get(res, "info.date")
'2021-06-12'
>>> deep_get(res, "result")
[{'name': 'test1', 'value': 2.5}, {'name': 'test2', 'value': 1.9}, {'name': 'test1', 'value': 3.1}]
>>> deep_get(res, "result[2]")
{'name': 'test1', 'value': 3.1}
>>> deep_get(res, "result[-1]")
{'name': 'test1', 'value': 3.1}
>>> deep_get(res, "result[2].name")
'test1'

Related

Python lists and dictionaries indexing

In the AWS API documentation it wants me to call a function in the boto3 module like this:
response = client.put_metric_data(
Namespace='string',
MetricData=[
{
'MetricName': 'string',
'Dimensions': [
{
'Name': 'string',
'Value': 'string'
},
],
'Timestamp': datetime(2015, 1, 1),
'Value': 123.0,
'StatisticValues': {
'SampleCount': 123.0,
'Sum': 123.0,
'Minimum': 123.0,
'Maximum': 123.0
},
'Values': [
123.0,
],
'Counts': [
123.0,
],
'Unit': 'Seconds'|'Microseconds'|'Milliseconds'|'Bytes'|'Kilobytes'|'Megabytes'|'Gigabytes'|'Terabytes'|'Bits'|'Kilobits'|'Megabits'|'Gigabits'|'Terabits'|'Percent'|'Count'|'Bytes/Second'|'Kilobytes/Second'|'Megabytes/Second'|'Gigabytes/Second'|'Terabytes/Second'|'Bits/Second'|'Kilobits/Second'|'Megabits/Second'|'Gigabits/Second'|'Terabits/Second'|'Count/Second'|'None',
'StorageResolution': 123
},
]
)
So, I set a variable using the same format:
cw_metric = [
{
'MetricName': '',
'Dimensions': [
{
'Name': 'Protocol',
'Value': 'SSH'
},
],
'Timestamp': '',
'Value': 0,
'StatisticValues': {
'SampleCount': 1,
'Sum': 0,
'Minimum': 0,
'Maximum': 0
}
}
]
To my untrained eye, this looks simply like json and I am able to use json.dumps(cw_metric) to get a JSON formatted string output that looks, well, exactly the same.
But, apparently, in Python, when I use brackets I am creating a list and when I use curly brackets I am creating a dict. So what did I create above? A list of dicts or in the case of Dimensions a list of dicts with a list of dicts? Can someone help me to understand that?
And finally, now that I have created the cw_metric variable I want to update some of the values inside of it. I've tried several combinations. I want to do something like this:
cw_metric['StatisticValues']['SampleCount']=2
I am of course told that I can't use a str as an index on a list.
So, I try something like this:
cw_metric[4][0]=2
or
cw_metric[4]['SampleCount']=2
It all just ends up in errors.
I found that this works:
cw_metric[0]['StatisticValues']['SampleCount']=2
But, that just seems stupid. Is this the proper way to handle this?

cw_metric is a list of one dictionary. Thus, cw_metric[0] is that dictionary. cw_metric[0]['Dimensions'] is a list of one dictionary as well. cw_metric[0]['StatisticValues'] is just a dictionary. One of its elements is, for example, cw_metric[0]['StatisticValues']['SampleCount'] == 1.

Python: How to populate nested dict with attributes and values from parsed XML string?

I have a dict containing identifiers as keys, with an XML string as their respective values. I want to parse the attributes and values from the XML and automagically populate a dict with them, under their respective identifier keys.
import xml.etree.ElementTree as etree
employees = {
'employee_0': '<Person><Attribute name="name"><Value>Bill Johnson</Value></Attribute><Attribute name="city"><Value>New York</Value></Attribute><Attribute name="email"><Value>bill.johnson#email.com</Value></Attribute></Person>',
'employee_1': '<Person><Attribute name="name"><Value>Amanda Philips</Value></Attribute><Attribute name="city"><Value>Los Angeles</Value></Attribute><Attribute name="email"><Value>amanda.philips#email.com</Value></Attribute></Person>'
}
for identifier_key in employees:
xml = etree.fromstring(employees[identifier_key])
for key in xml:
key_str = key.attrib["name"]
for value in key:
value_str = value.text
employees[identifier_key][key_str] = value_str
I want the employees dict to result in this:
{
"employee_0": {
"name": "Bill Johnson",
"city": "New York",
"email": "bill.johnson#email.com"
},
"employee_1": {
"name": "Amanda Philips",
"city": "Los Angeles",
"email": "amanda.philips#email.com"
}
}
But in the code above, we get a TypeError: 'str' object does not support item assignment. My questions are:
Why do we get this error? It seems like this should be the proper way to populate the dict. If I instead use employees[identifier_key] = { key_str: value_str } it will overwrite the previous iteration. I have tried .update() too, without luck. How can this operation be accomplished?
How can the operation be accomplished in a nice and clean way, e.g. using dict comprehension? I'm having difficulty putting together the syntax for it.

Another method.
from simplified_scrapy import SimplifiedDoc
employees = {
'employee_0': '<Person><Attribute name="name"><Value>Bill Johnson</Value></Attribute><Attribute name="city"><Value>New York</Value></Attribute><Attribute name="email"><Value>bill.johnson#email.com</Value></Attribute></Person>',
'employee_1': '<Person><Attribute name="name"><Value>Amanda Philips</Value></Attribute><Attribute name="city"><Value>Los Angeles</Value></Attribute><Attribute name="email"><Value>amanda.philips#email.com</Value></Attribute></Person>'
}
for identifier_key in employees:
dic = {}
xml = SimplifiedDoc(employees[identifier_key])
for attr in xml.Attributes:
dic[attr['name']]=attr.text
employees[identifier_key]=dic
print (employees)
Result:
{'employee_0': {'name': 'Bill Johnson', 'city': 'New York', 'email': 'bill.johnson#email.com'}, 'employee_1': {'name': 'Amanda Philips', 'city': 'Los Angeles', 'email': 'amanda.philips#email.com'}}

How to insert another item programmatically into body?

I am trying to build a free/busy body request to Google Calendar API via Python 3.8 . However, when I try to insert a new item into the body request, I am getting a bad request and can't use it.
This code is working:
SUBJECTA = '3131313636#resource.calendar.google.com'
SUBJECTB = '34343334#resource.calendar.google.com'
body = {
"timeMin": now,
"timeMax": nownext,
"timeZone": 'America/New_York',
"items": [{'id': SUBJECTA},{"id": SUBJECTB} ]
}
Good Body result:
{'timeMin': '2019-11-05T11:42:21.354803Z',
'timeMax': '2019-11-05T12:42:21.354823Z',
'timeZone': 'America/New_York',
'items': [{'id': '131313636#resource.calendar.google.com'},
{'id': '343334#resource.calendar.google.com'}]}
However,
While using this code:
items = "{'ID': '1313636#resource.calendar.google.com'},{'ID': '3383137#resource.calendar.google.com'},{'ID': '383733#resource.calendar.google.com'}"
body = {
"timeMin": now,
"timeMax": nownext,
"timeZone": 'America/New_York',
"items": items
}
The Body results contain additional quotes at the start and end position, failing the request:
{'timeMin': '2019-11-05T12:04:41.189784Z',
'timeMax': '2019-11-05T13:04:41.189804Z',
'timeZone': 'America/New_York',
'items': ["{'ID': 13131313636#resource.calendar.google.com},{'ID':
53333383137#resource.calendar.google.com},{'ID':
831383733#resource.calendar.google.com},{'ID':
33339373237#resource.calendar.google.com},{'ID':
393935323035#resource.calendar.google.com}"]}
What is the proper way to handle it and send the item list in an accurate way?

In your situation, the value of items is given by the string of "{'ID': '1313636#resource.calendar.google.com'},{'ID': '3383137#resource.calendar.google.com'},{'ID': '383733#resource.calendar.google.com'}".
You want to use as the object by parsing the string value with python.
The result value you expect is [{'ID': '1313636#resource.calendar.google.com'}, {'ID': '3383137#resource.calendar.google.com'}, {'ID': '383733#resource.calendar.google.com'}].
You have already been able to use Calender API.
If my understanding is correct, how about this answer? Please think of this as just one of several answers.
Sample script:
import json # Added
items = "{'ID': '1313636#resource.calendar.google.com'},{'ID': '3383137#resource.calendar.google.com'},{'ID': '383733#resource.calendar.google.com'}"
items = json.loads(("[" + items + "]").replace("\'", "\"")) # Added
body = {
"timeMin": now,
"timeMax": nownext,
"timeZone": 'America/New_York',
"items": items
}
print(body)
Result:
If now and nownext are the values of "now" and "nownext", respectively, the result is as follows.
{
"timeMin": "now",
"timeMax": "nownext",
"timeZone": "America/New_York",
"items": [
{
"ID": "1313636#resource.calendar.google.com"
},
{
"ID": "3383137#resource.calendar.google.com"
},
{
"ID": "383733#resource.calendar.google.com"
}
]
}
Note:
If you can retrieve the IDs as the string value, I recommend the following method as a sample script.
ids = ['1313636#resource.calendar.google.com', '3383137#resource.calendar.google.com', '383733#resource.calendar.google.com']
items = [{'ID': id} for id in ids]
If I misunderstood your question and this was not the result you want, I apologize.

How can I avoid a forest of apostrophes?

Using Python 3.7, I have this confusing-looking, nested dictionary:
dict = \
{
'HBL_Posts':
{'vNames':[ 'id_no', 'display_msg_no', 'thread', 'headline', 'category', 'author',
'auth_addr', 'author_pic_line', 'postbody',
'last_msg_no', 'mf_lnk', 'subject_header' ],
'data_fname':'_Posts_plain.htm', 'tpl_fname':'_Posts_tpl.htm', 'addrs_fname':'_addrs.csv' },
'MOTM':
{'vNames':[ 'work_month', 'zoom', 'zoom_id', 'headline', 'description', 'subject_header' ],
'data_fname':'_Posts_plain.htm', 'tpl_fname':'_Posts_tpl.htm', 'addrs_fname':'_addrs.csv'},
'MOTM recording':
{'vNames':[ 'topic', 'description', 'wDate', 'box', 'chat'],
'data_fname':'_Recording_data.htm', 'tpl_fname':'_Recording_tpl.htm', 'addrs_fname':'_addrs.csv'},
'Enticement':
{'vNames':[ 'enticing_post', 'headline', 'hb_preface', 'postscript'],
'data_fname':'_Entice_data.htm', 'tpl_fname':'_Entice_tpl.htm', 'addrs_fname':'_entice.csv'}
}
If I initially set each variable to its own name, like: HBL_Posts = 'HBL_Posts', I can substitute this, much clearer and less typo-prone, code:
dict = \
{
HBL_Posts:
{vNames:[ id_no, display_msg_no, thread, headline, category, author,
auth_addr, author_pic_line, postbody,
last_msg_no, mf_lnk, subject_header ],
data_fname:_Posts_plain.htm, tpl_fname:_Posts_tpl.htm, addrs_fname:_addrs.csv },
MOTM:
{vNames:[ work_month, zoom, zoom_id, headline, description, subject_header ],
data_fname:_Posts_plain.htm, tpl_fname:_Posts_tpl.htm, addrs_fname:_addrs.csv},
MOTM recording:
{vNames:[ topic, description, wDate, box, chat],
data_fname:_Recording_data.htm, tpl_fname:_Recording_tpl.htm, addrs_fname:_addrs.csv},
Enticement:
{vNames:[ enticing_post, headline, hb_preface, postscript],
data_fname:_Entice_data.htm, tpl_fname:_Entice_tpl.htm, addrs_fname:_entice.csv}
}
In fact I accomplished this by just doing all the required assignments, one at a time. But that is about as complicated as the original dictionary set up, with the apostrophes. What I'd like is a function that would enable me to do this neatly and economically.
def self_name(s):
[?????]
Then I could have a list of all the variables, vars_lst, and loop through it setting each to the literal version of itself:
for item in vars_lst:
item = self_name(item)
To avoid having to use apostrophes in setting up vars_lst, I would accept doing:
HBL_Posts = vNames = id_no = . . . = ''
After many, many hours of struggle, I have been unable to supply the needed code for the self_name function. How can I do that, or how can I find another way of avoiding so many apostrophes?

Indent it like JSON:
{
"HBL_Posts": {
"vNames": [
"id_no",
"display_msg_no",
"thread",
"headline",
"category",
"author",
"auth_addr",
"author_pic_line",
"postbody",
"last_msg_no",
"mf_lnk",
"subject_header"
],
"data_fname": "_Posts_plain.htm",
"tpl_fname": "_Posts_tpl.htm",
"addrs_fname": "_addrs.csv"
},
"MOTM": {
"vNames": [
"work_month",
"zoom",
"zoom_id",
"headline",
"description",
"subject_header"
],
"data_fname": "_Posts_plain.htm",
"tpl_fname": "_Posts_tpl.htm",
"addrs_fname": "_addrs.csv"
},
"MOTM recording": {
"vNames": [
"topic",
"description",
"wDate",
"box",
"chat"
],
"data_fname": "_Recording_data.htm",
"tpl_fname": "_Recording_tpl.htm",
"addrs_fname": "_addrs.csv"
},
"Enticement": {
"vNames": [
"enticing_post",
"headline",
"hb_preface",
"postscript"
],
"data_fname": "_Entice_data.htm",
"tpl_fname": "_Entice_tpl.htm",
"addrs_fname": "_entice.csv"
}
}
or even store that in a .json file and load it via:
import json
with open('my_file.json', 'r') as f:
my_dict = json.load(f)
JSON is easy for most people to read and the indentation is easy to see. Plus it is easy to save and read from a file so you don't have to clutter your code.
FYI:
You can pretty print a dictionary using:
import json
my_dict = ...
print(json.dumps(my_dict, indent=4))
which is how I printed your dictionary.

Need help to format a python dictionary string

I am unable to convert a file that I downloaded to a dictionary object so that I can access each element. I think the quotations are missing for the keys which prevent me from using json_loads() etc. Could you please help me with some solution. I have given the results of the download below. I need to format it.
{
success: true,
results: 2,
rows: [{
Symbol: "LITL",
CompanyName: "LancoInfratechLimited",
ISIN: "INE785C01048",
Ind: "-",
Purpose: "Results",
BoardMeetingDate: "26-Sep-2017",
DisplayDate: "19-Sep-2017",
seqId: "102121067",
Details: "toconsiderandapprovetheUn-AuditedFinancialResultsoftheCompanyonstandalonebasisfortheQuarterendedJune30,2017."
}, {
Symbol: "PETRONENGG",
CompanyName: "PetronEngineeringConstructionLimited",
ISIN: "INE742A01019",
Ind: "-",
Purpose: "Results",
BoardMeetingDate: "28-Sep-2017",
DisplayDate: "21-Sep-2017",
seqId: "102128225",
Details: "Toconsiderandapprove,interalia,theUnauditedFinancialResultsoftheCompanyforthequarterendedonJune30,2017."
}]
}

Here is one way to do it if you have a string of the dict. It is a little hacky but should work well.
import json
import re
regex_string = '(\w{1,}(?=:))'
regex = re.compile(regex_string, re.MULTILINE)
string = open('test_string', 'r').read() # I had the string in a file, but how
# just put the value here based on how you already had it stored.
string = regex.sub(r'"\1"', string)
python_object = json.loads(string)
# Now you can access the python_object just like any normal python dict.
print python_object["results"]
Here is the dict after it has been put through the regex. Now you can read it in with json
{
"success": true,
"results": 2,
"rows": [{
"Symbol": "LITL",
"CompanyName": "LancoInfratechLimited",
"ISIN": "INE785C01048",
"Ind": "-",
"Purpose": "Results",
"BoardMeetingDate": "26-Sep-2017",
"DisplayDate": "19-Sep-2017",
"seqId": "102121067",
"Details": "toconsiderandapprovetheUn-AuditedFinancialResultsoftheCompanyonstandalonebasisfortheQuarterendedJune30,2017."
}, {
"Symbol": "PETRONENGG",
"CompanyName": "PetronEngineeringConstructionLimited",
"ISIN": "INE742A01019",
"Ind": "-",
"Purpose": "Results",
"BoardMeetingDate": "28-Sep-2017",
"DisplayDate": "21-Sep-2017",
"seqId": "102128225",
"Details": "Toconsiderandapprove,interalia,theUnauditedFinancialResultsoftheCompanyforthequarterendedonJune30,2017."
}]
}

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

python regex usage: how to start with , least match , get content in middle [duplicate] - python-3.x

Try this. Here, I fetch only statecode from the COVID API (a JSON array). import requests r = requests.get('https://api.covid19india.org/data.json') x = r.json()['statewise'] for i in x: print(i['statecode'])

Related

Python lists and dictionaries indexing

Python: How to populate nested dict with attributes and values from parsed XML string?

How to insert another item programmatically into body?

How can I avoid a forest of apostrophes?

Need help to format a python dictionary string

Categories

Resources