Filtering a list of nested dictionary - python-3.x

I am receiving the following response in a list of nested dictionaries format:
list_of_dicts = [{
'id': '11593636317',
'properties': {
'created_date': '2021-09-28T16:16:31.635Z',
'modified_date': '2021-09-28T16:16:31.635Z',
'note': 'Test Note 123',
'id': '11593636317'},
'created_date': '2021-09-28T16:16:31.635Z',
'updated_date': '2021-09-28T16:16:31.635Z',
'archived': False
},
{
'id': '11593636318',
'properties': {
'created_date': '2021-09-28T16:16:31.635Z',
'modified_date': '2021-09-28T16:16:31.635Z',
'note': 'Ticket Note',
'id': '11593636318'},
'created_date': '2021-09-28T16:16:31.635Z',
'updated_date': '2021-09-28T16:16:31.635Z',
'archived': False
}
]
However, I don't need all of the records for a specific action. For that, I am trying to filter all records which note fields starts with the word Ticket.
For that I tried:
filtered_notes = []
for note in list_of_dicts:
if note['properties']['note'].startswith('Ticket'):
filtered_notes.append(note['id'])
Unfortunately, I am running into the following error and I have no clue how to get around it:
AttributeError: 'NoneType' object has no attribute 'startswith'

You can do:
filtered_notes = []
for note in list_of_dicts:
try:
if note['properties']['note'].startswith('Ticket'):
filtered_notes.append(note['id'])
except (KeyError, AttributeError):
pass
The try/except block will protect you in case some of the needed keys are missing or the note property values has an unexpected type.

Related

Python lists and dictionaries indexing

In the AWS API documentation it wants me to call a function in the boto3 module like this:
response = client.put_metric_data(
Namespace='string',
MetricData=[
{
'MetricName': 'string',
'Dimensions': [
{
'Name': 'string',
'Value': 'string'
},
],
'Timestamp': datetime(2015, 1, 1),
'Value': 123.0,
'StatisticValues': {
'SampleCount': 123.0,
'Sum': 123.0,
'Minimum': 123.0,
'Maximum': 123.0
},
'Values': [
123.0,
],
'Counts': [
123.0,
],
'Unit': 'Seconds'|'Microseconds'|'Milliseconds'|'Bytes'|'Kilobytes'|'Megabytes'|'Gigabytes'|'Terabytes'|'Bits'|'Kilobits'|'Megabits'|'Gigabits'|'Terabits'|'Percent'|'Count'|'Bytes/Second'|'Kilobytes/Second'|'Megabytes/Second'|'Gigabytes/Second'|'Terabytes/Second'|'Bits/Second'|'Kilobits/Second'|'Megabits/Second'|'Gigabits/Second'|'Terabits/Second'|'Count/Second'|'None',
'StorageResolution': 123
},
]
)
So, I set a variable using the same format:
cw_metric = [
{
'MetricName': '',
'Dimensions': [
{
'Name': 'Protocol',
'Value': 'SSH'
},
],
'Timestamp': '',
'Value': 0,
'StatisticValues': {
'SampleCount': 1,
'Sum': 0,
'Minimum': 0,
'Maximum': 0
}
}
]
To my untrained eye, this looks simply like json and I am able to use json.dumps(cw_metric) to get a JSON formatted string output that looks, well, exactly the same.
But, apparently, in Python, when I use brackets I am creating a list and when I use curly brackets I am creating a dict. So what did I create above? A list of dicts or in the case of Dimensions a list of dicts with a list of dicts? Can someone help me to understand that?
And finally, now that I have created the cw_metric variable I want to update some of the values inside of it. I've tried several combinations. I want to do something like this:
cw_metric['StatisticValues']['SampleCount']=2
I am of course told that I can't use a str as an index on a list.
So, I try something like this:
cw_metric[4][0]=2
or
cw_metric[4]['SampleCount']=2
It all just ends up in errors.
I found that this works:
cw_metric[0]['StatisticValues']['SampleCount']=2
But, that just seems stupid. Is this the proper way to handle this?
cw_metric is a list of one dictionary. Thus, cw_metric[0] is that dictionary. cw_metric[0]['Dimensions'] is a list of one dictionary as well. cw_metric[0]['StatisticValues'] is just a dictionary. One of its elements is, for example, cw_metric[0]['StatisticValues']['SampleCount'] == 1.

Restructure TSV to list of list of dicts

A simplified look at my data right at parse:
[
{'id':'group1'},
{'id':'member1', 'parentId':'group1', 'size':51},
{'id':'member2', 'parentId':'group1', 'size':16},
{'id':'group2'},
{'id':'member1', 'parentId':'group2', 'size':21},
...
]
The desired output should be like this:
data =
[
[
{'id':'group1'},
{'id':'member1', 'parentId':'group1', 'size':51},
{'id':'member2', 'parentId':'group1', 'size':16}
],
[
{'id':'group2'},
{'id':'member1', 'parentId':'group2', 'size':21},
]
]
The issue is that it's very challenging to iterate through this kind of data structure because each list contains a different length of possible objects: some might have 10 some might have 3, making it unclear when to begin and end each list. And it's also not uniform. Note some have only 'id' entries and no 'parentId' or 'size' entries.
master_data = []
for i in range(len(tsv_data)):
temp = {}
for j in range(?????):
???
How can Python handle arranging vanilla .tsv data into a list of lists as seen above?
I thought one appropriate direction to take the code was to see if I could tally something simple, before tackling the whole data set. So I attempted to compute a count of all occurences of group1, based off this discussion:
group_counts = {}
for member in data:
group = member.get('group1')
try:
group_counts[group] += 1
except KeyError:
group_counts[group] = 1
However, this returned:
'list' object has no attribute 'get'
Which leads me to believe that counting text occurences may not be the solution afterall.
You could fetch all groups to create the new datastructure afterwards add all the items:
data = [
{
'id': 'group1'
}, {
'id': 'member1',
'parentId': 'group1',
'size': 51
}, {
'id': 'member2',
'parentId': 'group1',
'size': 16
}, {
'id': 'group2'
}, {
'id': 'member1',
'parentId': 'group2',
'size': 21
}, {
'id': 'member3',
'parentId': 'group1',
'size': 16
}
]
result = {} # Use a dict for easier grouping.
lastGrpId = 0
# extract all groups
for dct in data:
if 'group' in dct['id']:
result[dct['id']] = [dct]
# extract all items and add to groups
for dct in data:
if 'parentId' in dct:
result[dct['parentId']].append(dct)
nestedListResult = [v for k, v in result.items()]
Out:
[
[
{
'id': 'group1'
}, {
'id': 'member1',
'parentId': 'group1',
'size': 51
}, {
'id': 'member2',
'parentId': 'group1',
'size': 16
}, {
'id': 'member3',
'parentId': 'group1',
'size': 16
}
], [{
'id': 'group2'
}, {
'id': 'member1',
'parentId': 'group2',
'size': 21
}]
]

Adding multiple filters in boto3

Hi I have a requirement to fetch ec2 instance details with tags as follows
prod = monitor
test = monitor
The objective is to list instances with these tags only . I was able to add one filter but not sure how to use multiple filters in ec2.instances.filter(Filters
from collections import defaultdict
import boto3
# Connect to EC2
ec2 = boto3.resource('ec2')
# Get information for all running instances
running_instances = ec2.instances.filter(Filters=[{
'Name': 'instance-state-name',
'Values': ['running'] ,
'Name': 'tag:prod',
'Values': ['monitor']}])
ec2info = defaultdict()
for instance in running_instances:
for tag in instance.tags:
if 'Name'in tag['Key']:
name = tag['Value']
# Add instance info to a dictionary
ec2info[instance.id] = {
'Name': name,
'Type': instance.instance_type,
'State': instance.state['Name'],
'Private IP': instance.private_ip_address,
'Public IP': instance.public_ip_address,
'Launch Time': instance.launch_time
}
attributes = ['Name', 'Type', 'State', 'Private IP', 'Public IP', 'Launch Time']
for instance_id, instance in ec2info.items():
for key in attributes:
print("{0}: {1}".format(key, instance[key]))
print("------")
Your syntax does not quite seem correct. You should be supplying a list of dictionaries. You should be able to duplicate tags, too:
Filters=[
{'Name': 'instance-state-name', 'Values': ['running']},
{'Name': 'tag:prod', 'Values': ['monitor']},
{'Name': 'tag:test', 'Values': ['monitor']},
]
This should return instances with both of those tags.
If you are wanting instances with either of the tags, then I don't think you can filter it in a single call. Instead, use ec2.instances.all(), then loop through the returned instances using Python code and apply your logic.
Try this;
for example;
response = ce.get_cost_and_usage(
Granularity='MONTHLY',
TimePeriod={
'Start': start_date,
'End': end_date
},
GroupBy=[
{
'Type': 'DIMENSION',
'Key': 'SERVICE'
},
],
Filter=
{
"Dimensions": { "Key": "LINKED_ACCOUNT", "Values": [awslinkedaccount[0]] },
"Dimensions": { "Key": "RECORD_TYPE", "Values": ["Usage"] },
},
Metrics=[
'BLENDED_COST',
],
)
print(response)

Python dictionary, get values by key name

I have nested dictionary , trying to iterate over it and get the values by key,
I have a payload which has route as main node, inside route i have many waypoints, i would like to iterate over all way points and sets the value based on key name into a protobuff variable.
sample code below:
'payload':
{
'route':
{
'name': 'Argo',
'navigation_type': 2,
'backtracking': False,
'continuous': False,
'waypoints':
{
'id': 2,
'coordinate':
{
'type': 0,
'x': 51.435989,
'y': 25.32838,
'z': 0
},
'velocity': 0.55555582,
'constrained': True,
'action':
{
'type': 1,
'duration': 0
}
}
'waypoints':
{
'id': 2,
'coordinate':
{
'type': 0,
'x': 51.435989,
'y': 25.32838,
'z': 0
},
'velocity': 0.55555582,
'constrained': True,
'action':
{
'type': 1,
'duration': 0
}
}
},
'waypoint_status_list':
{
'id': 1,
'status': 'executing'
},
'autonomy_status': 3
},
#method to iterate over payload
def get_encoded_payload(self, payload):
#1 fill route proto from payload
a = payload["route"]["name"] #working fine
b = payload["route"]["navigation_type"] #working fine
c = payload["route"]["backtracking"] #working fine
d = payload["route"]["continuous"] #working fine
self.logger.debug(type(payload["route"]["waypoints"])) # type is dict
#iterate over waypoints
for waypoint in payload["route"]["waypoints"]:
wp_id = waypoint["id"] # Error, string indices must be integer
i would like to iterate over all waypoints and set the value of each key value to a variable
self.logger.debug(type(payload["route"]["waypoints"])) # type is dict
Iterating over a dict gives you its keys. Your later code seems to be expecting multiple waypoints as a list of dicts, which would work, but that's not what your structure actually contains.
Try print(waypoint) and see what you get.

python regex usage: how to start with , least match , get content in middle [duplicate]

I wrote some code to get data from a web API. I was able to parse the JSON data from the API, but the result I gets looks quite complex. Here is one example:
>>> my_json
{'name': 'ns1:timeSeriesResponseType', 'declaredType': 'org.cuahsi.waterml.TimeSeriesResponseType', 'scope': 'javax.xml.bind.JAXBElement$GlobalScope', 'value': {'queryInfo': {'creationTime': 1349724919000, 'queryURL': 'http://waterservices.usgs.gov/nwis/iv/', 'criteria': {'locationParam': '[ALL:103232434]', 'variableParam': '[00060, 00065]'}, 'note': [{'value': '[ALL:103232434]', 'title': 'filter:sites'}, {'value': '[mode=LATEST, modifiedSince=null]', 'title': 'filter:timeRange'}, {'value': 'sdas01', 'title': 'server'}]}}, 'nil': False, 'globalScope': True, 'typeSubstituted': False}
Looking through this data, I can see the specific data I want: the 1349724919000 value that is labelled as 'creationTime'.
How can I write code that directly gets this value?
I don't need any searching logic to find this value. I can see what I need when I look at the response; I just need to know how to translate that into specific code to extract the specific value, in a hard-coded way. I read some tutorials, so I understand that I need to use [] to access elements of the nested lists and dictionaries; but I can't figure out exactly how it works for a complex case.
More generally, how can I figure out what the "path" is to the data, and write the code for it?
For reference, let's see what the original JSON would look like, with pretty formatting:
>>> print(json.dumps(my_json, indent=4))
{
"name": "ns1:timeSeriesResponseType",
"declaredType": "org.cuahsi.waterml.TimeSeriesResponseType",
"scope": "javax.xml.bind.JAXBElement$GlobalScope",
"value": {
"queryInfo": {
"creationTime": 1349724919000,
"queryURL": "http://waterservices.usgs.gov/nwis/iv/",
"criteria": {
"locationParam": "[ALL:103232434]",
"variableParam": "[00060, 00065]"
},
"note": [
{
"value": "[ALL:103232434]",
"title": "filter:sites"
},
{
"value": "[mode=LATEST, modifiedSince=null]",
"title": "filter:timeRange"
},
{
"value": "sdas01",
"title": "server"
}
]
}
},
"nil": false,
"globalScope": true,
"typeSubstituted": false
}
That lets us see the structure of the data more clearly.
In the specific case, first we want to look at the corresponding value under the 'value' key in our parsed data. That is another dict; we can access the value of its 'queryInfo' key in the same way, and similarly the 'creationTime' from there.
To get the desired value, we simply put those accesses one after another:
my_json['value']['queryInfo']['creationTime'] # 1349724919000
I just need to know how to translate that into specific code to extract the specific value, in a hard-coded way.
If you access the API again, the new data might not match the code's expectation. You may find it useful to add some error handling. For example, use .get() to access dictionaries in the data, rather than indexing:
name = my_json.get('name') # will return None if 'name' doesn't exist
Another way is to test for a key explicitly:
if 'name' in resp_dict:
name = resp_dict['name']
else:
pass
However, these approaches may fail if further accesses are required. A placeholder result of None isn't a dictionary or a list, so attempts to access it that way will fail again (with TypeError). Since "Simple is better than complex" and "it's easier to ask for forgiveness than permission", the straightforward solution is to use exception handling:
try:
creation_time = my_json['value']['queryInfo']['creationTime']
except (TypeError, KeyError):
print("could not read the creation time!")
# or substitute a placeholder, or raise a new exception, etc.
Here is an example of loading a single value from simple JSON data, and converting back and forth to JSON:
import json
# load the data into an element
data={"test1": "1", "test2": "2", "test3": "3"}
# dumps the json object into an element
json_str = json.dumps(data)
# load the json to a string
resp = json.loads(json_str)
# print the resp
print(resp)
# extract an element in the response
print(resp['test1'])
Try this.
Here, I fetch only statecode from the COVID API (a JSON array).
import requests
r = requests.get('https://api.covid19india.org/data.json')
x = r.json()['statewise']
for i in x:
print(i['statecode'])
Try this:
from functools import reduce
import re
def deep_get_imps(data, key: str):
split_keys = re.split("[\\[\\]]", key)
out_data = data
for split_key in split_keys:
if split_key == "":
return out_data
elif isinstance(out_data, dict):
out_data = out_data.get(split_key)
elif isinstance(out_data, list):
try:
sub = int(split_key)
except ValueError:
return None
else:
length = len(out_data)
out_data = out_data[sub] if -length <= sub < length else None
else:
return None
return out_data
def deep_get(dictionary, keys):
return reduce(deep_get_imps, keys.split("."), dictionary)
Then you can use it like below:
res = {
"status": 200,
"info": {
"name": "Test",
"date": "2021-06-12"
},
"result": [{
"name": "test1",
"value": 2.5
}, {
"name": "test2",
"value": 1.9
},{
"name": "test1",
"value": 3.1
}]
}
>>> deep_get(res, "info")
{'name': 'Test', 'date': '2021-06-12'}
>>> deep_get(res, "info.date")
'2021-06-12'
>>> deep_get(res, "result")
[{'name': 'test1', 'value': 2.5}, {'name': 'test2', 'value': 1.9}, {'name': 'test1', 'value': 3.1}]
>>> deep_get(res, "result[2]")
{'name': 'test1', 'value': 3.1}
>>> deep_get(res, "result[-1]")
{'name': 'test1', 'value': 3.1}
>>> deep_get(res, "result[2].name")
'test1'

Resources