Restructure TSV to list of list of dicts - python-3.x

A simplified look at my data right at parse:
[
{'id':'group1'},
{'id':'member1', 'parentId':'group1', 'size':51},
{'id':'member2', 'parentId':'group1', 'size':16},
{'id':'group2'},
{'id':'member1', 'parentId':'group2', 'size':21},
...
]
The desired output should be like this:
data =
[
[
{'id':'group1'},
{'id':'member1', 'parentId':'group1', 'size':51},
{'id':'member2', 'parentId':'group1', 'size':16}
],
[
{'id':'group2'},
{'id':'member1', 'parentId':'group2', 'size':21},
]
]
The issue is that it's very challenging to iterate through this kind of data structure because each list contains a different length of possible objects: some might have 10 some might have 3, making it unclear when to begin and end each list. And it's also not uniform. Note some have only 'id' entries and no 'parentId' or 'size' entries.
master_data = []
for i in range(len(tsv_data)):
temp = {}
for j in range(?????):
???
How can Python handle arranging vanilla .tsv data into a list of lists as seen above?
I thought one appropriate direction to take the code was to see if I could tally something simple, before tackling the whole data set. So I attempted to compute a count of all occurences of group1, based off this discussion:
group_counts = {}
for member in data:
group = member.get('group1')
try:
group_counts[group] += 1
except KeyError:
group_counts[group] = 1
However, this returned:
'list' object has no attribute 'get'
Which leads me to believe that counting text occurences may not be the solution afterall.

You could fetch all groups to create the new datastructure afterwards add all the items:
data = [
{
'id': 'group1'
}, {
'id': 'member1',
'parentId': 'group1',
'size': 51
}, {
'id': 'member2',
'parentId': 'group1',
'size': 16
}, {
'id': 'group2'
}, {
'id': 'member1',
'parentId': 'group2',
'size': 21
}, {
'id': 'member3',
'parentId': 'group1',
'size': 16
}
]
result = {} # Use a dict for easier grouping.
lastGrpId = 0
# extract all groups
for dct in data:
if 'group' in dct['id']:
result[dct['id']] = [dct]
# extract all items and add to groups
for dct in data:
if 'parentId' in dct:
result[dct['parentId']].append(dct)
nestedListResult = [v for k, v in result.items()]
Out:
[
[
{
'id': 'group1'
}, {
'id': 'member1',
'parentId': 'group1',
'size': 51
}, {
'id': 'member2',
'parentId': 'group1',
'size': 16
}, {
'id': 'member3',
'parentId': 'group1',
'size': 16
}
], [{
'id': 'group2'
}, {
'id': 'member1',
'parentId': 'group2',
'size': 21
}]
]

Related

Python lists and dictionaries indexing

In the AWS API documentation it wants me to call a function in the boto3 module like this:
response = client.put_metric_data(
Namespace='string',
MetricData=[
{
'MetricName': 'string',
'Dimensions': [
{
'Name': 'string',
'Value': 'string'
},
],
'Timestamp': datetime(2015, 1, 1),
'Value': 123.0,
'StatisticValues': {
'SampleCount': 123.0,
'Sum': 123.0,
'Minimum': 123.0,
'Maximum': 123.0
},
'Values': [
123.0,
],
'Counts': [
123.0,
],
'Unit': 'Seconds'|'Microseconds'|'Milliseconds'|'Bytes'|'Kilobytes'|'Megabytes'|'Gigabytes'|'Terabytes'|'Bits'|'Kilobits'|'Megabits'|'Gigabits'|'Terabits'|'Percent'|'Count'|'Bytes/Second'|'Kilobytes/Second'|'Megabytes/Second'|'Gigabytes/Second'|'Terabytes/Second'|'Bits/Second'|'Kilobits/Second'|'Megabits/Second'|'Gigabits/Second'|'Terabits/Second'|'Count/Second'|'None',
'StorageResolution': 123
},
]
)
So, I set a variable using the same format:
cw_metric = [
{
'MetricName': '',
'Dimensions': [
{
'Name': 'Protocol',
'Value': 'SSH'
},
],
'Timestamp': '',
'Value': 0,
'StatisticValues': {
'SampleCount': 1,
'Sum': 0,
'Minimum': 0,
'Maximum': 0
}
}
]
To my untrained eye, this looks simply like json and I am able to use json.dumps(cw_metric) to get a JSON formatted string output that looks, well, exactly the same.
But, apparently, in Python, when I use brackets I am creating a list and when I use curly brackets I am creating a dict. So what did I create above? A list of dicts or in the case of Dimensions a list of dicts with a list of dicts? Can someone help me to understand that?
And finally, now that I have created the cw_metric variable I want to update some of the values inside of it. I've tried several combinations. I want to do something like this:
cw_metric['StatisticValues']['SampleCount']=2
I am of course told that I can't use a str as an index on a list.
So, I try something like this:
cw_metric[4][0]=2
or
cw_metric[4]['SampleCount']=2
It all just ends up in errors.
I found that this works:
cw_metric[0]['StatisticValues']['SampleCount']=2
But, that just seems stupid. Is this the proper way to handle this?
cw_metric is a list of one dictionary. Thus, cw_metric[0] is that dictionary. cw_metric[0]['Dimensions'] is a list of one dictionary as well. cw_metric[0]['StatisticValues'] is just a dictionary. One of its elements is, for example, cw_metric[0]['StatisticValues']['SampleCount'] == 1.

How to sort descending order with an Object result in NodeJs

I am using the below function to get number of duplicated values in an array.But i want to get this result sorted descending order with respect to the values.
function countRequirementIds() {
const counts = {};
const sampleArray = RIDS;
sampleArray.forEach(function(x) { counts[x] = (counts[x] || 0) + 1; });
console.log(typeof counts); //object
return counts
}
Output:
{
"1": 4,
"2": 5,
"4": 1,
"13": 4
}
required output:
{
"2": 5,
"1": 4,
"13": 4,
"4": 1,
}
Javascript object keys are unordered as explained here: Does JavaScript guarantee object property order?
So sorting objects by keys is impossible. However if order is of a matter for you I would suggest using array of tuples:
const arrayOfTuples = [
[ "1", 4],
[ "2", 5],
[ "4", 1],
[ "13", 4],
]
arrayOfTuples.sort((a,b) => b[1] - a[1]);
console.log(arrayOfTuples);
// => [ [ '2', 5 ], [ '1', 4 ], [ '13', 4 ], [ '4', 1 ] ]
The sort command. https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/sort Arrays of objects can be sorted by comparing the value of one of their properties.

Filtering a list of nested dictionary

I am receiving the following response in a list of nested dictionaries format:
list_of_dicts = [{
'id': '11593636317',
'properties': {
'created_date': '2021-09-28T16:16:31.635Z',
'modified_date': '2021-09-28T16:16:31.635Z',
'note': 'Test Note 123',
'id': '11593636317'},
'created_date': '2021-09-28T16:16:31.635Z',
'updated_date': '2021-09-28T16:16:31.635Z',
'archived': False
},
{
'id': '11593636318',
'properties': {
'created_date': '2021-09-28T16:16:31.635Z',
'modified_date': '2021-09-28T16:16:31.635Z',
'note': 'Ticket Note',
'id': '11593636318'},
'created_date': '2021-09-28T16:16:31.635Z',
'updated_date': '2021-09-28T16:16:31.635Z',
'archived': False
}
]
However, I don't need all of the records for a specific action. For that, I am trying to filter all records which note fields starts with the word Ticket.
For that I tried:
filtered_notes = []
for note in list_of_dicts:
if note['properties']['note'].startswith('Ticket'):
filtered_notes.append(note['id'])
Unfortunately, I am running into the following error and I have no clue how to get around it:
AttributeError: 'NoneType' object has no attribute 'startswith'
You can do:
filtered_notes = []
for note in list_of_dicts:
try:
if note['properties']['note'].startswith('Ticket'):
filtered_notes.append(note['id'])
except (KeyError, AttributeError):
pass
The try/except block will protect you in case some of the needed keys are missing or the note property values has an unexpected type.

group dictionaries and get count

I have a list of dictionaries like this:
list1 = [{'name': 'maik','is_payed': 1, 'brand': 'HP', 'count': 1, 'items': [{'device': 'mouse', 'count': 110}]},{'name': 'milanie','is_payed': 0, 'brand': 'dell', 'count':10, 'items': [{'device': 'bales', 'count': 200}]}]
list2 = [{'name': 'maik','is_payed': 0, 'brand': 'HP', 'count': 20, 'items': [{'device': 'mouse', 'count': 1}]},{'name': 'nikola','is_payed': 1, 'brand': 'toshiba', 'count':10, 'items': [{'device': 'hard', 'count': 20}]}]
my_list= list1 + list2
count = pd.DataFrame(my_list).groupby(['name', 'is_payed'])
final_list_ = []
for commande, group in count:
print(commande)
records = group.to_dict("records")
final_list_.append({"name": commande[0],
"payed": commande[1],
"occurrence": len(group),
"items": pd.DataFrame(records).groupby('device').agg(
occurrence=('device', 'count')).reset_index().to_dict('records')})
I don't know how can I get it like this:
the 'payed' field is like this payed/total_commands
for example lets take maik he has two commands one is payed and the other one is not, so the final result will be like this:
{'name': 'maik','payed': 1/2, 'brand': 'HP', 'count': 21, 'items': [{'device': 'mouse', 'count': 111}]}
Since you just want to group by "name" and are only interested in the "played" values, let's concentrate on that and ignore the other data.
So for our purposes, your starting data looks like:
my_list = [
{'name': 'maik', 'is_payed': 1},
{'name': 'milanie', 'is_payed': 0},
{'name': 'maik', 'is_payed': 0},
{'name': 'nikola', 'is_payed': 1}
]
Now let's take a first pass over this data and count up the number of times we see a name and the number of times that name corresponds to an "is_payed" flag
results = {}
for item in my_list:
key = item["name"]
results.setdefault(key, {"sum": 0, "count": 0})
results[key]["count"] += 1
results[key]["is_payed"] += item["is_payed"]
At this point we have a dictionary that will look like:
{
'maik': {'is_payed': 1, 'count': 2},
'milanie': {'is_payed': 0, 'count': 1},
'nikola': {'is_payed': 1, 'count': 1}
}
Now we will take a pass over this dictionary and create our true final result:
results = [
{"name": key, "payed": f"{value['is_payed']}/{value['count']}"}
for key, value in results.items()
]
Giving us:
[
{'name': 'maik', 'payed': '1/2'},
{'name': 'milanie', 'payed': '0/1'},
{'name': 'nikola', 'payed': '1/1'}
]

Python dictionary, get values by key name

I have nested dictionary , trying to iterate over it and get the values by key,
I have a payload which has route as main node, inside route i have many waypoints, i would like to iterate over all way points and sets the value based on key name into a protobuff variable.
sample code below:
'payload':
{
'route':
{
'name': 'Argo',
'navigation_type': 2,
'backtracking': False,
'continuous': False,
'waypoints':
{
'id': 2,
'coordinate':
{
'type': 0,
'x': 51.435989,
'y': 25.32838,
'z': 0
},
'velocity': 0.55555582,
'constrained': True,
'action':
{
'type': 1,
'duration': 0
}
}
'waypoints':
{
'id': 2,
'coordinate':
{
'type': 0,
'x': 51.435989,
'y': 25.32838,
'z': 0
},
'velocity': 0.55555582,
'constrained': True,
'action':
{
'type': 1,
'duration': 0
}
}
},
'waypoint_status_list':
{
'id': 1,
'status': 'executing'
},
'autonomy_status': 3
},
#method to iterate over payload
def get_encoded_payload(self, payload):
#1 fill route proto from payload
a = payload["route"]["name"] #working fine
b = payload["route"]["navigation_type"] #working fine
c = payload["route"]["backtracking"] #working fine
d = payload["route"]["continuous"] #working fine
self.logger.debug(type(payload["route"]["waypoints"])) # type is dict
#iterate over waypoints
for waypoint in payload["route"]["waypoints"]:
wp_id = waypoint["id"] # Error, string indices must be integer
i would like to iterate over all waypoints and set the value of each key value to a variable
self.logger.debug(type(payload["route"]["waypoints"])) # type is dict
Iterating over a dict gives you its keys. Your later code seems to be expecting multiple waypoints as a list of dicts, which would work, but that's not what your structure actually contains.
Try print(waypoint) and see what you get.

Resources