Export nested dictionary to csv - python-3.x

I need to export a nested dictionary to CSV. Here's what each entry looks like (that needs to be one line in the csv later):
{'createdTime': '2017-10-30T12:33:02.000Z',
'fields': {'Date': '2017-10-30T12:32:56.000Z',
'field1': 'example#gmail.com',
'field2': 1474538185964188,
'field3': 6337,
....},
'id': 'reca7LBr64XM1ClWy'}
I think I need to iterate through the dictionary and create a list of lists(?) to create the csv from using the csv module.
['Date', 'field1', 'field2', 'field3', ...],
['2017-10-30T12:32:56.000Z', 'example#gmail.com', 1474538185964188, 6337 ...]
My problem is to find a smart way to iterate through the dict to get to a list like this.

You can get the values in the below way:
def process_data():
csv_data = [{'createdTime': '2017-10-30T12:33:02.000Z',
'fields': {'Date': '2017-10-30T12:32:56.000Z',
'field1': 'example#gmail.com',
'field2': 1474538185964188,
'field3': 6337},
'id': 'reca7LBr64XM1ClWy'},
{'createdTime': '2017-10-30T12:33:02.000Z',
'fields': {'Date': '2017-10-30T12:32:56.000Z',
'field1': 'example#gmail.com',
'field2': 1474538185964188,
'field3': 6337},
'id': 'reca7LBr64XM1ClWy'}]
headers = [key for key in csv_data[0]['fields'].keys()]
body = []
for row in csv_data:
body_row = []
for colomn_header in headers:
body_row.append(row['fields'][colomn_header])
body.append(body_row)
process_data()
#header -- ['Date', 'field1', 'field2', 'field3']
#body -- [['2017-10-30T12:32:56.000Z', 'example#gmail.com',
# 1474538185964188, 6337], ['2017-10-30T12:32:56.000Z',
# 'example#gmail.com', 1474538185964188, 6337]]

Related

Unique nested dictionary from the 'for' loop Python3

I've got the host which executes commands via 'subprocess' and gets output list of several parameters. The problem is that output can not be correctly modified to be translated to the dictionary, whether it's yaml or json. After list is received Regexp takes part to match valuable information and to perform grouping. I am interested in getting a unique dictionary, where crossing keys are put into nested dictionary.
Here is the code and the example of output list:
from re import compile,match
# Output can differ from request to request, the "keys" from the #
# list_of_values can dublicate or appear more than two times. The values #
# mapped to the keys can differ too. #
list_of_values = [
"paramId: '11'", "valueId*: '11'",
"elementId: '010_541'", 'mappingType: Both',
"startRng: ''", "finishRng: ''",
'DbType: sql', "activeSt: 'false'",
'profile: TestPr1', "specificHost: ''",
'hostGroup: tstGroup10', 'balance: all',
"paramId: '194'", "valueId*: '194'",
"elementId: '010_541'", 'mappingType: Both',
"startRng: '1020304050'", "finishRng: '1020304050'",
'DbType: sql', "activeSt: 'true'",
'profile: TestPr1', "specificHost: ''",
'hostGroup: tstGroup10', 'balance: all']
re_compile_valueId = compile(
"valueId\*:\s.(?P<valueId>\d{1,5})"
"|elementId:\s.(?P<elementId>\d{3}_\d{2,3})"
"|startRng:\s.(?P<startRng>\d{1,10})"
"|finishRng:\s.(?P<finishRng>\d{1,10})"
"|DbType:\s(?P<DbType>nosql|sql)"
"|activeSt:\s.(?P<activeSt>true|false)"
"|profile:\s(?P<profile>[A-z0-9]+)"
"|hostGroup:\s(?P<hostGroup>[A-z0-9]+)"
"|balance:\s(?P<balance>none|all|priority group)"
)
iterator_loop = 0
uniq_dict = dict()
next_dict = dict()
for element in list_of_values:
match_result = match(re_compile_valueId,element)
if match_result:
temp_dict = match_result.groupdict()
for key, value in temp_dict.items():
if value:
if key == 'valueId':
uniq_dict['valueId'+str(iterator_loop)] = ''
iterator_loop +=1
next_dict.update({key: value})
else:
next_dict.update({key: value})
uniq_dict['valueId'+str(iterator_loop-1)] = next_dict
print(uniq_dict)
This code right here responses with:
{
'valueId0':
{
'valueId': '194',
'elementId': '010_541',
'DbType': 'sql',
'activeSt': 'true',
'profile': 'TestPr1',
'hostGroup': 'tstGroup10',
'balance': 'all',
'startRng': '1020304050',
'finishRng': '1020304050'
},
'valueId1':
{
'valueId': '194',
'elementId': '010_541',
'DbType': 'sql',
'activeSt': 'true',
'profile': 'TestPr1',
'hostGroup': 'tstGroup10',
'balance': 'all',
'startRng': '1020304050',
'finishRng': '1020304050'
}
}
And I was waiting for something like:
{
'valueId0':
{
'valueId': '11',
'elementId': '010_541',
'DbType': 'sql',
'activeSt': 'false',
'profile': 'TestPr1',
'hostGroup': 'tstGroup10',
'balance': 'all',
'startRng': '',
'finishRng': ''
},
'valueId1':
{
'valueId': '194',
'elementId': '010_541',
'DbType': 'sql',
'activeSt': 'true',
'profile': 'TestPr1',
'hostGroup': 'tstGroup10',
'balance': 'all',
'startRng': '1020304050',
'finishRng': '1020304050'
}
}
I've also got another code below, which runs and puts values as expected. But the structure breaks the idea of having this all looped around, because each dictionary result key has its own order number mapped. The example below. The list_of_values and re_compile_valueId can be used from previous example.
for element in list_of_values:
match_result = match(re_compile_valueId,element)
if match_result:
temp_dict = match_result.groupdict()
for key, value in temp_dict.items():
if value:
if key == 'balance':
key = key+str(iterator_loop)
uniq_dict.update({key: value})
iterator_loop +=1
else:
key = key+str(iterator_loop)
uniq_dict.update({key: value})
print(uniq_dict)
The output will look like:
{
'valueId1': '11', 'elementId1': '010_541',
'DbType1': 'sql', 'activeSt1': 'false',
'profile1': 'TestPr1', 'hostGroup1': 'tstGroup10',
'balance1': 'all', 'valueId2': '194',
'elementId2': '010_541', 'startRng2': '1020304050',
'finishRng2': '1020304050', 'DbType2': 'sql',
'activeSt2': 'true', 'profile2': 'TestPr1',
'hostGroup2': 'tstGroup10', 'balance2': 'all'
}
Would appreciate any help! Thanks!
It appeared that some documentation reading needs to be performed :D
The copy() of the next_dict under else statement needs to be applied. Thanks to thread:
Why does updating one dictionary object affect other?
Many thanks to answer's author #thefourtheye (https://stackoverflow.com/users/1903116/thefourtheye)
The final code:
for element in list_of_values:
match_result = match(re_compile_valueId,element)
if match_result:
temp_dict = match_result.groupdict()
for key, value in temp_dict.items():
if value:
if key == 'valueId':
uniq_dict['valueId'+str(iterator_loop)] = ''
iterator_loop +=1
next_dict.update({key: value})
else:
next_dict.update({key: value})
uniq_dict['valueId'+str(iterator_loop-1)] = next_dict.copy()
Thanks for the involvement to everyone.

How to add 3 dataframes in a dataframe dict

I have 3 dataframes: train, validation, test.
I want to create a dictionary with these 3 dataframes to get the output below:
features are the dataframe columns's name
How ca I do this dictionary of dataframe?
DatasetDict({
train: Dataset({
features: ['ID', 'Tweet', 'anger', 'anticipation', 'disgust', 'fear', 'joy', 'love', 'optimism', 'pessimism', 'sadness', 'surprise', 'trust'],
num_rows: 6838
})
test: Dataset({
features: ['ID', 'Tweet', 'anger', 'anticipation', 'disgust', 'fear', 'joy', 'love', 'optimism', 'pessimism', 'sadness', 'surprise', 'trust'],
num_rows: 3259
})
validation: Dataset({
features: ['ID', 'Tweet', 'anger', 'anticipation', 'disgust', 'fear', 'joy', 'love', 'optimism', 'pessimism', 'sadness', 'surprise', 'trust'],
num_rows: 886
})
})
I am trying this:
DatasetDict = {}
dataframes = [train, validation, test]
for grp in dataframes:
DatasetDict[grp] = df
But it's not working
train.name = 'train'
test.name = 'test'
validation.name = 'validation'
datasetdict = {}
dataframes = [train, validation, test]
for df in dataframes:
datasetdict[df.name] = {'features': df.columns.to_list(), 'num_rows': len(df)}

why json.dumps add \n in the output,how should I remove it while saving it in a file?

Why does json.dumps add \n in the output, and how should I remove it while saving it in a file?
Python 3.5.2 (default, Nov 23 2017, 16:37:01)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> import json
>>> data = {'people':[{'name': 'Scott', 'website': 'stackabuse.com', 'from': 'Nebraska'}]}
>>> json.dumps(data, indent=4)
'{\n "people": [\n {\n "website": "stackabuse.com",\n "from": "Nebraska",\n "name": "Scott"\n }\n ]\n}'
>>>
You are using pretty print, if you want to avoid new lines do not use indent flag.
import json
data = {'people': [{'name': 'Scott', 'website': 'stackabuse.com', 'from': 'Nebraska'}]}
print(json.dumps(data))
{"people": [{"name": "Scott", "website": "stackabuse.com", "from": "Nebraska"}]}
Your version just use nice formatting:
import json
data = {'people': [{'name': 'Scott', 'website': 'stackabuse.com', 'from': 'Nebraska'}]}
print(json.dumps(data, indent=4))
{
"people": [
{
"name": "Scott",
"website": "stackabuse.com",
"from": "Nebraska"
}
]
}
In addition - new lines have no matter for json. Below two examples works same:
import json
data = {'people': [{'name': 'Scott', 'website': 'stackabuse.com', 'from': 'Nebraska'}]}
with open('/tmp/file1', 'w') as f:
json.dump(data, f, indent=4)
with open('/tmp/file2', 'w') as f:
json.dump(data, f)
with open('/tmp/file1') as f:
print(json.load(f))
with open('/tmp/file2') as f:
print(json.load(f))
{'people': [{'name': 'Scott', 'website': 'stackabuse.com', 'from': 'Nebraska'}]}
{'people': [{'name': 'Scott', 'website': 'stackabuse.com', 'from': 'Nebraska'}]}
Because you ask it to, by providing indent. Just doing json.dumps(data) will not insert any newlines.

Python nested json to csv

I can't convert this Json to csv. I have been trying with different solutions posted here using panda or other parser but non solved this.
This is a small extract of the big json
{'data': {'items': [{'category': 'cat',
'coupon_code': 'cupon 1',
'coupon_name': '$829.99/€705.79 ',
'coupon_url': 'link3',
'end_time': '2017-12-31 00:00:00',
'language': 'sp',
'start_time': '2017-12-19 00:00:00'},
{'category': 'LED ',
'coupon_code': 'code',
'coupon_name': 'text',
'coupon_url': 'link',
'end_time': '2018-01-31 00:00:00',
'language': 'sp',
'start_time': '2017-10-07 00:00:00'}],
'total_pages': 1,
'total_results': 137},
'error_no': 0,
'msg': '',
'request': 'GET api/ #2017-12-26 04:50:02'}
I'd like to get an output like this with the columns:
category, coupon_code, coupon_name, coupon_url, end_time, language, start_time
I'm running python 3.6 with no restrictions.

Getting tags from python 3 dictionary object. AWS Boto3 Python 3

I have a dictionary object that is being returned to me from AWS. I need to pull the tag "based_on_ami" out of this dictionary. I have tried converting to a list, but I am new to programming and have not been able to figure out how to access Tags since they are a few levels down in the dictionary.
What is the best way for me to pull that tag out of the dictionary and put it into a variable i can use?
{
'Images':[
{
'Architecture':'x86_64',
'CreationDate':'2017-11-27T14:41:30.000Z',
'ImageId':'ami-8e73e0f4',
'ImageLocation':'23452345234545/java8server_ubuntu16-2b71edd1-f95e-4ee5-8fd6-d8a46975fdb5',
'ImageType':'machine',
'Public':False,
'OwnerId':'23452345234545',
'State':'available',
'BlockDeviceMappings':[
{
'DeviceName':'/dev/sda1',
'Ebs':{
'Encrypted':False,
'DeleteOnTermination':True,
'SnapshotId':'snap-0c10e8f5ced5b5240',
'VolumeSize':8,
'VolumeType':'gp2'
}
},
{
'DeviceName':'/dev/sdb',
'VirtualName':'ephemeral0'
},
{
'DeviceName':'/dev/sdc',
'VirtualName':'ephemeral1'
}
],
'EnaSupport':True,
'Hypervisor':'xen',
'Name':'java8server_ubuntu16-2b71edd1-f95e-4ee5-8fd6-d8a46975fdb5',
'RootDeviceName':'/dev/sda1',
'RootDeviceType':'ebs',
'SriovNetSupport':'simple',
'Tags':[
{
'Key':'service',
'Value':'baseami'
},
{
'Key':'cloudservice',
'Value':'ami'
},
{
'Key':'Name',
'Value':'java8server_ubuntu16-2b71edd1-f95e-4ee5-8fd6-d8a46975fdb5'
},
{
'Key':'os',
'Value':'ubuntu 16.04 lts'
},
{
'Key':'based_on_ami',
'Value':'ami-aa2ea8d0'
}
],
'VirtualizationType':'hvm'
}
],
'ResponseMetadata':{
'RequestId':'2c376c75-c31f-4aba-a058-173f3b125a00',
'HTTPStatusCode':200,
'HTTPHeaders':{
'content-type':'text/xml;charset=UTF-8',
'transfer-encoding':'chunked',
'vary':'Accept-Encoding',
'date':'Fri, 01 Dec 2017 18:17:53 GMT',
'server':'AmazonEC2'
},
'RetryAttempts':0
}
}
The best way to approach this type of problem is to find the value you're looking for, and then work outwards until you find a solution. You need to look at what is at each of those levels.
So, what are you looking for? You're looking for the Value for based_on_ami's Key. So your final step is going to be:
if obj['Key'] == 'based_on_ami':
# do something with obj['Value'].
But how do you get there? Well, the object is inside of a list, so you'll need to iterate the list:
for tag in <some list>:
if tag['Key'] == 'based_on_ami':
# do something with tag['Value'].
What is that list? It's the list of tags:
for tag in image['Tags']:
if tag['Key'] == 'based_on_ami':
# do something with tag['Value'].
And where are those tags? In an image object that you find in a list:
for image in image_list:
for tag in image['Tags']:
if tag['Key'] == 'based_on_ami':
# do something with tag['Value'].
The image list is the value found at the Images key in your initial dict.
image_list = my_data['Images']
for image in image_list:
for tag in image['Tags']:
if tag['Key'] == 'based_on_ami':
# do something with tag['Value'].
And now you're collecting all of those values, so you'll need a list and you'll need to append to it:
result = []
image_list = my_data['Images']
for image in image_list:
for tag in image['Tags']:
if tag['Key'] == 'based_on_ami':
result.append(tag['Value'])
So, I took your example above, and added another based_on_ami node with the value quack:
{'ResponseMetadata': {'RequestId': '2c376c75-c31f-4aba-a058-173f3b125a00', 'RetryAttempts': 0, 'HTTPHeaders': {'vary': 'Accept-Encoding', 'transfer-encoding': 'chunked', 'server': 'AmazonEC2', 'content-type': 'text/xml;charset=UTF-8', 'date': 'Fri, 01 Dec 2017 18:17:53 GMT'}, 'HTTPStatusCode': 200}, 'Images': [{'Public': False, 'CreationDate': '2017-11-27T14:41:30.000Z', 'BlockDeviceMappings': [{'Ebs': {'SnapshotId': 'snap-0c10e8f5ced5b5240', 'VolumeSize': 8, 'Encrypted': False, 'VolumeType': 'gp2', 'DeleteOnTermination': True}, 'DeviceName': '/dev/sda1'}, {'VirtualName': 'ephemeral0', 'DeviceName': '/dev/sdb'}, {'VirtualName': 'ephemeral1', 'DeviceName': '/dev/sdc'}], 'OwnerId': '23452345234545', 'ImageLocation': '23452345234545/java8server_ubuntu16-2b71edd1-f95e-4ee5-8fd6-d8a46975fdb5', 'RootDeviceName': '/dev/sda1', 'ImageType': 'machine', 'Hypervisor': 'xen', 'RootDeviceType': 'ebs', 'State': 'available', 'Architecture': 'x86_64', 'Name': 'java8server_ubuntu16-2b71edd1-f95e-4ee5-8fd6-d8a46975fdb5', 'Tags': [{'Value': 'baseami', 'Key': 'service'}, {'Value': 'ami', 'Key': 'cloudservice'}, {'Value': 'java8server_ubuntu16-2b71edd1-f95e-4ee5-8fd6-d8a46975fdb5', 'Key': 'Name'}, {'Value': 'ubuntu 16.04 lts', 'Key': 'os'}, {'Value': 'ami-aa2ea8d0', 'Key': 'based_on_ami'}], 'EnaSupport': True, 'SriovNetSupport': 'simple', 'ImageId': 'ami-8e73e0f4'}, {'Public': False, 'CreationDate': '2017-11-27T14:41:30.000Z', 'BlockDeviceMappings': [{'Ebs': {'SnapshotId': 'snap-0c10e8f5ced5b5240', 'VolumeSize': 8, 'Encrypted': False, 'VolumeType': 'gp2', 'DeleteOnTermination': True}, 'DeviceName': '/dev/sda1'}, {'VirtualName': 'ephemeral0', 'DeviceName': '/dev/sdb'}, {'VirtualName': 'ephemeral1', 'DeviceName': '/dev/sdc'}], 'VirtualizationType': 'hvm', 'OwnerId': '23452345234545', 'ImageLocation': '23452345234545/java8server_ubuntu16-2b71edd1-f95e-4ee5-8fd6-d8a46975fdb5', 'RootDeviceName': '/dev/sda1', 'ImageType': 'machine', 'Hypervisor': 'xen', 'RootDeviceType': 'ebs', 'State': 'available', 'Architecture': 'x86_64', 'Name': 'java8server_ubuntu16-2b71edd1-f95e-4ee5-8fd6-d8a46975fdb5', 'Tags': [{'Value': 'baseami', 'Key': 'service'}, {'Value': 'ami', 'Key': 'cloudservice'}, {'Value': 'java8server_ubuntu16-2b71edd1-f95e-4ee5-8fd6-d8a46975fdb5', 'Key': 'Name'}, {'Value': 'ubuntu 16.04 lts', 'Key': 'os'}, {'Value': 'quack', 'Key': 'based_on_ami'}], 'EnaSupport': True, 'SriovNetSupport': 'simple', 'ImageId': 'ami-8e73e0f4'}]}
My result:
['ami-aa2ea8d0', 'quack']
info = {...}
tags = []
for image in info['Images']:
for tag in image['Tags']:
if tag['Key'] == 'based_on_ami':
tags.append(tag['Value'])
print(tags)

Resources