Nested Dictionary using python

Nested Dictionary using python - python-3.x

I am trying to write nested type of dictionary in python. I am providing my input and expected output and my tried code.
This is my input:
input = [['10', 'PS_S1U_X2_LP', 'permit', 'origin', 'igp', 'RM_S1U_X2_LP'],
['20', '', 'permit', '', '', 'RM_S1U_X2_LP'],
['10', 'MPLS-LOOPBACK', 'permit', '', '', 'MPLS-LOOPBACK-RLFA'],
]
And my desired output is:
output =
"route_policy_list": [
{
"policy_terms": [],
"route_policy_statement": [
{
"entry": "10",
"prefix_list": "PS_S1U_X2_LP",
"action_statements": [
{
"action_value": "igp",
"action": "permit",
"action_statement": "origin"
}
]
},
{
"entry": "20",
"prefix_list": "",
"action_statements": [
{
"action_value": "",
"action": "permit",
"action_statement": ""
}
]
}
],
"name": "RM_S1U_X2_LP"
},
{
"policy_terms": [],
"route_policy_statement": [
{
"entry": "10",
"prefix_list": "MPLS-LOOPBACK",
"action_statements": [
{
"action_value": "",
"action": "permit",
"action_statement": ""
}
]
}
],
"name": "MPLS-LOOPBACK-RLFA"
}
]
And I have tried this code:
from collections import defaultdict
res1 = defaultdict(list)
for fsm1 in input:
name1 = fsm1.pop()
action = fsm1[2]
action_statement = fsm1[3]
action_value = fsm1[4]
item1 = dict(zip(['entry','prefix_list'],fsm1))
res1['action'] = action
res1['action_statement'] = action_statement
res1['action_value'] = action_value
res1[name].append(item1)
print(res1)
Please help me to get desired output as mentioned above as i am new to coding and struggling to write.

Here is the final code. I used setdefault method to group the data first then used simple for loop to represent the data in requested way.
# Input
input = [['10', 'PS_S1U_X2_LP', 'permit', 'origin', 'igp', 'RM_S1U_X2_LP'],
['20', '', 'permit', '', '', 'RM_S1U_X2_LP'],
['10', 'MPLS-LOOPBACK', 'permit', '', '', 'MPLS-LOOPBACK-RLFA'],
]
# Main code
d = {}
final = []
for i in input:
d.setdefault(i[-1], []).append(i[:-1])
for i, v in d.items():
a = {}
a["policy_terms"] = []
a["route_policy_statement"] = [{"entry": j[0], "prefix_list":j[1], "action_statements":[{"action_value":j[-2], "action": j[-4], "action_statement": j[-3]}]} for j in v]
a["name"] = i
final.append(a)
final_dict = {"route_policy_list": final}
print (final_dict)
# Output
# {'route_policy_list': [{'policy_terms': [], 'route_policy_statement': [{'entry': '10', 'prefix_list': 'PS_S1U_X2_LP', 'action_statements': [{'action_value': 'origin', 'action': 'PS_S1U_X2_LP', 'action_statement': 'permit'}]}, {'entry': '20', 'prefix_list': '', 'action_statements': [{'action_value': '', 'action': '', 'action_statement': 'permit'}]}], 'name': 'RM_S1U_X2_LP'}, {'policy_terms': [], 'route_policy_statement': [{'entry': '10', 'prefix_list': 'MPLS-LOOPBACK', 'action_statements': [{'action_value': '', 'action': 'MPLS-LOOPBACK', 'action_statement': 'permit'}]}], 'name': 'MPLS-LOOPBACK-RLFA'}]}
I hope this helps and count!

It seems like every sublist in input consists of the same order of data, so I would create another list of indices such as
indices = ['entry', 'prefix_list', 'action', 'action_statement', 'action_value', 'name']
and then just hard code the values, because it seems you want specific values in specific places.
dic_list = []
for lst in input:
dic = {'policy terms' : [],
'route_policy_statements' : {
indices[0] : lst[0],
indices[1] : lst[1],
'action_statements' : {
indices[2] : lst[2],
indices[3] : lst[3],
indices[4] : lst[4]
},
indices[5] : lst[5]
}
}
dic_list.append(dic)

Related

How can I iterate through nested dictionary?

{
"success": true,
"time": 1660441201,
"currency": "RUB",
"items": {
"186150629_143865972": {
"price": "300.00",
"buy_order": 220,
"avg_price": "279.405789",
"popularity_7d": "19",
"market_hash_name": "CS:GO Case Key",
"ru_name": "\u041a\u043b\u044e\u0447 \u043e\u0442 \u043a\u0435\u0439\u0441\u0430 CS:GO",
"ru_rarity": "\u0431\u0430\u0437\u043e\u0432\u043e\u0433\u043e \u043a\u043b\u0430\u0441\u0441\u0430",
"ru_quality": "",
"text_color": "D2D2D2",
"bg_color": ""
},
"186150630_143865972": {
"price": "993.06",
"buy_order": 200.02,
"avg_price": "573.320000",
"popularity_7d": "1",
"market_hash_name": "eSports Key",
"ru_name": "\u041a\u043b\u044e\u0447 eSports",
"ru_rarity": "\u0431\u0430\u0437\u043e\u0432\u043e\u0433\u043e \u043a\u043b\u0430\u0441\u0441\u0430",
"ru_quality": "",
"text_color": "D2D2D2",
"bg_color": ""
}
}
}
So that is my dictionary.For example I want to get all values from "price" and "market_hash_name"
How should I iterate to get them?

Saving the input dictionary you have defined into the variable input_dict, we can extract this information using simple list comprehenstions, either to extract it as tuples or as dictionaries :)
A step by step extraction can be seen here-
>>> input_dict['items']
{'186150630_143865972': {'popularity_7d': '1', 'bg_color': '', 'text_color': 'D2D2D2', 'ru_name': '\\u041a\\u043b\\u044e\\u0447 eSports', 'avg_price': '573.320000', 'price': '993.06', 'market_hash_name': 'eSports Key', 'buy_order': 200.02, 'ru_rarity': '\\u0431\\u0430\\u0437\\u043e\\u0432\\u043e\\u0433\\u043e \\u043a\\u043b\\u0430\\u0441\\u0441\\u0430', 'ru_quality': ''}, '186150629_143865972': {'popularity_7d': '19', 'bg_color': '', 'text_color': 'D2D2D2', 'ru_name': '\\u041a\\u043b\\u044e\\u0447 \\u043e\\u0442 \\u043a\\u0435\\u0439\\u0441\\u0430 CS:GO', 'avg_price': '279.405789', 'price': '300.00', 'market_hash_name': 'CS:GO Case Key', 'buy_order': 220, 'ru_rarity': '\\u0431\\u0430\\u0437\\u043e\\u0432\\u043e\\u0433\\u043e \\u043a\\u043b\\u0430\\u0441\\u0441\\u0430', 'ru_quality': ''}}
>>> list(input_dict['items'].values())
[{'popularity_7d': '1', 'bg_color': '', 'text_color': 'D2D2D2', 'ru_name': '\\u041a\\u043b\\u044e\\u0447 eSports', 'avg_price': '573.320000', 'price': '993.06', 'market_hash_name': 'eSports Key', 'buy_order': 200.02, 'ru_rarity': '\\u0431\\u0430\\u0437\\u043e\\u0432\\u043e\\u0433\\u043e \\u043a\\u043b\\u0430\\u0441\\u0441\\u0430', 'ru_quality': ''}, {'popularity_7d': '19', 'bg_color': '', 'text_color': 'D2D2D2', 'ru_name': '\\u041a\\u043b\\u044e\\u0447 \\u043e\\u0442 \\u043a\\u0435\\u0439\\u0441\\u0430 CS:GO', 'avg_price': '279.405789', 'price': '300.00', 'market_hash_name': 'CS:GO Case Key', 'buy_order': 220, 'ru_rarity': '\\u0431\\u0430\\u0437\\u043e\\u0432\\u043e\\u0433\\u043e \\u043a\\u043b\\u0430\\u0441\\u0441\\u0430', 'ru_quality': ''}]
>>> [(i['price'], i['market_hash_name']) for i in input_dict['items'].values()]
[('993.06', 'eSports Key'), ('300.00', 'CS:GO Case Key')]
>>> [{'price': i['price'], 'market_hash_name': i['market_hash_name']} for i in input_dict['items'].values()]
[{'price': '993.06', 'market_hash_name': 'eSports Key'}, {'price': '300.00', 'market_hash_name': 'CS:GO Case Key'}]
As you can see, the final two extractions give a tuple and a dictionary result with the information you need.

Convert to dataframe, then just call those 2 columns.
import pandas as pd
import json
data = '''{
"success": true,
"time": 1660441201,
"currency": "RUB",
"items": {
"186150629_143865972": {
"price": "300.00",
"buy_order": 220,
"avg_price": "279.405789",
"popularity_7d": "19",
"market_hash_name": "CS:GO Case Key",
"ru_name": "\u041a\u043b\u044e\u0447 \u043e\u0442 \u043a\u0435\u0439\u0441\u0430 CS:GO",
"ru_rarity": "\u0431\u0430\u0437\u043e\u0432\u043e\u0433\u043e \u043a\u043b\u0430\u0441\u0441\u0430",
"ru_quality": "",
"text_color": "D2D2D2",
"bg_color": ""
},
"186150630_143865972": {
"price": "993.06",
"buy_order": 200.02,
"avg_price": "573.320000",
"popularity_7d": "1",
"market_hash_name": "eSports Key",
"ru_name": "\u041a\u043b\u044e\u0447 eSports",
"ru_rarity": "\u0431\u0430\u0437\u043e\u0432\u043e\u0433\u043e \u043a\u043b\u0430\u0441\u0441\u0430",
"ru_quality": "",
"text_color": "D2D2D2",
"bg_color": ""
}
}
}'''
jsonData = json.loads(data)
df = pd.DataFrame(jsonData['items']).T
df = df[['price', 'market_hash_name']]
Output:
print(df)
price market_hash_name
186150629_143865972 300.00 CS:GO Case Key
186150630_143865972 993.06 eSports Key

how to define bigquery schema when build apache beam data pipeline

I create a data pipeline with apache beam, but it can not insert the data to bigquery.
I use beam.ParDo to process the data, and yield the data row by row,
below is the code.
project = 'project_name'
dataset = 'XXX'
class parser_data(beam.DoFn):
def process(self, data):
ZZ = [{"NN":d["NNN"], "descrip":d} for d in data["colZ"]]
ret = pd.DataFrame(data['colD'])
ret["colA"] = data["colA"]
ret["colB"] = data["colB"]
ret["colC"] = data["colC"]
ret = pd.merge(ret, pd.DataFrame(ZZ), on=["NN"], how="left")
ret = ret[["colA", "colB", "colC", "NN", "sample", "descrip"]]
print(ret)
ret_dict = ret.to_dict("records")
print(ret_dict)
for i in range(len(ret_dict)):
yield ret_dict[i]
options = PipelineOptions(
runner = 'DirectRunner',
region = 'us-west1',
project = project,
job_name = "test-tmp",
streaming = False,
setup_file = './setup.py',
subnetwork = "XXXXXXX",
service_account_email = "XXXXXX",
temp_location='XXXXXX',
staging_location="XXXXXX",
use_public_ips = False
)
d = {
'colA': '1',
'colB': 'Strawberry',
'colC': 2,
'colD': [{"NN":"AA", "sample":1}, {"NN":"AA", "sample":2}, {"NN":"BB", "sample":3}, {"NN":"CC", "sample":4}, {"NN":"CC", "sample":5}],
'colZ': [{"NNN":"AA", "name":"123", "timeperiod":"152"}, {"NNN":"BB", "name":"1212513", "timeperiod":"1952"}, {"NNN":"CC", "name":"13", "timeperiod":"14152"}],
}
schema = {
'fields':[
{'name': 'colA', 'type': 'STRING', 'mode': 'REQUIRED'},
{'name': 'colB', 'type': 'STRING', 'mode': 'REQUIRED'},
{'name': 'colC', 'type': 'STRING', 'mode': 'REQUIRED'},
{'name': 'NN', 'type': 'STRING', 'mode': 'REQUIRED'},
{'name': 'sample', 'type': 'STRING', 'mode': 'REQUIRED'},
{
'name': 'descrip', 'type': 'RECORD', 'mode': 'NULLABLE',
'fields':[
{"name": "NNN", "type": "STRING", 'mode': 'NULLABLE'},
{"name": "name", "type": "STRING", 'mode': 'NULLABLE'},
{"name": "timeperiod", "type": "STRING", 'mode': 'NULLABLE'},
]
},
]
}
with beam.Pipeline(options=options) as pipeline:
data = (
pipeline | 'get data' >> beam.Create([d])
)
ret_A = (
data | "Process A data " >> beam.ParDo(parser_data())
| "Insert data into BQ" >> beam.io.WriteToBigQuery(
f"{project}:{dataset}.TestJsonData",
schema=schema,
create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND
)
)
The error is below
RuntimeError: BigQuery job beam_bq_job_LOAD_testtmp_LOAD_STEP_820_9672b886a985a9a36a9c3805cee3be5e_3f26019c07d746ef92c0893574156f5b failed. Error Result: <ErrorProto
location: 'gs://XXXXXXXX/dataflow_temp/bq_load/db11e8430c10470382be2565136d53fb/{project}.{dataset}.TestJsonData/39e0d645-8484-4033-a1c4-3e4a825d6fee'
message: 'Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the errors[] collection for more details.'
reason: 'invalid'> [while running '[25]: Insert data into BQ/BigQueryBatchFileLoads/WaitForDestinationLoadJobs']
Also the print function show the data, so I think the problem is in the bigquery schema, but I can not find it
Anyone have any idea?

Creating Nested JSON from Dataframe

I have a dataframe and have to convert it into nested JSON.
countryname name text score
UK ABC Hello 5
Right now, I have some code that generates JSON, grouping countryname and name.
However, I want to firstly group by countryname and then group by name. Below is the code and output:
cols = test.columns.difference(['countryname','name'])
j = (test.groupby(['countryname','name'])[cols]
.apply(lambda x: x.to_dict('r'))
.reset_index(name='results')
.to_json(orient='records'))
test_json = json.dumps(json.loads(j), indent=4)
Output:
[
{
"countryname":"UK"
"name":"ABC"
"results":[
{
"text":"Hello"
"score":"5"
}
]
}
]
However, I am expecting an output like this:
[
{
"countryname":"UK"
{
"name":"ABC"
"results":[
{
"text":"Hello"
"score":"5"
}
]
}
}
]
Can anyone please help in fixing this?

This would be the valid JSON. Note the comma , usage, is required as you may check here.
[
{
"countryname":"UK",
"name":"ABC",
"results":[
{
"text":"Hello",
"score":"5"
}
]
}
]
The other output you try to achieve is also not according to the standard:
[{
"countryname": "UK",
"you need a name in here": {
"name": "ABC",
"results": [{
"text": "Hello",
"score": "5"
}]
}
}]
I improved that so you can figure out what name to use.
For custom JSON output you will need to use custom function to reformat your object first.
l=df.to_dict('records')[0] #to get the list
print(l, type(l)) #{'countryname': 'UK', 'name': 'ABC', 'text': 'Hello', 'score': 5} <class 'dict'>
e = l['countryname']
print(e) # UK
o=[{
"countryname": l['countryname'],
"you need a name in here": {
"name": l['name'],
"results": [{
"text": l['text'],
"score": l['score']
}]
}
}]
print(o) #[{'countryname': 'UK', 'you need a name in here': {'name': 'ABC', 'results': [{'text': 'Hello', 'score': 5}]}}]

unable to tag the ec2 resources boto3 python

I would like to tag the host that I am spining up using boto3 python api
response = client.allocate_hosts(
AutoPlacement='on'|'off',
AvailabilityZone='string',
ClientToken='string',
InstanceType='string',
Quantity=123,
TagSpecifications=[
{
'ResourceType': 'dedicated-host',
'Tags': [
{
'Key': 'string',
'Value': 'string'
},
]
},
])
Here is what I am doing
Availability Zone,Instance Type , Quantity are parameterized and I use dictionary to input data
count = 10
input_dict = {}
input_dict['AvailabilityZone'] = 'us-east-1a'
input_dict['InstanceType'] = 'c5.large'
input_dict['Quantity'] = int(count)
instance = client.allocate_hosts(**input_dict,)
print(str(instance))
This code works for me but i need to tag the resource too
TagSpecifications=[
{
'ResourceType': 'customer-gateway'|'dedicated-host'|'dhcp-options'|'elastic-ip'|'fleet'|'fpga-image'|'image'|'instance'|'internet-gateway'|'launch-template'|'natgateway'|'network-acl'|'network-interface'|'reserved-instances'|'route-table'|'security-group'|'snapshot'|'spot-instances-request'|'subnet'|'transit-gateway'|'transit-gateway-attachment'|'transit-gateway-route-table'|'volume'|'vpc'|'vpc-peering-connection'|'vpn-connection'|'vpn-gateway',
'Tags': [
{
'Key': 'string',
'Value': 'string'
},
]
},
]
how can I input that into the dictionary .. It seems like tag specification has dictionary inside dict .. I am making syntax errors. I tried the below code without success.
input_dict['TagSpecifications'] = [{'ResourceType':'dedicated-host','Tags':[{'key':'Name','Value':'demo'},]},]

The easiest way is to simply pass values directly:
response = client.allocate_hosts(
AvailabilityZone='us-east-1a',
InstanceType='c5.large',
Quantity=10,
TagSpecifications=[
{
'ResourceType': 'dedicated-host',
'Tags': [
{
'Key': 'Name',
'Value': 'Demo'
}
]
}
])

Groovy code to convert json to CSV file

Does anyone have any sample Groovy code to convert a JSON document to CSV file? I have tried to search on Google but to no avail.
Example input (from comment):
[ company_id: '1',
web_address: 'vodafone.com/',
phone: '+44 11111',
fax: '',
email: '',
addresses: [
[ type: "office",
street_address: "Vodafone House, The Connection",
zip_code: "RG14 2FN",
geo: [ lat: 51.4145, lng: 1.318385 ] ]
],
number_of_employees: 91272,
naics: [
primary: [
"517210": "Wireless Telecommunications Carriers (except Satellite)" ],
secondary: [
"517110": "Wired Telecommunications Carriers",
"517919": "Internet Service Providers",
"518210": "Web Hosting"
]
]
More info from an edit:
def export(){
def exportCsv = [ [ id:'1', color:'red', planet:'mars', description:'Mars, the "red" planet'],
[ id:'2', color:'green', planet:'neptune', description:'Neptune, the "green" planet'],
[ id:'3', color:'blue', planet:'earth', description:'Earth, the "blue" planet'],
]
def out = new File('/home/mandeep/groovy/workspace/FirstGroovyProject/src/test.csv')
exportCsv.each {
def row = [it.id, it.color, it.planet,it.description]
out.append row.join(',')
out.append '\n'
}
return out
}

Ok, how's this:
import groovy.json.*
// Added extra fields and types for testing
def js = '''{"infile": [{"field1": 11,"field2": 12, "field3": 13},
{"field1": 21, "field4": "dave","field3": 23},
{"field1": 31,"field2": 32, "field3": 33}]}'''
def data = new JsonSlurper().parseText( js )
def columns = data.infile*.keySet().flatten().unique()
// Wrap strings in double quotes, and remove nulls
def encode = { e -> e == null ? '' : e instanceof String ? /"$e"/ : "$e" }
// Print all the column names
println columns.collect { c -> encode( c ) }.join( ',' )
// Then create all the rows
println data.infile.collect { row ->
// A row at a time
columns.collect { colName -> encode( row[ colName ] ) }.join( ',' )
}.join( '\n' )
That prints:
"field3","field2","field1","field4"
13,12,11,
23,,21,"dave"
33,32,31,
Which looks correct to me

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Nested Dictionary using python - python-3.x

Related

How can I iterate through nested dictionary?

how to define bigquery schema when build apache beam data pipeline

Creating Nested JSON from Dataframe

unable to tag the ec2 resources boto3 python

Groovy code to convert json to CSV file

Categories

Resources