I am trying to write nested type of dictionary in python. I am providing my input and expected output and my tried code.
This is my input:
input = [['10', 'PS_S1U_X2_LP', 'permit', 'origin', 'igp', 'RM_S1U_X2_LP'],
['20', '', 'permit', '', '', 'RM_S1U_X2_LP'],
['10', 'MPLS-LOOPBACK', 'permit', '', '', 'MPLS-LOOPBACK-RLFA'],
]
And my desired output is:
output =
"route_policy_list": [
{
"policy_terms": [],
"route_policy_statement": [
{
"entry": "10",
"prefix_list": "PS_S1U_X2_LP",
"action_statements": [
{
"action_value": "igp",
"action": "permit",
"action_statement": "origin"
}
]
},
{
"entry": "20",
"prefix_list": "",
"action_statements": [
{
"action_value": "",
"action": "permit",
"action_statement": ""
}
]
}
],
"name": "RM_S1U_X2_LP"
},
{
"policy_terms": [],
"route_policy_statement": [
{
"entry": "10",
"prefix_list": "MPLS-LOOPBACK",
"action_statements": [
{
"action_value": "",
"action": "permit",
"action_statement": ""
}
]
}
],
"name": "MPLS-LOOPBACK-RLFA"
}
]
And I have tried this code:
from collections import defaultdict
res1 = defaultdict(list)
for fsm1 in input:
name1 = fsm1.pop()
action = fsm1[2]
action_statement = fsm1[3]
action_value = fsm1[4]
item1 = dict(zip(['entry','prefix_list'],fsm1))
res1['action'] = action
res1['action_statement'] = action_statement
res1['action_value'] = action_value
res1[name].append(item1)
print(res1)
Please help me to get desired output as mentioned above as i am new to coding and struggling to write.
Here is the final code. I used setdefault method to group the data first then used simple for loop to represent the data in requested way.
# Input
input = [['10', 'PS_S1U_X2_LP', 'permit', 'origin', 'igp', 'RM_S1U_X2_LP'],
['20', '', 'permit', '', '', 'RM_S1U_X2_LP'],
['10', 'MPLS-LOOPBACK', 'permit', '', '', 'MPLS-LOOPBACK-RLFA'],
]
# Main code
d = {}
final = []
for i in input:
d.setdefault(i[-1], []).append(i[:-1])
for i, v in d.items():
a = {}
a["policy_terms"] = []
a["route_policy_statement"] = [{"entry": j[0], "prefix_list":j[1], "action_statements":[{"action_value":j[-2], "action": j[-4], "action_statement": j[-3]}]} for j in v]
a["name"] = i
final.append(a)
final_dict = {"route_policy_list": final}
print (final_dict)
# Output
# {'route_policy_list': [{'policy_terms': [], 'route_policy_statement': [{'entry': '10', 'prefix_list': 'PS_S1U_X2_LP', 'action_statements': [{'action_value': 'origin', 'action': 'PS_S1U_X2_LP', 'action_statement': 'permit'}]}, {'entry': '20', 'prefix_list': '', 'action_statements': [{'action_value': '', 'action': '', 'action_statement': 'permit'}]}], 'name': 'RM_S1U_X2_LP'}, {'policy_terms': [], 'route_policy_statement': [{'entry': '10', 'prefix_list': 'MPLS-LOOPBACK', 'action_statements': [{'action_value': '', 'action': 'MPLS-LOOPBACK', 'action_statement': 'permit'}]}], 'name': 'MPLS-LOOPBACK-RLFA'}]}
I hope this helps and count!
It seems like every sublist in input consists of the same order of data, so I would create another list of indices such as
indices = ['entry', 'prefix_list', 'action', 'action_statement', 'action_value', 'name']
and then just hard code the values, because it seems you want specific values in specific places.
dic_list = []
for lst in input:
dic = {'policy terms' : [],
'route_policy_statements' : {
indices[0] : lst[0],
indices[1] : lst[1],
'action_statements' : {
indices[2] : lst[2],
indices[3] : lst[3],
indices[4] : lst[4]
},
indices[5] : lst[5]
}
}
dic_list.append(dic)
Related
{
"success": true,
"time": 1660441201,
"currency": "RUB",
"items": {
"186150629_143865972": {
"price": "300.00",
"buy_order": 220,
"avg_price": "279.405789",
"popularity_7d": "19",
"market_hash_name": "CS:GO Case Key",
"ru_name": "\u041a\u043b\u044e\u0447 \u043e\u0442 \u043a\u0435\u0439\u0441\u0430 CS:GO",
"ru_rarity": "\u0431\u0430\u0437\u043e\u0432\u043e\u0433\u043e \u043a\u043b\u0430\u0441\u0441\u0430",
"ru_quality": "",
"text_color": "D2D2D2",
"bg_color": ""
},
"186150630_143865972": {
"price": "993.06",
"buy_order": 200.02,
"avg_price": "573.320000",
"popularity_7d": "1",
"market_hash_name": "eSports Key",
"ru_name": "\u041a\u043b\u044e\u0447 eSports",
"ru_rarity": "\u0431\u0430\u0437\u043e\u0432\u043e\u0433\u043e \u043a\u043b\u0430\u0441\u0441\u0430",
"ru_quality": "",
"text_color": "D2D2D2",
"bg_color": ""
}
}
}
So that is my dictionary.For example I want to get all values from "price" and "market_hash_name"
How should I iterate to get them?
Saving the input dictionary you have defined into the variable input_dict, we can extract this information using simple list comprehenstions, either to extract it as tuples or as dictionaries :)
A step by step extraction can be seen here-
>>> input_dict['items']
{'186150630_143865972': {'popularity_7d': '1', 'bg_color': '', 'text_color': 'D2D2D2', 'ru_name': '\\u041a\\u043b\\u044e\\u0447 eSports', 'avg_price': '573.320000', 'price': '993.06', 'market_hash_name': 'eSports Key', 'buy_order': 200.02, 'ru_rarity': '\\u0431\\u0430\\u0437\\u043e\\u0432\\u043e\\u0433\\u043e \\u043a\\u043b\\u0430\\u0441\\u0441\\u0430', 'ru_quality': ''}, '186150629_143865972': {'popularity_7d': '19', 'bg_color': '', 'text_color': 'D2D2D2', 'ru_name': '\\u041a\\u043b\\u044e\\u0447 \\u043e\\u0442 \\u043a\\u0435\\u0439\\u0441\\u0430 CS:GO', 'avg_price': '279.405789', 'price': '300.00', 'market_hash_name': 'CS:GO Case Key', 'buy_order': 220, 'ru_rarity': '\\u0431\\u0430\\u0437\\u043e\\u0432\\u043e\\u0433\\u043e \\u043a\\u043b\\u0430\\u0441\\u0441\\u0430', 'ru_quality': ''}}
>>> list(input_dict['items'].values())
[{'popularity_7d': '1', 'bg_color': '', 'text_color': 'D2D2D2', 'ru_name': '\\u041a\\u043b\\u044e\\u0447 eSports', 'avg_price': '573.320000', 'price': '993.06', 'market_hash_name': 'eSports Key', 'buy_order': 200.02, 'ru_rarity': '\\u0431\\u0430\\u0437\\u043e\\u0432\\u043e\\u0433\\u043e \\u043a\\u043b\\u0430\\u0441\\u0441\\u0430', 'ru_quality': ''}, {'popularity_7d': '19', 'bg_color': '', 'text_color': 'D2D2D2', 'ru_name': '\\u041a\\u043b\\u044e\\u0447 \\u043e\\u0442 \\u043a\\u0435\\u0439\\u0441\\u0430 CS:GO', 'avg_price': '279.405789', 'price': '300.00', 'market_hash_name': 'CS:GO Case Key', 'buy_order': 220, 'ru_rarity': '\\u0431\\u0430\\u0437\\u043e\\u0432\\u043e\\u0433\\u043e \\u043a\\u043b\\u0430\\u0441\\u0441\\u0430', 'ru_quality': ''}]
>>> [(i['price'], i['market_hash_name']) for i in input_dict['items'].values()]
[('993.06', 'eSports Key'), ('300.00', 'CS:GO Case Key')]
>>> [{'price': i['price'], 'market_hash_name': i['market_hash_name']} for i in input_dict['items'].values()]
[{'price': '993.06', 'market_hash_name': 'eSports Key'}, {'price': '300.00', 'market_hash_name': 'CS:GO Case Key'}]
As you can see, the final two extractions give a tuple and a dictionary result with the information you need.
Convert to dataframe, then just call those 2 columns.
import pandas as pd
import json
data = '''{
"success": true,
"time": 1660441201,
"currency": "RUB",
"items": {
"186150629_143865972": {
"price": "300.00",
"buy_order": 220,
"avg_price": "279.405789",
"popularity_7d": "19",
"market_hash_name": "CS:GO Case Key",
"ru_name": "\u041a\u043b\u044e\u0447 \u043e\u0442 \u043a\u0435\u0439\u0441\u0430 CS:GO",
"ru_rarity": "\u0431\u0430\u0437\u043e\u0432\u043e\u0433\u043e \u043a\u043b\u0430\u0441\u0441\u0430",
"ru_quality": "",
"text_color": "D2D2D2",
"bg_color": ""
},
"186150630_143865972": {
"price": "993.06",
"buy_order": 200.02,
"avg_price": "573.320000",
"popularity_7d": "1",
"market_hash_name": "eSports Key",
"ru_name": "\u041a\u043b\u044e\u0447 eSports",
"ru_rarity": "\u0431\u0430\u0437\u043e\u0432\u043e\u0433\u043e \u043a\u043b\u0430\u0441\u0441\u0430",
"ru_quality": "",
"text_color": "D2D2D2",
"bg_color": ""
}
}
}'''
jsonData = json.loads(data)
df = pd.DataFrame(jsonData['items']).T
df = df[['price', 'market_hash_name']]
Output:
print(df)
price market_hash_name
186150629_143865972 300.00 CS:GO Case Key
186150630_143865972 993.06 eSports Key
I create a data pipeline with apache beam, but it can not insert the data to bigquery.
I use beam.ParDo to process the data, and yield the data row by row,
below is the code.
project = 'project_name'
dataset = 'XXX'
class parser_data(beam.DoFn):
def process(self, data):
ZZ = [{"NN":d["NNN"], "descrip":d} for d in data["colZ"]]
ret = pd.DataFrame(data['colD'])
ret["colA"] = data["colA"]
ret["colB"] = data["colB"]
ret["colC"] = data["colC"]
ret = pd.merge(ret, pd.DataFrame(ZZ), on=["NN"], how="left")
ret = ret[["colA", "colB", "colC", "NN", "sample", "descrip"]]
print(ret)
ret_dict = ret.to_dict("records")
print(ret_dict)
for i in range(len(ret_dict)):
yield ret_dict[i]
options = PipelineOptions(
runner = 'DirectRunner',
region = 'us-west1',
project = project,
job_name = "test-tmp",
streaming = False,
setup_file = './setup.py',
subnetwork = "XXXXXXX",
service_account_email = "XXXXXX",
temp_location='XXXXXX',
staging_location="XXXXXX",
use_public_ips = False
)
d = {
'colA': '1',
'colB': 'Strawberry',
'colC': 2,
'colD': [{"NN":"AA", "sample":1}, {"NN":"AA", "sample":2}, {"NN":"BB", "sample":3}, {"NN":"CC", "sample":4}, {"NN":"CC", "sample":5}],
'colZ': [{"NNN":"AA", "name":"123", "timeperiod":"152"}, {"NNN":"BB", "name":"1212513", "timeperiod":"1952"}, {"NNN":"CC", "name":"13", "timeperiod":"14152"}],
}
schema = {
'fields':[
{'name': 'colA', 'type': 'STRING', 'mode': 'REQUIRED'},
{'name': 'colB', 'type': 'STRING', 'mode': 'REQUIRED'},
{'name': 'colC', 'type': 'STRING', 'mode': 'REQUIRED'},
{'name': 'NN', 'type': 'STRING', 'mode': 'REQUIRED'},
{'name': 'sample', 'type': 'STRING', 'mode': 'REQUIRED'},
{
'name': 'descrip', 'type': 'RECORD', 'mode': 'NULLABLE',
'fields':[
{"name": "NNN", "type": "STRING", 'mode': 'NULLABLE'},
{"name": "name", "type": "STRING", 'mode': 'NULLABLE'},
{"name": "timeperiod", "type": "STRING", 'mode': 'NULLABLE'},
]
},
]
}
with beam.Pipeline(options=options) as pipeline:
data = (
pipeline | 'get data' >> beam.Create([d])
)
ret_A = (
data | "Process A data " >> beam.ParDo(parser_data())
| "Insert data into BQ" >> beam.io.WriteToBigQuery(
f"{project}:{dataset}.TestJsonData",
schema=schema,
create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND
)
)
The error is below
RuntimeError: BigQuery job beam_bq_job_LOAD_testtmp_LOAD_STEP_820_9672b886a985a9a36a9c3805cee3be5e_3f26019c07d746ef92c0893574156f5b failed. Error Result: <ErrorProto
location: 'gs://XXXXXXXX/dataflow_temp/bq_load/db11e8430c10470382be2565136d53fb/{project}.{dataset}.TestJsonData/39e0d645-8484-4033-a1c4-3e4a825d6fee'
message: 'Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the errors[] collection for more details.'
reason: 'invalid'> [while running '[25]: Insert data into BQ/BigQueryBatchFileLoads/WaitForDestinationLoadJobs']
Also the print function show the data, so I think the problem is in the bigquery schema, but I can not find it
Anyone have any idea?
I have a dataframe and have to convert it into nested JSON.
countryname name text score
UK ABC Hello 5
Right now, I have some code that generates JSON, grouping countryname and name.
However, I want to firstly group by countryname and then group by name. Below is the code and output:
cols = test.columns.difference(['countryname','name'])
j = (test.groupby(['countryname','name'])[cols]
.apply(lambda x: x.to_dict('r'))
.reset_index(name='results')
.to_json(orient='records'))
test_json = json.dumps(json.loads(j), indent=4)
Output:
[
{
"countryname":"UK"
"name":"ABC"
"results":[
{
"text":"Hello"
"score":"5"
}
]
}
]
However, I am expecting an output like this:
[
{
"countryname":"UK"
{
"name":"ABC"
"results":[
{
"text":"Hello"
"score":"5"
}
]
}
}
]
Can anyone please help in fixing this?
This would be the valid JSON. Note the comma , usage, is required as you may check here.
[
{
"countryname":"UK",
"name":"ABC",
"results":[
{
"text":"Hello",
"score":"5"
}
]
}
]
The other output you try to achieve is also not according to the standard:
[{
"countryname": "UK",
"you need a name in here": {
"name": "ABC",
"results": [{
"text": "Hello",
"score": "5"
}]
}
}]
I improved that so you can figure out what name to use.
For custom JSON output you will need to use custom function to reformat your object first.
l=df.to_dict('records')[0] #to get the list
print(l, type(l)) #{'countryname': 'UK', 'name': 'ABC', 'text': 'Hello', 'score': 5} <class 'dict'>
e = l['countryname']
print(e) # UK
o=[{
"countryname": l['countryname'],
"you need a name in here": {
"name": l['name'],
"results": [{
"text": l['text'],
"score": l['score']
}]
}
}]
print(o) #[{'countryname': 'UK', 'you need a name in here': {'name': 'ABC', 'results': [{'text': 'Hello', 'score': 5}]}}]
I would like to tag the host that I am spining up using boto3 python api
response = client.allocate_hosts(
AutoPlacement='on'|'off',
AvailabilityZone='string',
ClientToken='string',
InstanceType='string',
Quantity=123,
TagSpecifications=[
{
'ResourceType': 'dedicated-host',
'Tags': [
{
'Key': 'string',
'Value': 'string'
},
]
},
])
Here is what I am doing
Availability Zone,Instance Type , Quantity are parameterized and I use dictionary to input data
count = 10
input_dict = {}
input_dict['AvailabilityZone'] = 'us-east-1a'
input_dict['InstanceType'] = 'c5.large'
input_dict['Quantity'] = int(count)
instance = client.allocate_hosts(**input_dict,)
print(str(instance))
This code works for me but i need to tag the resource too
TagSpecifications=[
{
'ResourceType': 'customer-gateway'|'dedicated-host'|'dhcp-options'|'elastic-ip'|'fleet'|'fpga-image'|'image'|'instance'|'internet-gateway'|'launch-template'|'natgateway'|'network-acl'|'network-interface'|'reserved-instances'|'route-table'|'security-group'|'snapshot'|'spot-instances-request'|'subnet'|'transit-gateway'|'transit-gateway-attachment'|'transit-gateway-route-table'|'volume'|'vpc'|'vpc-peering-connection'|'vpn-connection'|'vpn-gateway',
'Tags': [
{
'Key': 'string',
'Value': 'string'
},
]
},
]
how can I input that into the dictionary .. It seems like tag specification has dictionary inside dict .. I am making syntax errors. I tried the below code without success.
input_dict['TagSpecifications'] = [{'ResourceType':'dedicated-host','Tags':[{'key':'Name','Value':'demo'},]},]
The easiest way is to simply pass values directly:
response = client.allocate_hosts(
AvailabilityZone='us-east-1a',
InstanceType='c5.large',
Quantity=10,
TagSpecifications=[
{
'ResourceType': 'dedicated-host',
'Tags': [
{
'Key': 'Name',
'Value': 'Demo'
}
]
}
])
Does anyone have any sample Groovy code to convert a JSON document to CSV file? I have tried to search on Google but to no avail.
Example input (from comment):
[ company_id: '1',
web_address: 'vodafone.com/',
phone: '+44 11111',
fax: '',
email: '',
addresses: [
[ type: "office",
street_address: "Vodafone House, The Connection",
zip_code: "RG14 2FN",
geo: [ lat: 51.4145, lng: 1.318385 ] ]
],
number_of_employees: 91272,
naics: [
primary: [
"517210": "Wireless Telecommunications Carriers (except Satellite)" ],
secondary: [
"517110": "Wired Telecommunications Carriers",
"517919": "Internet Service Providers",
"518210": "Web Hosting"
]
]
More info from an edit:
def export(){
def exportCsv = [ [ id:'1', color:'red', planet:'mars', description:'Mars, the "red" planet'],
[ id:'2', color:'green', planet:'neptune', description:'Neptune, the "green" planet'],
[ id:'3', color:'blue', planet:'earth', description:'Earth, the "blue" planet'],
]
def out = new File('/home/mandeep/groovy/workspace/FirstGroovyProject/src/test.csv')
exportCsv.each {
def row = [it.id, it.color, it.planet,it.description]
out.append row.join(',')
out.append '\n'
}
return out
}
Ok, how's this:
import groovy.json.*
// Added extra fields and types for testing
def js = '''{"infile": [{"field1": 11,"field2": 12, "field3": 13},
{"field1": 21, "field4": "dave","field3": 23},
{"field1": 31,"field2": 32, "field3": 33}]}'''
def data = new JsonSlurper().parseText( js )
def columns = data.infile*.keySet().flatten().unique()
// Wrap strings in double quotes, and remove nulls
def encode = { e -> e == null ? '' : e instanceof String ? /"$e"/ : "$e" }
// Print all the column names
println columns.collect { c -> encode( c ) }.join( ',' )
// Then create all the rows
println data.infile.collect { row ->
// A row at a time
columns.collect { colName -> encode( row[ colName ] ) }.join( ',' )
}.join( '\n' )
That prints:
"field3","field2","field1","field4"
13,12,11,
23,,21,"dave"
33,32,31,
Which looks correct to me