Creating Nested JSON from Dataframe - python-3.x

I have a dataframe and have to convert it into nested JSON.
countryname name text score
UK ABC Hello 5
Right now, I have some code that generates JSON, grouping countryname and name.
However, I want to firstly group by countryname and then group by name. Below is the code and output:
cols = test.columns.difference(['countryname','name'])
j = (test.groupby(['countryname','name'])[cols]
.apply(lambda x: x.to_dict('r'))
.reset_index(name='results')
.to_json(orient='records'))
test_json = json.dumps(json.loads(j), indent=4)
Output:
[
{
"countryname":"UK"
"name":"ABC"
"results":[
{
"text":"Hello"
"score":"5"
}
]
}
]
However, I am expecting an output like this:
[
{
"countryname":"UK"
{
"name":"ABC"
"results":[
{
"text":"Hello"
"score":"5"
}
]
}
}
]
Can anyone please help in fixing this?

This would be the valid JSON. Note the comma , usage, is required as you may check here.
[
{
"countryname":"UK",
"name":"ABC",
"results":[
{
"text":"Hello",
"score":"5"
}
]
}
]
The other output you try to achieve is also not according to the standard:
[{
"countryname": "UK",
"you need a name in here": {
"name": "ABC",
"results": [{
"text": "Hello",
"score": "5"
}]
}
}]
I improved that so you can figure out what name to use.
For custom JSON output you will need to use custom function to reformat your object first.
l=df.to_dict('records')[0] #to get the list
print(l, type(l)) #{'countryname': 'UK', 'name': 'ABC', 'text': 'Hello', 'score': 5} <class 'dict'>
e = l['countryname']
print(e) # UK
o=[{
"countryname": l['countryname'],
"you need a name in here": {
"name": l['name'],
"results": [{
"text": l['text'],
"score": l['score']
}]
}
}]
print(o) #[{'countryname': 'UK', 'you need a name in here': {'name': 'ABC', 'results': [{'text': 'Hello', 'score': 5}]}}]

Related

Is this the best way to parse a Json output from Google Ads Stream

Is this the best way to parse a Json output from Google Ads Stream. I am parsing the json with pandas & it is taking too much time
record counts is around 700K
[{
"results": [
{
"customer": {
"resourceName": "customers/12345678900",
"id": "12345678900",
"descriptiveName": "ABC"
},
"campaign": {
"resourceName": "customers/12345678900/campaigns/12345",
"name": "Search_Google_Generic",
"id": "12345"
},
"adGroup": {
"resourceName": "customers/12345678900/adGroups/789789",
"id": "789789",
"name": "adgroup_details"
},
"metrics": {
"clicks": "500",
"conversions": 200,
"costMicros": "90000000",
"allConversionsValue": 5000.6936,
"impressions": "50000"
},
"segments": {
"device": "DESKTOP",
"date": "2022-10-28"
}
}
],
"fieldMask": "segments.date,customer.id,customer.descriptiveName,campaign.id,campaign.name,adGroup.id,adGroup.name,segments.device,metrics.costMicros,metrics.impressions,metrics.clicks,metrics.conversions,metrics.allConversionsValue",
"requestId": "fdhfgdhfgjf"
}
]
This is the sample json.I am saving the stream in json file and then reading using pandas and trying to dump in csv file
I want to convert it to CSV format, Like
with open('Adgroups.json', encoding='utf-8') as inputfile:
df = pd.read_json(inputfile)
df_new = pd.DataFrame(columns= ['Date', 'Account_ID', 'Account', 'Campaign_ID','Campaign',
'Ad_Group_ID', 'Ad_Group','Device',
'Cost', 'Impressions', 'Clicks', 'Conversions', 'Conv_Value'])
for i in range(len(df['results'])):
results = df['results'][i]
for result in results:
new_row = pd.Series({ 'Date': result['segments']['date'],
'Account_ID': result['customer']['id'],
'Account': result['customer']['descriptiveName'],
'Campaign_ID': result['campaign']['id'],
'Campaign': result['campaign']['name'],
'Ad_Group_ID': result['adGroup']['id'],
'Ad_Group': result['adGroup']['name'],
'Device': result['segments']['device'],
'Cost': result['metrics']['costMicros'],
'Impressions': result['metrics']['impressions'],
'Clicks': result['metrics']['clicks'],
'Conversions': result['metrics']['conversions'],
'Conv_Value': result['metrics']['allConversionsValue']
})
df_new = df_new.append(new_row, ignore_index = True)
df_new.to_csv('Adgroups.csv', encoding='utf-8', index=False)
Don't use df.append. It's very slow because it has to copy the dataframe over and over again. I think it's being deprecated for this reason.
You can build the rows using list comprehension before constructing the data frame:
import json
with open("Adgroups.json") as fp:
data = json.load(fp)
columns = [
"Date",
"Account_ID",
"Account",
"Campaign_ID",
"Campaign",
"Ad_Group_ID",
"Ad_Group",
"Device",
"Cost",
"Impressions",
"Clicks",
"Conversions",
"Conv_Value",
]
records = [
(
r["segments"]["date"],
r["customer"]["id"],
r["customer"]["descriptiveName"],
r["campaign"]["id"],
r["campaign"]["name"],
r["adGroup"]["id"],
r["adGroup"]["name"],
r["segments"]["device"],
r["metrics"]["costMicros"],
r["metrics"]["impressions"],
r["metrics"]["clicks"],
r["metrics"]["conversions"],
r["metrics"]["allConversionsValue"],
)
for d in data
for r in d["results"]
]
df = pd.DataFrame(records, columns=columns)

How to extract selected key and value from nested dictionary object in a list?

I have a list example_list contains two dict objects, it looks like this:
[
{
"Meta": {
"ID": "1234567",
"XXX": "XXX"
},
"bbb": {
"ccc": {
"ddd": {
"eee": {
"fff": {
"xxxxxx": "xxxxx"
},
"www": [
{
"categories": {
"ppp": [
{
"content": {
"name": "apple",
"price": "0.111"
},
"xxx: "xxx"
}
]
},
"date": "A2020-01-01"
}
]
}
}
}
}
},
{
"Meta": {
"ID": "78945612",
"XXX": "XXX"
},
"bbb": {
"ccc": {
"ddd": {
"eee": {
"fff": {
"xxxxxx": "xxxxx"
},
"www": [
{
"categories": {
"ppp": [
{
"content": {
"name": "banana",
"price": "12.599"
},
"xxx: "xxx"
}
]
},
"date": "A2020-01-01"
}
]
}
}
}
}
}
]
now I want to filter the items and only keep "ID": "xxx" and the correspoding value for "price": "0.111", expected result can be something similar to :
[{"ID": "1234567", "price": "0.111"}, {"ID": "78945612", "price": "12.599"}]
or something like {"1234567":"0.111", "78945612":"12.599" }
Here's what I've tried:
map_list=[]
map_dict={}
for item in example_list:
#get 'ID' for each item in 'meta'
map_dict['ID'] = item['meta']['ID']
# get 'price'
data_list = item['bbb']['ccc']['ddd']['www']
for data in data_list:
for dataitem in data['categories']['ppp']
map_dict['price'] = item["content"]["price"]
map_list.append(map_dict)
print(map_list)
The result for this doesn't look right, feels like the item isn't iterating properly, it gives me result:
[{"ID": "78945612", "price": "12.599"}, {"ID": "78945612", "price": "12.599"}]
It gave me duplicated result for the second ID but where is the first ID?
Can someone take a look for me please, thanks.
Update:
From some comments from another question, I understand the reason for the output keeps been overwritten is because the key name in the dict is always the same, but I'm not sure how to fix this because the key and value needs to be extracted from different level of for loops, any help would be appreciated, thanks.
as #Scott Hunter has mentioned, you need to create a new map_dict everytime you are trying to do this. Here is a quick fix to your solution (I am sadly not able to test it right now, but it seems right to me).
map_list=[]
for item in example_list:
# get 'price'
data_list = item['bbb']['ccc']['ddd']['www']
for data in data_list:
for dataitem in data['categories']['ppp']:
map_dict={}
map_dict['ID'] = item['meta']['ID']
map_dict['price'] = item["content"]["price"]
map_list.append(map_dict)
print(map_list)
But what are you doing here is that you are basically just "forcing" your way through ... I recommend you to take a break and check out somekind of tutorial, which will help you to understand how it really works in the back-end. This is how I would have written it:
list_dicts = []
for example in example_list:
for www in item['bbb']['ccc']['ddd']['www']:
for www_item in www:
list_dicts.append({
'ID': item['meta']['ID'],
'price': www_item["content"]["price"]
})
Good luck with this problem and hope it helps :)
You need to create a new dictionary for map_dict for each ID.

python dictionary how can create (structured) unique dictionary list if the key contains list of values of other keys

I have below unstructured dictionary list which contains values of other keys in a list .
I am not sure if the question i ask is strange. this is the actual dictionary payload that we receive from source which not aligned with respective entry
[
{
"dsply_nm": [
"test test",
"test test",
"",
""
],
"start_dt": [
"2021-04-21T00:01:00-04:00",
"2021-04-21T00:01:00-04:00",
"2021-04-21T00:01:00-04:00",
"2021-04-21T00:01:00-04:00"
],
"exp_dt": [
"2022-04-21T00:01:00-04:00",
"2022-04-21T00:01:00-04:00",
"2022-04-21T00:01:00-04:00",
"2022-04-21T00:01:00-04:00"
],
"hrs_pwr": [
"14",
"12",
"13",
"15"
],
"make_nm": "test",
"model_nm": "test",
"my_yr": "1980"
}
]
"the length of list cannot not be expected and it could be more than 4 sometimes or less in some keys"
#Expected:
i need to check if the above dictionary are in proper structure or not and based on that it should return the proper dictionary list associate with each item
for eg:
def get_dict_list(items):
if type(items == not structure)
result = get_associated_dict_items_mapped
return result
else:
return items
#Final result
expected_dict_list=
[{"dsply_nm":"test test","start_dt":"2021-04-21T00:01:00-04:00","exp_dt":"2022-04-21T00:01:00-04:00","hrs_pwr":"14"},
{"dsply_nm":"test test","start_dt":"2021-04-21T00:01:00-04:00","exp_dt":"2022-04-21T00:01:00-04:00","hrs_pwr":"12","make_nm": "test",model_nm": "test","my_yr": "1980"},
{"dsply_nm":"","start_dt":"2021-04-21T00:01:00-04:00","exp_dt":"2022-04-21T00:01:00-04:00","hrs_pwr":"13"},
{"dsply_nm":"","start_dt":"2021-04-21T00:01:00-04:00","exp_dt":"2022-04-21T00:01:00-04:00","hrs_pwr":"15"}
]
in above dictionary payload, below part is associated with the second dictionary items and have to map respectively
"make_nm": "test",
"model_nm": "test",
"my_yr": "1980"
}
Can anyone help on this?
Thanks
Since customer details is a list
dict(zip(customer_details[0], list(customer_details.values[0]())))
this yields:
{'insured_details': ['asset', 'asset', 'asset'],
'id': ['213', '214', '233'],
'dept': ['account', 'sales', 'market'],
'salary': ['12', '13', '14']}
​
I think a couple of list comprehensions will get you going. If you would like me to unwind them into more traditional for loops, just let me know.
import json
def get_dict_list(item):
first_value = list(item.values())[0]
if not isinstance(first_value, list):
return [item]
return [{key: item[key][i] for key in item.keys()} for i in range(len(first_value))]
cutomer_details = [
{
"insured_details": "asset",
"id": "xxx",
"dept": "account",
"salary": "12"
},
{
"insured_details": ["asset", "asset", "asset"],
"id":["213","214","233"],
"dept":["account","sales","market"],
"salary":["12","13","14"]
}
]
cutomer_details_cleaned = []
for detail in cutomer_details:
cutomer_details_cleaned.extend(get_dict_list(detail))
print(json.dumps(cutomer_details_cleaned, indent=4))
That should give you:
[
{
"insured_details": "asset",
"id": "xxx",
"dept": "account",
"salary": "12"
},
{
"insured_details": "asset",
"id": "213",
"dept": "account",
"salary": "12"
},
{
"insured_details": "asset",
"id": "214",
"dept": "sales",
"salary": "13"
},
{
"insured_details": "asset",
"id": "233",
"dept": "market",
"salary": "14"
}
]

nested dictionary in list to dataframe python

Have a json input from api:
{
"api_info": {
"status": "healthy"
},
"items": [
{
"timestamp": "time",
"stock_data": [
{
"ticker": "string",
"industry": "string",
"Description": "string"
}
]
"ISIN":xxx,
"update_datetime": "time"
}
]
}
have initially run
apiRawData = requests.get(url).json()['items']
then ran the json_normalize method:
apiExtractedData = pd.json_normalize(apiRawData,'stock_data',errors='ignore')
Here is the initial output where the stock_data is still contained within a list.
stock_data ISIN update_datetime
0 [{'description': 'zzz', 'industry': 'C', 'ticker... xxx time
stock_data
ISIN
update_datetime
0
[{'description': 'zzz', 'industry': 'C', 'ticker...]
123
time
What i would like to achieve is a dataframe showing the headers and the corresponding rows:
description
industry
ticker
ISIN
update_datetime
0
'zzz'
'C'
xxx
123
time
Do direct me if there is already an existing question answered :) cheers.
I think you can simply convert your existing data frame into your expected one by using below code:
apiExtractedData['description'] = apiExtractedData['stock_data'].apply(lambda x: x[0]['description'])
apiExtractedData['industry'] = apiExtractedData['stock_data'].apply(lambda x: x[0]['industry'])
apiExtractedData['ticker'] = apiExtractedData['stock_data'].apply(lambda x: x[0]['ticker'])
And then just delete your stock_data column:
apiExtractedData = apiExtractedData.drop(['stock_data'], axis = 1)

Try to perform a sort but the output does not appear to sort the output

I have a little issue with some sorting in my Groovy scrip and I am not sure why it is not working as expected.
Below is the JSON I am trying to sort:
{
"aaa": [
{
"aaa1": xxx,
"aaa2": "xxx",
"bbb": [ {
"bbb1": xx,
"bbb2": [
1,
2
],
"ccc": [],
"ccc1": xxx
}]
},
{
"aaa1": xxx,
"aaa2": "xxx",
"bbb": [ {
"bbb1": xx,
"bbb2": [
1,
2
],
"ccc": [],
"ccc1": xxx
}]
}
]
}
]
}
I am trying to sort this JSON by 'policyid' but it doesn't seem to sort it and I have no idea why it isn't as to me the code seems correct:
Below is what I want it to output:
[ {ccc=[1], bbb1=xxx, aaa1=[]},
{ccc=[1, 2], bbb2=xxx, aaa2=[]}]
There is a typo in your code:
log.info testsort.sort{a,b -> a.policyid <=> b.policyid}
You access policyid field while it should be policyId. Change mentioned line to:
log.info testsort.sort{a,b -> a.policyId <=> b.policyId}
and you will get the output:
[policyId:31, passengerSeqIds:[2], optionalPassengerSeqIds:[], cost:25]
[policyId:34, passengerSeqIds:[1], optionalPassengerSeqIds:[], cost:40]
[policyId:35, passengerSeqIds:[1, 2], optionalPassengerSeqIds:[], cost:72]

Resources