renaming data in a dataframe using config json - python-3.x

I have a situation where I need to change some junk data in json such as
'a' need to be A
'B' need to be B
I want to create a config json which shall have a dictionary where the key & value should look like
dict={'a':'A', 'b':'B'}
And then access the json in another python file which reads data from a dataframe where those junk values(keys of the dictionary) are there & change them to the correct ones(values of the dictionary). Can anyone help..?

So, given the following config.json file:
{
"junk1": "John",
"junk2": "Jack",
"junk3": "Tom",
"junk4": "Butch"
}
You could have the following python file in the same directory:
import pandas as pd
import json
with open("config.json", "r") as f:
cfg = json.load(f)
df = pd.DataFrame(
{
"class": {
0: "class1",
1: "class2",
2: "class3",
3: "class4",
},
"firstname": {0: "junk1", 1: "junk2", 2: "junk3", 3: "junk4"},
}
)
print(df)
# Outputs
class firstname
0 class1 junk1
1 class2 junk2
2 class3 junk3
3 class4 junk4
And then do:
df["firstname"] = df["firstname"].replace(cfg)
print(df)
# Outputs
class firstname
0 class1 John
1 class2 Jack
2 class3 Tom
3 class4 Butch

Related

Dataframe Nested Dict inside List - retrieve 'id' value

['{"data":{"attributes":{"title":"Contract 1","AnnualValue":0},"id":1,"type":"contract"}}',
'{"data":{"attributes":{"title":"Contract 2","AnnualValue":0},"id":2,"type":"contract"}}',
'{"data":{"attributes":{"title":"Contract 3","AnnualValue":0},"id":3,"type":"contract"}}']
I have the above data frame and need to 'pull' the 'id' value. tried converting to json etc but struggling to get the value. Is anyone able to point me in the right direction - 5 hours of googling has just led me up the garden path!!
Thanks
import json
json_list = [
'{"data":{"attributes":{"title":"Contract 1","AnnualValue":0},"id":1,"type":"contract"}}',
'{"data":{"attributes":{"title":"Contract 2","AnnualValue":0},"id":2,"type":"contract"}}',
'{"data":{"attributes":{"title":"Contract 3","AnnualValue":0},"id":3,"type":"contract"}}'
]
ids = [
json.loads(json_body)["data"]["id"]
for json_body in json_list
]
[1, 2, 3]
This is how you can display that data in a dataframe:
import pandas as pd
import json
data_list = ['{"data":{"attributes":{"title":"Contract 1","AnnualValue":0},"id":1,"type":"contract"}}',
'{"data":{"attributes":{"title":"Contract 2","AnnualValue":0},"id":2,"type":"contract"}}',
'{"data":{"attributes":{"title":"Contract 3","AnnualValue":0},"id":3,"type":"contract"}}']
new_data_list = []
for x in data_list:
new_data_list.append((json.loads(x)['data']['id'], json.loads(x)['data']['type'], json.loads(x)['data']['attributes']['title'], json.loads(x)['data']['attributes']['AnnualValue']))
df = pd.DataFrame(new_data_list, columns = ['Id', 'Type', 'Title', 'Annual Value'])
print(df)
Which returns:
Id
Type
Title
Annual Value
0
1
contract
Contract 1
0
1
2
contract
Contract 2
0
2
3
contract
Contract 3
0

filter dataframe columns as you iterate through rows and create dictionary

I have the following table of data in a spreadsheet:
Name Description Value
foo foobar 5
baz foobaz 4
bar foofoo 8
I'm reading the spreadsheet and passing the data as a dataframe.
I need to transform this table of data to json following a specific schema.
I have the following script:
for index, row in df.iterrows():
if row['Description'] == 'foofoo':
print(row.to_dict())
which return:
{'Name': 'bar', 'Description': 'foofoo', 'Value': '8'}
I want to be able to filter out a specific column. For example, to return this:
{'Name': 'bar', 'Description': 'foofoo'}
I know that I can print only the columns I want with this print(row['Name'],row['Description']) however this is only returning me values when I also want to return the key.
How can I do this?
I wrote this entire thing only to realize that #anky_91 had already suggested it. Oh well...
import pandas as pd
data = {
"name": ["foo", "abc", "baz", "bar"],
"description": ["foobar", "foofoo", "foobaz", "foofoo"],
"value": [5, 3, 4, 8],
}
df = pd.DataFrame(data=data)
print(df, end='\n\n')
rec_dicts = df.loc[df["description"] == "foofoo", ["name", "description"]].to_dict(
"records"
)
print(rec_dicts)
Output:
name description value
0 foo foobar 5
1 abc foofoo 3
2 baz foobaz 4
3 bar foofoo 8
[{'name': 'abc', 'description': 'foofoo'}, {'name': 'bar', 'description': 'foofoo'}]
After converting to dictionary you can delete the key which you don't need with:
del(row[value])
Now the dictionary will have only name and description.
You can try this:
import io
import pandas as pd
s="""Name,Description,Value
foo,foobar,5
baz,foobaz,4
bar,foofoo,8
"""
df = pd.read_csv(io.StringIO(s))
for index, row in df.iterrows():
if row['Description'] == 'foofoo':
print(row[['Name', 'Description']].to_dict())
Result:
{'Name': 'bar', 'Description': 'foofoo'}

How to split a DataFrame that has a list of dictionaries into individual DataFrame columns?

I have a json file that I import as a dataframe. One of the columns contains a list of dictionaries. I need to split the dictionaries into individual columns for each row.
import urllib
import json
import requests
from pandas.io.json import json_normalize
f = requests.get(url)
data = json.loads(f.text)
docs = json_normalize(data['documents'])
display(docs)
doc_num sentence categories
1 "I am a dog" [{"id" : "A"}, {"id" : "B"}, {"id" : "C"}]
2 "I am a cat" [{"id" : "C"}, {"id" : "D"}, {"id" : "E"}]
... ... ...
What I would like my DataFrame to look like is:
doc_num sentence cat_A cat_B cat_C cat_D ...
1 "I am a dog" 1 1 1 0
2 "I am a cat" 0 0 1 1
... ... ... ... ... ...
I would like my DataFrame to separate the list of dictionaries into individual columns where the column has a "1" for if it belongs in that category, and a "0" if it does not belong to that category.
This may help. I think the code is understandable but if you need help let me know. I tested the output.
df = pd.DataFrame(data={'categories':[[{'id':'A'},{'id':'B'},{'id':'C'}],[{'id':'B'},{'id':'D'}],[{'id':'C',}]]})
all_keys = {}
def get_all_keys(x):
for d in x:
if d['id'] not in all_keys:
all_keys[d['id']] = 1
_,df['categories'].apply(get_all_keys)
for i,key in enumerate(all_keys.keys()):
all_keys[key] = i
mat = np.zeros((df.shape[0],len(all_keys.keys())),dtype=np.int)
print(mat.shape)
def f(i,x):
for d in x:
key = d['id']
key_index = all_keys[key]
mat[i][key_index]=1
for i,row in df.iterrows():
print(i,row)
f(i,row['categories'])
new_df = pd.DataFrame(data=mat,columns=all_keys.keys())
df = pd.concat([df,new_df],axis=1)

Count occurrences of item in JSON element grouped by another element

I am trying to count the number of occurences of an item (Activity) in a json file grouped by another item (Source). Example json below.
{
"No": "9",
"Time": "08:12",
"Source": "location1",
"Dest": "location3",
"Activity": "fast"
}
My code below so far counts the occurences of each Activity
from collections import Counter
import json
with open('dataset_3.json', 'r') as json_file:
json_data = json.load(json_file) # loads json data
c = Counter(item['Activity'] for item in json_data)
print(c)
The code correctly counts and outputs below.
Counter({'fast': 8, 'medium': 1, 'slow': 1})
I would like now to count each occurence of activity again, but grouped by location so the output should be something like:
location 1 Fast: 8, Medium: 1, Slow: 2
loctaion 2 Fast: 6, Medium: 3, Slow: 4
I have tried the code below but the output is not correct (see below)
with open('dataset_3.json', 'r') as json_file:
json_data = json.load(json_file) # loads json data
for item in json_data:
if item['Source'] == 'location1':
c = Counter(item['Activity'])
print(c)
Output
Counter({'f': 3, 'a': 1, 's': 1, 't'})
Counter({'s': 1, 'l': 1, 'o': 1, 'w'})
You can put an if inside the generator statement for the Counter to add a condition to the for loop. I pasted your code with the fix below:
from collections import Counter
import json
with open('dataset_3.json', 'r') as json_file:
json_data = json.load(json_file) # loads json data
c = Counter(item['Activity'] for item in json_data if item['Source'] == 'location1')
print(c)

Convert list of Pandas Dataframe JSON objects

I have a Dataframe with one column where each cell in the column is a JSON object.
players
0 {"name": "tony", "age": 57}
1 {"name": "peter", age": 46}
I want to convert this to a data frame as:
name age
tony 57
peter 46
Any ideas how I do this?
Note: the original JSON object looks like this...
{
"players": [{
"age": 57,
"name":"tony"
},
{
"age": 46,
"name":"peter"
}]
}
Use DataFrame constructor if types of values are dicts:
#print (type(df.loc[0, 'players']))
#<class 'str'>
#import ast
#df['players'] = df['players'].apply(ast.literal_eval)
print (type(df.loc[0, 'players']))
<class 'dict'>
df = pd.DataFrame(df['players'].values.tolist())
print (df)
age name
0 57 tony
1 46 peter
But better is use json_normalize from jsons object as suggested #jpp:
json = {
"players": [{
"age": 57,
"name":"tony"
},
{
"age": 46,
"name":"peter"
}]
}
df = json_normalize(json, 'players')
print (df)
age name
0 57 tony
1 46 peter
This can do it:
df = df['players'].apply(pd.Series)
However, it's slow:
In [20]: timeit df.players.apply(pd.Series)
1000 loops, best of 3: 824 us per loop
#jezrael suggestion is faster:
In [24]: timeit pd.DataFrame(df.players.values.tolist())
1000 loops, best of 3: 387 us per loop

Resources