I have the following table of data in a spreadsheet:
Name Description Value
foo foobar 5
baz foobaz 4
bar foofoo 8
I'm reading the spreadsheet and passing the data as a dataframe.
I need to transform this table of data to json following a specific schema.
I have the following script:
for index, row in df.iterrows():
if row['Description'] == 'foofoo':
print(row.to_dict())
which return:
{'Name': 'bar', 'Description': 'foofoo', 'Value': '8'}
I want to be able to filter out a specific column. For example, to return this:
{'Name': 'bar', 'Description': 'foofoo'}
I know that I can print only the columns I want with this print(row['Name'],row['Description']) however this is only returning me values when I also want to return the key.
How can I do this?
I wrote this entire thing only to realize that #anky_91 had already suggested it. Oh well...
import pandas as pd
data = {
"name": ["foo", "abc", "baz", "bar"],
"description": ["foobar", "foofoo", "foobaz", "foofoo"],
"value": [5, 3, 4, 8],
}
df = pd.DataFrame(data=data)
print(df, end='\n\n')
rec_dicts = df.loc[df["description"] == "foofoo", ["name", "description"]].to_dict(
"records"
)
print(rec_dicts)
Output:
name description value
0 foo foobar 5
1 abc foofoo 3
2 baz foobaz 4
3 bar foofoo 8
[{'name': 'abc', 'description': 'foofoo'}, {'name': 'bar', 'description': 'foofoo'}]
After converting to dictionary you can delete the key which you don't need with:
del(row[value])
Now the dictionary will have only name and description.
You can try this:
import io
import pandas as pd
s="""Name,Description,Value
foo,foobar,5
baz,foobaz,4
bar,foofoo,8
"""
df = pd.read_csv(io.StringIO(s))
for index, row in df.iterrows():
if row['Description'] == 'foofoo':
print(row[['Name', 'Description']].to_dict())
Result:
{'Name': 'bar', 'Description': 'foofoo'}
Related
I have only been able to create a two column data frame from a defaultdict (termed output):
df_mydata = pd.DataFrame([(k, v) for k, v in output.items()],
columns=['id', 'value'])
What I would like to be able to do is using this basic format also initiate the dataframe with three columns: 'id', 'id2' and 'value'. I have a separate defined dict that contains the necessary look up info, called id_lookup.
So I tried:
df_mydata = pd.DataFrame([(k, id_lookup[k], v) for k, v in output.items()],
columns=['id', 'id2','value'])
I think I'm doing it right, but I get key errors. I will only know if id_lookup is exhaustive for all possible encounters in hindsight. For my purposes, simply putting it all together and placing 'N/A` or something for those types of errors will be acceptable.
Would the above be appropriate for calculating a new column of data using a defaultdict and a simple lookup dict, and how might I make it robust to key errors?
Here is an example of how you could do this:
import pandas as pd
from collections import defaultdict
df = pd.DataFrame({'id': [1, 2, 3, 4],
'value': [10, 20, 30, 40]})
id_lookup = {1: 'A', 2: 'B', 3: 'C'}
new_column = defaultdict(str)
# Loop through the df and populate the defaultdict
for index, row in df.iterrows():
try:
new_column[index] = id_lookup[row['id']]
except KeyError:
new_column[index] = 'N/A'
# Convert the defaultdict to a Series and add it as a new column in the df
df['id2'] = pd.Series(new_column)
# Print the updated DataFrame
print(df)
which gives:
id value id2
0 1 10 A
1 2 20 B
2 3 30 C
3 4 40 N/A
I have a list of dictionaries like below. I want to group the dictionaries based on grade, and convert the list of dictionaries to single dictionaries with key as grade value and value as list of dictionaries
Input:
[
{'name':'abc','mark':'99','grade':'A'},
{'name':'xyz','mark':'90','grade':'A'},
{'name':'123','mark':'70','grade':'C'},
]
I want my output like below:
{
A: [ {'name': 'abc','mark':'99'}, {'name': 'xyz','mark':'90'} ],
C: [ {'name': '123','mark':'70'} ]
}
I tried sorted and groupby; but not able to remove grade from dictionary.
Use a loop with dict.setdefault:
l = [{'name':'abc','mark':'99','grade':'A'},
{'name':'xyz','mark':'90','grade':'A'},
{'name':'123','mark':'70','grade':'C'},
]
out = {}
for d in l:
# avoid mutating the original dictionaries
d = d.copy()
# get grade, try to get the key in "out"
# if the key doesn't exist, initialize with an empty list
out.setdefault(d.pop('grade'), []).append(d)
print(out)
Output:
{'A': [{'name': 'abc', 'mark': '99'},
{'name': 'xyz', 'mark': '90'}],
'C': [{'name': '123', 'mark': '70'}],
}
I have a pandas dataframe with several columns and in one of them, there are string values. I need to change these strings to an acceptable value based on the current value. The dataframe is relatively large (40.000 x 32)
I've made a small function that takes the string to be changed as a parameter and then lookup what this should be changed to.
df = pd.DataFrame({
'A': ['Script','Scrpt','MyScript','Sunday','Monday','qwerty'],
'B': ['Song','Blues','Rock','Classic','Whatever','Something']})
def lut(txt):
my_lut = {'Script' : ['Script','Scrpt','MyScript'],
'Weekday' : ['Sunday','Monday','Tuesday']}
for key, value in my_lut.items():
if txt in value:
return(key)
break
return('Unknown')
The desired output should be:
A B
0 Script Song
1 Script Blues
2 Script Rock
3 Weekday Classic
4 Weekday Whatever
5 Unknown Something
I can't figure out how to apply this to the dataframe.
I've struggled over this for some time now so any input will be appreciated
Regards,
Check this out:
import pandas as pd
df = pd.DataFrame({
'A': ['Script','Scrpt','MyScript','Sunday','sdfsd','qwerty'],
'B': ['Song','Blues','Rock','Classic','Whatever','Something']})
dic = {'Weekday': ['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday'], 'Script': ['Script','Scrpt','MyScript']}
for k, v in dic.items():
for item in v:
df.loc[df.A == item, 'A'] = k
df.loc[~df.A.isin(k for k, v in dic.items()), 'A'] = "Unknown"
Output:
I'm not sure if the title of my question is the right description to the issue I'm facing.
I'm reading the following table of data from a spreadsheet and passing it as a dataframe:
Name Description Value
foo foobar 5
baz foobaz 4
bar foofoo 8
I need to transform this table of data to json following a specific schema.
I'm trying to get the following output:
{'global': {'Name': 'bar', 'Description': 'foofoo', 'spec': {'Value': '8'}}
So far I'm able to get the global and spec objects but I'm not sure how I should combine them to get the expected output above.
I wrote this:
for index, row in df.iterrows():
if row['Description'] == 'foofoo':
global = row.to_dict()
spec = row.to_dict()
del(global['Value'])
del(spec['Name'])
del(spec['Description'])
print("global:", global)
print("spec:", spec)
with the following output:
global: {'Name': 'bar', 'Description': 'foofoo'}
spec: {'Value': '8'}
How can I combine these two objects to get to the desired output?
This should give you that output:
global['spec'] = spec
combined = {'global': global}
Try this and see if it works faster: slow speed might be due to iterrows. I suggest you move the iteration to the dictionary after exporting from the dataframe.
Name Description Value
0 foo foobar 5
1 baz foobaz 4
2 bar foofoo 8
#Export dataframe to dictionar, using the 'index' option
M = df.to_dict('index')
r = {}
q = []
#iterating through the dictionary items(key,value pair)
for i,j in M.items():
#assign value to key 'global'
r['global'] = j
#popitem() works similarly to pop in list,
#take out the last item
#and remove it from parent dictionary
#this nests the spec key, inside the global key
r['global']['spec'] = dict([j.popitem()])
#this ensures the dictionaries already present are not overriden
#you could use copy or deep.copy to ensure same state
q.append(dict(r))
{'global': {'Name': 'foo', 'Description': 'foobar', 'spec': {'Value': 5}}}
{'global': {'Name': 'baz', 'Description': 'foobaz', 'spec': {'Value': 4}}}
{'global': {'Name': 'bar', 'Description': 'foofoo', 'spec': {'Value': 8}}}
dict popitem
Name = [list(['Amy', 'A', 'Angu']),
list(['Jon', 'Johnson']),
list(['Bob', 'Barker'])]
Other = [list(['Amy', 'Any', 'Anguish']),
list(['Jon', 'Jan']),
list(['Baker', 'barker'])]
import pandas as pd
df = pd.DataFrame({'Other' : Other,
'ID': ['E123','E456','E789'],
'Other_ID': ['A123','A456','A789'],
'Name' : Name,
})
ID Name Other Other_ID
0 E123 [Amy, A, Angu] [Amy, Any, Anguish] A123
1 E456 [Jon, Johnson] [Jon, Jan] A456
2 E789 [Bob, Barker] [Baker, barker] A789
I have the df as seen above. I want to make columns ID, Name and Other into a dictionary with they key being ID. I tried this according to python pandas dataframe columns convert to dict key and value
todict = dict(zip(df.ID, df.Name))
Which is close to what I want
{'E123': ['Amy', 'A', 'Angu'],
'E456': ['Jon', 'Johnson'],
'E789': ['Bob', 'Barker']}
But I would like to get this output that includes values from Other column
{'E123': ['Amy', 'A', 'Angu','Amy', 'Any','Anguish'],
'E456': ['Jon', 'Johnson','Jon','Jan'],
'E789': ['Bob', 'Barker','Baker','barker']
}
And If I put the third column Other it gives me errors
todict = dict(zip(df.ID, df.Name, df.Other))
How do I get the output I want?
Why not just combine the Name and Other column before creating a dict of the Name column.
df['Name'] = df['Name'] + df['Other']
dict(zip(df.ID, df.Name))
Gives
{'E123': ['Amy', 'A', 'Angu', 'Amy', 'Any', 'Anguish'],
'E456': ['Jon', 'Johnson', 'Jon', 'Jan'],
'E789': ['Bob', 'Barker', 'Baker', 'barker']}