filter dataframe columns as you iterate through rows and create dictionary - python-3.x

I have the following table of data in a spreadsheet:
Name Description Value
foo foobar 5
baz foobaz 4
bar foofoo 8
I'm reading the spreadsheet and passing the data as a dataframe.
I need to transform this table of data to json following a specific schema.
I have the following script:
for index, row in df.iterrows():
if row['Description'] == 'foofoo':
print(row.to_dict())
which return:
{'Name': 'bar', 'Description': 'foofoo', 'Value': '8'}
I want to be able to filter out a specific column. For example, to return this:
{'Name': 'bar', 'Description': 'foofoo'}
I know that I can print only the columns I want with this print(row['Name'],row['Description']) however this is only returning me values when I also want to return the key.
How can I do this?

I wrote this entire thing only to realize that #anky_91 had already suggested it. Oh well...
import pandas as pd
data = {
"name": ["foo", "abc", "baz", "bar"],
"description": ["foobar", "foofoo", "foobaz", "foofoo"],
"value": [5, 3, 4, 8],
}
df = pd.DataFrame(data=data)
print(df, end='\n\n')
rec_dicts = df.loc[df["description"] == "foofoo", ["name", "description"]].to_dict(
"records"
)
print(rec_dicts)
Output:
name description value
0 foo foobar 5
1 abc foofoo 3
2 baz foobaz 4
3 bar foofoo 8
[{'name': 'abc', 'description': 'foofoo'}, {'name': 'bar', 'description': 'foofoo'}]

After converting to dictionary you can delete the key which you don't need with:
del(row[value])
Now the dictionary will have only name and description.

You can try this:
import io
import pandas as pd
s="""Name,Description,Value
foo,foobar,5
baz,foobaz,4
bar,foofoo,8
"""
df = pd.read_csv(io.StringIO(s))
for index, row in df.iterrows():
if row['Description'] == 'foofoo':
print(row[['Name', 'Description']].to_dict())
Result:
{'Name': 'bar', 'Description': 'foofoo'}

Related

column comprehension robust to missing values

I have only been able to create a two column data frame from a defaultdict (termed output):
df_mydata = pd.DataFrame([(k, v) for k, v in output.items()],
columns=['id', 'value'])
What I would like to be able to do is using this basic format also initiate the dataframe with three columns: 'id', 'id2' and 'value'. I have a separate defined dict that contains the necessary look up info, called id_lookup.
So I tried:
df_mydata = pd.DataFrame([(k, id_lookup[k], v) for k, v in output.items()],
columns=['id', 'id2','value'])
I think I'm doing it right, but I get key errors. I will only know if id_lookup is exhaustive for all possible encounters in hindsight. For my purposes, simply putting it all together and placing 'N/A` or something for those types of errors will be acceptable.
Would the above be appropriate for calculating a new column of data using a defaultdict and a simple lookup dict, and how might I make it robust to key errors?
Here is an example of how you could do this:
import pandas as pd
from collections import defaultdict
df = pd.DataFrame({'id': [1, 2, 3, 4],
'value': [10, 20, 30, 40]})
id_lookup = {1: 'A', 2: 'B', 3: 'C'}
new_column = defaultdict(str)
# Loop through the df and populate the defaultdict
for index, row in df.iterrows():
try:
new_column[index] = id_lookup[row['id']]
except KeyError:
new_column[index] = 'N/A'
# Convert the defaultdict to a Series and add it as a new column in the df
df['id2'] = pd.Series(new_column)
# Print the updated DataFrame
print(df)
which gives:
id value id2
0 1 10 A
1 2 20 B
2 3 30 C
3 4 40 N/A
​

How to remove common item from list of dictionaries after grouping

I have a list of dictionaries like below. I want to group the dictionaries based on grade, and convert the list of dictionaries to single dictionaries with key as grade value and value as list of dictionaries
Input:
[
{'name':'abc','mark':'99','grade':'A'},
{'name':'xyz','mark':'90','grade':'A'},
{'name':'123','mark':'70','grade':'C'},
]
I want my output like below:
{
A: [ {'name': 'abc','mark':'99'}, {'name': 'xyz','mark':'90'} ],
C: [ {'name': '123','mark':'70'} ]
}
I tried sorted and groupby; but not able to remove grade from dictionary.
Use a loop with dict.setdefault:
l = [{'name':'abc','mark':'99','grade':'A'},
{'name':'xyz','mark':'90','grade':'A'},
{'name':'123','mark':'70','grade':'C'},
]
out = {}
for d in l:
# avoid mutating the original dictionaries
d = d.copy()
# get grade, try to get the key in "out"
# if the key doesn't exist, initialize with an empty list
out.setdefault(d.pop('grade'), []).append(d)
print(out)
Output:
{'A': [{'name': 'abc', 'mark': '99'},
{'name': 'xyz', 'mark': '90'}],
'C': [{'name': '123', 'mark': '70'}],
}

Change a dataframe column value based on the current value?

I have a pandas dataframe with several columns and in one of them, there are string values. I need to change these strings to an acceptable value based on the current value. The dataframe is relatively large (40.000 x 32)
I've made a small function that takes the string to be changed as a parameter and then lookup what this should be changed to.
df = pd.DataFrame({
'A': ['Script','Scrpt','MyScript','Sunday','Monday','qwerty'],
'B': ['Song','Blues','Rock','Classic','Whatever','Something']})
def lut(txt):
my_lut = {'Script' : ['Script','Scrpt','MyScript'],
'Weekday' : ['Sunday','Monday','Tuesday']}
for key, value in my_lut.items():
if txt in value:
return(key)
break
return('Unknown')
The desired output should be:
A B
0 Script Song
1 Script Blues
2 Script Rock
3 Weekday Classic
4 Weekday Whatever
5 Unknown Something
I can't figure out how to apply this to the dataframe.
I've struggled over this for some time now so any input will be appreciated
Regards,
Check this out:
import pandas as pd
df = pd.DataFrame({
'A': ['Script','Scrpt','MyScript','Sunday','sdfsd','qwerty'],
'B': ['Song','Blues','Rock','Classic','Whatever','Something']})
dic = {'Weekday': ['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday'], 'Script': ['Script','Scrpt','MyScript']}
for k, v in dic.items():
for item in v:
df.loc[df.A == item, 'A'] = k
df.loc[~df.A.isin(k for k, v in dic.items()), 'A'] = "Unknown"
Output:

Create one nested object with two objects from dictionary

I'm not sure if the title of my question is the right description to the issue I'm facing.
I'm reading the following table of data from a spreadsheet and passing it as a dataframe:
Name Description Value
foo foobar 5
baz foobaz 4
bar foofoo 8
I need to transform this table of data to json following a specific schema.
I'm trying to get the following output:
{'global': {'Name': 'bar', 'Description': 'foofoo', 'spec': {'Value': '8'}}
So far I'm able to get the global and spec objects but I'm not sure how I should combine them to get the expected output above.
I wrote this:
for index, row in df.iterrows():
if row['Description'] == 'foofoo':
global = row.to_dict()
spec = row.to_dict()
del(global['Value'])
del(spec['Name'])
del(spec['Description'])
print("global:", global)
print("spec:", spec)
with the following output:
global: {'Name': 'bar', 'Description': 'foofoo'}
spec: {'Value': '8'}
How can I combine these two objects to get to the desired output?
This should give you that output:
global['spec'] = spec
combined = {'global': global}
Try this and see if it works faster: slow speed might be due to iterrows. I suggest you move the iteration to the dictionary after exporting from the dataframe.
Name Description Value
0 foo foobar 5
1 baz foobaz 4
2 bar foofoo 8
#Export dataframe to dictionar, using the 'index' option
M = df.to_dict('index')
r = {}
q = []
#iterating through the dictionary items(key,value pair)
for i,j in M.items():
#assign value to key 'global'
r['global'] = j
#popitem() works similarly to pop in list,
#take out the last item
#and remove it from parent dictionary
#this nests the spec key, inside the global key
r['global']['spec'] = dict([j.popitem()])
#this ensures the dictionaries already present are not overriden
#you could use copy or deep.copy to ensure same state
q.append(dict(r))
{'global': {'Name': 'foo', 'Description': 'foobar', 'spec': {'Value': 5}}}
{'global': {'Name': 'baz', 'Description': 'foobaz', 'spec': {'Value': 4}}}
{'global': {'Name': 'bar', 'Description': 'foofoo', 'spec': {'Value': 8}}}
dict popitem

turn three columns into dictionary python

Name = [list(['Amy', 'A', 'Angu']),
list(['Jon', 'Johnson']),
list(['Bob', 'Barker'])]
Other = [list(['Amy', 'Any', 'Anguish']),
list(['Jon', 'Jan']),
list(['Baker', 'barker'])]
import pandas as pd
df = pd.DataFrame({'Other' : Other,
'ID': ['E123','E456','E789'],
'Other_ID': ['A123','A456','A789'],
'Name' : Name,
})
ID Name Other Other_ID
0 E123 [Amy, A, Angu] [Amy, Any, Anguish] A123
1 E456 [Jon, Johnson] [Jon, Jan] A456
2 E789 [Bob, Barker] [Baker, barker] A789
I have the df as seen above. I want to make columns ID, Name and Other into a dictionary with they key being ID. I tried this according to python pandas dataframe columns convert to dict key and value
todict = dict(zip(df.ID, df.Name))
Which is close to what I want
{'E123': ['Amy', 'A', 'Angu'],
'E456': ['Jon', 'Johnson'],
'E789': ['Bob', 'Barker']}
But I would like to get this output that includes values from Other column
{'E123': ['Amy', 'A', 'Angu','Amy', 'Any','Anguish'],
'E456': ['Jon', 'Johnson','Jon','Jan'],
'E789': ['Bob', 'Barker','Baker','barker']
}
And If I put the third column Other it gives me errors
todict = dict(zip(df.ID, df.Name, df.Other))
How do I get the output I want?
Why not just combine the Name and Other column before creating a dict of the Name column.
df['Name'] = df['Name'] + df['Other']
dict(zip(df.ID, df.Name))
Gives
{'E123': ['Amy', 'A', 'Angu', 'Amy', 'Any', 'Anguish'],
'E456': ['Jon', 'Johnson', 'Jon', 'Jan'],
'E789': ['Bob', 'Barker', 'Baker', 'barker']}

Resources