How can i update the rows of a Dataframe with to make them a dictionary with the column names? - python-3.x

I have a dataframe like this.
ID Name id2 name2
101 A 1 d_a
103 B 2 d_b
101 A 3 d_c
103 B 4 d_d
and i want the output df like this.
ID Name id2 name2
101 A [{'id2':1},{'id2':3}] [{'name2':'d_a'},{'name2':'d_c'}]
103 B [{'id2':2},{'id2':4}] [{'name2':'d_b'},{'name2':'d_d'}]

Use list comprehension with DataFrame.to_dict:
df1 = pd.DataFrame([[df[[x]].to_dict('r') for x in df]], columns=df.columns)
print (df1)
col1 \
0 [{'col1': 1}, {'col1': 2}, {'col1': 3}]
col2
0 [{'col2': 'def'}, {'col2': 'bb'}, {'col2': 'ra'}]
EDIT: Use GroupBy.apply with lambda function:
cols = ['id2','name2']
df2 = df.groupby(['ID','Name'])[cols].agg(lambda x: x.to_frame().to_dict('r')).reset_index()
print (df2)
ID Name id2 name2
0 101 A [{'id2': 1}, {'id2': 3}] [{'name2': 'd_a'}, {'name2': 'd_c'}]
1 103 B [{'id2': 2}, {'id2': 4}] [{'name2': 'd_b'}, {'name2': 'd_d'}]

Related

Check if the values in the df matches any of the values of the dict

df -
orderid
supplierid
name
other columns
101
1
x
102
1
z
103
2
x
dict -
{1: {'name': ['x', 'y']},
2: {'name': ['z']}}
My end goal is to check if the name given for a particular id matches any of the values in the dict or not and populate a new column "exist_or_not" with yes or no for the same.
Expected result -
orderid
supplierid
name
exist_or_not
101
1
x
yes
102
1
z
no
103
2
x
no
How can I solve this?
Create DataFrame with nested list comprehension first, then use left join in DataFrame.merge with indicator parameter and last create yes, no in numpy.where:
d = {1: {'name': ['x', 'y']}, 2: {'name': ['z']}}
df1 = pd.DataFrame([(k, x)
for k, v in d.items()
for k1, v1 in v.items()
for x in v1], columns=['supplierid','name'])
print (df1)
supplierid name
0 1 x
1 1 y
2 2 z
df = df.merge(df1, on=['supplierid','name'], how='left', indicator='exist_or_not')
df['exist_or_not'] = np.where(df['exist_or_not'].eq('both'), 'yes', 'no')
print (df)
orderid supplierid name exist_or_not
0 101 1 x yes
1 102 1 z no
2 103 2 x no
With a list comprehension
df["exists_or_not"] = ["yes"
if the_name in d[sup_id]["name"]
else "no"
for the_name, sup_id in zip(df.name, df.supplierid)]
where d is your dictionary {1: {'name': ['x', 'y']}, 2: {'name': ['z']}}.
to get
orderid supplierid name exists_or_not
0 101 1 x yes
1 102 1 z no
2 103 2 x no

pd dataframe from lists and dictionary using series

I have few lists and a dictionary and would like to create a pd dataframe.
Could someone help me out, I seem to be missing something:
one simple example bellow:
dict={"a": 1, "b": 3, "c": "text1"}
l1 = [1, 2, 3, 4]
l3 = ["x", "y"]
Using series I would do like this:
df = pd.DataFrame({'col1': pd.Series(l1), 'col2': pd.Series(l3)})
and would have the lists within the df as expected
for dict would do
df = pd.DataFrame(list(dic.items()), columns=['col3', 'col4'])
And would expect this result:
col1 col2 col3 col4
1 x a 1
2 y b 3
3 c text1
4
The problem is like this the first df would be overwritten by the second call of pd.Dataframe
How would I do this to have only one df with 4 columns?
I know one way would be to split the dict in 2 separate lists and just use Series over 4 lists, but I would think there is a better way to do this, out of 2 lists and 1 dict as above to have directly one df with 4 columns.
thanks for the help
you can also use pd.concat to concat two dataframe.
df1 = pd.DataFrame({'col1': pd.Series(l1), 'col2': pd.Series(l3)})
df2 = pd.DataFrame(list(dic.items()), columns=['col3', 'col4'])
df = pd.concat([df1, df2], axis=1)
Why not build each column seperately via dict.keys() and dict.values() instead of using dict.items()
df = pd.DataFrame({
'col1': pd.Series(l1),
'col2': pd.Series(l3),
'col3': pd.Series(dict.keys()),
'col4': pd.Series(dict.values())
})
print(df)
col1 col2 col3 col4
0 1 x a 1
1 2 y b 3
2 3 NaN c text1
3 4 NaN NaN NaN
Alternatively:
column_values = [l1, l3, dict.keys(), dict.values()]
data = {f"col{i}": pd.Series(values) for i, values in enumerate(column_values)}
df = pd.DataFrame(data)
print(df)
col0 col1 col2 col3
0 1 x a 1
1 2 y b 3
2 3 NaN c text1
3 4 NaN NaN NaN
You can unpack zipped values of list generated from d.items() and pass to itertools.zip_longest for add missing values for match by maximum length of list:
#dict is python code word, so used d for variable
d={"a": 1, "b": 3, "c": "text1"}
l1 = [1, 2, 3, 4]
l3 = ["x", "y"]
df = pd.DataFrame(zip_longest(l1, l3, *zip(*d.items()),
fillvalue=np.nan),
columns=['col1','col2','col3','col4'])
print (df)
col1 col2 col3 col4
0 1 x a 1
1 2 y b 3
2 3 NaN c text1
3 4 NaN NaN NaN

Duplicate rows in a dataframe according to a criterion from the table

I have a dataframe like this:
d = {'col1': ['a', 'b'], 'col2': [2, 4]}
df = pd.DataFrame(data=d)
df
>> col1 col2
0 a 2
1 b 4
and i want to duplicate the rows by col2 and get a table like this:
>> col1 col2
0 a 2
1 a 2
2 b 4
3 b 4
4 b 4
5 b 4
Thanks to everyone for the help!
Here's my solution using some numpy:
numRows = np.sum(df.col2)
blankSpace = np.zeros(numRows,).astype(str)
d2 = {'col1': blankSpace, 'col2': blankSpace}
df2 = pd.DataFrame(data=d2)
counter = 0
for i in range(df.shape[0]):
letter = df.col1[i]
numRowsForLetter = df.col2[i]
for j in range(numRowsForLetter):
df2.at[counter, 'col1'] = letter
df2.at[counter, 'col2'] = numRowsForLetter
counter += 1
df2 is your output dataframe!

Combine text in dataframe python

Suppose I have this DataFrame:
df = pd.DataFrame({'col1': ['AC1', 'AC2', 'AC3', 'AC4', 'AC5'],
'col2': ['A', 'B', 'B', 'A', 'C'],
'col3': ['ABC', 'DEF', 'FGH', 'IJK', 'LMN']})
I want to comnbine text of 'col3' if values in 'col2' are duplicated. The result should be like this:
col1 col2 col3
0 AC1 A ABC, IJK
1 AC2 B DEF, FGH
2 AC3 B DEF, FGH
3 AC4 A ABC, IJK
4 AC5 C LMN
I start this excercise by finding duplicated values in this dataframe:
col2 = df['col2']
df1 = df[col2.isin(col2[col2.duplicated()])]
Any suggestion what I should do next?
You can use
a = df.groupby('col2').apply(lambda group: ','.join(group['col3']))
df['col3'] = df['col2'].map(a)
Output
print(df)
col1 col2 col3
0 AC1 A ABC,IJK
1 AC2 B DEF,FGH
2 AC3 B DEF,FGH
3 AC4 A ABC,IJK
4 AC5 C LMN
You might want to leverage the groupby and apply functions in Pandas
df.groupby('col2').apply(lambda group: ','.join(group['col3']))

append one dataframe column value to another dataframe

I have two dataframes. df1 is empty dataframe and df2 is having some data as shown. There are few columns common in both dfs. I want to append df2 dataframe columns data into df1 dataframe's column. df3 is expected result.
I have referred Python + Pandas + dataframe : couldn't append one dataframe to another, but not working. It gives following error:
ValueError: Plan shapes are not aligned
df1:
Empty DataFrame
Columns: [a, b, c, d, e]
Index: [] `
df2:
c e
0 11 55
1 22 66
df3 (expected output):
a b c d e
0 11 55
1 22 66
tried with append but not getting desired result
import pandas as pd
l1 = ['a', 'b', 'c', 'd', 'e']
l2 = []
df1 = pd.DataFrame(l2, columns=l1)
l3 = ['c', 'e']
l4 = [[11, 55],
[22, 66]]
df2 = pd.DataFrame(l4, columns=l3)
print("concat","\n",pd.concat([df1,df2])) # columns will be inplace
print("merge Nan","\n",pd.merge(df2, df1,how='left', on=l3)) # columns occurence is not preserved
#### Output ####
#concat
a b c d e
0 NaN NaN 11 NaN 55
1 NaN NaN 22 NaN 66
#merge
c e a b d
0 11 55 NaN NaN NaN
1 22 66 NaN NaN NaN
Append seems to work for me. Does this not do what you want?
df1 = pd.DataFrame(columns=['a', 'b', 'c'])
print("df1: ")
print(df1)
df2 = pd.DataFrame(columns=['a', 'c'], data=[[0, 1], [2, 3]])
print("df2:")
print(df2)
print("df1.append(df2):")
print(df1.append(df2, ignore_index=True, sort=False))
Output:
df1:
Empty DataFrame
Columns: [a, b, c]
Index: []
df2:
a c
0 0 1
1 2 3
df1.append(df2):
a b c
0 0 NaN 1
1 2 NaN 3
Have you tried pd.concat ?
pd.concat([df1,df2])

Resources