How to rename values in a column from list? - python-3.x

I have a df that looks like this:
col1 col2
value test1
value test2
value test3
value test4
value test5
I want to rename col1 from a list in a repeating fashion like so:
lst = ['new1','new2','new3','new4','new5']
col1 col2
new1 test1
new2 test2
new3 test3
new4 test4
new5 test5
I need the list to repeat for all the rows in col1.
I tried this:
df = df.set_index('col1')
df = df.rename(index={'value':['new1','new2','new3','new4','new5']})
but this passes the entire list into each row of col1 like so:
col1 col2
['new1','new2','new3','new4','new5'] test1
['new1','new2','new3','new4','new5'] test2
['new1','new2','new3','new4','new5'] test3
['new1','new2','new3','new4','new5'] test4
['new1','new2','new3','new4','new5'] test5

assign
This only works for OP's example where the lst length is the same as the dataframe df
df.assign(col1=lst)
col1 col2
0 new1 test1
1 new2 test2
2 new3 test3
3 new4 test4
4 new5 test5
More generically
This is more generic. If you aren't using Python 3.6 and have f-strings you can use str.format
df.assign(col1=[f'new{i+1}' for i in range(len(df))])
# df.assign(col1=[*map('new{}'.format, range(1, len(df) + 1))])
col1 col2
0 new1 test1
1 new2 test2
2 new3 test3
3 new4 test4
4 new5 test5
With itertools
If you want to just repeat the list you've got, I'd use itertools islice and cycle
from itertools import cycle, islice
df.assign(col1=[*islice(cycle(lst), len(df))])
col1 col2
0 new1 test1
1 new2 test2
2 new3 test3
3 new4 test4
4 new5 test5

One way from numpy.put
lst = ['new1','new2']
np.put(df['col1'],np.arange(len(df)),lst)
df
Out[37]:
col1 col2
0 new1 test1
1 new2 test2
2 new1 test3
3 new2 test4
4 new1 test5
Another option
n=len(df)
df['col1']=(lst*((n//len(lst))+1))[:n]
df
Out[48]:
col1 col2
0 new1 test1
1 new2 test2
2 new1 test3
3 new2 test4
4 new1 test5

Related

Group by and drop duplicates in pandas dataframe

I have a pandas dataframe as below. I want to group by based on all the three columns and retain the group with the max of Col1.
import pandas as pd
df = pd.DataFrame({'col1':['A', 'A', 'A', 'A', 'B', 'B'], 'col2':['1', '1', '1', '1', '2', '3'], 'col3':['5', '5', '2', '2', '2', '3']})
df
col1 col2 col3
0 A 1 5
1 A 1 5
2 A 1 2
3 A 1 2
4 B 2 2
5 B 3 3
My expected output
col1 col2 col3
0 A 1 5
1 A 1 5
4 B 2 2
5 B 3 3
I tried below code, but it return me the last row of each group, instead I want to sort by col3 and keep the group with max col3
df.drop_duplicates(keep='last', subset=['col1','col2','col3'])
col1 col2 col3
1 A 1 5
3 A 1 2
4 B 2 2
5 B 3 3
For Example: Here I want to drop 1st group because 2 < 5, so I want to keep the group with col3 as 5
df.sort_values(by=['col1', 'col2', 'col3'], ascending=False)
a_group = df.groupby(['col1', 'col2', 'col3'])
for name, group in a_group:
group = group.reset_index(drop=True)
print(group)
col1 col2 col3
0 A 1 2
1 A 1 2
col1 col2 col3
0 A 1 5
1 A 1 5
col1 col2 col3
0 B 2 2
col1 col2 col3
0 B 3 3
You cant group on all columns since the col you wish to retain max for has different values. Instead dont include that column in the group and consider others:
col_to_max = 'col3'
i = df.columns ^ [col_to_max]
out = df[df[col_to_max] == df.groupby(list(i))[col_to_max].transform('max')]
print(out)
col1 col2 col3
0 A 1 5
1 A 1 5
4 B 2 2
5 B 3 3
So we can do
out = df[df.col3==df.groupby(['col1','col2'])['col3'].transform('max')]
col1 col2 col3
0 A 1 5
1 A 1 5
4 B 2 2
5 B 3 3
I believe you can use groupby with nlargest(2). Also make sure that your 'col3' is a numerical one.
>>> df['col3'] = df['col3'].astype(int)
>>> df.groupby(['col1','col2'])['col3'].nlargest(2).reset_index().drop('level_2',axis=1)
col1 col2 col3
0 A 1 5
1 A 1 5
2 B 2 2
3 B 3 3
You can get index which doesn't has col3 max value and duplicated index and drop the intersection
ind = df.assign(max = df.groupby("col1")["col3"].transform("max")).query("max != col3").index
ind2 = df[df.duplicated(keep=False)].index
df.drop(set(ind).intersection(ind2))
col1 col2 col3
0 A 1 5
1 A 1 5
4 B 2 2
5 B 3 3

line feed inside row in column with pandas

are there any way in pandas to separate data inside a row in a column? row have multiple data, I mean, I group by col1 and the result is that I have a df like that:
col1 Col2
0 1 abc,def,ghi
1 2 xyz,asd
and desired output would be:
Col1 Col2
0 1 abc
def
ghi
1 2 xyz
asd
thanks
Use str.split and explode:
print (df.assign(Col2=df["Col2"].str.split(","))
.explode("Col2"))
col1 Col2
0 1 abc
0 1 def
0 1 ghi
1 2 xyz
1 2 asd

How to update NaN values in a dataframe with a dictionary

I'm trying to replace NaN values in one column when the adjacent column starts with a string.
starts_with ={'65S':'test1','95S':test2,'20S':"test3"}
Here is the DF
15 65String NaN
16 95String NaN
17 20String NaN
I would like it to look like
15 65String test1
16 95String test2
17 20String test3
Using findall find the key in dict , the just map back the dict's value
df.c3=df.c2.str.findall('|'.join(starts_with.keys())).str[0].map(starts_with)
df
Out[110]:
c1 c2 c3
0 15 65String test1
1 16 95String test2
2 17 20String test3
Use Series.str.extract with ^ for match start of string with Series.map by dictionary:
starts_with ={'65S':'test1','95S':'test2','20S':"test3"}
pat = '|'.join(r"^{}".format(x) for x in starts_with)
df['C'] = df['B'].str.extract('(' + pat + ')', expand=False).map(starts_with)
print (df)
A B C
0 15 65String test1
1 16 95String test2
2 17 20String test3

How to flatten multiple dictionary objects inside a list in pandas column?

I have a df that with a column that looks like this:
col1
[{'value':'2019-02-02'},{'value':'test1'},{'value':'test2'},{'value':'test3'},{'value':'test4'},{'value':'test5'},{'value':'test6'}]
How do I flatten this column so it looks like this:
col1 col2
value 2019-02-02
value test1
value test2
value test3
value test4
value test5
value test6
I am trying this:
df['Col1'] = [x[0]['value'] for x in df['Col1']]
This is only picking the first value in all the nested dictionary and makign the df like this:
col1
2019-02-02
2019-02-02
2019-02-02
2019-02-02
Do with tolist then we need concat with each pd.DataFrame(sublist)
pd.concat([pd.DataFrame(sublist) for sublist in df.col1.tolist()]).stack().reset_index(level=1)
Out[22]:
level_1 0
0 value 2019-02-02
1 value test1
2 value test2
3 value test3
4 value test4
5 value test5
6 value test6
0 value 2019-02-02
1 value test1
2 value test2
3 value test3
4 value test4
5 value test5
6 value test6

How to prevent df.pivot from inserting 'None' and blank rows during pivot?

I am wanting to pivot a df that looks like this:
columns values
col1 test1
col2 test2
col3 test3
col4 test4
col1 test5
col2 test6
col3 test7
col4 test8
I am trying this:
df['index'] = df.index
df = df.pivot(index='index', columns='columns', values='values')
which results in a df that looks like this (roughly):
col1 col2 col3 col4
None None test1 None
test5 None None None
How do I pivot the df to look like this?:
col1 col2 col3 col4
test1 test2 test3 test4
test5 test6 test7 test8
I am creating an articfical index column because I dont have another column to make an index. I only have 2 columns in the dataframe.
Using cumcount create a new key then do pivot
df.assign(key=df.groupby('columns').cumcount()).pivot('key','columns','values')
Out[54]:
columns col1 col2 col3 col4
key
0 test1 test2 test3 test4
1 test5 test6 test7 test8

Resources