How can I replace the column name in a panda dataframe - python-3.x

I have a excel file as
Old_name new_name
xyz abc
opq klm
And I have my dataframe as like this
Id timestamp xyz opq
1 04-10-2021 3 4
2 05-10-2021 4 9
As you see I have my old names as column name and I would like to map and replace them with new name as in my csv file. How can I do that?

Try with rename:
df.rename(columns=col_names.set_index('Old_name')['new_name'], inplace=True)
# verify
print(df)
Output:
Id timestamp abc klm
0 1 04-10-2021 3 4
1 2 05-10-2021 4 9

Related

How to group rows in pandas with sum in the certain column

Given a DataFrame like this:
A
B
C
D
0
ABC
unique_ident_1
10
ONE
1
KLM
unique_ident_2
2
TEN
2
KLM
unique_ident_2
7
TEN
3
XYZ
unique_ident_3
2
TWO
3
ABC
unique_ident_1
8
ONE
3
XYZ
unique_ident_3
-5
TWO
where column "B" contains a unique text identifier, columns "A" and "D" contain some constant texts dependent from unique id, and column C has a quantity. I want to group rows by unique identifiers (col "B") with quantity column summed up by ident:
A
B
C
D
0
ABC
unique_ident_1
18
ONE
1
KLM
unique_ident_2
9
TEN
2
XYZ
unique_ident_3
-3
TWO
How can I get this result with pandas?
use named tuples with a groupby.
df1 = df.groupby('B',as_index=False).agg(
A=('A','first'),
C=('C','sum'),
D=('D','first')
)[df.columns]
A B C D
0 ABC unique_ident_1 18 ONE
1 KLM unique_ident_2 9 TEN
2 XYZ unique_ident_3 -3 TWO
You can also create a dictionary and then group incase you have many columns:
agg_d = {col:'sum' if col=='C' else'first' for col in df.columns}
out = df.groupby('B').agg(agg_d).reset_index(drop=True)
print(out)
A B C D
0 ABC unique_ident_1 18 ONE
1 KLM unique_ident_2 9 TEN
2 XYZ unique_ident_3 -3 TWO

How do I compare a dataframe column with another dataframe and create a column

I have two dataframes df1 and df2. Here is a small sample
Days
4
6
9
1
4
My df2 is
Day1 Day2 Alphabets
2 5 abc
4 7 bcd
8 10 ghi
10 12 abc
I want to change my df1 such that it has new column Alphabets from df2 if the days in df1 is between day1 and day2. Something like:
if df1['Days'] in between df2['Day1'] and df2['Day2']:
df1['Alphabets']=df2['Alphabets']
Result is:
Days Alphabets
4 abc
6 bcd
9 ghi
etc.
I tried for loop and its taking a lot of time even to run. Is there any other elegant way to do?
Thanks in advance
I will use numpy broadcast
s1=df2.Day1.values
s2=df2.Day2.values
s=df1.Days.values[:,None]
df1['V']=((s-s1>0)&(s-s2<0)).dot(df2.Alphabets)
df1
Out[277]:
Days V
0 4 abc
1 6 bcd
2 9 ghi
3 1
4 4 abc

Pandas: Sort a dataframe based on multiple columns

I know that this question has been asked several times. But none of the answers match my case.
I've a pandas dataframe with columns,department and employee_count. I need to sort the employee_count column in descending order. But if there is a tie between 2 employee_counts then they should be sorted alphabetically based on department.
Department Employee_Count
0 abc 10
1 adc 10
2 bca 11
3 cde 9
4 xyz 15
required output:
Department Employee_Count
0 xyz 15
1 bca 11
2 abc 10
3 adc 10
4 cde 9
This is what I've tried.
df = df.sort_values(['Department','Employee_Count'],ascending=[True,False])
But this just sorts the departments alphabetically.
I've also tried to sort by Department first and then by Employee_Count. Like this:
df = df.sort_values(['Department'],ascending=[True])
df = df.sort_values(['Employee_Count'],ascending=[False])
This doesn't give me correct output either:
Department Employee_Count
4 xyz 15
2 bca 11
1 adc 10
0 abc 10
3 cde 9
It gives 'adc' first and then 'abc'.
Kindly help me.
You can swap columns in list and also values in ascending parameter:
Explanation:
Order of columns names is order of sorting, first sort descending by Employee_Count and if some duplicates in Employee_Count then sorting by Department only duplicates rows ascending.
df1 = df.sort_values(['Employee_Count', 'Department'], ascending=[False, True])
print (df1)
Department Employee_Count
4 xyz 15
2 bca 11
0 abc 10 <-
1 adc 10 <-
3 cde 9
Or for test if use second False then duplicated rows are sorting descending:
df2 = df.sort_values(['Employee_Count', 'Department',],ascending=[False, False])
print (df2)
Department Employee_Count
4 xyz 15
2 bca 11
1 adc 10 <-
0 abc 10 <-
3 cde 9

New column with in a Pandas Dataframe with respect to duplicates in given column

Hi i have a dataframe with a column "id" like below
id
abc
def
ghi
abc
abc
xyz
def
I need a new column "id1" with a number 1 appended to it and number should be incremented for every duplicate. output should be like below.
id id1
abc abc1
def def1
ghi ghi1
abc abc2
abc abc3
xyz xyz1
def def2
Can anyone suggest me a solution for this?
Use groupby.cumcount for count ids, add 1 and convert to strings:
df['id1'] = df['id'] + df.groupby('id').cumcount().add(1).astype(str)
print (df)
id id1
0 abc abc1
1 def def1
2 ghi ghi1
3 abc abc2
4 abc abc3
5 xyz xyz1
6 def def2
Detail:
print (df.groupby('id').cumcount())
0 0
1 0
2 0
3 1
4 2
5 0
6 1
dtype: int64

dataframe transformation python

I am new to pandas. I have dataframe,df with 3 columns:(date),(name) and (count).
Given each day: is there an easy way to create a new dataframe from original one that contains new columns representing the unique names in the original (name column) and their respective count values in the correct columns?
date name count
0 2017-08-07 ABC 12
1 2017-08-08 ABC 5
2 2017-08-08 TTT 6
3 2017-08-09 TAC 5
4 2017-08-09 ABC 10
It should now be
date ABC TTT TAC
0 2017-08-07 12 0 0
1 2017-08-08 5 6 0
3 2017-08-09 10 0 5
df = pd.DataFrame({"date":["2017-08-07","2017-08-08","2017-08-08","2017-08-09","2017-08-09"],"name":["ABC","ABC","TTT","TAC","ABC"], "count":["12","5","6","5","10"]})
df = df.pivot(index='date', columns='name', values='count').reset_index().fillna(0)

Resources