python : How to delete the column which contains specific word or string like 'No data' i.e. removing columns which has any specific string - python-3.x

Rat Rat 1 Rat 2
0 jkl No jkl
1 ghj No jkhj
2 mmm No jkl
3 klj No oo
4 kkk No jkk
I want to remove the column which has string 'No' and the output should have only rows Rat and Rat 2

Related

How can I replace the column name in a panda dataframe

I have a excel file as
Old_name new_name
xyz abc
opq klm
And I have my dataframe as like this
Id timestamp xyz opq
1 04-10-2021 3 4
2 05-10-2021 4 9
As you see I have my old names as column name and I would like to map and replace them with new name as in my csv file. How can I do that?
Try with rename:
df.rename(columns=col_names.set_index('Old_name')['new_name'], inplace=True)
# verify
print(df)
Output:
Id timestamp abc klm
0 1 04-10-2021 3 4
1 2 05-10-2021 4 9

Joining column of different rows in pandas

If i have a dataframe and i want to merge ID column based on the Name column without deleting any row.
How would i do this?
Ex-
Name
ID
John
ABC
John
XYZ
Lucy
MNO
I want to convert the above dataframe into the below one
Name
ID
John
ABC, XYZ
John
ABC, XYZ
Lucy
MNO
Use GroupBy.transform with join:
df['ID'] = df.groupby('Name')['ID'].transform(', '.join)
print (df)
Name ID
0 John ABC, XYZ
1 John ABC, XYZ
2 Lucy MNO

How to group rows in pandas with sum in the certain column

Given a DataFrame like this:
A
B
C
D
0
ABC
unique_ident_1
10
ONE
1
KLM
unique_ident_2
2
TEN
2
KLM
unique_ident_2
7
TEN
3
XYZ
unique_ident_3
2
TWO
3
ABC
unique_ident_1
8
ONE
3
XYZ
unique_ident_3
-5
TWO
where column "B" contains a unique text identifier, columns "A" and "D" contain some constant texts dependent from unique id, and column C has a quantity. I want to group rows by unique identifiers (col "B") with quantity column summed up by ident:
A
B
C
D
0
ABC
unique_ident_1
18
ONE
1
KLM
unique_ident_2
9
TEN
2
XYZ
unique_ident_3
-3
TWO
How can I get this result with pandas?
use named tuples with a groupby.
df1 = df.groupby('B',as_index=False).agg(
A=('A','first'),
C=('C','sum'),
D=('D','first')
)[df.columns]
A B C D
0 ABC unique_ident_1 18 ONE
1 KLM unique_ident_2 9 TEN
2 XYZ unique_ident_3 -3 TWO
You can also create a dictionary and then group incase you have many columns:
agg_d = {col:'sum' if col=='C' else'first' for col in df.columns}
out = df.groupby('B').agg(agg_d).reset_index(drop=True)
print(out)
A B C D
0 ABC unique_ident_1 18 ONE
1 KLM unique_ident_2 9 TEN
2 XYZ unique_ident_3 -3 TWO

New column with in a Pandas Dataframe with respect to duplicates in given column

Hi i have a dataframe with a column "id" like below
id
abc
def
ghi
abc
abc
xyz
def
I need a new column "id1" with a number 1 appended to it and number should be incremented for every duplicate. output should be like below.
id id1
abc abc1
def def1
ghi ghi1
abc abc2
abc abc3
xyz xyz1
def def2
Can anyone suggest me a solution for this?
Use groupby.cumcount for count ids, add 1 and convert to strings:
df['id1'] = df['id'] + df.groupby('id').cumcount().add(1).astype(str)
print (df)
id id1
0 abc abc1
1 def def1
2 ghi ghi1
3 abc abc2
4 abc abc3
5 xyz xyz1
6 def def2
Detail:
print (df.groupby('id').cumcount())
0 0
1 0
2 0
3 1
4 2
5 0
6 1
dtype: int64

Arranging column2 based on column1 in excel

I have two column like this
column 1 column 2
abc ddd def ghij
abc ghi jkl dddd
bbc mno qrst wxyz
As you can see column 2 is having few letters from column 1. I need to sort column 2 based on the occurrence of this letters. So my column will be looking like
column 1 column 2
abc ddd def dddd
abc ghi jkl ghij
bbc mno qrst mnoq
How to sort this ?
With a couple of temporary helper columns, say:
in C1: =MID(A1,5,3)
in D1: =MATCH(LEFT(B1,3),C:C)
and both formulae copied down you might sort B:D in order ascending for ColumnD.

Resources