how to split name in dataframe's column python3 - python-3.x

I have a dataframe:
import pandas as pd
d = {'A-101': ['1','2','4'], 'B-102':
['5','7','8'],'A-102': ['1','2','9']}
df = pd.DataFrame(data=d)
which is
A-101 B-102 A-102
0 1 5 1
1 2 7 2
2 4 8 9
how to change above into the follow:
company number '0' '1' '2'
A 101 1 2 4
B 102 5 7 8
A 102 1 2 9
Here I wan to split the column name A-101 into two columns and transpose the column into a row with name column name '0', '1', '2'

Solution
output = pd.DataFrame(columns=['company', 'number', '0', '1', '2'])
output['company'] = [col.split('-')[0] for col in df.columns]
output['number'] = [col.split('-')[1] for col in df.columns]
output['0'] = df.iloc[0].values
output['1'] = df.iloc[1].values
output['2'] = df.iloc[2].values

Related

How to map/replace multiple values in a column for each row in pandas dataframe

I have this sample
col1 result
1 A
1,2,3
2 B
2,3,4
3,4
4 D
1,3,4
3 C
Here's my map variable.
vals_to_replace = {'1':'A', '2':'B', '3':'C' , '4':'D'}
I map this to col1, and only getting some values from the col result, not sure why why single value got mapped only.
Any ideas on how to solve it?
Thanks
Maybe this is what works for you:
import pandas as pd
df = pd.DataFrame({'col1': ['1', '1,2,3', '2', '2,3,4', '3, 4', '4', '1,3,4', '3']})
translation = {'1':'A', '2':'B', '3':'C' , '4':'D'}
df['result'] = df.col1.str.translate(str.maketrans(translation))
print(df)
Result:
col1 result
0 1 A
1 1,2,3 A,B,C
2 2 B
3 2,3,4 B,C,D
4 3, 4 C, D
5 4 D
6 1,3,4 A,C,D
7 3 C

How to compare a string of one column of pandas with rest of the columns and if value is found in any column of the row append a new row?

I want to compare the Category column with all the predicted_site and if value matches with anyone column, append a column named rank and insert 1 if value is found or else insert 0
Use DataFrame.filter for predicted columns compared by DataFrame.eq with Category column, convert to integers, change columns names by DataFrame.add_prefix and last add new columns by DataFrame.join:
df = pd.DataFrame({
'category':list('abcabc'),
'B':[4,5,4,5,5,4],
'predicted1':list('adadbd'),
'predicted2':list('cbarac')
})
df1 = df.filter(like='predicted').eq(df['category'], axis=0).astype(int).add_prefix('new_')
df = df.join(df1)
print (df)
category B predicted1 predicted2 new_predicted1 new_predicted2
0 a 4 a c 1 0
1 b 5 d b 0 1
2 c 4 a a 0 0
3 a 5 d r 0 0
4 b 5 b a 1 0
5 c 4 d c 0 1
This solution is much less elegant than that proposed by #jezrael, however you can try it.
#sample dataframe
d = {'cat': ['comp-el', 'el', 'comp', 'comp-el', 'el', 'comp'], 'predicted1': ['com', 'al', 'p', 'col', 'el', 'comp'], 'predicted2': ['a', 'el', 'p', 'n', 's', 't']}
df = pd.DataFrame(data=d)
#iterating through rows
for i, row in df.iterrows():
#assigning values
cat = df.loc[i,'cat']
predicted1 = df.loc[i,'predicted1']
predicted2 = df.loc[i,'predicted2']
#condition
if (cat == predicted1 or cat == predicted2):
df.loc[i,'rank'] = 1
else:
df.loc[i,'rank'] = 0
output:
cat predicted1 predicted2 rank
0 comp-el com a 0.0
1 el al el 1.0
2 comp p p 0.0
3 comp-el col n 0.0
4 el el s 1.0
5 comp comp t 1.0

Using Python3.x rename a column name of a dataframe with user provided new column name in console window

How to rename a column name of my DataFrame with the 'user provided new column name in the console screen'.
Say you have a dataframe like this
>>> df
A B C
0 1 9 1
1 4 7 2
2 3 7 6
3 6 1 6
4 8 1 9
And you want to rename the column B to that given by user:
>>> col_name = input("Enter Column Name: ")
Enter Column Name: test
>>> col_names = df.columns.tolist()
>>> col_names
['A', 'B', 'C']
>>> col_names[1] = col_name
>>> col_names
['A', 'test', 'C']
>>> df.columns = col_names
>>> df
A test C
0 1 9 1
1 4 7 2
2 3 7 6
3 6 1 6
4 8 1 9

Pandas dataframe merge by function on column names

I say to dataframes.
df_A has columns A__a, B__b, C. (shape 5,3)
df_B has columns A_a, B_b, D. (shape 4,3)
How can I unify them (without having to iterate over all columns) to get one df with columns A,B ? (shape 9,2) - meaning A__a and A_a should be unified to the same column.
I need to use merge with applying the function lambda x: x.replace("_",""). Is it possible?
import pandas as pd
df = pd.DataFrame(np.random.randint(0,5,size=(5, 3)), columns=['A__a', 'B__b', 'C'])
df:
A__a B__b C
0 3 0 2
1 0 3 4
2 0 4 4
3 4 2 1
4 3 4 3
df2:
df2 = pd.DataFrame(np.random.randint(0,4,size=(4, 3)), columns=['A__a', 'B__b', 'D'])
A__a B__b D
0 3 2 0
1 3 1 1
2 0 2 0
3 3 2 0
df3 = pd.concat([df, df2], join='inner', ignore_index=True)
df_final = df3.rename(lambda x: str(x).split("__")[0],axis='columns')
df_final
df_final:
A B
0 3 0
1 0 3
2 0 4
3 4 2
4 3 4
5 3 2
6 3 1
7 0 2
8 3 2
A simple concatenation will do
pd.concat([df_A, df_B], join='outer')[['A', 'B']].copy().
or
'pd.concat([df_A, df_B], join='inner')
You have to merge Dataframe using 'outer'
import pandas as pd
import numpy as np
df_A = pd.DataFrame(np.random.randint(10,size=(5,3)), columns=['A','B','C'])
df_B = pd.DataFrame(np.random.randint(10,size=(4,3)), columns=['A','B','D'])
print(df_A.shape,df_B.shape)
#(5, 3) (4, 3)
new_df = df_A.merge(df_B , how= 'outer', on = ['A','B'])[['A','B']]
print(new_df.shape)
#(9,2)
If you cant change the name of the columns in advance and you want to use lambda x: x.replace("_",""), this is a way:
df = pd.concat([df1.rename_axis(lambda x: str(x).replace("_",""),axis='columns'), df2.rename_axis(lambda x: str(x).replace("_",""),axis='columns')], join='inner', ignore_index=True)
Example:
d1 = {'A__a' : ('A', 'B', 'C', 'D', 'E') , 'B__b' : ('a', 'b', 'c', 'd', 'e') ,'C': (1,2,3,4,5)}
df1 = pd.DataFrame(d1)
A__a B__b C
0 A a 1
1 B b 2
2 C c 3
3 D d 4
4 E e 5
d2 = {'A_a' : ('B', 'C', 'D','G') , 'B_b' : ('l','m','n','o') ,'D': (6,7,8,9)}
df2=pd.DataFrame(d2)
A_a B_b D
0 B l 6
1 C m 7
2 D n 8
3 G o 9
Output:
Aa Bb
0 A a
1 B b
2 C c
3 D d
4 E e
5 B l
6 C m
7 D n
8 G o
Alternative with:
df = pd.concat([df1.rename(columns={'A__a':'A', 'B__b':'B'}), df2.rename(columns={'A_a':'A', 'B_b':'B'})], join='inner', ignore_index=True)

How to sum columns in pandas and add the result into a new row?

In this code I want to sum each column and add it as a new row.
It does the sum but it does not show the new row.
df = pd.DataFrame(g, columns=('AWA', 'REM', 'S1', 'S2'))
df['xSujeto'] = df.sum(axis=1)
xEstado = df.sum(axis=0)
df.append(xEstado, ignore_index=True)
df
I think you can use loc:
df = pd.DataFrame({'AWA':[1,2,3],
'REM':[4,5,6],
'S1':[7,8,9],
'S2':[1,3,5]})
#add 1 to last index value
print (df.index[-1] + 1)
3
df.loc[df.index[-1] + 1] = df.sum()
print (df)
AWA REM S1 S2
0 1 4 7 1
1 2 5 8 3
2 3 6 9 5
3 6 15 24 9
Or append from comment of Nickil Maveli:
xEstado = df.sum()
df = df.append(xEstado, ignore_index=True)
print (df)
AWA REM S1 S2
0 1 4 7 1
1 2 5 8 3
2 3 6 9 5
3 6 15 24 9

Resources