Add columns - specify name - row value based on other - python-3.x

I have a Dataframe like this:
data = {'TYPE':['X', 'Y', 'Z'],'A': [11,12,13], 'B':[21,22,23], 'C':[31,32,34]}
df = pd.DataFrame(data)
TYPE A B C
0 X 11 21 31
1 Y 12 22 32
2 Z 13 23 34
I like to get the following DataFrame:
TYPE A A_added B B_added C C_added
0 X 11 15 21 25 31 35
1 Y 12 18 22 28 32 38
2 Z 13 20 23 30 34 40
For each column (next to TYPE column), here A,B,C:
add a new column with the name column_name_added
if TYPE = X add 4, if TYPE = Y add 6, if Z add 7

Idea is multiple values by helper Series created by Series.map with dictionary with DataFrame.add, add to original by DataFrame.join and last change order of columns by DataFrame.reindex:
d = {'X':4,'Y':6, 'Z':7}
cols = df.columns[:1].tolist() + [i for x in df.columns[1:] for i in (x, x + '_added')]
df1 = df.iloc[:, 1:].add(df['TYPE'].map(d), axis=0, fill_value=0).add_suffix('_added')
df2 = df.join(df1).reindex(cols, axis=1)
print (df2)
TYPE A A_added B B_added C C_added
0 X 11 15 21 25 31 35
1 Y 12 18 22 28 32 38
2 Z 13 20 23 30 34 41
EDIT: For values not matched dictionary are created missing values, so if add Series.fillna it return value 7 for all another values:
d = {'X':4,'Y':6}
cols = df.columns[:1].tolist() + [i for x in df.columns[1:] for i in (x, x + '_added')]
df1 = df.iloc[:, 1:].add(df['TYPE'].map(d).fillna(7).astype(int), axis=0).add_suffix('_added')
df2 = df.join(df1).reindex(cols, axis=1)
print (df2)
TYPE A A_added B B_added C C_added
0 X 11 15 21 25 31 35
1 Y 12 18 22 28 32 38
2 Z 13 20 23 30 34 41

Related

Creating a list from series of pandas

Click here for the imageI m trying to create a list from 3 different series which will be of the shape "({A} {B} {C})" where A denotes the 1st element from series 1, B is for 1st element from series 2, C is for 1st element from series 3 and this way it should create a list containing 600 element.
List 1 List 2 List 3
u_p0 1 v_p0 2 w_p0 7
u_p1 21 v_p1 11 w_p1 45
u_p2 32 v_p2 25 w_p2 32
u_p3 45 v_p3 76 w_p3 49
... .... ....
u_p599 56 v_p599 78 w_599 98
Now I want the output list as follows
(1 2 7)
(21 11 45)
(32 25 32)
(45 76 49)
.....
These are the 3 series I created from a dataframe
r1=turb_1.iloc[qw1] #List1
r2=turb_1.iloc[qw2] #List2
r3=turb_1.iloc[qw3] #List3
Pic of the seriesFor the output I think formatted string python method will be useful but I m quite not sure how to proceed.
turb_3= ["({A} {B} {C})".format(A=i,B=j,C=k) for i in r1 for j in r2 for k in r3]
Any kind of help will be useful.
Use pandas.DataFrame.itertuples with str.format:
# Sample data
print(df)
col1 col2 col3
0 1 2 7
1 21 11 45
2 32 25 32
3 45 76 49
fmt = "({} {} {})"
[fmt.format(*tup) for tup in df[["col1", "col2", "col3"]].itertuples(False, None)]
Output:
['(1 2 7)', '(21 11 45)', '(32 25 32)', '(45 76 49)']

Replace values in Columns

I want to replace values in columns using if loop:
If value in column [D] is not same as any values in [A,B,C] then replace column with first NaN with D, and if there is no NaN in a row, create a new column [E] and add value from column [D] in column [E].
ID A B C D
0 22 32 NaN 22
1 25 13 NaN 15
2 27 NaN NaN 20
3 29 10 16 29
4 12 92 33 55
I want output to be:
ID A B C D E
0 22 32 NaN 22
1 25 13 15 15
2 27 20 NaN 20
3 29 10 16 29
4 12 92 33 55 55
List = [[22 , 32 , None , 22],
[25 , 13 , None , 15],
[27 , None , None , 20],
[29 , 10 , 16 , 29],
[12 , 92 , 33 , 55]]
for Row in List:
Target_C = Row[3]
if Row.count(Target_C) < 2: # If there is no similar condetion pass
None_Found = False # Small bool to check later if there is no None !
for enumerate_Column in enumerate(Row): # get index for each list
if(None in enumerate_Column): # if there is None gin the row
Row[enumerate_Column[0]] = Target_C # replace None with column D
None_Found = True # Change None_Found to True
if(None_Found): # Break the loop if found None
break
if(None_Found == False): # if you dont found None add new clulmn
Row.append(Target_C)
My Code example
You can do it this way
a = df.isnull()
b = (a[a.any(axis=1)].idxmax(axis=1))
nanindex = b.index
check = (df.A!=df.D) & (df.B!=df.D) & (df.C!=df.D)
commonind = check[~check].index
replace_ind_list = list(nanindex.difference(commonind))
new_col_list = df.index.difference(list(set(commonind.tolist()+nanindex.tolist()))).tolist()
df['E']=''
for index, row in df.iterrows():
for val in new_col_list:
if index == val:
df.at[index,'E'] = df['D'][index]
for val in replace_ind_list:
if index == val:
df.at[index,b[val]] = df['D'][index]
df
Output
ID A B C D E
0 0 22 32.0 NaN 22
1 1 25 13.0 15.0 15
2 2 27 20.0 NaN 20
3 3 29 10.0 16.0 29
4 4 12 92.0 33.0 55 55

Pandas - Fill N rows for a specific column with a integer value and increment the integer there after

I have a dataframe to which I added say a column named col_1. I want to add integer values to that column starting from the first row that increment after every 4th row. So the new resulting column should have values of as such.
col_1
1
1
1
1
2
2
2
2
The current approach I have is a very brute force one:
for x in range(len(df)):
if x <= 3:
df['col_1'][x] = 1
if x >3 and x <= 7:
df['col_1'][x] = 2
This might work for something small but when moving to something larger it will chew up a lot of time.
If there si default RangeIndex you can use integer division with add 1:
df['col_1'] = df.index // 4 + 1
Or for general solution use helper array by lenght of DataFrame:
df['col_1'] = np.arange(len(df)) // 4 + 1
For repeat 1 and 2 pattern use also modulo by 2 like:
df = pd.DataFrame({'a':range(20, 40)})
df['col_1'] = (np.arange(len(df)) // 4) % 2 + 1
print (df)
a col_1
0 20 1
1 21 1
2 22 1
3 23 1
4 24 2
5 25 2
6 26 2
7 27 2
8 28 1
9 29 1
10 30 1
11 31 1
12 32 2
13 33 2
14 34 2
15 35 2
16 36 1
17 37 1
18 38 1
19 39 1

Remove index from dataframe using Python

I am trying to create a Pandas Dataframe from a string using the following code -
import pandas as pd
input_string="""A;B;C
0;34;88
2;45;200
3;47;65
4;32;140
"""
data = input_string
df = pd.DataFrame([x.split(';') for x in data.split('\n')])
print(df)
I am getting the following result -
0 1 2
0 A B C
1 0 34 88
2 2 45 200
3 3 47 65
4 4 32 140
5 None None
But I need something like the following -
A B C
0 34 88
2 45 200
3 47 65
4 32 140
I added "index = False" while creating the dataframe like -
df = pd.DataFrame([x.split(';') for x in data.split('\n')],index = False)
But, it gives me an error -
TypeError: Index(...) must be called with a collection of some kind, False
was passed
How is this achievable?
Use read_csv with StringIO and index_col parameetr for set first column to index:
input_string="""A;B;C
0;34;88
2;45;200
3;47;65
4;32;140
"""
df = pd.read_csv(pd.compat.StringIO(input_string),sep=';', index_col=0)
print (df)
B C
A
0 34 88
2 45 200
3 47 65
4 32 140
Your solution should be changed with split by default parameter (arbitrary whitespace), pass to DataFrame all values of lists without first with columns parameter and if need first column to index add DataFrame.set_axis:
L = [x.split(';') for x in input_string.split()]
df = pd.DataFrame(L[1:], columns=L[0]).set_index('A')
print (df)
B C
A
0 34 88
2 45 200
3 47 65
4 32 140
For general solution use first value of first list in set_index:
L = [x.split(';') for x in input_string.split()]
df = pd.DataFrame(L[1:], columns=L[0]).set_index(L[0][0])
EDIT:
You can set column name instead index name to A value:
df = df.rename_axis(df.index.name, axis=1).rename_axis(None)
print (df)
A B C
0 34 88
2 45 200
3 47 65
4 32 140
import pandas as pd
input_string="""A;B;C
0;34;88
2;45;200
3;47;65
4;32;140
"""
data = input_string
df = pd.DataFrame([x.split(';') for x in data.split()])
df.columns = df.iloc[0]
df = df.iloc[1:].rename_axis(None, axis=1)
df.set_index('A',inplace = True)
df
output
B C
A
0 34 88
2 45 200
3 47 65
4 32 140

how to add a new column in dataframe which divides multiple columns and finds the maximum value

This maybe real simple solution but I am new to python 3 and I have a dataframe with multiple columns. I would like to add a new column to the existing dataframe - which does the following calculation i.e.
New Column = Max((Column A/Column B), (Column C/Column D), (Column E/Column F))
I can do a max based on the following code but wanted to check how can I do div alongwith it.
df['Max'] = df[['Column A','Column B','Column C', 'Column D', 'Column E', 'Column F']].max(axis=1)
Column A Column B Column C Column D Column E Column F Max
3600 36000 22 11 3200 3200 36000
2300 2300 13 26 1100 1200 2300
1300 13000 15 33 1000 1000 13000
Thanks
You can div the df by itself by slicing the columns in steps and then take the max:
In [105]:
df['Max'] = df.ix[:,df.columns[::2]].div(df.ix[:,df.columns[1::2]].values, axis=1).max(axis=1)
df
Out[105]:
Column A Column B Column C Column D Column E Column F Max
0 3600 36000 22 11 3200 3200 2
1 2300 2300 13 26 1100 1200 1
2 1300 13000 15 33 1000 1000 1
Here are the intermediate values:
In [108]:
df.ix[:,df.columns[::2]].div(df.ix[:,df.columns[1::2]].values, axis=1)
Out[108]:
Column A Column C Column E
0 0.1 2.000000 1.000000
1 1.0 0.500000 0.916667
2 0.1 0.454545 1.000000
You can try something like as follows
df['Max'] = df.apply(lambda v: max(v['A'] / v['B'].astype(float), v['C'] / V['D'].astype(float), v['E'] / v['F'].astype(float)), axis=1)
Example
In [14]: df
Out[14]:
A B C D E F
0 1 11 1 11 12 98
1 2 22 2 22 67 1
2 3 33 3 33 23 4
3 4 44 4 44 11 10
In [15]: df['Max'] = df.apply(lambda v: max(v['A'] / v['B'].astype(float), v['C'] /
v['D'].astype(float), v['E'] / v['F'].astype(float)), axis=1)
In [16]: df
Out[16]:
A B C D E F Max
0 1 11 1 11 12 98 0.122449
1 2 22 2 22 67 1 67.000000
2 3 33 3 33 23 4 5.750000
3 4 44 4 44 11 10 1.100000

Resources