python3 modifying rows in a dataframe based on a condition

python3 modifying rows in a dataframe based on a condition - python-3.x

I have a dataframe something like
A B C
1 4 x
2 8 y
3 7 z
4 12 y
5 10 b
i need to modify column B based on condition something like
if B <= 5 then B = 1
if B > 5 and B <= 10 then B = 2
if B > 10 and B < 15 then B = 3
so that my dataframe becomes
A B C
1 1 x
2 2 y
3 2 z
4 3 y
5 2 b
i am okay if I have to add a new column first and then drop column B. Could anyone help please?

You should use the apply function to implement this.
def check(row):
if (row['B']) <= 5:
return 1
elif (row['B'] > 5) and (row['B'] <= 10):
return 2
elif (row['B'] > 10) and (row['B'] <= 15):
return 3
These would apply the function to each row and then you can perform the checks.
df['B'] = df.apply(check, axis = 1)
Then the resulting DF would look like:
A B C
1 1 x
2 2 y
3 2 z
4 3 y
5 2 b
More documentation available here.

Related

How to fill a matrix with equal sum of rows and columns?

I have a N x N matrix with integer elements.
We have two inputs : n and k.
There is two condition for solving this problem:
1- sum of matrix's columns and rows should be equal to k.
2- Difference between max and min numbers in matrix should be minimum.
I wrote a code in python but it doesn't work well.
n , k = map(int,input().split())
matrix = [[k//n]*n for i in range(n)]
def row_sum(matrix,row):
return sum(matrix[row])
def col_sum(matrix,col):
res = 0
for i in matrix:
res += i[col]
return res
for i in range(n):
for j in range(n):
if (row_sum(matrix,i) != k) and (col_sum(matrix, j) != k):
matrix[i][j] += 1
for i in matrix:
print(*i)
for example we have a 5x5 matrix that sum of its columns and rows should be equal to 6:
input : 5 6
output :
2 1 1 1 1
1 2 1 1 1
1 1 2 1 1
1 1 1 2 1
1 1 1 1 2
but it doesn't work well:
input : 6 11
output:
2 2 2 2 2 1
2 2 2 2 2 1
2 2 2 2 2 1
2 2 2 2 2 1
2 2 2 2 2 1
1 1 1 1 1 2
I spend a lot of time on this and i can't solve it. Please Help!
(This problem is not a homework or something like that. It's a question from an algorithm contest and the contest is over!)

The solution is to work out the first row (using the code you already have), and then set each row to be the row above it rotated one position.
So for example if the first row has the values
a b c d e
then you rotate one position each row to get
a b c d e
b c d e a
c d e a b
d e a b c
e a b c d
Since each value gets placed in each column once the columns will contain one of each value and so add up to the same total, and since each row has the same values just moved around all the rows add up the same too.
Code:
n , k = map(int,input().split())
matrix = [[k//n]*n for i in range(n)]
def row_sum(matrix,row):
return sum(matrix[row])
for j in range(n):
if (row_sum(matrix,0) != k):
matrix[0][j] += 1
for i in range(1, n):
for j in range(n):
matrix[i][j] = matrix[i-1][(j+1)%n]
for i in matrix:
print(*i)

Do I use a loop, df.melt or df.explode to achieve a flattened dataframe?

Can anyone help with some code that will achieve the following transformation? I have tried variations of df.melt, df.explode, and also a looping statement but only get error statements. I think it might need nesting but don't have the experience to do so.
index A B C D
0 X d 4 2
1 Y b 5 2
Where column D represents frequency of column C.
desired output is:
index A B C
0 X d 4
1 X d 4
2 Y b 5
3 Y b 5

If you want to repeat rows, why not use index.repeat?
import pandas as pd
#recreate the sample dataframe
df = pd.DataFrame({"A":["X","Y"],"B":["d","b"],"C":[4,5],"D":[3,2]}, columns=list("ABCD"))
df = df.reindex(df.index.repeat(df["D"])).drop("D", 1).reset_index(drop=True)
print(df)
Sample output
A B C
0 X d 4
1 X d 4
2 X d 4
3 Y b 5
4 Y b 5

pandas transform one row into multiple rows

I have a dataframe as below.
My dataframe as below.
ID list
1 a, b, c
2 a, s
3 NA
5 f, j, l
I need to break each items in the list column(String) into independent row as below:
ID item
1 a
1 b
1 c
2 a
2 s
3 NA
5 f
5 j
5 l
Thanks.

Use str.split to separate your items then explode:
print (df.assign(list=df["list"].str.split(", ")).explode("list"))
ID list
0 1 a
0 1 b
0 1 c
1 2 a
1 2 s
2 3 NaN
3 5 f
3 5 j
3 5 l

A beginners approach : Just another way of doing the same thing using pd.DataFrame.stack
df['list'] = df['list'].map(lambda x : str(x).split(','))
dfOut = pd.DataFrame(df['list'].values.tolist())
dfOut.index = df['ID']
dfOut = dfOut.stack().reset_index()
del dfOut['level_1']
dfOut.rename(columns = {0 : 'list'}, inplace = True)
Output:
ID list
0 1 a
1 1 b
2 1 c
3 2 a
4 2 s
5 3 nan
6 5 f
7 5 j
8 5 l

Column label of max in pandas

I am trying to extract maximum value in row and contributing column label from pandas dataframe. For example,
A B C D
index
x 0 1 2 3
y 3 2 1 0
I expect the following output,
A B C D Maxv Con
index
x 0 1 2 3 3 D
y 3 2 1 0 3 A
I tried the following,
df['Maxv'] = df.apply(max,axis=1)
df['Con'] = df.idxmax(axis='rows')
It returned only the max column and 'NaN' for Con column. What is the error here?
Thanks in Advance.
AP

Need axis='columns' or axis=1 in DataFrame.idxmax:
df['Con'] = df.idxmax(axis='columns')
print (df)
A B C D Maxv Con
index
x 0 1 2 3 3 D
y 3 2 1 0 3 A
Or:
df['Con'] = df.idxmax(axis=1)
print (df)
A B C D Maxv Con
index
x 0 1 2 3 3 D
y 3 2 1 0 3 A
You get NaNs, because data are not align to index:
print (df.idxmax(axis='rows'))
A y
B y
C x
D x
dtype: object

Pandas Pivot Table Slice Off Level 0 of Index

Given the following data frame and pivot table:
df=pd.DataFrame({'A':['a','a','a','a','a','b','b','b','b'],
'B':['x','y','z','x','y','z','x','y','z'],
'C':['a','b','a','b','a','b','a','b','a'],
'D':[7,5,3,4,1,6,5,3,1]})
table = pd.pivot_table(df, index=['A', 'B','C'],aggfunc='sum')
table
D
A B C
a x a 7
b 4
y a 1
b 5
z a 3
b x a 5
y b 3
z a 1
b 6
I want the pivot table exactly how it is, minus index level 0, like this:
D
B C
x a 7
b 4
y a 1
b 5
z a 3
x a 5
y b 3
z a 1
b 6
Thanks in advance!

You can selectively drop an index level using reset_index with param drop=True:
In [95]:
table.reset_index('A', drop=True)
Out[95]:
D
B C
x a 7
b 4
y a 1
b 5
z a 3
x a 5
y b 3
z a 1
b 6

You can use droplevel on index:
table.index = table.index.droplevel(0).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

python3 modifying rows in a dataframe based on a condition - python-3.x

Related

How to fill a matrix with equal sum of rows and columns?

Do I use a loop, df.melt or df.explode to achieve a flattened dataframe?

pandas transform one row into multiple rows

Column label of max in pandas

Pandas Pivot Table Slice Off Level 0 of Index

Categories

Resources