Check each value in a dataframe and if that is less means then change the value which is given - python-3.x

cateory Percentage
AB 99
CD 65
EF 12
GH 25
IJ 90
KL 100
If CD's percentage is less than 70 then change that as 71 else existing value is fine
If EF's percentage is less than 20 then change that as 21 else existing value is fine
If GH's percentage is less than 30 then change that as 45 else existing value is fine
For AB existing value is fine
Output
cateory Percentage
AB 99
CD 65
EF 21
GH 45
IJ 90
KL 100

Create list of tuples for replacement by compare both columns and if match replace by new value - last value of tuple:
L = [('CD', 70, 71), ('EF', 20, 21), ('GH', 30, 45)]
for cat, less, new in L:
m = df['cateory'].eq(cat) & df['Percentage'].lt(less)
df.loc[m, 'Percentage'] = new
print (df)
cateory Percentage
0 AB 99
1 CD 71
2 EF 21
3 GH 45
4 IJ 90
5 KL 100

Related

Data frame transformation using transposing and flatening

I have a data frame that looks like:
tdelta A B label
1 11 21 Lab1
2 24 45 Lab2
3 44 65 Lab3
4 77 22 Lab4
5 12 64 Lab5
6 39 09 Lab6
7 85 11 Lab7
8 01 45 Lab8
And I need to transform this dataset into:
For selected window: 4
A1 A2 A3 A4 B1 B2 B3 B4 L1 label
11 24 44 77 21 45 65 22 Lab1 Lab4
12 39 85 01 64 09 11 45 Lab5 Lab8
So based on the selected window - 'w', I need to transpose w rows with the first corresponding label as my X values and the corresponding last label as my Y value. here is what I have developed till now:
def data_process(data,window):
n=len(data)
A = pd.DataFrame(data['A'])
B = pd.DataFrame(data['B'])
lb = pd.DataFrame(data['lab'])
df_A = pd.concat([gsr.loc[i] for i in range(0,window)],axis=1).reset_index()
df_B = pd.concat([st.loc[i] for i in range(0,window)],axis=1).reset_index()
df_lb = pd.concat([lb.loc[0],axis=1).reset_index()
X = pd.concat([df_A,df_B,df_lab],axis=1)
Y = pd.DataFrame(data['lab']).shift(-window)
return X, Y
I think this works for only the first 'window' rows. I need it to work for my entire dataframe.
This is essentially a pivot, with a lot of cleaning up after the pivot. For the pivot to work we need to use integer and modulus division so that we can group the rows into windows of length w and figure out which column they then belong to.
# Number of rows to group together
w = 4
df['col'] = np.arange(len(df))%w + 1
df['i'] = np.arange(len(df))//w
# Reshape and flatten the MultiIndex
df = (df.drop(columns='tdelta')
.pivot(index='i', columns='col')
.rename_axis(index=None))
df.columns = [f'{x}{y}'for x,y in df.columns]
# Define these columns and remove the intermediate label columns.
df['L1'] = df['label1']
df['label'] = df[f'label{w}']
df = df.drop(columns=[f'label{i}' for i in range(1, w+1)])
print(df)
A1 A2 A3 A4 B1 B2 B3 B4 L1 label
0 11 24 44 77 21 45 65 22 Lab1 Lab4
1 12 39 85 1 64 9 11 45 Lab5 Lab8

Creating a list from series of pandas

Click here for the imageI m trying to create a list from 3 different series which will be of the shape "({A} {B} {C})" where A denotes the 1st element from series 1, B is for 1st element from series 2, C is for 1st element from series 3 and this way it should create a list containing 600 element.
List 1 List 2 List 3
u_p0 1 v_p0 2 w_p0 7
u_p1 21 v_p1 11 w_p1 45
u_p2 32 v_p2 25 w_p2 32
u_p3 45 v_p3 76 w_p3 49
... .... ....
u_p599 56 v_p599 78 w_599 98
Now I want the output list as follows
(1 2 7)
(21 11 45)
(32 25 32)
(45 76 49)
.....
These are the 3 series I created from a dataframe
r1=turb_1.iloc[qw1] #List1
r2=turb_1.iloc[qw2] #List2
r3=turb_1.iloc[qw3] #List3
Pic of the seriesFor the output I think formatted string python method will be useful but I m quite not sure how to proceed.
turb_3= ["({A} {B} {C})".format(A=i,B=j,C=k) for i in r1 for j in r2 for k in r3]
Any kind of help will be useful.
Use pandas.DataFrame.itertuples with str.format:
# Sample data
print(df)
col1 col2 col3
0 1 2 7
1 21 11 45
2 32 25 32
3 45 76 49
fmt = "({} {} {})"
[fmt.format(*tup) for tup in df[["col1", "col2", "col3"]].itertuples(False, None)]
Output:
['(1 2 7)', '(21 11 45)', '(32 25 32)', '(45 76 49)']

Diagonals in North East Direction - Time Limit Exceeded in Python 3.8.3

The program must accept an integer matrix of size RxC as the input. The program must print the integers in the diagonals in the North-East directions of the matrix in the seprate line as output.
Boundary:
2<=R,C<=100
Time Limit : 500ms
Example 1:
Input:
3 3
73 77 76
71 17 87
37 73 98
Output:
73
71 77
37 17 76
73 87
98
Example 2:
Input:
4 6
97 78 7 39 92 45
68 100 49 95 97 100
59 41 81 22 26 100
46 37 81 12 93 10
Output:
97
68 78
59 100 7
46 41 49 39
37 81 95 92
81 22 97 45
12 26 100
93 100
10
My Code:
row,col = map(int,input().split())
matrix = [list(map(int,input().split())) for i in range(row)]
# Redundancy of row and col
rep = []
for i in range(row):
for j in range(col):
b = []
for k in range(i,row):
if (j,k) not in rep:
b.append(matrix[k][j])
rep.append((j,k))
j-=1
if j<0:break
if len(b):print(*(b[::-1]))
My code works well but when the matrix is of size (100,100) it exceeds the given time limit, is there a way to reduce it. Thanks in advance
Note : No External Libraries should be used!
The trick here is to realize that because each number only appears in the solution once, so we really only need to evaluate each value once.
We can also see that each matrix will result in row + col - 1 number of North-East direction diagonals, which will help us.
# Original code
row,col = map(int,input().split())
# I won't turn them into ints, strings actually make it easier for my work
matrix = [input().split() for i in range(row)]
diagonals = [""] * (row + col - 1)
for i in range(row):
for j in range(col):
# determine which diagonal the number belongs to, and prepend it
diagonals[i + j] = "%s %s" % (matrix[i][j], diagonals[i + j])
# print out diagonals one at a time
for diagonal in diagonals: print(diagonal)
I never got the chance to run it, but this should give the general idea!
(new to SO, plz be nice :D)

A vectorized solution producing a new column in DataFrame that depends on conditions of existing columns and also the new column itself

My current dataframe data is as follows:
df=pd.DataFrame([[1.4,3.5,4.6],[2.8,5.4,6.4],[7.8,6.5,5.8]],columns=['t','i','m'])
t i m
0 14 35 46
1 28 54 64
2 28 34 64
3 78 65 58
My goal is to apply a vectorized operations on a df with a conditions as follows (pseudo code):
New column of answer starts with value of 1.
For row in df.itertuples():
if (m > i) & (answer in row-1 is an odd number):
answer in row = answer in row-1 + m
elif (m > i):
answer in row = answer in row-1 - m
else:
answer in row = answer in row-1
The desired output is as follows:
t i m answer
0 14 35 46 1
1 28 54 59 60
2 78 12 58 2
3 78 91 48 2
Any elegant solution would be appreciated.

Pandas: how to test that top-n-dataframe really results from original dataframe

I have a DataFrame, foo:
A B C D E
0 50 46 18 65 55
1 48 56 98 71 96
2 99 48 36 79 70
3 15 24 25 67 34
4 77 67 98 22 78
and another Dataframe, bar, which contains the greatest 2 values of each row of foo. All other values have been replaced with zeros, to create sparsity:
A B C D E
0 0 0 0 65 55
1 0 0 98 0 96
2 99 0 0 79 0
3 0 0 0 67 34
4 0 0 98 0 78
How can I test that every row in bar really contains the desired values?
One more thing: The solution should work with large DateFrames i.e. 20000 X 20000.
Obviously you can do that with looping and efficient sorting, but maybe a better way would be:
n = foo.shape[0]
#Test1:
#bar dataframe has original data except zeros for two values:
diff = foo - bar
test1 = ((diff==0).sum(axis=1) == 2) == n
#Test2:
#bar dataframe has 3 zeros on each line
test2 = ((bar==0).sum(axis=1) == 3) == n
#Test3:
#these 2 numbers that bar has are the max
bar2=bar.replace({0:pandas.np.nan(), inplace=True}
#the max of remaining values is smaller than the min of bar:
row_ok = (diff.max(axis=1) < bar.min(axis=1))
test3 = (ok.sum() == n)
I think this covers all cases, but haven't tested it all...

Resources