Normalizing the data using python - python-3.x

I have the below data
df1
Hema shiva Ishan
0 22 30 33
1 34 32 21
2 20 12 14
3 26 14 18
4 12 28 17
5 30 11 22
6 18 15 18
7 19 18 19
8 22 20 32
I wanted to take ratio of first column value with rest of the columns , eg first column should divide by 22 , 2nd column 30 and 3rd columns by 33 .
The answer is below .
Please help me if I missing something

Just divide the first row by the DF:
df.iloc[0] / df

Related

Sort rows by row value (top to bottom)

There is lotto draw (5 numbers) on each row. I have formula which calculates the most frequient numbers with their number of draws. Is it possible in end result to sort same number of draws results by row value. This means that if number is drawn on top rows will have grater value than those on bottom rows. Considering number of row to be a value. How is that possible?
Formula used:
=LET(flatten, TEXTSPLIT(TEXTJOIN(";",,A1:F27),,";"), numUq, UNIQUE(flatten), matches, XMATCH(flatten,numUq),SORT(HSTACK(numUq, DROP(FREQUENCY(matches, UNIQUE(matches)),-1)),2,-1))
In the example screenshot number 35 and number 13 have equal draws count, but 13 should be before 35.
Data:
A
B
C
D
E
F
18
35
31
13
37
10
43
47
36
13
6
19
6
12
6
35
14
1
43
24
45
7
21
16
37
39
44
24
12
40
39
8
34
28
49
46
27
44
15
46
45
12
22
0
10
5
28
28
4
7
23
6
44
41
30
22
47
13
29
29
37
9
26
44
39
10
30
17
21
20
41
22
43
35
0
22
13
9
14
22
42
20
32
21
13
38
48
6
14
2
11
47
20
20
23
6
22
26
1
25
45
31
27
39
6
44
3
24
22
45
34
17
5
13
16
23
20
7
30
16
25
21
7
34
1
35
32
34
1
9
10
32
23
35
11
3
6
12
5
30
4
20
33
15
26
10
8
28
16
11
21
14
3
38
10
42
16
3
26
48
30
28
Link to file
Here it is on a bit of the data. Here I have added a third column based on the average row of each unique number and sorted first on frequency then on row average:
=LET(range,A1:F3,uniques,UNIQUE(TOCOL(range)),rows,SEQUENCE(ROWS(range)),
avrow,BYROW(uniques,LAMBDA(uniq,SUM((range=uniq)*rows/SUM(--(range=uniq))))),
freq,DROP(FREQUENCY(range,uniques),-1),
SORTBY(HSTACK(uniques,freq,avrow),freq,-1,avrow,1))
Can 6 really occur twice in the same draw? Maybe not, but it doesn't affect the answer.
EDIT
Here is a version based on your original formula:
=LET(range,A1:F27,
flatten, TEXTSPLIT(TEXTJOIN(";",,A1:F27),,";"),
numUq, UNIQUE(flatten),
rows,SEQUENCE(ROWS(range)),
matches, XMATCH(flatten,numUq),
avrow,BYROW(numUq,LAMBDA(numUq,SUM((range=--numUq)*rows/SUM(--(range=--numUq))))),
freq,DROP(FREQUENCY(matches, UNIQUE(matches)),-1),
SORTBY(HSTACK(numUq,freq,avrow),freq,-1,avrow,1))
Full Dataset
The sorting is based on number of appearances and average row, but you could use other measures like row of first appearance if you wanted to.
Different approach:
=LET(data,A1:F27,
a,TOCOL(data),
b,MMULT(--(TRANSPOSE(a)=a),SEQUENCE(COUNTA(a),,1,0)),
c,TOCOL(IF(ISNUMBER(data),MAX(ROW(data)+1)-ROW(data)^99)),
d,MMULT(--(TRANSPOSE(a)=a),c),
s,SORTBY(HSTACK(a,b),b,-1,d,1),
UNIQUE(s))
a "flattens" the data using TOCOL.
b creates a "countif" of the drawn values in a using MMULT.
c returns the maximum row value of the data + 1 minus the row value of each value found ^99.
^99 because I want the number to be higher if it would be found in the first row only versus if it was found in each row except the first.
d returns a "sumif" of the calculated row values of c against the values of a.
We than only need a and b for the list using HSTACK, but we need them sorted by the count b descending and sorted by the sumif d ascending using SORTBY.
This will sort it as you illustrated it.
If it's a tie (36 and 19 in the data) it will show the first in row first.

resampling a pandas dataframe and filling new rows with zero

I have a time series as a dataframe. The first column is the week number, the second are values for that week. The first week (22) and the last week (48), are the lower and upper bounds of the time series. Some weeks are missing, for example, there is no week 27 and 28. I would like to resample this series such that there are no missing weeks. Where a week was inserted, I would like the corresponding value to be zero. This is my data:
week value
0 22 1
1 23 2
2 24 2
3 25 3
4 26 2
5 29 3
6 30 3
7 31 3
8 32 7
9 33 4
10 34 5
11 35 4
12 36 2
13 37 3
14 38 10
15 39 5
16 40 7
17 41 10
18 42 11
19 43 15
20 44 9
21 45 13
22 46 5
23 47 6
24 48 2
I am wondering if this can be achieved in Pandas without creating a loop from scratch. I have looked into pd.resample, but can't achieve the results I am looking for.
I would set week as index, reindex with fill_value option:
start, end = df['week'].agg(['min','max'])
df.set_index('week').reindex(np.arange(start, end+1), fill_value=0).reset_index()
Output (head):
week value
0 22 1
1 23 2
2 24 2
3 25 3
4 26 2
5 27 0
6 28 0
7 29 3
8 30 3

How to append the multiple columns of a data-frame to a new empty data-frame

I'm having a dataset which contains multiple columns. I'm also having the list of columns:
columns_list = ['A1','A2','B1','B2']
df
A1 A2 B1 B2
0 1 11 21 31
1 2 12 22 32
2 3 13 23 33
3 4 14 24 34
Based on the columns list, how do I transform data.Frame df to new_df, as below:
new_df
0 1
0 1 11
1 2 12
2 3 13
3 4 14
4 21 31
5 22 32
6 23 33
7 24 34
I tried to append that but I'm getting error. How to create the new data.frame. Thank You.
df1 = pd.DataFrame(df[columns_list[0:2]].to_numpy())
df2 = pd.DataFrame(df[columns_list[2:]].to_numpy())
new_df = pd.concat([df1, df2]).reset_index(drop=True)

How to divide 1 column into 5 segments with pandas and python?

I have a list of 1 column and 50 rows.
I want to divide it into 5 segments. And each segment has to become a column of a dataframe. I do not want the NAN to appear (figure2). How can I solve that?
Like this:
df = pd.DataFrame(result_list)
AWA=df[:10]
REM=df[10:20]
S1=df[20:30]
S2=df[30:40]
SWS=df[40:50]
result = pd.concat([AWA, REM, S1, S2, SWS], axis=1)
result
Figure2
You can use numpy's reshape function:
result_list = [i for i in range(50)]
pd.DataFrame(np.reshape(result_list, (10, 5), order='F'))
Out:
0 1 2 3 4
0 0 10 20 30 40
1 1 11 21 31 41
2 2 12 22 32 42
3 3 13 23 33 43
4 4 14 24 34 44
5 5 15 25 35 45
6 6 16 26 36 46
7 7 17 27 37 47
8 8 18 28 38 48
9 9 19 29 39 49

pulling out the result from MATLAB to Excel row by row

I do calculations on 64 elements (for p=1:64 function end) and pull out the result values in an Excel file.
Is there any way to arrange the result values for each element row by row (the values of the first element should appear on the first row, the values of the second element should appear on the second row and so on)?
I used P=reshape(A,[],16) but Matlab pushes the values from right to the left mixing them.
For example,
If I set the loop for the calculation p=1:1 and use P=reshape(A,[],16) the result is:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
If I set p=1:2 the result becomes:
for element 1: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31
for element 2: 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32
(the values of element 2 are: 17 18 19 20 21 22 23 24 25 ... 32)
The result for p=1:2 should be:
for element 1: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
for element 2: 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
for element 3: 33 34 35 ,etc...
Try this:
P=reshape(A,16,[])'
Is this what you need?

Resources