How to divide 1 column into 5 segments with pandas and python? - python-3.x

I have a list of 1 column and 50 rows.
I want to divide it into 5 segments. And each segment has to become a column of a dataframe. I do not want the NAN to appear (figure2). How can I solve that?
Like this:
df = pd.DataFrame(result_list)
AWA=df[:10]
REM=df[10:20]
S1=df[20:30]
S2=df[30:40]
SWS=df[40:50]
result = pd.concat([AWA, REM, S1, S2, SWS], axis=1)
result
Figure2

You can use numpy's reshape function:
result_list = [i for i in range(50)]
pd.DataFrame(np.reshape(result_list, (10, 5), order='F'))
Out:
0 1 2 3 4
0 0 10 20 30 40
1 1 11 21 31 41
2 2 12 22 32 42
3 3 13 23 33 43
4 4 14 24 34 44
5 5 15 25 35 45
6 6 16 26 36 46
7 7 17 27 37 47
8 8 18 28 38 48
9 9 19 29 39 49

Related

Sort rows by row value (top to bottom)

There is lotto draw (5 numbers) on each row. I have formula which calculates the most frequient numbers with their number of draws. Is it possible in end result to sort same number of draws results by row value. This means that if number is drawn on top rows will have grater value than those on bottom rows. Considering number of row to be a value. How is that possible?
Formula used:
=LET(flatten, TEXTSPLIT(TEXTJOIN(";",,A1:F27),,";"), numUq, UNIQUE(flatten), matches, XMATCH(flatten,numUq),SORT(HSTACK(numUq, DROP(FREQUENCY(matches, UNIQUE(matches)),-1)),2,-1))
In the example screenshot number 35 and number 13 have equal draws count, but 13 should be before 35.
Data:
A
B
C
D
E
F
18
35
31
13
37
10
43
47
36
13
6
19
6
12
6
35
14
1
43
24
45
7
21
16
37
39
44
24
12
40
39
8
34
28
49
46
27
44
15
46
45
12
22
0
10
5
28
28
4
7
23
6
44
41
30
22
47
13
29
29
37
9
26
44
39
10
30
17
21
20
41
22
43
35
0
22
13
9
14
22
42
20
32
21
13
38
48
6
14
2
11
47
20
20
23
6
22
26
1
25
45
31
27
39
6
44
3
24
22
45
34
17
5
13
16
23
20
7
30
16
25
21
7
34
1
35
32
34
1
9
10
32
23
35
11
3
6
12
5
30
4
20
33
15
26
10
8
28
16
11
21
14
3
38
10
42
16
3
26
48
30
28
Link to file
Here it is on a bit of the data. Here I have added a third column based on the average row of each unique number and sorted first on frequency then on row average:
=LET(range,A1:F3,uniques,UNIQUE(TOCOL(range)),rows,SEQUENCE(ROWS(range)),
avrow,BYROW(uniques,LAMBDA(uniq,SUM((range=uniq)*rows/SUM(--(range=uniq))))),
freq,DROP(FREQUENCY(range,uniques),-1),
SORTBY(HSTACK(uniques,freq,avrow),freq,-1,avrow,1))
Can 6 really occur twice in the same draw? Maybe not, but it doesn't affect the answer.
EDIT
Here is a version based on your original formula:
=LET(range,A1:F27,
flatten, TEXTSPLIT(TEXTJOIN(";",,A1:F27),,";"),
numUq, UNIQUE(flatten),
rows,SEQUENCE(ROWS(range)),
matches, XMATCH(flatten,numUq),
avrow,BYROW(numUq,LAMBDA(numUq,SUM((range=--numUq)*rows/SUM(--(range=--numUq))))),
freq,DROP(FREQUENCY(matches, UNIQUE(matches)),-1),
SORTBY(HSTACK(numUq,freq,avrow),freq,-1,avrow,1))
Full Dataset
The sorting is based on number of appearances and average row, but you could use other measures like row of first appearance if you wanted to.
Different approach:
=LET(data,A1:F27,
a,TOCOL(data),
b,MMULT(--(TRANSPOSE(a)=a),SEQUENCE(COUNTA(a),,1,0)),
c,TOCOL(IF(ISNUMBER(data),MAX(ROW(data)+1)-ROW(data)^99)),
d,MMULT(--(TRANSPOSE(a)=a),c),
s,SORTBY(HSTACK(a,b),b,-1,d,1),
UNIQUE(s))
a "flattens" the data using TOCOL.
b creates a "countif" of the drawn values in a using MMULT.
c returns the maximum row value of the data + 1 minus the row value of each value found ^99.
^99 because I want the number to be higher if it would be found in the first row only versus if it was found in each row except the first.
d returns a "sumif" of the calculated row values of c against the values of a.
We than only need a and b for the list using HSTACK, but we need them sorted by the count b descending and sorted by the sumif d ascending using SORTBY.
This will sort it as you illustrated it.
If it's a tie (36 and 19 in the data) it will show the first in row first.

resampling a pandas dataframe and filling new rows with zero

I have a time series as a dataframe. The first column is the week number, the second are values for that week. The first week (22) and the last week (48), are the lower and upper bounds of the time series. Some weeks are missing, for example, there is no week 27 and 28. I would like to resample this series such that there are no missing weeks. Where a week was inserted, I would like the corresponding value to be zero. This is my data:
week value
0 22 1
1 23 2
2 24 2
3 25 3
4 26 2
5 29 3
6 30 3
7 31 3
8 32 7
9 33 4
10 34 5
11 35 4
12 36 2
13 37 3
14 38 10
15 39 5
16 40 7
17 41 10
18 42 11
19 43 15
20 44 9
21 45 13
22 46 5
23 47 6
24 48 2
I am wondering if this can be achieved in Pandas without creating a loop from scratch. I have looked into pd.resample, but can't achieve the results I am looking for.
I would set week as index, reindex with fill_value option:
start, end = df['week'].agg(['min','max'])
df.set_index('week').reindex(np.arange(start, end+1), fill_value=0).reset_index()
Output (head):
week value
0 22 1
1 23 2
2 24 2
3 25 3
4 26 2
5 27 0
6 28 0
7 29 3
8 30 3

I would like to find consecutive numbers in column A and column B in python (pandas)

I would like to find consecutive numbers in column A and Column B in python, Column A should be ascending but Column B is descending. I am attaching an example file.
Input file
nucleotide
Pos_A
Pos_B
Connection_Pos20_Pos102
20
102
Connection_Pos19_Pos102
19
102
Connection_Pos20_Pos101
20
101
Connection_Pos18_Pos102
18
102
Connection_Pos19_Pos101
19
101
Connection_Pos20_Pos100
20
100
Connection_Pos17_Pos102
17
102
Connection_Pos18_Pos101
18
101
Connection_Pos19_Pos100
19
100
Connection_Pos20_Pos99
20
99
Connection_Pos16_Pos102
16
102
Connection_Pos17_Pos101
17
101
Connection_Pos18_Pos100
18
100
Connection_Pos19_Pos99
19
99
Connection_Pos20_Pos98
20
98
Connection_Pos15_Pos102
15
102
Connection_Pos16_Pos101
16
101
Connection_Pos17_Pos100
17
100
Connection_Pos18_Pos99
18
99
Connection_Pos19_Pos98
19
98
Connection_Pos20_Pos97
20
97
Connection_Pos14_Pos102
14
102
Connection_Pos15_Pos101
15
101
Connection_Pos16_Pos100
16
100
Output:
nucleotide
Pos_A
Pos_B
Consecutive ID
Consecutive Number (Size)
Connection_Pos20_Pos102
20
102
101
1
Connection_Pos19_Pos102
19
102
100
2
Connection_Pos20_Pos101
20
101
100
2
Connection_Pos18_Pos102
18
102
99
3
Connection_Pos19_Pos101
19
101
99
3
Connection_Pos20_Pos100
20
100
99
3
Connection_Pos17_Pos102
17
102
98
4
Connection_Pos18_Pos101
18
101
98
4
Connection_Pos19_Pos100
19
100
98
4
Connection_Pos20_Pos99
20
99
98
4
Connection_Pos16_Pos102
16
102
97
5
Connection_Pos17_Pos101
17
101
97
5
Connection_Pos18_Pos100
18
100
97
5
Connection_Pos19_Pos99
19
99
97
5
Connection_Pos20_Pos98
20
98
97
5
Connection_Pos15_Pos102
15
102
96
6
Connection_Pos16_Pos101
16
101
96
6
Connection_Pos17_Pos100
17
100
96
6
Connection_Pos18_Pos99
18
99
96
6
Connection_Pos19_Pos98
19
98
96
6
Connection_Pos20_Pos97
20
97
96
6
Connection_Pos14_Pos102
14
102
95
7
Connection_Pos15_Pos101
15
101
95
7
Connection_Pos16_Pos100
16
100
95
7
Connection_Pos17_Pos99
17
99
95
7
Connection_Pos18_Pos98
18
98
95
7
Connection_Pos19_Pos97
19
97
95
7
Connection_Pos20_Pos96
20
96
95
7
For Consecutive ID, if Pos_B's shifted difference != 1, then we want to subtract 1, so we mark those indexes as -1 with mul(-1) and cumsum them:
df['ID'] = df.Pos_B.shift().sub(df.Pos_B).ne(1).mul(-1).cumsum() + df.Pos_B[0]
For Consecutive Number, if Pos_A's shifted difference != -1, then we want to add 1, so we mark those indexes as 1 and cumsum again:
df['Number'] = df.Pos_A.shift().sub(df.Pos_A).ne(-1).mul(1).cumsum()
Result:
nucleotide Pos_A Pos_B ID Number
0 Connection_Pos20_Pos102 20 102 101 1
1 Connection_Pos19_Pos102 19 102 100 2
2 Connection_Pos20_Pos101 20 101 100 2
3 Connection_Pos18_Pos102 18 102 99 3
4 Connection_Pos19_Pos101 19 101 99 3
5 Connection_Pos20_Pos100 20 100 99 3
6 Connection_Pos17_Pos102 17 102 98 4
7 Connection_Pos18_Pos101 18 101 98 4
8 Connection_Pos19_Pos100 19 100 98 4
9 Connection_Pos20_Pos99 20 99 98 4
10 Connection_Pos16_Pos102 16 102 97 5
11 Connection_Pos17_Pos101 17 101 97 5
12 Connection_Pos18_Pos100 18 100 97 5
13 Connection_Pos19_Pos99 19 99 97 5
14 Connection_Pos20_Pos98 20 98 97 5
15 Connection_Pos15_Pos102 15 102 96 6
16 Connection_Pos16_Pos101 16 101 96 6
17 Connection_Pos17_Pos100 17 100 96 6
18 Connection_Pos18_Pos99 18 99 96 6
19 Connection_Pos19_Pos98 19 98 96 6
20 Connection_Pos20_Pos97 20 97 96 6
21 Connection_Pos14_Pos102 14 102 95 7
22 Connection_Pos15_Pos101 15 101 95 7
23 Connection_Pos16_Pos100 16 100 95 7
Do it one by one then groupby with ngroup
s1 = df.Pos_A.diff().le(0).cumsum()
s2 = df.Pos_B.diff().ge(0).cumsum()
df['out'] = df.groupby([s1,s2]).ngroup()+1
Out[452]:
0 1
1 2
2 2
3 3
4 3
5 3
6 4
7 4
8 4
9 4
10 5
11 5
12 5
13 5
14 5
15 6
16 6
17 6
18 6
19 6
20 6
21 7
22 7
23 7
24 7
25 7
26 7
27 7
dtype: int64

printing a string like a matrix

Trying to let the user input a number, and print a table according to the square of its size. Here's an example.
Size--> 3
0 1 2
3 4 5
6 7 8
Size--> 4
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
Size--> 6
0 1 2 3 4 5
6 7 8 9 10 11
12 13 14 15 16 17
18 19 20 21 22 23
24 25 26 27 28 29
30 31 32 33 34 35
Size--> 9
0 1 2 3 4 5 6 7 8
9 10 11 12 13 14 15 16 17
18 19 20 21 22 23 24 25 26
27 28 29 30 31 32 33 34 35
36 37 38 39 40 41 42 43 44
45 46 47 48 49 50 51 52 53
54 55 56 57 58 59 60 61 62
63 64 65 66 67 68 69 70 71
72 73 74 75 76 77 78 79 80
Here's is the code that i have tried.
length=int(input('Size--> '))
size=length*length
biglist=[]
for i in range(size):
biglist.append(i)
biglist = [str(i) for i in biglist]
for i in range(0, len(biglist), length):
print(' '.join(biglist[i: i+length]))
but instead here's what i got
Size--> 3
0 1 2
3 4 5
6 7 8
Size--> 4
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
Size--> 6
0 1 2 3 4 5
6 7 8 9 10 11
12 13 14 15 16 17
18 19 20 21 22 23
24 25 26 27 28 29
30 31 32 33 34 35
As you can see the rows are not aligned properly like the example.
What's the simplest way of presenting it in a proper alignment? Thx :)
Using .format on string with right aligning.
And strlen is the number of characters required for each number.
length = int(input('Size--> '))
size = length*length
biglist = []
for i in range(size):
biglist.append(i)
biglist = [str(i) for i in biglist]
strlen = len(str(length**2-1))+1
for i in range(0, len(biglist), length):
# print(' '.join(biglist[i: i+length]))
for x in biglist[i: i+length]:
print(f"{x:>{strlen}}", end='')
print()

Pandas: list of lists to expanded rows

I have an extension to this question. I have lists of lists in my columns and I need to expand the rows one step further. If I just repeat the steps it splits my strings into letters. Could you suggest a smart way around? Thanks!
d1 = pd.DataFrame({'column1': [['ana','bob',[1,2,3]],['dona','elf',[4,5,6]],['gear','hope',[7,8,9]]],
'column2':[10,20,30],
'column3':[44,55,66]})
d2 = pd.DataFrame.from_records(d1.column1.tolist()).stack().reset_index(level=1, drop=True).rename('column1')
d1_d2 = d1.drop('column1', axis=1).join(d2).reset_index(drop=True)[['column1','column2', 'column3']]
d1_d2
It seems you need flatten nested lists:
from collections import Iterable
def flatten(coll):
for i in coll:
if isinstance(i, Iterable) and not isinstance(i, str):
for subc in flatten(i):
yield subc
else:
yield i
d1['column1'] = d1['column1'].apply(lambda x: list(flatten(x)))
print (d1)
column1 column2 column3
0 [ana, bob, 1, 2, 3] 10 44
1 [dona, elf, 4, 5, 6] 20 55
2 [gear, hope, 7, 8, 9] 30 66
And then use your solution:
d2 = (pd.DataFrame(d1.column1.tolist())
.stack()
.reset_index(level=1, drop=True)
.rename('column1'))
d1_d2 = (d1.drop('column1', axis=1)
.join(d2)
.reset_index(drop=True)[['column1','column2', 'column3']])
print (d1_d2)
column1 column2 column3
0 ana 10 44
1 bob 10 44
2 1 10 44
3 2 10 44
4 3 10 44
5 dona 20 55
6 elf 20 55
7 4 20 55
8 5 20 55
9 6 20 55
10 gear 30 66
11 hope 30 66
12 7 30 66
13 8 30 66
14 9 30 66
Assuming the expected result is same as jezrael.
pandas >= 0.25.0
d1 = d1.explode('column1').explode('column1').reset_index(drop=True)
d1:
column1 column2 column3
0 ana 10 44
1 bob 10 44
2 1 10 44
3 2 10 44
4 3 10 44
5 dona 20 55
6 elf 20 55
7 4 20 55
8 5 20 55
9 6 20 55
10 gear 30 66
11 hope 30 66
12 7 30 66
13 8 30 66
14 9 30 66

Resources