I would like to find consecutive numbers in column A and Column B in python, Column A should be ascending but Column B is descending. I am attaching an example file.
Input file
nucleotide
Pos_A
Pos_B
Connection_Pos20_Pos102
20
102
Connection_Pos19_Pos102
19
102
Connection_Pos20_Pos101
20
101
Connection_Pos18_Pos102
18
102
Connection_Pos19_Pos101
19
101
Connection_Pos20_Pos100
20
100
Connection_Pos17_Pos102
17
102
Connection_Pos18_Pos101
18
101
Connection_Pos19_Pos100
19
100
Connection_Pos20_Pos99
20
99
Connection_Pos16_Pos102
16
102
Connection_Pos17_Pos101
17
101
Connection_Pos18_Pos100
18
100
Connection_Pos19_Pos99
19
99
Connection_Pos20_Pos98
20
98
Connection_Pos15_Pos102
15
102
Connection_Pos16_Pos101
16
101
Connection_Pos17_Pos100
17
100
Connection_Pos18_Pos99
18
99
Connection_Pos19_Pos98
19
98
Connection_Pos20_Pos97
20
97
Connection_Pos14_Pos102
14
102
Connection_Pos15_Pos101
15
101
Connection_Pos16_Pos100
16
100
Output:
nucleotide
Pos_A
Pos_B
Consecutive ID
Consecutive Number (Size)
Connection_Pos20_Pos102
20
102
101
1
Connection_Pos19_Pos102
19
102
100
2
Connection_Pos20_Pos101
20
101
100
2
Connection_Pos18_Pos102
18
102
99
3
Connection_Pos19_Pos101
19
101
99
3
Connection_Pos20_Pos100
20
100
99
3
Connection_Pos17_Pos102
17
102
98
4
Connection_Pos18_Pos101
18
101
98
4
Connection_Pos19_Pos100
19
100
98
4
Connection_Pos20_Pos99
20
99
98
4
Connection_Pos16_Pos102
16
102
97
5
Connection_Pos17_Pos101
17
101
97
5
Connection_Pos18_Pos100
18
100
97
5
Connection_Pos19_Pos99
19
99
97
5
Connection_Pos20_Pos98
20
98
97
5
Connection_Pos15_Pos102
15
102
96
6
Connection_Pos16_Pos101
16
101
96
6
Connection_Pos17_Pos100
17
100
96
6
Connection_Pos18_Pos99
18
99
96
6
Connection_Pos19_Pos98
19
98
96
6
Connection_Pos20_Pos97
20
97
96
6
Connection_Pos14_Pos102
14
102
95
7
Connection_Pos15_Pos101
15
101
95
7
Connection_Pos16_Pos100
16
100
95
7
Connection_Pos17_Pos99
17
99
95
7
Connection_Pos18_Pos98
18
98
95
7
Connection_Pos19_Pos97
19
97
95
7
Connection_Pos20_Pos96
20
96
95
7
For Consecutive ID, if Pos_B's shifted difference != 1, then we want to subtract 1, so we mark those indexes as -1 with mul(-1) and cumsum them:
df['ID'] = df.Pos_B.shift().sub(df.Pos_B).ne(1).mul(-1).cumsum() + df.Pos_B[0]
For Consecutive Number, if Pos_A's shifted difference != -1, then we want to add 1, so we mark those indexes as 1 and cumsum again:
df['Number'] = df.Pos_A.shift().sub(df.Pos_A).ne(-1).mul(1).cumsum()
Result:
nucleotide Pos_A Pos_B ID Number
0 Connection_Pos20_Pos102 20 102 101 1
1 Connection_Pos19_Pos102 19 102 100 2
2 Connection_Pos20_Pos101 20 101 100 2
3 Connection_Pos18_Pos102 18 102 99 3
4 Connection_Pos19_Pos101 19 101 99 3
5 Connection_Pos20_Pos100 20 100 99 3
6 Connection_Pos17_Pos102 17 102 98 4
7 Connection_Pos18_Pos101 18 101 98 4
8 Connection_Pos19_Pos100 19 100 98 4
9 Connection_Pos20_Pos99 20 99 98 4
10 Connection_Pos16_Pos102 16 102 97 5
11 Connection_Pos17_Pos101 17 101 97 5
12 Connection_Pos18_Pos100 18 100 97 5
13 Connection_Pos19_Pos99 19 99 97 5
14 Connection_Pos20_Pos98 20 98 97 5
15 Connection_Pos15_Pos102 15 102 96 6
16 Connection_Pos16_Pos101 16 101 96 6
17 Connection_Pos17_Pos100 17 100 96 6
18 Connection_Pos18_Pos99 18 99 96 6
19 Connection_Pos19_Pos98 19 98 96 6
20 Connection_Pos20_Pos97 20 97 96 6
21 Connection_Pos14_Pos102 14 102 95 7
22 Connection_Pos15_Pos101 15 101 95 7
23 Connection_Pos16_Pos100 16 100 95 7
Do it one by one then groupby with ngroup
s1 = df.Pos_A.diff().le(0).cumsum()
s2 = df.Pos_B.diff().ge(0).cumsum()
df['out'] = df.groupby([s1,s2]).ngroup()+1
Out[452]:
0 1
1 2
2 2
3 3
4 3
5 3
6 4
7 4
8 4
9 4
10 5
11 5
12 5
13 5
14 5
15 6
16 6
17 6
18 6
19 6
20 6
21 7
22 7
23 7
24 7
25 7
26 7
27 7
dtype: int64
I have dataframe i want to move column name to left from specific column. original dataframe have many columns can not do this by rename columns
df=pd.DataFrame({'A':[1,3,4,7,8,11,1,15,20,15,16,87],
'H':[1,3,4,7,8,11,1,15,78,15,16,87],
'N':[1,3,4,98,8,11,1,15,20,15,16,87],
'p':[1,3,4,9,8,11,1,15,20,15,16,87],
'B':[1,3,4,6,8,11,1,19,20,15,16,87],
'y':[0,0,0,0,1,1,1,0,0,0,0,0]})
print((df))
A H N p B y
0 1 1 1 1 1 0
1 3 3 3 3 3 0
2 4 4 4 4 4 0
3 7 7 98 9 6 0
4 8 8 8 8 8 1
5 11 11 11 11 11 1
6 1 1 1 1 1 1
7 15 15 15 15 19 0
8 20 78 20 20 20 0
9 15 15 15 15 15 0
10 16 16 16 16 16 0
11 87 87 87 87 87 0
Here i want to remove label N first dataframe after removing label N
A H p B y
0 1 1 1 1 1 0
1 3 3 3 3 3 0
2 4 4 4 4 4 0
3 7 7 98 9 6 0
4 8 8 8 8 8 1
5 11 11 11 11 11 1
6 1 1 1 1 1 1
7 15 15 15 15 19 0
8 20 78 20 20 20 0
9 15 15 15 15 15 0
10 16 16 16 16 16 0
11 87 87 87 87 87 0
Rrquired output:
A H P B y
0 1 1 1 1 1 0
1 3 3 3 3 3 0
2 4 4 4 4 4 0
3 7 7 98 9 6 0
4 8 8 8 8 8 1
5 11 11 11 11 11 1
6 1 1 1 1 1 1
7 15 15 15 15 19 0
8 20 78 20 20 20 0
9 15 15 15 15 15 0
10 16 16 16 16 16 0
11 87 87 87 87 87 0
Here last column can be ignore
Note: in original dataframe have many columns , can not rename columns , so need some auto method to shift column names lef
You can do
df.columns=sorted(df.columns.str.replace('N',''),key=lambda x : x=='')
df
A H p B y
0 1 1 1 1 1 0
1 3 3 3 3 3 0
2 4 4 4 4 4 0
3 7 7 98 9 6 0
4 8 8 8 8 8 1
5 11 11 11 11 11 1
6 1 1 1 1 1 1
7 15 15 15 15 19 0
8 20 78 20 20 20 0
9 15 15 15 15 15 0
10 16 16 16 16 16 0
11 87 87 87 87 87 0
Replace the columns with your own custom list.
>>> cols = list(df.columns)
>>> cols.remove('N')
>>> df.columns = cols + ['']
Output
>>> df
A H p B y
0 1 1 1 1 1 0
1 3 3 3 3 3 0
2 4 4 4 4 4 0
3 7 7 98 9 6 0
4 8 8 8 8 8 1
5 11 11 11 11 11 1
6 1 1 1 1 1 1
7 15 15 15 15 19 0
8 20 78 20 20 20 0
9 15 15 15 15 15 0
10 16 16 16 16 16 0
11 87 87 87 87 87 0
Trying to let the user input a number, and print a table according to the square of its size. Here's an example.
Size--> 3
0 1 2
3 4 5
6 7 8
Size--> 4
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
Size--> 6
0 1 2 3 4 5
6 7 8 9 10 11
12 13 14 15 16 17
18 19 20 21 22 23
24 25 26 27 28 29
30 31 32 33 34 35
Size--> 9
0 1 2 3 4 5 6 7 8
9 10 11 12 13 14 15 16 17
18 19 20 21 22 23 24 25 26
27 28 29 30 31 32 33 34 35
36 37 38 39 40 41 42 43 44
45 46 47 48 49 50 51 52 53
54 55 56 57 58 59 60 61 62
63 64 65 66 67 68 69 70 71
72 73 74 75 76 77 78 79 80
Here's is the code that i have tried.
length=int(input('Size--> '))
size=length*length
biglist=[]
for i in range(size):
biglist.append(i)
biglist = [str(i) for i in biglist]
for i in range(0, len(biglist), length):
print(' '.join(biglist[i: i+length]))
but instead here's what i got
Size--> 3
0 1 2
3 4 5
6 7 8
Size--> 4
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
Size--> 6
0 1 2 3 4 5
6 7 8 9 10 11
12 13 14 15 16 17
18 19 20 21 22 23
24 25 26 27 28 29
30 31 32 33 34 35
As you can see the rows are not aligned properly like the example.
What's the simplest way of presenting it in a proper alignment? Thx :)
Using .format on string with right aligning.
And strlen is the number of characters required for each number.
length = int(input('Size--> '))
size = length*length
biglist = []
for i in range(size):
biglist.append(i)
biglist = [str(i) for i in biglist]
strlen = len(str(length**2-1))+1
for i in range(0, len(biglist), length):
# print(' '.join(biglist[i: i+length]))
for x in biglist[i: i+length]:
print(f"{x:>{strlen}}", end='')
print()
I have a list of 1 column and 50 rows.
I want to divide it into 5 segments. And each segment has to become a column of a dataframe. I do not want the NAN to appear (figure2). How can I solve that?
Like this:
df = pd.DataFrame(result_list)
AWA=df[:10]
REM=df[10:20]
S1=df[20:30]
S2=df[30:40]
SWS=df[40:50]
result = pd.concat([AWA, REM, S1, S2, SWS], axis=1)
result
Figure2
You can use numpy's reshape function:
result_list = [i for i in range(50)]
pd.DataFrame(np.reshape(result_list, (10, 5), order='F'))
Out:
0 1 2 3 4
0 0 10 20 30 40
1 1 11 21 31 41
2 2 12 22 32 42
3 3 13 23 33 43
4 4 14 24 34 44
5 5 15 25 35 45
6 6 16 26 36 46
7 7 17 27 37 47
8 8 18 28 38 48
9 9 19 29 39 49
For the code:
dataset = pd.read_csv("/Users/Akshita/Desktop/EE660/donor_raw_data_medmean.csv", header=None, names=None)
# Separate data and label
X_label = dataset[1:19373][0]
X_data = dataset[1:19373]
print(X_data[X_label==1])
I get the output:(There are actually 4000~ samples with label=1)
0 1 2 3 4 5 6 7 8 9 ... 51 52 53 54 55 56 57 58 \
16386 1 17 60 0 1 0 0 0 0 1 ... 0 20 20 20 5 10 15 15
16396 1 137 60 0 1 0 0 0 0 1 ... 15 25 10 15 6 14 16 120
16399 1 89 54 0 1 0 0 0 0 1 ... 10 15 5 15 6 14 16 79
16402 1 89 75 0 1 0 0 0 0 1 ... 25 35 10 35 6 13 15 79
..
..
19356 1 101 80 1 0 0 1 0 0 2 ... 25 30 5 28 7 16 18 101
19363 1 65 70 1 0 0 1 0 0 1 ... 7 12 5 10 4 8 20 63
19372 1 29 70 0 0 0 1 0 0 2 ... 0 25 25 25 4 9 24 24
..
[859 rows x 61 columns]
and for
print(X_data[X_label==0])
I get the output:(There are about 15000~ samples with label=0)
0 1 2 3 4 5 6 7 8 9 ... 51 52 53 54 55 56 57 58 \
16384 0 17 74 0 1 0 0 0 0 1 ... 0 15 15 15 4 10 17 17
16385 0 17 60 0 1 0 0 0 0 2 ... 0 15 15 15 4 11 17 17
16387 0 29 67 0 1 0 0 0 0 1 ... 0 20 20 20 5 11 23 28
16388 0 53 60 0 1 0 0 0 0 1 ... 5 30 25 30 5 11 26 52
16389 0 65 49 0 1 0 0 0 0 1 ... 30 35 5 27 6 13 16 56
..
..
19369 0 137 77 1 0 1 0 0 0 1 ... 9 10 1 10 6 13 21 130
19370 0 29 60 1 0 0 1 0 0 1 ... 0 15 15 15 3 9 23 23
19371 0 129 78 1 0 0 1 0 0 2 ... 20 25 5 25 7 24 8 129
What can I be doing wrong?