Incorrect logical indexing? - python-3.x

For the code:
dataset = pd.read_csv("/Users/Akshita/Desktop/EE660/donor_raw_data_medmean.csv", header=None, names=None)
# Separate data and label
X_label = dataset[1:19373][0]
X_data = dataset[1:19373]
print(X_data[X_label==1])
I get the output:(There are actually 4000~ samples with label=1)
0 1 2 3 4 5 6 7 8 9 ... 51 52 53 54 55 56 57 58 \
16386 1 17 60 0 1 0 0 0 0 1 ... 0 20 20 20 5 10 15 15
16396 1 137 60 0 1 0 0 0 0 1 ... 15 25 10 15 6 14 16 120
16399 1 89 54 0 1 0 0 0 0 1 ... 10 15 5 15 6 14 16 79
16402 1 89 75 0 1 0 0 0 0 1 ... 25 35 10 35 6 13 15 79
..
..
19356 1 101 80 1 0 0 1 0 0 2 ... 25 30 5 28 7 16 18 101
19363 1 65 70 1 0 0 1 0 0 1 ... 7 12 5 10 4 8 20 63
19372 1 29 70 0 0 0 1 0 0 2 ... 0 25 25 25 4 9 24 24
..
[859 rows x 61 columns]
and for
print(X_data[X_label==0])
I get the output:(There are about 15000~ samples with label=0)
0 1 2 3 4 5 6 7 8 9 ... 51 52 53 54 55 56 57 58 \
16384 0 17 74 0 1 0 0 0 0 1 ... 0 15 15 15 4 10 17 17
16385 0 17 60 0 1 0 0 0 0 2 ... 0 15 15 15 4 11 17 17
16387 0 29 67 0 1 0 0 0 0 1 ... 0 20 20 20 5 11 23 28
16388 0 53 60 0 1 0 0 0 0 1 ... 5 30 25 30 5 11 26 52
16389 0 65 49 0 1 0 0 0 0 1 ... 30 35 5 27 6 13 16 56
..
..
19369 0 137 77 1 0 1 0 0 0 1 ... 9 10 1 10 6 13 21 130
19370 0 29 60 1 0 0 1 0 0 1 ... 0 15 15 15 3 9 23 23
19371 0 129 78 1 0 0 1 0 0 2 ... 20 25 5 25 7 24 8 129
What can I be doing wrong?

Related

How to return first item when the items in the pandas dataframe window are the same?

I am a python beginner.
I have the following pandas DataFrame, with only two columns; "Time" and "Input".
I want to loop over the "Input" column. Assuming we have a window size w= 3. (three consecutive values) such that for every selected window, we will check if all the items/elements within that window are 1's, then return the first item as 1 and change the remaining values to 0's.
index Time Input
0 11 0
1 22 0
2 33 0
3 44 1
4 55 1
5 66 1
6 77 0
7 88 0
8 99 0
9 1010 0
10 1111 1
11 1212 1
12 1313 1
13 1414 0
14 1515 0
My intended output is as follows
index Time Input What_I_got What_I_Want
0 11 0 0 0
1 22 0 0 0
2 33 0 0 0
3 44 1 1 1
4 55 1 1 0
5 66 1 1 0
6 77 1 1 1
7 88 1 0 0
8 99 1 0 0
9 1010 0 0 0
10 1111 1 1 1
11 1212 1 0 0
12 1313 1 0 0
13 1414 0 0 0
14 1515 0 0 0
What should I do to get the desired output? Am I missing something in my code?
import pandas as pd
import re
pd.Series(list(re.sub('111', '100', ''.join(df.Input.astype(str))))).astype(int)
Out[23]:
0 0
1 0
2 0
3 1
4 0
5 0
6 1
7 0
8 0
9 0
10 1
11 0
12 0
13 0
14 0
dtype: int32

how to shift column labels to left python

I have dataframe i want to move column name to left from specific column. original dataframe have many columns can not do this by rename columns
df=pd.DataFrame({'A':[1,3,4,7,8,11,1,15,20,15,16,87],
'H':[1,3,4,7,8,11,1,15,78,15,16,87],
'N':[1,3,4,98,8,11,1,15,20,15,16,87],
'p':[1,3,4,9,8,11,1,15,20,15,16,87],
'B':[1,3,4,6,8,11,1,19,20,15,16,87],
'y':[0,0,0,0,1,1,1,0,0,0,0,0]})
print((df))
A H N p B y
0 1 1 1 1 1 0
1 3 3 3 3 3 0
2 4 4 4 4 4 0
3 7 7 98 9 6 0
4 8 8 8 8 8 1
5 11 11 11 11 11 1
6 1 1 1 1 1 1
7 15 15 15 15 19 0
8 20 78 20 20 20 0
9 15 15 15 15 15 0
10 16 16 16 16 16 0
11 87 87 87 87 87 0
Here i want to remove label N first dataframe after removing label N
A H p B y
0 1 1 1 1 1 0
1 3 3 3 3 3 0
2 4 4 4 4 4 0
3 7 7 98 9 6 0
4 8 8 8 8 8 1
5 11 11 11 11 11 1
6 1 1 1 1 1 1
7 15 15 15 15 19 0
8 20 78 20 20 20 0
9 15 15 15 15 15 0
10 16 16 16 16 16 0
11 87 87 87 87 87 0
Rrquired output:
A H P B y
0 1 1 1 1 1 0
1 3 3 3 3 3 0
2 4 4 4 4 4 0
3 7 7 98 9 6 0
4 8 8 8 8 8 1
5 11 11 11 11 11 1
6 1 1 1 1 1 1
7 15 15 15 15 19 0
8 20 78 20 20 20 0
9 15 15 15 15 15 0
10 16 16 16 16 16 0
11 87 87 87 87 87 0
Here last column can be ignore
Note: in original dataframe have many columns , can not rename columns , so need some auto method to shift column names lef
You can do
df.columns=sorted(df.columns.str.replace('N',''),key=lambda x : x=='')
df
A H p B y
0 1 1 1 1 1 0
1 3 3 3 3 3 0
2 4 4 4 4 4 0
3 7 7 98 9 6 0
4 8 8 8 8 8 1
5 11 11 11 11 11 1
6 1 1 1 1 1 1
7 15 15 15 15 19 0
8 20 78 20 20 20 0
9 15 15 15 15 15 0
10 16 16 16 16 16 0
11 87 87 87 87 87 0
Replace the columns with your own custom list.
>>> cols = list(df.columns)
>>> cols.remove('N')
>>> df.columns = cols + ['']
Output
>>> df
A H p B y
0 1 1 1 1 1 0
1 3 3 3 3 3 0
2 4 4 4 4 4 0
3 7 7 98 9 6 0
4 8 8 8 8 8 1
5 11 11 11 11 11 1
6 1 1 1 1 1 1
7 15 15 15 15 19 0
8 20 78 20 20 20 0
9 15 15 15 15 15 0
10 16 16 16 16 16 0
11 87 87 87 87 87 0

Obtaining UNIQUE VALUE occurrences count in a set of COLUMNS using AWK

IGNORING columns 1 & 2 (only the rest of the columns); I would like to obtain the occurrence COUNT of UNIQUE EVEN values (ignoring ODD ones) for the following set of data.
I have tried:
awk '{ a[$3, $4, $5, $6, $7]++ } END { for (b in a) { cnt+=1 } {print cnt}}' file
I obtain 76 but I don’t expect this value.
> 0 0
> 1 0 0
> 2 0 2
> 3 0 0 6
> 4 0 0 8
> 5 0 0 10
> 6 0 2 14
> 7 0 2 16
> 8 0 0 6 20
> 9 0 0 8 24
> 10 0 0 8 26
> 11 0 0 10 32
> 12 0 0 10 34
> 13 0 2 14 40
> 14 0 2 16 42
> 15 0 0 8 24 48
> 16 0 0 8 24 50
> 17 0 0 8 26 56
> 18 0 0 10 32 60
> 19 0 0 10 34 64
> 20 0 0 10 34 66
> 21 0 2 14 40 72
> 22 0 0 8 24 48 76
> 23 0 0 8 24 50 82
> 24 0 0 8 26 56 88
> 25 0 0 8 26 56 90
> 26 0 0 10 32 60 96
> 27 0 0 10 32 60 98
> 28 0 0 10 34 64 104
> 29 0 0 10 34 64 106
> 30 0 0 10 34 66 112
> 31 0 0 10 34 66 114
> 0 1
> 1 1 2 5
> 2 1 2
> 3 1 2 12 23 19
> 4 1 2 12 23
> 5 1 2 12
> 6 1 2 12 28
> 7 1 2 12 28 36
> 8 1 2 12 30 47 45
> 9 1 2 12 30 47
> 10 1 2 12 30
> 11 1 2 12 30 52
> 12 1 2 12 28 38
> 13 1 2 12 28 38 62
> 14 1 2 12 28 38 62 68
> 15 1 2 12 30 54 75
> 16 1 2 12 30 54
> 17 1 2 12 30 54 78
> 18 1 2 12 30 54 78 84
> 19 1 2 12 30 54 78 84 92
> 20 1 2 12 28 38 62 70
> 21 1 2 12 28 38 62 70 108
> 22 1 2 12 30 54 80
> 23 1 2 12 30 54 78 86
> 24 1 2 12 30 54 78 86 120
> 25 1 2 12 30 54 78 84 94
> 26 1 2 12 30 54 78 84 94 124
> 27 1 2 12 30 54 78 84 92 102
> 28 1 2 12 30 54 78 84 92 102 128
> 29 1 2 12 28 38 62 70 110
> 30 1 2 12 28 38 62 70 110 130
> 31 1 2 12 28 38 62 70 108 116
> 0 2
> 1 2 2 5
> 2 2 2
> 3 2 2 5 6
> 4 2 2 5 6 18
> 5 2 2 5 6 18 22
> 6 2 2 14
> 7 2 2 16
> 8 2 2 5 6 20
> 9 2 2 5 6 20 44
> 10 2 2 5 6 18 26
> 11 2 2 5 6 18 22 32
> 12 2 2 5 6 18 22 32 58
> 13 2 2 14 40
> 14 2 2 16 42
> 15 2 2 5 6 20 44 50 75
> 16 2 2 5 6 20 44 50
> 17 2 2 5 6 18 26 56
> 18 2 2 5 6 18 22 32 60
> 19 2 2 14 40 72 109 101
> 20 2 2 14 40 72 109
> 21 2 2 14 40 72
> 22 2 2 5 6 20 44 50 80
> 23 2 2 5 6 20 44 50 80 118
> 24 2 2 5 6 20 44 50 80 118 120
> 25 2 2 5 6 20 44 50 80 118 120 122
> 26 2 2 14 40 72 109 101 102 127
> 27 2 2 14 40 72 109 101 102
> 28 2 2 14 40 72 109 101 104
> 29 2 2 14 40 72 116 133 131
> 30 2 2 14 40 72 116 133
> 31 2 2 14 40 72 116
> 0 3
> 1 3 0
> 2 3 0 4
> 3 3 0 6
> 4 3 0 6 18
> 5 3 0 6 18 22
> 6 3 0 4 16 37
> 7 3 0 4 16
> 8 3 0 6 20
> 9 3 0 6 18 26 47
> 10 3 0 6 18 26
> 11 3 0 6 18 22 32
> 12 3 0 6 18 22 32 58
> 13 3 0 4 16 42 69
> 14 3 0 4 16 42
> 15 3 0 6 18 26 47 48
> 16 3 0 6 18 26 47 48 74
> 17 3 0 6 18 26 56
> 18 3 0 6 18 22 32 60
> 19 3 0 6 18 22 32 58 64
> 20 3 0 6 18 22 32 58 66
> 21 3 0 6 18 22 32 58 66 108
> 22 3 0 6 18 26 47 48 76
> 23 3 0 6 18 26 56 86
> 24 3 0 6 18 26 56 88
> 25 3 0 6 18 26 56 90
> 26 3 0 6 18 22 32 60 96
> 27 3 0 6 18 22 32 60 98
> 28 3 0 6 18 22 32 58 64 104
> 29 3 0 6 18 22 32 58 64 106
> 30 3 0 6 18 22 32 58 66 112
> 31 3 0 6 18 22 32 58 66 114
> 0 4
> 1 4 0
> 2 4 2
> 3 4 0 6
> 4 4 0 8
> 5 4 0 10
> 6 4 2 16 37
> 7 4 2 16
> 8 4 0 8 24 45
> 9 4 0 8 24
> 10 4 0 8 26
> 11 4 0 8 26 52
> 12 4 2 16 37 38
> 13 4 2 16 42 69
> 14 4 2 16 42
> 15 4 0 8 24 48
> 16 4 0 8 24 50
> 17 4 0 8 26 56
> 18 4 0 8 26 52 60
> 19 4 2 16 37 38 64
> 20 4 2 16 42 69 70
> 21 4 2 16 42 69 72
> 22 4 0 8 24 48 76
> 23 4 0 8 24 50 82
> 24 4 0 8 26 56 88
> 25 4 0 8 26 52 60 94
> 26 4 0 8 26 52 60 96
> 27 4 0 8 26 52 60 98
> 28 4 2 16 37 38 64 104
> 29 4 2 16 42 69 70 110
> 30 4 2 16 42 69 70 112
> 31 4 2 16 42 69 70 114
You can try this awk command to count unique values ignoring 1st and 2nd column:
awk '{$1=$2=""; !seen[$0]++} END{print length(seen)}' file
130
If you are counting uniques excluding 1st and 2nd columns and ignoring odd numbers then use:
awk '{for (i=3; i<=NF; i++) !($i%2) && !seen[$i]++} END{print length(seen)}' file
63

Sum the values from the days in a specific week in Excel

So I have some rows of data and some columns with dates.
As you can see on the image below.
I want the sum of the week for each row - but the tricky thing is that not every week is 5 days, so there might be weeks with 3 days. So somehow, I want to try to go for the weeknumber and then sum it.
Can anyone help with me a formular (or a VBA macro)?
I am completely lost after trying several approaches.
18-May-15 19-May-15 20-May-15 21-May-15 22-May-15 25-May-15 26-May-15 27-May-15 28-May-15 29-May-15 1-Jun-15 2-Jun-15 3-Jun-15 4-Jun-15 WEEK 1 TOTAL WEEK 2 TOTAL
33 15 10 19 18 8 10 15 10 29 16 24 8 26 74
18 11 8 17 0 6 16 9 16 16 36 9 6 4 55
0 0 1 0 0 1 0 0 1 0 0 3 3 2 8
30 7 4 8 8 11 10 3 0 11 3 4 5 6 18
0 0 0 11 0 0 0 1 0 7 8 1 1 2 12
1 1 4 0 5 1 6 2 1 4 2 4 5 4 15
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
52 27 22 36 23 15 32 26 27 49 54 37 19 34 144
30 50 25 21 34 12 33 32 26 43 54 43 18 32 147
0 0 1 0 3 0 0 0 0 0 0 0 0 0 0
29 5 3 4 4 1 1 2 4 4 3 4 2 3 12
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 4 1 10 9 0 0 0 0 0 1 1 2
1 2 0 0 0 0 0 1 3 0 0 0 2 2 4
15 29 5 17 16 4 18 20 12 28 25 22 4 23 74
11 15 11 3 15 7 11 9 5 12 18 10 5 7 40
1 0 2 1 1 0 0 1 8 1 4 3 2 0 9
3 6 7 0 2 1 4 2 1 2 7 8 7 2 24
21 21 21 21 21 22 22 22 22 22 23 23 23 23
Using SUMIF is one way. But you need to get your references straight in order to make it easy to enter.
Note in the diagram below, the formula:
=SUMIF(Weeknums,M$1,$B2:$K2)
where weeknums is the row of calculated Week Numbers.
Also note that the column headers showing the Week number to be summed could be made more explanatory with custom formatting:
I know you've already accepted an answer but just to show you:
If you transposed your data you would then be able to utilise the pivot tables
You could set up a calculated field to calculate exactly what you wanted (and depending on how you sorted/grouped the date you could sort this by weeks, months, quarters or even years
You would then get all of your final values displayed in an easy to read format grouped by whatever you want. In my opinion this is a lot more powerful solution for the long run.

Benchmarking CPU and File IO for an application running on Linux

I wrote two programs to run on Linux, each using a different algorithm, and I want to find a way (preferably using a benchmarking software) to compare the CPU usage and IO operations between these two programs.
Is there such a thing? and if yes, where can I find them. Thanks.
You can try hardinfo
Or there are like n different tools measuring system performance if measuring it while running your app solves your purpose
And you can also check this thread
You might try vmstat command:
vmstat 2 20 > vmstat.txt
20 samples of 2 seconds
bi = KB in, bo = KB out with wa = waiting for I/O
I/O can also increase cache demands
%CPU utilisation = us (user) = sy (system)
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 0 277504 17060 82732 0 0 91 87 1432 236 11 3 84 1
0 0 0 277372 17068 82732 0 0 0 24 1361 399 23 8 59 10
test start
0 1 0 275240 17068 82732 0 0 0 512 1342 305 24 4 69 4
2 1 0 275232 17068 82780 0 0 24 10752 4176 216 7 8 0 85
1 1 0 275240 17076 82732 0 0 12288 2590 5295 243 15 8 0 77
0 1 0 275240 17076 82748 0 0 8 11264 4329 214 6 12 0 82
0 1 0 275240 17076 82780 0 0 16 11264 4278 233 15 10 0 75
0 1 0 275240 17084 82780 0 0 19456 542 6563 255 10 7 0 83
0 1 0 275108 17084 82748 0 0 5128 3072 3501 265 16 37 0 47
3 1 0 275108 17084 82748 0 0 924 5120 8369 3845 12 33 0 55
0 1 0 275116 17092 82748 0 0 1576 85 11483 6645 5 50 0 45
1 1 0 275116 17092 82748 0 0 0 136 2304 689 3 9 0 88
2 1 0 275084 17100 82732 0 0 0 352 2374 800 14 26 0 61
0 0 0 275076 17100 82732 0 0 546 118 2408 1014 35 17 47 1
0 1 0 275076 17104 82732 0 0 0 62 1324 76 3 2 89 7
1 1 0 275076 17108 82732 0 0 0 452 1879 442 8 13 66 12
0 0 0 275116 17108 82732 0 0 800 352 2456 1195 19 17 56 8
0 1 0 275116 17112 82732 0 0 0 54 1325 76 4 1 88 8
test end
1 1 0 275116 17116 82732 0 0 0 510 1717 286 6 10 72 11
1 0 0 275076 17116 82732 0 0 1600 1152 3087 1344 23 29 41 7

Resources