Sumproduct multiples of the same lookup

Sumproduct multiples of the same lookup - excel

I have 2 data sets. First, a master table that displays and sums all of the information from the reference tables. The master table looks like this.
BayNum NumCompleted
102
103
104
105
The reference table is a running timeline with indicator variables for whether or not something was completed at various time intervals.
BayNum 1030 1100 1130 1200 1230
102 1 0 1 0 0
102 0 0 1 0 1
102 1 0 0 1 0
102 0 0 0 0 1
103 0 1 1 1 0
103 1 0 0 0 1
103 1 0 1 1 1
104 1 0 0 0 1
104 0 0 1 0 1
104 1 0 0 1 0
104 1 0 0 0 1
104 1 0 0 0 1
105 1 0 1 0 0
105 0 1 1 1 0
105 0 0 0 0 1
I would like the NumCompleted column in the master table to sum all all of the records that have the same bay number.
I think that there is some sort of sumproduct way to go about this but I don't understand arrays very well so I am having trouble visualizing how this works in my head.
I tried this formula
=SUMPRODUCT(INDEX(TPH!H2:NC166,MATCH('Post Observations'!$G$2,TPH!$F$2:$F$166,0)))
But this returns a reference error I think because Index can only work through a column instead of a full array or something. Would I have to instead do something with Index Small so that it runs through the full list of things? I've done something like that before but I don't know if that would apply here.
Per the example above, I would expect my master table to look like this.
BayNum NumCompleted
102 7
103 9
104 10
105 6

You can use SUMPRODUCT to multiply each cell in the range, by whether the "BayNum" matches (1 if it does or 0 if not), then sum all the results:
=SUMPRODUCT(($B$2:$F$8)*($A$2:$A$8=$H2))

Related

How to return first item when the items in the pandas dataframe window are the same?

I am a python beginner.
I have the following pandas DataFrame, with only two columns; "Time" and "Input".
I want to loop over the "Input" column. Assuming we have a window size w= 3. (three consecutive values) such that for every selected window, we will check if all the items/elements within that window are 1's, then return the first item as 1 and change the remaining values to 0's.
index Time Input
0 11 0
1 22 0
2 33 0
3 44 1
4 55 1
5 66 1
6 77 0
7 88 0
8 99 0
9 1010 0
10 1111 1
11 1212 1
12 1313 1
13 1414 0
14 1515 0
My intended output is as follows
index Time Input What_I_got What_I_Want
0 11 0 0 0
1 22 0 0 0
2 33 0 0 0
3 44 1 1 1
4 55 1 1 0
5 66 1 1 0
6 77 1 1 1
7 88 1 0 0
8 99 1 0 0
9 1010 0 0 0
10 1111 1 1 1
11 1212 1 0 0
12 1313 1 0 0
13 1414 0 0 0
14 1515 0 0 0
What should I do to get the desired output? Am I missing something in my code?

import pandas as pd
import re
pd.Series(list(re.sub('111', '100', ''.join(df.Input.astype(str))))).astype(int)
Out[23]:
0 0
1 0
2 0
3 1
4 0
5 0
6 1
7 0
8 0
9 0
10 1
11 0
12 0
13 0
14 0
dtype: int32

how to find combinations present in different columns

I have dataset with sample and shop names. I am trying to figure out a way to calculate proportion of shops that sell a combination of samples. For example, sample 12,13 and 22 are available in shop2,3 and 4. Like wise, sample6,7,8,9,10, 16 and 17 is available in shop1.
The dataset i have is very large with 9000 columns and 26 rows. Here what i show is just a small dataset. What i want to do is to figure a way to screen the table for all possible combination of samples present in shops (if >0) and print out in a dictionary, for example sample12_sample13_sample22:[shop2,shop3,shop4] and List out all possible combinations that are available.
Sorry that I could not figure out how to do this, so i do not have any code, right now.
What approach should i use here?
Any help is appreciated.
Thanks!
Name Shop1 Shop2 Shop3 Shop4
Sample1 0 0 0 0
Sample2 0 0 0 0
Sample3 0 0 0 0
Sample4 0 0 0 0
Sample5 0 0 0 0
Sample6 1 0 0 0
Sample7 4 0 0 0
Sample8 12 0 0 0
Sample9 1 0 0 0
Sample10 1 0 0 0
Sample11 0 0 0 0
Sample12 0 5 21 233
Sample13 0 8 36 397
Sample14 0 4 0 0
Sample15 0 0 0 0
Sample16 2 0 0 0
Sample17 17 0 0 0
Sample18 0 0 0 0
Sample19 0 0 0 0
Sample20 0 0 0 0
Sample21 0 0 0 0
Sample22 0 1 20 127

What we can do is melt then we groupby twice
s = df.melt('Name')
s = s[s.value!=0]
s = s.groupby('Name')['variable'].agg([','.join,'count'])
out = s[s['count']>1].reset_index().groupby('join')['Name'].agg(','.join)
out
Out[104]:
join
Shop2,Shop3,Shop4 Sample12,Sample13,Sample22
Name: Name, dtype: object

How to interpret such value of the time column in /proc/self/mountstats - does it indicate a performance issue?

I have some bladefs volume and I just checked /proc/self/mountstats where I see statistics per operations:
...
opts: rw,vers=3,rsize=131072,wsize=131072,namlen=255,acregmin=1800,acregmax=1800,acdirmin=1800,acdirmax=1800,hard,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.0.2.100,mountvers=3,mountport=903,mountproto=tcp,local_lock=all
age: 18129
caps: caps=0x3fc7,wtmult=512,dtsize=32768,bsize=0,namlen=255
sec: flavor=1,pseudoflavor=1
events: 18840 116049 23 5808 22138 21048 146984 13896 287 2181 0 7560 31380 0 9565 5106 0 6471 0 0 13896 0 0 0 0 0 0
bytes: 339548407 48622919 0 0 311167118 48622919 76846 13896
RPC iostats version: 1.0 p/v: 100003/3 (nfs)
xprt: tcp 875 1 7 0 0 85765 85764 1 206637 0 37 1776 35298
per-op statistics
NULL: 0 0 0 0 0 0 0 0
GETATTR: 18840 18840 0 2336164 2110080 92 8027 8817
SETATTR: 0 0 0 0 0 0 0 0
LOOKUP: 21391 21392 0 3877744 4562876 118 103403 105518
ACCESS: 20183 20188 0 2584304 2421960 72 10122 10850
READLINK: 0 0 0 0 0 0 0 0
READ: 3425 3425 0 465848 311606600 340 97323 97924
WRITE: 2422 2422 0 48975488 387520 763 200645 201522
CREATE: 2616 2616 0 447392 701088 21 870 1088
MKDIR: 858 858 0 188760 229944 8 573 705
SYMLINK: 0 0 0 0 0 0 0 0
MKNOD: 0 0 0 0 0 0 0 0
REMOVE: 47 47 0 6440 6768 0 8 76
RMDIR: 23 23 0 4876 3312 0 3 5
RENAME: 23 23 0 7176 5980 0 5 6
LINK: 0 0 0 0 0 0 0 0
READDIR: 160 160 0 23040 4987464 0 16139 16142
READDIRPLUS: 15703 15703 0 2324044 8493604 43 1041634 1041907
FSSTAT: 1 1 0 124 168 0 0 0
FSINFO: 2 2 0 248 328 0 0 0
PATHCONF: 1 1 0 124 140 0 0 0
COMMIT: 68 68 0 9248 10336 2 272 275...
about my bladefs. I am interested in READ operation statistics. As I know the last column (97924) means:
execute: How long ops of this type take to execute (from
rpc_init_task to rpc_exit_task) (microsecond)
How to interpret this? Is it the average time of each read operation regardless of the block size? I have very strong suspicion that I have problems with NFS: am I right? The value of 0.1 sec looks bad for me, but I am not sure how exactly to interpret this time: average, some sum...?

After reading the kernel source, the statistics are printed from net/sunrpc/stats.c rpc_clnt_show_stats() and the 8th column of per-op statistics statistics seems to printed from _print_rpc_iostats, it's printing struct rpc_iostats member om_execute. (The newest kernel has 9 columns with errors on the last column.)
That member looks to be only referenced/actually changed in rpc_count_iostats_metrics with:
execute = ktime_sub(now, task->tk_start);
op_metrics->om_execute = ktime_add(op_metrics->om_execute, execute);
Assuming ktime_add does what it says, the value of om_execute only increases. So the 8th column of mountstats would be the sum of the time of operations of this type.

How to switch 1 (ON) flags occurring together in batch of size more than a specified threshold to 0 in pandas dataframe?

A flag column in a pandas dataframe is populated by 1 or 0
The problem is to identify continuous 1s.
Let t be the number of days thresholds
There are two types of transformations required:
i) If there are more than t 1s together, turn the (t+1)th onwards 1 to 0
ii) If there are more than t 1s together, turn all the 1s to 0s
My approach is to create 2 columns called result1 and result2, and filter using these columns:
Please see image here
I have not been able to think of anything as such, so not posting any code.
A nudge or hint in the right direction would be appreciated.

Use:
#compare 0 values
m = df['Value'].eq(0)
#get cumulative sum and filter only 1 rows
g = m.cumsum()[~m]
#set by condition - 0 or ccounter per groups
df['Result1'] = np.where(m, 0, df.groupby(g).cumcount().add(1))
#get maximum per groups with transform for new Series
df['Result2'] = np.where(m, 0, df.groupby(g)['Result1'].transform('max')).astype(int)
print (df)
Value Result1 Result2
0 1 1 1
1 0 0 0
2 0 0 0
3 1 1 2
4 1 2 2
5 0 0 0
6 1 1 4
7 1 2 4
8 1 3 4
9 1 4 4
10 0 0 0
11 0 0 0
12 1 1 1
13 0 0 0
14 1 1 1
15 0 0 0
16 0 0 0
17 1 1 6
18 1 2 6
19 1 3 6
20 1 4 6
21 1 5 6
22 1 6 6
23 0 0 0
24 1 1 1
25 0 0 0
26 0 0 0
27 1 1 1
28 0 0 0

Transpose row four columns at a time (row to 4-column matrix)

I have data in the following layout (each 'digit' in its own column but all in Row 1 of Excel):
1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7
How might I easily convert this to:
1 2 3 4
5 6 7 8
9 1 2 3
4 5 6 7
I want every four pieces of data to be added, in separate columns, to new rows underneath one another.

If your data starts in A1, please try in A3, copied across to D and down to suit:
=OFFSET($A$1,,COLUMN()+4*(ROW()-2)-5)

This is already answered, but just for fun, here's a version that's non-volatile and doesn't need to be placed in a particular location:
=INDEX($A$1:$P$1,4*(ROW($A$1:$D$4)-1)+COLUMN($A$1:$D$4))
Replace the range $A$1:$P$1 above with whatever range you want to rearrange, and enter as an array formula. (Select the entire 4x4 range, type the formula, and press ctrl-shift-enter when you're finished rather than just enter.) Note that $A$1:$D$4 is a constant, used to obtain the sequential numbers.
One more way - a shorter formula but requiring some setup:
Create a named range, let's call it Matrix1:
1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1
You can put the matrix on another, possibly hidden, worksheet, or you can just use an array constant:
={1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0;0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0;0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0;0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1}
And another, let's call it Matrix2:
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
or:
={1,0,0,0;0,1,0,0;0,0,1,0;0,0,0,1;1,0,0,0;0,1,0,0;0,0,1,0;0,0,0,1;1,0,0,0;0,1,0,0;0,0,1,0;0,0,0,1;1,0,0,0;0,1,0,0;0,0,1,0;0,0,0,1}
Then the formula is just:
=MMULT(Matrix1*$A$2:$P$2,Matrix2)
Again, select the entire 4x4 range and enter as an array formula.
You could, of course, enter the array constants into the formula directly, but that gets unwieldy:
=MMULT({1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0;0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0;0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0;0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1}*$A$2:$P$2,{1,0,0,0;0,1,0,0;0,0,1,0;0,0,0,1;1,0,0,0;0,1,0,0;0,0,1,0;0,0,0,1;1,0,0,0;0,1,0,0;0,0,1,0;0,0,0,1;1,0,0,0;0,1,0,0;0,0,1,0;0,0,0,1})

I found this formula much easier to deal with.
=INDEX($A:$A,ROW(A1)*4-4+COLUMN(A1))
For details:
https://www.extendoffice.com/documents/excel/3360-excel-transpose-every-5-rows.html

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Sumproduct multiples of the same lookup - excel

You can use SUMPRODUCT to multiply each cell in the range, by whether the "BayNum" matches (1 if it does or 0 if not), then sum all the results: =SUMPRODUCT(($B$2:$F$8)*($A$2:$A$8=$H2))

Related

How to return first item when the items in the pandas dataframe window are the same?

how to find combinations present in different columns

How to interpret such value of the time column in /proc/self/mountstats - does it indicate a performance issue?

How to switch 1 (ON) flags occurring together in batch of size more than a specified threshold to 0 in pandas dataframe?

Transpose row four columns at a time (row to 4-column matrix)

Categories

Resources