I have dataset with sample and shop names. I am trying to figure out a way to calculate proportion of shops that sell a combination of samples. For example, sample 12,13 and 22 are available in shop2,3 and 4. Like wise, sample6,7,8,9,10, 16 and 17 is available in shop1.
The dataset i have is very large with 9000 columns and 26 rows. Here what i show is just a small dataset. What i want to do is to figure a way to screen the table for all possible combination of samples present in shops (if >0) and print out in a dictionary, for example sample12_sample13_sample22:[shop2,shop3,shop4] and List out all possible combinations that are available.
Sorry that I could not figure out how to do this, so i do not have any code, right now.
What approach should i use here?
Any help is appreciated.
Thanks!
Name Shop1 Shop2 Shop3 Shop4
Sample1 0 0 0 0
Sample2 0 0 0 0
Sample3 0 0 0 0
Sample4 0 0 0 0
Sample5 0 0 0 0
Sample6 1 0 0 0
Sample7 4 0 0 0
Sample8 12 0 0 0
Sample9 1 0 0 0
Sample10 1 0 0 0
Sample11 0 0 0 0
Sample12 0 5 21 233
Sample13 0 8 36 397
Sample14 0 4 0 0
Sample15 0 0 0 0
Sample16 2 0 0 0
Sample17 17 0 0 0
Sample18 0 0 0 0
Sample19 0 0 0 0
Sample20 0 0 0 0
Sample21 0 0 0 0
Sample22 0 1 20 127
What we can do is melt then we groupby twice
s = df.melt('Name')
s = s[s.value!=0]
s = s.groupby('Name')['variable'].agg([','.join,'count'])
out = s[s['count']>1].reset_index().groupby('join')['Name'].agg(','.join)
out
Out[104]:
join
Shop2,Shop3,Shop4 Sample12,Sample13,Sample22
Name: Name, dtype: object
I have some bladefs volume and I just checked /proc/self/mountstats where I see statistics per operations:
...
opts: rw,vers=3,rsize=131072,wsize=131072,namlen=255,acregmin=1800,acregmax=1800,acdirmin=1800,acdirmax=1800,hard,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.0.2.100,mountvers=3,mountport=903,mountproto=tcp,local_lock=all
age: 18129
caps: caps=0x3fc7,wtmult=512,dtsize=32768,bsize=0,namlen=255
sec: flavor=1,pseudoflavor=1
events: 18840 116049 23 5808 22138 21048 146984 13896 287 2181 0 7560 31380 0 9565 5106 0 6471 0 0 13896 0 0 0 0 0 0
bytes: 339548407 48622919 0 0 311167118 48622919 76846 13896
RPC iostats version: 1.0 p/v: 100003/3 (nfs)
xprt: tcp 875 1 7 0 0 85765 85764 1 206637 0 37 1776 35298
per-op statistics
NULL: 0 0 0 0 0 0 0 0
GETATTR: 18840 18840 0 2336164 2110080 92 8027 8817
SETATTR: 0 0 0 0 0 0 0 0
LOOKUP: 21391 21392 0 3877744 4562876 118 103403 105518
ACCESS: 20183 20188 0 2584304 2421960 72 10122 10850
READLINK: 0 0 0 0 0 0 0 0
READ: 3425 3425 0 465848 311606600 340 97323 97924
WRITE: 2422 2422 0 48975488 387520 763 200645 201522
CREATE: 2616 2616 0 447392 701088 21 870 1088
MKDIR: 858 858 0 188760 229944 8 573 705
SYMLINK: 0 0 0 0 0 0 0 0
MKNOD: 0 0 0 0 0 0 0 0
REMOVE: 47 47 0 6440 6768 0 8 76
RMDIR: 23 23 0 4876 3312 0 3 5
RENAME: 23 23 0 7176 5980 0 5 6
LINK: 0 0 0 0 0 0 0 0
READDIR: 160 160 0 23040 4987464 0 16139 16142
READDIRPLUS: 15703 15703 0 2324044 8493604 43 1041634 1041907
FSSTAT: 1 1 0 124 168 0 0 0
FSINFO: 2 2 0 248 328 0 0 0
PATHCONF: 1 1 0 124 140 0 0 0
COMMIT: 68 68 0 9248 10336 2 272 275...
about my bladefs. I am interested in READ operation statistics. As I know the last column (97924) means:
execute: How long ops of this type take to execute (from
rpc_init_task to rpc_exit_task) (microsecond)
How to interpret this? Is it the average time of each read operation regardless of the block size? I have very strong suspicion that I have problems with NFS: am I right? The value of 0.1 sec looks bad for me, but I am not sure how exactly to interpret this time: average, some sum...?
After reading the kernel source, the statistics are printed from net/sunrpc/stats.c rpc_clnt_show_stats() and the 8th column of per-op statistics statistics seems to printed from _print_rpc_iostats, it's printing struct rpc_iostats member om_execute. (The newest kernel has 9 columns with errors on the last column.)
That member looks to be only referenced/actually changed in rpc_count_iostats_metrics with:
execute = ktime_sub(now, task->tk_start);
op_metrics->om_execute = ktime_add(op_metrics->om_execute, execute);
Assuming ktime_add does what it says, the value of om_execute only increases. So the 8th column of mountstats would be the sum of the time of operations of this type.
I have data in the following layout (each 'digit' in its own column but all in Row 1 of Excel):
1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7
How might I easily convert this to:
1 2 3 4
5 6 7 8
9 1 2 3
4 5 6 7
I want every four pieces of data to be added, in separate columns, to new rows underneath one another.
If your data starts in A1, please try in A3, copied across to D and down to suit:
=OFFSET($A$1,,COLUMN()+4*(ROW()-2)-5)
This is already answered, but just for fun, here's a version that's non-volatile and doesn't need to be placed in a particular location:
=INDEX($A$1:$P$1,4*(ROW($A$1:$D$4)-1)+COLUMN($A$1:$D$4))
Replace the range $A$1:$P$1 above with whatever range you want to rearrange, and enter as an array formula. (Select the entire 4x4 range, type the formula, and press ctrl-shift-enter when you're finished rather than just enter.) Note that $A$1:$D$4 is a constant, used to obtain the sequential numbers.
One more way - a shorter formula but requiring some setup:
Create a named range, let's call it Matrix1:
1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1
You can put the matrix on another, possibly hidden, worksheet, or you can just use an array constant:
={1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0;0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0;0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0;0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1}
And another, let's call it Matrix2:
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
or:
={1,0,0,0;0,1,0,0;0,0,1,0;0,0,0,1;1,0,0,0;0,1,0,0;0,0,1,0;0,0,0,1;1,0,0,0;0,1,0,0;0,0,1,0;0,0,0,1;1,0,0,0;0,1,0,0;0,0,1,0;0,0,0,1}
Then the formula is just:
=MMULT(Matrix1*$A$2:$P$2,Matrix2)
Again, select the entire 4x4 range and enter as an array formula.
You could, of course, enter the array constants into the formula directly, but that gets unwieldy:
=MMULT({1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0;0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0;0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0;0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1}*$A$2:$P$2,{1,0,0,0;0,1,0,0;0,0,1,0;0,0,0,1;1,0,0,0;0,1,0,0;0,0,1,0;0,0,0,1;1,0,0,0;0,1,0,0;0,0,1,0;0,0,0,1;1,0,0,0;0,1,0,0;0,0,1,0;0,0,0,1})
I found this formula much easier to deal with.
=INDEX($A:$A,ROW(A1)*4-4+COLUMN(A1))
For details:
https://www.extendoffice.com/documents/excel/3360-excel-transpose-every-5-rows.html