all subsets of lists in KDB+

all subsets of lists in KDB+ - subset

I am new to KDB. I have a table in following format:
id date name order
34 2020.01.20 John 10
23 2020.01.20 John -20
21 2020.01.20 John 30
43 2020.01.20 John -400
44 2020.01.20 Dan -6483
22 2020.01.20 Dan 8796
The sample table can be created as follows:
t:([]id:(34, 23, 21, 43, 44, 22); date:(2020.01.20; 2020.01.20; 2020.01.20; 2020.01.20; 2020.01.20; 2020.01.20); name:(`John`John`John`John`Dan`Dan); order:(10, -20, 30, -400, -6483, 8796));
I want all the subsets of orders possible for any given date and name in the below format. Now the order column below is sum of order value for id and all the ids.
id date name order ids
34 2020.01.20 John 10 0n
34 2020.01.20 John -10 23
34 2020.01.20 John 40 21
34 2020.01.20 John -390 43
34 2020.01.20 John 20 23, 21
34 2020.01.20 John -360 21, 43
34 2020.01.20 John -380 23, 21, 43
23 2020.01.20 John -20 0n
23 2020.01.20 John -10 34
23 2020.01.20 John 10 21
23 2020.01.20 John -420 43
23 2020.01.20 John 20 34, 21
23 2020.01.20 John -390 21, 43
23 2020.01.20 John -380 34, 21, 43
21 2020.01.20 John 30 0n
21 2020.01.20 John 40 34
21 2020.01.20 John 10 23
21 2020.01.20 John -370 43
21 2020.01.20 John 20 34, 23
21 2020.01.20 John 20 23, 43
21 2020.01.20 John -380 34, 23, 43
43 2020.01.20 John -400 0n
43 2020.01.20 John -390 34
43 2020.01.20 John -420 23
43 2020.01.20 John -370 21
43 2020.01.20 John -410 34, 23
43 2020.01.20 John -390 23, 21
43 2020.01.20 John -380 34, 23, 21
44 2020.01.20 Dan -6483 0n
44 2020.01.20 Dan 2313 22
22 2020.01.20 Dan 8796 0n
22 2020.01.20 Dan 2313 44

Not sure if this is the most optimal solution, but below snippet will do what you are looking for:
orderMap: (!) . t`id`order;
subsets: ungroup
update ids: {x where each (count[x]-1){x cross 01b}/01b} each ids from
select ids: id by name from t;
t: ej[`name;t;subsets];
t: delete from t where id in' ids;
t: update order: order + sum each orderMap#/:ids from t;
t
For consistency, ids column is created as list of integer lists and empty array `long$() is used instead of 0n
More detaily:
orderMap: (!) . t`id`order gives id-to-order mapping. Here I use assume that ids are unique.
subsets is table of names and id subsets assigned to the name. {(count[x]-1){x cross 01b}/01b} returns "include" flags which help forming subsets, e.g. 0000b, 1000b, 0100b, .... Could be done more efficiently with integers binary representation.
ej[`name;t;subsets] - joins original table with id subsets by name.
delete from t where id in' ids deletes rows where id is included into ids subset.
update order: order + sum each orderMap#/:ids from t sums up order of id and ids' subset by using orderMap

This gets you part of the way, though this approach has an extra combination that you seem to exclude (you can exclude these afterwards if need be):
q)comb:{$[type b:(count[a:x except y]-1)(01b cross)/01b;(`long$();a);a where each b]};
q)update sum each order from ungroup ungroup select id,order:(order,/:'order i?comb[i]each i),ids:id i?comb[i]each i by date,name from t
date name id order ids
---------------------------------
2020.01.20 Dan 44 -6483 `long$()
2020.01.20 Dan 44 2313 ,22
2020.01.20 Dan 22 8796 `long$()
2020.01.20 Dan 22 2313 ,44
2020.01.20 John 34 10 `long$()
2020.01.20 John 34 -390 ,43
2020.01.20 John 34 40 ,21
2020.01.20 John 34 -360 21 43
2020.01.20 John 34 -10 ,23
2020.01.20 John 34 -410 23 43
2020.01.20 John 34 20 23 21
2020.01.20 John 34 -380 23 21 43
2020.01.20 John 23 -20 `long$()
2020.01.20 John 23 -420 ,43
...

Related

Horizontal SUMIFS with two vertical criteria

I am given the following sales table which provide the sales that each employee made, but instead of their name I have their ID and each ID may have more than 1 row.
To map the ID back to the name, I have a look up table with each employee's name and ID.
Sales Table:
Year
ID
North
South
West
East
2020
A
58
30
74
72
2020
A
85
40
90
79
2020
B
9
82
20
5
2020
B
77
13
49
21
2020
C
85
55
37
11
2020
C
29
70
21
22
2021
A
61
37
21
42
2021
A
22
39
2
34
2021
B
62
55
9
72
2021
B
59
11
2
37
2021
C
41
22
64
47
2021
C
83
18
56
83
ID table:
ID
Name
A
Allison
B
Brandon
C
Chris
I am trying to sum up each employee's sales by a given year, and aggregate all their transactions by their name (rather than ID), so that my result looks like the following:
Result:
Report
2021
Allison
258
Brandon
307
Chris
414
I want the user to be able to select the year, and the report would automatically sum up each person's sales by the year and their name.
Any ideas on how I can accomplish this?

With FILTER:
=SUM(FILTER($C$2:$F$13,($B$2:$B$13=INDEX($I$2:$I$4,MATCH(N3,$J$2:$J$4,0)))*($A$2:$A$13=$N$2)))
With SUMPRODUCT:
=SUMPRODUCT($C$2:$F$13*($B$2:$B$13=INDEX($I$2:$I$4,MATCH(N3,$J$2:$J$4,0)))*($A$2:$A$13=$N$2))

How to compare two dataframes based on certain column values and remove them in pandas

I have two data frames.
df1:
userID ID Sex Date Month Year Security
John 45 Male 31 03 1975 Low
Tom 22 Male 01 01 1990 High
Mary 33 Female 23 05 1990 Medium
Hary 56 Male 15 09 1970 High
df2:
userID ID Sex Date Month Year
Hari 45 Male 31 03 1975
Luka 22 Male 01 01 1990
Johan 33 Female 23 05 1990
Irfan 56 Male 29 09 1971
John 45 Male 31 03 1975
Tom 22 Male 01 01 1990
Mary 34 Female 34 05 1980
Hary 56 Male 15 09 1970
I wanted to compare df2 with df1 and keep only those rows in df2 which are having
common values in columns (userID,ID,Date,Month,Year)
So my new df2 should look like this:
John 45 Male 31 03 1975
Tom 22 Male 01 01 1990
Hary 56 Male 15 09 1970
What could be the best approach get this in pandas?
Can someone help me in this?

Just do with simple merge follow with dropna
df2.merge(df1,how='left').dropna().drop('Security',1)
Out[318]:
userID ID Sex Date Month Year
4 John 45 Male 31 3 1975
5 Tom 22 Male 1 1 1990
7 Hary 56 Male 15 9 1970

Define the key columns which you want to merge on, and then perform an inner merge between df2 and only the key columns of df1. The default for merge is inner, so you don't need to specify it explicitly. Subsetting df1 to only these key columns ensures that you don't bring any of its columns over to df2 with the merge.
key_cols = ['userID', 'ID', 'Date', 'Month', 'Year']
df2.merge(df1.loc[:, df1.columns.isin(key_cols)])
Outputs:
userID ID Sex Date Month Year
0 John 45 Male 31 3 1975
1 Tom 22 Male 1 1 1990
2 Hary 56 Male 15 9 1970

Excel: I want to SUM the top 2 values IF a cell in the row matches a string

I've been trying to figure out how to SUM the top 2 values of an array using SUMPRODUCT but I also want to add a criteria that will only sum the product if it matches a specific string. I thought I could combine SUMPRODUCT and SUMIF but I have been unsuccessful.
Position Age ADP Trend Value
QB 23 241 84.2 21
QB 35 185 -37.5 142
QB 27 300 25 19
QB 26 300 25 19
QB 32 300 25 19
RB 22 98 -2.2 1051
RB 24 69 0.3 1929
RB 24 238 6 25
RB 26 300 25 19
RB 26 300 25 19
WR 22 300 25 19
WR 24 300 25 19
WR 26 232 -17 36
WR 25 300 25 19
WR 28 300 25 19
WR 23 9 -4.2 8591
WR 23 178 21.4 161
WR 23 38 8.5 4679
WR 26 222 102.8 53
WR 23 300 25 19
WR 26 300 25 19
TE 26 117 -18.7 617
TE 36 193 -30.3 119
TE 26 199 -22.5 105
TE 24 300 25 19
What I want is to SUM the top two values under the Value column IF the Position = QB.
How can I accomplish this?
Cheers!

Use this array formula:
=SUM(LARGE(IF(A2:A25="QB",E2:E25,""),1),LARGE(IF(A2:A25="QB",E2:E25,""),2))
Press CTRL+SHIFT+ENTER to evaluate the formula as it is an array formula.

Excel Add date column with dates repeated 24 times [duplicate]

This question already has answers here:
Excel add column starting at 1 and increments to 24 then resets [closed]
(2 answers)
Closed 8 years ago.
Here is a sample of my data
Hour Index Visits
0 67
1 22
2 111
3 22
4 0
5 0
6 22
7 44
8 0
9 89
10 22
11 111
12 44
13 89
14 44
15 111
16 177
17 89
18 44
19 44
20 89
21 22
22 89
23 44
24 133
25 44
26 22
27 22
28 44
29 22
30 44
31 44
32 22
What I want to do is add two columns. In one column there is the date starting at Jan 1, 2013 and repeats this date for 24 rows until it increments to the next day. Then I want another column that just displays the month of the previous column. Here is what it should look like
Hour Index Visits date month
0 67 1/1/2013 1
1 22 1/1/2013 1
2 111 1/1/2013 1
3 22 1/1/2013 1
4 0 1/1/2013 1
5 0 1/1/2013 1
6 22 1/1/2013 1
7 44 1/1/2013 1
8 0 1/1/2013 1
9 89 1/1/2013 1
10 22 1/1/2013 1
11 111 1/1/2013 1
12 44 1/1/2013 1
13 89 1/1/2013 1
14 44 1/1/2013 1
15 111 1/1/2013 1
16 177 1/1/2013 1
17 89 1/1/2013 1
18 44 1/1/2013 1
19 44 1/1/2013 1
20 89 1/1/2013 1
21 22 1/1/2013 1
22 89 1/1/2013 1
23 44 1/1/2013 1
24 133 2/1/2013 1
25 44 2/1/2013 1
26 22 2/1/2013 1
27 22 2/1/2013 1
28 44 2/1/2013 1
29 22 2/1/2013 1
30 44 2/1/2013 1
31 44 2/1/2013 1
32 22 2/1/2013 1

Suppose your Hours starts from A2. Then you can write in date column (column C):
=DATE(2013,1,1)+INT(A2/24)
and drop it down.
Next step, write in month column (Column D):
=MONTH(C2)
and drop it down.

Merging two files by a single column in unix

I would like to merge two files by one column in unix.
I have file_a:
subjectid name age
12 Jane 16
24 Kristen 90
15 Clarke 78
23 Joann 31
I have another file_b:
subjectid prob_disease
12 0.009
24 0.738
15 0.392
23 1.2E-5
I would like to merge these files in the command line. I'd like to merge files a and b by subjectid. Since each file is about 2 million lines long, I tried in R but it froze due to the amount of data, could someone please help me do this in linux?
Desired output:
subjectid prob_disease name age
12 0.009 Jane 16
24 0.738 Kristen 90
15 0.392 Clarke 78
23 1.2E-5 Joanna 31
Please help and thank you!

Check out join(1). In your case, you don't even need any flags:
$ join file_b file_a
subjectid prob_disease name age
12 0.009 Jane 16
24 0.738 Kristen 90
15 0.392 Clarke 78
23 1.2E-5 Joann 31

You're looking for the join command:
$ cat test.1
12 Jane 16
24 Kristen 90
15 Clarke 78
23 Joann 31
$ cat test.2
12 0.009
24 0.738
15 0.392
23 1.2E-5
$ join -j1 -o 2.1,2.2,1.2,1.3 <(sort test.1) <(sort test.2)
12 0.009 Jane 16
15 0.392 Clarke 78
23 1.2E-5 Joann 31
24 0.738 Kristen 90
$

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

all subsets of lists in KDB+ - subset

Related

Horizontal SUMIFS with two vertical criteria

How to compare two dataframes based on certain column values and remove them in pandas

Excel: I want to SUM the top 2 values IF a cell in the row matches a string

Excel Add date column with dates repeated 24 times [duplicate]

Merging two files by a single column in unix

Categories

Resources