Within Table Subquery of Identical Combinations - subquery

I would like to select groups that have the exact same attributes from a table. For example, my table is like the following
facs_run_id | fcj_id
1 | 17
1 | 4
1 | 12
2 | 17
2 | 4
2 | 12
3 | 17
3 | 12
3 | 10
In this table each facs_run_id has different combinations of fcj_id, some are shared between facs_run_id numbers while others are not. For example, above facs_run_id 1 and 2 are identical, while 3 has shared fcj_id but is not identical to 1 and 2. I would like to make query to:
gather all fcj_id from a particular facs_run_id
find all facs_run_id that have the exact same fcj_id combination.
Herein, I want to find all facs_run_id that are equal in fcj_id combinations to facs_run_id: 1, so it should return 2 (or 1 & 2).
I can get those that are missing certain fcj_id and even find which fcj_id are missing with this:
SELECT facs_run_id
FROM facs_panel
EXCEPT
SELECT fcj_id
FROM facs_panel
WHERE facs_run_id = 2;
or this:
SELECT row(fp.*, fcj.fcj_antigen, fcj.fcj_color)
FROM facs_panel fp
LEFT OUTER JOIN facs_conjugate_lookup fcj ON fcj.fcj_id = fp.fcj_id
WHERE fp.fcj_id in ( SELECT fp.fcj_id
FROM facs_panel fp
WHERE fp.facs_run_id = 1);
But I am not able to make a query that returns IDENTICAL facs_run_id. I suppose this could be considered a way of looking for aggregated duplicates, but I don't know how to do that. Any suggestions or pointers would be greatly appreciated (or a better way to create the table if this type of query will not work).

It's pretty easy with a couple CTEs:
with f1 as (select
facs_run_id,
array_agg(fcj_id) as flist
from facs_panel
group by facs_run_id),
f2 as (select flist, count(*)
from f1
group by flist
having count(*) > 1)
select f2.flist, f1. facs_run_id
from f2
join f1 on (f2.flist = f1.flist)
order by flist, facs_run_id;
The data from the question, run through this query, produces:
flist | facs_run_id
-----------+-------------
{4,12,17} | 1
{4,12,17} | 2
(2 rows)

Related

VLOOKUP with criterion of max

lets say I have a Table1 as follow:
ID | Value
________________
1 | 0
2 | 0
1 | 1
3 | 1
1 | 0
2 | 0
1 | 0
2 | 0
3 | 0
4 | 1
1 | 0
5 | 0
and I have a second table that contains unique IDs from Table1.
In Table1 ID may repeat, but each ID can have at most one 1 in Value column, the rest is 0.
How can I write VLOOKUP like formula that will tell me if given ID has 1 in any occurence?
I would like to get smth like
ID | Value
________________
1 | 1
2 | 0
3 | 1
4 | 1
5 | 0
with SQL I would write smth as SELECT ID, max(Value) from Table1 group by ID, or even instead of max would use sum.
Also to mention: Table1 will be in separate file from my output table and the Value will be just one of many columns, therefore I cannot use Pivot Tables
I think the solution is easier than you might think:
=SUMIFS(B$2:B$13,A$2:A$13,1)
What are you doing? You are summing everything? I just want to know where the 1 is, no need to sum it?
Well: you seem to have two possible values: either all 0's, either all 0's and just one 1: if you search for that 1, or if you take the sum, the result is the same :-)
Ok, that's a neat trick, but what if I decide there might be more than one 1?
Well: just translate a number, larger than 1, to 1, which you can do with this formula:
=IF(E2,1,0)
There are several ways to go about it, and I'm assuming that your values are more complicated than your example, so here is one way:
=MAX(IF(A$2:A$13=E3,B$2:B$13))
Where A2:A13 is your IDs, B2:B13 is the value, and E3 is the start of your reference table. This is an array formula and needs to be confirmed with CTRL+SHIFT+ENTER
If it's as simple as 1 or 0, you should use the answer that #dominique gave.
Give a try on the following formula-
=HSTACK(UNIQUE(A2:A13),MAXIFS(B2:B13,A2:A13,UNIQUE(A2:A13)))
This will work like SQL. It will also work if you have more values than one.

How to unnest multiple columns in presto, outputting into corresponding rows

I'm trying to unnest some code
I have a a couple of columns that have arrays, both columns using | as a deliminator
The data would be stored looking like this, with extra values to the side which show the current currency
I want to output it like this
I tried doing another unnest column, like this
SELECT c.campaign, c.country, a.product_name, u.price--, u.price -- add price to this split. handy for QBR
FROM c, UNNEST(split(price, '|')) u(price), UNNEST(split(product_name, '|')) a(product_name)
group by 1,2, 3, 4
but this duplicated several rows, so I'm not sure if unnesting the two columns doesn't quite work
Thanks
The issue with your query is that the clause FROM c, UNNEST(...), UNNEST(...) is effectively computing the cross join between each row of c and the rows produced by each of the derived tables resulting from the UNNEST calls.
You can solve it by unnesting all your arrays in a single call to UNNEST, thus, producing a single derived table. When used in that manner, the UNNEST produces a table with one column for each array and one row for each element in the arrays. If the arrays have a different length, it will produce rows up to the number of elements in the largest array and fill in with NULL for the column of the smaller array.
To illustrate, for your case, this is what you want:
WITH data(a, b, c) AS (
VALUES
('a|b|c', '1|2|3', 'CAD'),
('d|e|f', '4|5|6', 'USD')
)
SELECT t.a, t.b, data.c
FROM data, UNNEST(split(a, '|'), split(b, '|')) t(a, b)
which produces:
a | b | c
---+---+-----
a | 1 | CAD
b | 2 | CAD
c | 3 | CAD
d | 4 | USD
e | 5 | USD
f | 6 | USD
(6 rows)

How to sort a column based on exact matches with another column

I have an inventory table that looks like this (subset):
part number | price | quantity
10115 | 14.95 | 10
1050 | 5.95 | 12
1074 | 7.49 | 8
110-1353 | 13.99 | 22
and i also have another table in sheet 2 that looks like this (subset):
part number | quantity
10023 | 1
110-1353 | 3
10115 | 2
20112 | 1
I want to basically subtract the quantities in the second table from the ones in the first table. What is the best way of doing this? I have looked in to VLOOKUP and INDEX MATCH but they are not quite right for this. Would this perhaps actually be better in say an Access DB ?
I have add another two columns next to sheet 1 last column. Let us assume that the second table range is A1:B5.
Image:
Formulas:
Column D:
=IFNA(VLOOKUP(A2,Sheet2!$A$2:$B$5,2,FALSE),0)
Column E:
=C2-D2
If you wanted to tackle this using MS Access, the SQL code might look like this:
select
t1.[part number],
t1.price,
t1.quantity - nz(t2.quantity, 0) as qty
from
inventory t1 left join table2 t2 on t1.[part number] = t2.[part number]
Here, I assume that you have a table called inventory and a table called table2 (change these to suit your database).
A left join is used to ensure that all records from inventory are returned, regardless of whether a match is found in table2, and the Nz function is used to return 0 for records for which there is no part number match in table2.

Excel Count Distinct where other columns match and sum of another column = 0

I need to answer the question of how many parts were successful by counting the distinct Part Labels where PartName matches, and the sum of LabelFailures = 0.
PartName | PartLabel | LabelFailure
---------+-----------+-------------
a | 1 | 1
a | 1 | 0
a | 2 | 0
a | 2 | 0
b | 1 | 0
Desired Results:
PartName | PartsLabelSucceeded
---------+--------------------
a | 1
b | 1
This question might be similar to these two, but I'm having a hard time holding the individual components in my head to apply the answers to this particular situation. I've been trying to use COUNTIFS, but haven't found a way to fit both criteria in correctly.
Excel Count Unique Values on Multiple Criteria
Excel Count Distinct Values with Multiple Criteria
Use a helper column.
In D2 put:
=(COUNTIFS($A$1:A2,A2,$B$1:B2,B2)=1)*(COUNTIFS(A:A,A2,B:B,B2,C:C,1)=0)
Then insert a pivot table with PartName as the Rows And Count in the Values.
OR List the PartName manually and use SUMIFS:
=SUMIFS(D:D,A:A,G2)
This can also be done with a formula, without the helper if one wants to list the partNames:
=SUMPRODUCT((($A$2:$A$6=F2)*(COUNTIFS(A:A,$A$2:$A$6,B:B,$B$2:$B$6,C:C,1)=0)/(COUNTIFS(A:A,$A$2:$A$6,B:B,$B$2:$B$6)+($A$2:$A$6<>F2)+(COUNTIFS(A:A,$A$2:$A$6,B:B,$B$2:$B$6,C:C,1)>0))))

SUMIF of multiple columns with INDIRECT

I have a Excel sheet which is used as database, let's call that MyDB in the following Example. The first column A consists of some strings.
A | B | C
-----------------|--------------|------------------------------------------
Turnover 2014 | 1 | 2
Something | 2 | 0
Something | |
Turnover 2014 | 3 | 1
Something | |
Something | 0 | 2
What I want to do is look for the string Turnover 2014 and sum all values in that row from B:C (C is just an example in my case it will be variable and can be F or M).
What I have:
=SUMIF(INDIRECT("'MyDB'!A"&Helper!D2&":"&"A"&Helper!D8),"=Turnover 2014",INDIRECT("'MyDB'!$B"&Helper!D2&":"&"B"&Helper!D8))
The Helper!D2 and Helper!D8 contain the variable range, which is one of the reasons I have to use INDIRECT. For this example lets assume D2 = 1 and D8 = 6 (the full table)
Simple version:
=SUMIF(INDIRECT("'MyDB'!A1:A6"),"=Turnover 2014",INDIRECT("'MyDB'!B1:B6"))
This sums all values in B where A = Turnover 2014, so no problem here. Now I will show you my attempts to do the same with multi-columns:
=SUMIF(INDIRECT("'MyDB'!A1:A6"),"=Turnover 2014",INDIRECT("'MyDB'!B1:C6"))
=SUMPRODUCT((INDIRECT("'MyDB'!A1:A6") = "Turnover 2014")*(INDIRECT("'MyDB'!B1:C6")))
Both didn't work in my case (IMPORTANT I'm not talking about the simplified version I'm talking about the original version with all the variables).
In all cases I only get the sum of 4 where I need 7
Check whether your column A contains Turnover 2014 without leading/trailing spaces.
And try:
=SUMPRODUCT(
(TRIM(INDIRECT("'MyDB'!A"&Helper!D2&":"&"A"&Helper!D8)) = "Turnover 2014")*
(INDIRECT("'MyDB'!B"&Helper!D2&":"&"C"&Helper!D8))
)
also I suggest you to take a look at alternative formula without INDIRECT which is much better because it's not volatile formula:
=SUMPRODUCT(
(TRIM(INDEX(MyDB!$A:$A,Helper!D2):INDEX(MyDB!$A:$A,Helper!D8))="Turnover 2014")*
(INDEX(MyDB!$B:$B,Helper!D2):INDEX(MyDB!$C:$C,Helper!D8))
)

Resources