Bigquery SQL - conditional string agg - string

I have the following table -
Columns -
row number (name - colA),
integer - a random number (colB) might be any value.
id - short string, might be null (colC).
1 2 'a'
2 4 'b'
3 6
4 1 'c'
Expected result -
1 2 'a'
2 4 'b' 'a'
3 6 'a, b'
4 1 'c' NULL
5 3 'd' 'b'
The process of generating the last column -
For each row - Taking colB from that row ("examined row") and test each row above it (after ordering by colA) ("tested row").
examined_row.colB >= tested_row.colB. If so - we take ColC.
After checking all the tested rows - we string_agg(colC)
Does anyone has idea how to do it in SQL without massive inner join ?
Thought about string agg, however, for the first part (the condition) - I have no idea how to handle the value from the examined row and compare it to the tested row, since analytical function can't be into string_agg().
Also sub query might not help as I see it.
The only solution I thought is inner join, but it not effecient.
Even if you have solution with array_agg is also better.
Thanks !

Below is for BigQuery Standard SQL
#standardSQL
select colA, colB, colC,
(
select string_agg(colC, ', ')
from t.candidates
where t.colB > colB
order by colA
) as colD
from (
select *, array_agg(struct(colB, colC)) over win as candidates
from `project.dataset.table`
window win as (order by colA rows between unbounded preceding and 1 preceding )
) t
if to apply to sample data from your question - output is

Related

Keeping rows based on source rankings in excel?

I have a table that looks like this -
Source Rank
Value
1
A
2
A
3
A
2
B
3
B
1
C
2
C
3
C
I want to make only keep the rows for each value with best rank. So the table will look like this -
Source Rank
Value
1
A
2
B
1
C
It's almost impossible to find a better argument for the invention of the "Subtotals" feature, as you can see:
Oh, in case you don't get the same results immediately: don't forget to click the 2 button in the left margin :-)
Create a Pivot Table from the source data, and set:
Row Labels: Value
Values: Source Rank - and use the pulldown to set to Min
Admittedly this will have the columns the other way round from what you show as ideal result.

Convert sql to excel formula

I have 2 tables in excel.
Table 1
Item Quantity_Required Quantity_Remaining
A 5
B 10
C 3
Table 2
Source Item Quantity
1 A 2
2 A 1
1 B 5
My result should be to fill in Quantity_Remaining column in Table 1
Table 1
Item Quantity_Required Quantity_Remaining
A 5 2
B 10 5
C 3 3
The logic in SQL code is as follows.
SELECT A.Item,
A.Quantity_Required,
A.Quantity_Required - B.Quantity as Quantity_Remaining
FROM Table1 A
LEFT JOIN
(SELECT Item,
SUM(Quantity)
FROM Table2
GROUP BY Item) B
ON A.Item = B.Item
I need pointers on how to translate this to Excel.
For data placed in excel sheet like below, you can use this formula:
=$B2-SUMPRODUCT(($A2=$B$8:$B$10)*($C$8:$C$10))
So the second part of formula SUMPRODUCT is looking for cells in range B8:B10 which match with A2 and then taking their values from column C and adding them.

Calculate a value for 2 different set of IDs in excel

My excel table has 5 Rows: Id, ColA, ColB, Count and Test.
ID A B Count Test
2 a low 5 -
2 b high 6 -
2 c low 7 -
2 d high 8 -
2 e low 9 -
1 a low 1 =(1-5)
1 l high 2 -
1 e low 3 =(3-9)
I want to Calculate the value of Test for only rows with Id = 1
If Value of ColA for ID 1 = Value of of ColA for ID 2 and
Value of ColB for ID 1 = Value of of ColB for ID 2
then calculate the difference between the Count Values
else
0
The Excel Table is connected to Sql Query. Every time I refresh it the table has a different number of rows.
I tried using VLOOKUP in TEST column where Id = 1 and specified the array table as the first 5 rows (only with Id = 2) but it doesn't seem to work because when I refresh the table the second time there are only 2 rows for Id = 2.
I want the TEST column value to be automatically calculated each time the table is refreshed. Thanks!
use countifs to find if it exists, and sumifs to return the value:
=IF(AND(A2=1,COUNTIFS(B:B,B2,C:C,C2,A:A,2)),D2-SUMIFS(D:D,B:B,B2,C:C,C2,A:A,2),0)

Rank, group and set a label in Spotfire table

I have two column [Clients] and [Sales], I'd like to create a new column containing a scoring (from 1 to 5) for each "Client" in terms of Sales.
I want to make a ranking of [Sales] and split it, uniformly in 5 groups, then set a label 1 for the highest [Sales], 2 for the second group, etc...
Does someone have an idea of an expression to use ?
You can use the percentile function in conjunction with a case statement. In the screenshot below, I created 5 calculations to find the 20th, 40th, 60th, and 80th percentiles and then created a case statement to rank based on those values.
Percentile calculation:
Percentile([Sales],20)
Case statement:
case
when [Sales]<[20th Percentile] then 1
when ([Sales]>=[20th Percentile]) and ([Sales]<[40th Percentile]) then 2
when ([Sales]>=[40th Percentile]) and ([Sales]<[60th Percentile]) then 3
when ([Sales]>=[60th Percentile]) and ([Sales]<[80th Percentile]) then 4
when [Sales]>=[80th Percentile] then 5
else NULL
end
See attached screenshot
Data sample and calcs

How to count every word occurence in a string in a ORACLE loop?

I've got a problem which seems simple at first, but really isn't. I'm storing words in a table in such way that pair of strings "A B C D E" and "D E F" becomes:
id value
-- -----
1 A
1 B
1 C
1 D
1 E
2 D
2 E
2 F
And i pass to my ORACLE procedure string which looks like this: "A B C D G". And now I want to check percentage of similarity between strings in the database and string passed as a parameter.
I presume that I have to use one of split functions and use an array. Later check if every word in passed string occurs in the table and then count ids. But there`s a twist: I need precise percentage value.
So, result from example above should look like this:
id percentage
-- ----------
1 80 -- 4 out of 5 letters exists in query string (A B C D)
2 33 -- 1 out of 3 (D)
So, my questions are:
what is most effective way to split query string and then iterate on it (table?)
how to store partial results and then count them?
how to count final percentage value?
Every help would be greatly appreciated.
The following query would give you what you want without the need to bother with procedures.
select id
, sum(case when value in ('A', 'B', 'C', 'D', G') then 1 else 0 ) / count(*)
from my_table
group by id
Alternatively if you have to pass the string "A B C D G" and get a result back you could do:
select id
, sum(case when instr('A B C D G', value) <> 0 then 1 else 0 ) / count(*)
from my_table
group by id
These do involve full scanning the table or an index full scan if you use the suggested index below, so you might want to add the following where clause if you only want to find ids that have a percentage > 0.
select id
, sum(case when instr('A B C D G', value) <> 0 then 1 else 0 ) / count(*)
from my_table
where exists ( select 1
from my_table
where id = mt.id
and instr('A B C D G', value) <> 0 )
group by id
For all the queries your table should be indexed on my_table, id in that order.
Have you had a look at UTL_MATCH? It doesn't do exactly what you're trying to achieve, but you may find it useful if the definition of your percentage agreement isn't set in stone.

Resources