Unnest from Table in Snowflake - pivot

I have the following table:
PersonID CW_MilesRun PW_MilesRun CM_MilesRun PM_MilesRun
1 15 25 35 45
2 10 20 30 40
3 5 10 15 20
...
I need to split this table into a vertical table with an id for each field (i.e CD_MilesRun =1, CW_MilesRun = 2, etc) So that my table looks similar to this:
PersonID TimeID Description C_MilesRun P_MilesRun
1 1 Week 15 25
1 2 Month 35 45
2 1 Week 10 20
2 2 Month 30 40
3 1 Week 5 10
3 2 Month 15 20
In postgres, I would use something similar to:
SELECT
PersonID
, unnest(array[1,2]) AS TimeID
, unnest(array['Week','Month']) AS "Description"
, unnest(array["CW_MilesRun","CM_MilesRun"]) C_MilesRun
, unnest(array["PW_MilesRun","PM_MilesRun"]) P_MilesRun
FROM myTableHere
;
However, I cannot get a similar function in snowflake to work. Any ideas?

You can use FLATTEN() with LATERAL to get the result you want, although the query is quite different.
with tbl as (select $1 PersonID, $2 CW_MilesRun, $3 PW_MilesRun, $4 CM_MilesRun, $5 PM_MilesRun from values (1, 15, 25, 35, 45),(2, 10, 20, 30, 40),(3, 5, 10, 15, 20))
select
PersonID,
t.value[0] TimeID,
t.value[1] Description,
iff(t.index=0,CW_MilesRun,CM_MilesRun) C_MilesRun,
iff(t.index=1,PW_MilesRun,PM_MilesRun) P_MilesRun
from tbl, lateral flatten(parse_json('[[1, "Week"],[2, "Month"]]')) t;
PERSONID TIMEID DESCRIPTION C_MILESRUN P_MILESRUN
1 1 "Week" 15 25
1 2 "Month" 35 45
2 1 "Week" 10 20
2 2 "Month" 30 40
3 1 "Week" 5 10
3 2 "Month" 15 20
P.S. Use t.* to see what's available after flattening (perhaps that is obvious.)

You could alternatively use UNPIVOT and NATURAL JOIN.
Above answer is great ... just like thinking about alternative ways of doing things ... you never know when it might suit your needs - plus exposes you to a couple new cool functions.
with cte as (
select
1 PersonID,
15 CW_MilesRun,
25 PW_MilesRun,
35 CM_MilesRun,
45 PM_MilesRun
union
select
2 PersonID,
10 CW_MilesRun,
20 PW_MilesRun,
30 CM_MilesRun,
40 PM_MilesRun
union
select
3 PersonID,
5 CW_MilesRun,
10 PW_MilesRun,
15 CM_MilesRun,
20 PM_MilesRun
)
select * from
(select
PersonID,
CW_MilesRun weekly,
CM_MilesRun monthly
from
cte
) unpivot (C_MilesRun for description in (weekly, monthly))
natural join
(select * from
(select
PersonID,
PW_MilesRun weekly,
PM_MilesRun monthly
from
cte
) unpivot (P_MilesRun for description in (weekly, monthly))) f

Related

Calculate the number of occurrences of words in a column and find the second, third most common

I have a formula that finds the frequent occurring text and works well.
=INDEX(Rng,MATCH(MAX(COUNTIF(Rng,Rng)),COUNTIF(Rng,Rng),0))
How can I tweak to find the second highest, third highest?
2nd:
=LARGE(A2:A; 2)
3rd:
=LARGE(A2:A; 3)
update 1:
use query:
=QUERY(A:A,
"select A,count(A) where A is not null group by A label count(A)''")
to get only 2nd or 3rd you can use index like:
=INDEX(QUERY(A:A,
"select A,count(A) where A is not null group by A label count(A)''"), 2)
update 2:
=INDEX(QUERY({'Data Entry Errors'!I:I},
"select Col1,count(Col1) where Col1 is not null group by Col1 order by count(Col1) desc limit 3 label count(Col1)''"),,1)
In Google Sheets, to get the number of occurrences of each word in the column A2:A, use this:
=query(A2:A, "select A, count(A) where A is not null group by A order by count(A) desc label count(A) '' ", 0)
To get just the second and third result and the number of their occurrences, use this:
=query(A2:A, "select A, count(A) where A is not null group by A order by count(A) desc limit 2 offset 1 label count(A) '' ", 0)
To get just the names that are the second and third by the number of their occurrences, use this:
=query( query(A2:A, "select A, count(A) where A is not null group by A order by count(A) desc limit 2 offset 1 label count(A) '' ", 0), "select Col1", 0 )
For Excel 365
Say we have data in column A from A2 through A66 like:
20
11
27
18
3
31
2
30
8
1
18
32
3
5
4
6
4
1
22
11
2
46
33
34
25
53
37
9
20
2
12
4
5
4
23
39
19
4
28
22
5
16
24
7
6
10
13
31
56
23
1
16
27
39
1
6
11
6
20
11
24
12
9
29
12
and we want a frequency table listing the most frequent value, the second most frequent value, the third, etc.
The simplest approach is to construct a Pivot Table, but if you need a formula approach, then in B2 enter:
=UNIQUE(A2:A66)
in C2 enter:
=COUNTIF(A$2:A$66,B2)
We now sort cols B:C by C. In D2 enter:
=SORTBY(B2:C35,C2:C35,-1)
:

How to replace a column in dataframe for the result of a function

currently I have a dataframe with a column named age, which has the age of the person in days. I would like to convert this value to year, how could I achieve that?
at this moment, if one runs this command
df['age']
the result would be something like
0 18393
1 20228
2 18857
3 17623
4 17474
5 21914
6 22113
7 22584
8 17668
9 19834
10 22530
11 18815
12 14791
13 19809
I would like to change the value from each row to the current value/ 365 (which would convert days to year)
As suggested:
>>> df['age'] / 365
age
0 50.391781
1 55.419178
2 51.663014
3 48.282192
4 47.873973
Or if you need a real year:
>>> df['age'] // 365
age
0 50
1 55
2 51
3 48
4 47

LISTAGG Partition for Webi

I would like to do something similar to oracle LISTAGG in Webi. Below are my Queries.
Query 1
Id M1 ; columns
1 10
2 20
3 30
4 40
5 50
Query 2
Id D1 ; column
1 A11
1 A12
1 A13
2 A21
2 A22
2 A23
2 A24
3 A31
wanted outcome by merging Query 1 and Query 2 By Id
Id M1 New Column
1 10 A11;A12;A13
2 20 A21;A22;A23;A24
3 30 A31
4 40
5 50
I can get to the point below. Then, use NoFilter to keep values intact when applying a filter. However, the column F2 has the values "#MULTIVALUE". I can get NoFilter to work with one query. But, with two queries like this, NoFilter doesn't work. Any suggestion to address the issue.
Id M1 F1 (Measure) F2
1 10 A11;A12;A13 =NoFilter([F1])
1 10 A11;A12;A13
1 10 A11;A12;A13
2 20 A21;A22;A23;A24
2 20 A21;A22;A23;A24
2 20 A21;A22;A23;A24
2 20 A21;A22;A23;A24
3 30 A31
4 40
5 50
I wonder if anyone could show me how to achieve this.
Many thanks for your help,
Andre

taking top 3 in a groupby, and lumping rest into 'other category'

I am currently doing a groupby in pandas like this:
df.groupby(['grade'])['students'].nunique())
and the result I get is this:
grade
grade 1 12
grade 2 8
grade 3 30
grade 4 2
grade 5 600
grade 6 90
Is there a way to get the output such that I see the groups of the top 3, and everything else is classified under other?
this is what I am looking for
grade
grade 3 30
grade 5 600
grade 6 90
other (3 other grades) 22
I think you can add a helper column in the df and call it something like "Grouping".
name the top 3 rows with its original name and name the remaining as "other" and then just group by the "Grouping" column.
Can't do much without the actual input data, but if this is your starting dataframe (df) after your groupby -
grade unique
0 grade_1 12
1 grade_2 8
2 grade_3 30
3 grade_4 2
4 grade_5 600
5 grade_6 90
You can do a few more steps to get to your table -
ddf = df.nlargest(3, 'unique')
ddf = ddf.append({'grade': 'Other', 'unique':df['unique'].sum()-ddf['unique'].sum()}, ignore_index=True)
grade unique
0 grade_5 600
1 grade_6 90
2 grade_3 30
3 Other 22

column values in a row

I have following table
id count hour age range
-------------------------------------
0 5 10 61 10-200
1 6 20 61 10-200
2 7 15 61 10-200
5 9 5 61 201-300
7 10 25 61 201-300
0 5 10 62 10-20
1 6 20 62 10-20
2 7 15 62 10-20
5 9 5 62 21-30
1 8 6 62 21-30
7 10 25 62 21-30
10 15 30 62 31-40
I need to select distinct values of column range
I tried following query
Select distinct range as interval from table name where age = 62;
its result is in a column as follows:
interval
----------
10-20
21-30
31-41
How can I get result as follows?
10-20, 21-30, 31-40
EDITED:
I am now trying following query:
select sys_connect_by_path(range,',') interval
from
(select distinct NVL(range,'0') range , ROW_NUMBER() OVER (ORDER BY RANGE) rn
from table_name where age = 62)
where connect_by_isleaf = 1 CONNECT BY rn = PRIOR rn+1 start with rn = 1;
Which is giving me output as:
Interval
----------------------------------------------------------------------------
, 10-20,10-20,10-20,21-30,21-30, 31-40
guys plz help me to get my desired output.
If you are on 11.2 rather than just 11.1, you can use the LISTAGG aggregate function
SELECT listagg( interval, ',' )
WITHIN GROUP( ORDER BY interval )
FROM (SELECT DISTINCT range AS interval
FROM table_name
WHERE age = 62)
If you are using an earlier version of Oracle, you could use one of the other Oracle string aggregation techniques on Tim Hall's page. Prior to 11.2, my personal preference would be to create a user-defined aggregate function so that you can then
SELECT string_agg( interval )
FROM (SELECT DISTINCT range AS interval
FROM table_name
WHERE age = 62)
If you don't want to create a function, however, you can use the ROW_NUMBER and SYS_CONNECT_BY_PATH approach though that tends to get a bit harder to follow
with x as (
SELECT DISTINCT range AS interval
FROM table_name
WHERE age = 62 )
select ltrim( max( sys_connect_by_path(interval, ','))
keep (dense_rank last order by curr),
',') range
from (select interval,
row_number() over (order by interval) as curr,
row_number() over (order by interval) -1 as prev
from x)
connect by prev = PRIOR curr
start with curr = 1

Resources