Snowflake unpivoting

Snowflake unpivoting - pivot

I need to transpose a table in which column1 is name of an entity and column2 to column366 are dates in a year that hold a dollar amount. The table, the select statement and the output result are all given
below -
Question - This syntax requires me to create a comma separated list of columns - which are basically 365 dates - and use that list in the IN clause of the select statement.
Like this -
.....unpivot (cash for dates in ("1-1-2020" , "1-2-2020" , "1-3-2020"........."12-31-2020")) order by 2
Is there any better way of doing this ? Like with regular expressions ? I don't want to type 365 dates in mm-dd-yyyy format and get carpel tunnel for my trouble
Here is the table - First line is column header, second line is separator. 3rd, 4th and 5th lines are sample data.
Name 01-01-2020 01-02-2020 01-03-2020 12-31-2020
---------------------------------------------------------------------------------------------------
Entity1 10.00 15.75 20.00 100.00
Entity2 11.00 16.75 20.00 10.00
Entity3 112.00 166.75 29.00 108.00
I can transpose it using the select statement below
select * from Table1
unpivot (cash for dates in ("1-1-2020" , "1-2-2020" , "1-3-2020")) order by 2
to get an output like the one below -
Name-------------------dates-----------------------cash
--------------------------------------------------------------
Entity1 01-01-2020 10.00
Entity2 01-01-2020 11.00
Entity3 01-01-2020 112.00
...............................
.............................
.........
and so on

There is a simpler way to do this without PIVOT. Snowflake gives you a function to represent an entire row as an "OBJECT" -- a collection of key-value pairs. With that representation, you can FLATTEN each element and extract both the column name (key == date) and the value inside (value == cash). Here is a query that will do it:
with obj as (
select OBJECT_CONSTRUCT(*) o from Table1
)
select o:NAME::varchar as name,
f.key::date as date,
f.value::float as cash
from obj,
lateral flatten (input => obj.o, mode => 'OBJECT') f
where f.key != 'NAME'
;

Related

Find the column in Subquery coalesce function

I am using the Coalesce function to return a value from my preferred ranking of Columns but I also want to include the name of the column that the value was derived from.
`i.e
Table:
Apples Pears Mangos
4 5
**SQL **
; with CTE as
(
Select
Coalesce(Apples,Pears,Mangos) as QTY_Fruit
from Table
) select *, column name from QTY_Fruit
from CTE
Result:
QTY_Fruit Col Name
4 Pears`
I am trying to avoid a case statement if possible because there are about 12 fields that I will need to use in my Coalesce. Would love for an easy way to pull the column name based on value in QTY_Fruit. I'm all ears if the answer lies outside the use of subqueries but I figured this would be a start.

How to search a string from a collection of string in hive column

I have below two input table : table 1 and table 2
Input : table 1 id and currency are of type string
ID
Currency
1
USD,JPY,EUR
2
NOK,MXN
3
AUD
4
AUD,HKD
input: table2 exception_currency are of type string
exception_currency
AUD
NOK
USD
HKD
expected output as below
Exception is yes if the each currency column values from table 1 and exception_currency from table 2 are not matching.
For example for id 1, the exception is "YES" because JPY and EUR are not available in the table 2.
ID
Currency
Exception
1
USD,JPY,EUR
YES
2
NOK,MXN
YES
3
AUD
NO
4
AUD,HKD
NO
I tried below code but not getting expected results.
select
id,
currency,
case when array_contains(split(t1.currency,','), t2.exception_currency) then 'NO' else 'YES' as exception
from table1 t1 left join table2 t2 on (t1.currency=t2.exception_currency);

First seperate the comma separated values into rows and then compare it with table2.
Select id, currency,
Max(excep) as exception
from table1
Join (Select
Id, Case when exception_currency is null then 'yes' else 'no' end excep
From
(SELECT ID, sep_curr
FROM table1 lateral view explode(split(currency,',')) currency AS sep_curr) rs
Left join table2 on exception_currency= sep_curr
) Rs on rs.id= table1.id
Group by id, currency
Here lateral view will generate rows from comma separated values.
Joining it with exception table to get non existent exception.
Then finally max will show yes if at least one value is exception.

finding if n out of m columns are null over each row using calculated column functions in Spotfire

I have the following table and I would like to get the number of nulls for each SEQ_ID
SEQ_ID zScore for 7d zScore for 14d zScore for 21d zScore for 28d zScore for 35d
456 11.353 13.2922 9.0162 8.8533
789 8.5991 8.8244 5.7394
So for SEQ_ID 456 I would have 1 null
For SEQ_ID 789 I would have 2 nulls
Is there a way to do this without writing complicated case statements with brute force combinations in the Calculated column area using Spotfire

I guess you are looking for a Spotfire custom expression not involving R.
This would give you the number of columns that are not null. If you know the total number of columns, you can easily turn it into the number of null columns
Len(RXReplace(Concatenate($map("[yourtable].$esc($csearch([yourtable],"*"))",",'-',")),'\\w+','Z','g')) -
Len(RXReplace(Concatenate($map("[yourtable].$esc($csearch([yourtable],"*"))",",'-',")),'\\w+','','g'))
[yourtable] would be the name of your data table. This acts on all columns.

Cassandra asking to allow filter even after mentioning all partition key in query?

I have been trying to model a data in Cassandra, and was trying to filter the data based on date in that, as given by the answer here on SO, Here second answer is not using allow filter.
This is my current schema,
CREATE TABLE Banking.BankData(acctID TEXT,
email TEXT,
transactionDate Date ,
transactionAmount double ,
balance DOUBLE,
currentTime timestamp ,
PRIMARY KEY((acctID, transactionDate), currentTime )
WITH CLUSTERING ORDER BY (currentTime DESC);
Now have inserted a data by
INSERT INTO banking.BankData(acctID, email, transactionDate, transactionAmount, balance, currentTime) values ('11', 'alpitanand20#gmail.com','2013-04-03',10010, 10010, toTimestamp(now()));
Now when I try to query, like
SELECT * FROM banking.BankData WHERE acctID = '11' AND transactionDate > '2012-04-03';
It's saying me to allow filtering, however in the link mentioned above, it was not the case.
The final requirement was to get data by year, month, week and so on, thats why had taken to partition it by day, but date range query is not working.
Any suggestion in remodel or i am doing something wrong ?
Thanks

Cassandra supports only equality predicate on the partition key columns, so you can use only = operation on it.
Range predicates (>, <, >=, <=) are supported only only on the clustering columns, and it should be a last clustering column of condition.
For example, if you have following primary key: (pk, c1, c2, c3), you can have range predicate as following:
where pk = xxxx and c1 > yyyy
where pk = xxxx and c1 = yyyy and c2 > zzzz
where pk = xxxx and c1 = yyyy and c2 = zzzz and c3 > wwww
but you can't have:
where pk = xxxx and c2 > zzzz
where pk = xxxx and c3 > zzzz
because you need to restrict previous clustering columns before using range operation.
If you want to perform a range query on this data, you need to declare corresponding column as clustering column, like this:
PRIMARY KEY(acctID, transactionDate, currentTime )
in this case you can perform your query. But because you have time component, you can simply do:
PRIMARY KEY(acctID, currentTime )
and do the query like this:
SELECT * FROM banking.BankData WHERE acctID = '11'
AND currentTime > '2012-04-03T00:00:00Z';
But you need to take 2 things into consideration:
your primary should be unique - maybe you'll need to add another clustering column, like, transaction ID (for example, as uuid type) - in this case even 2 transactions happen into the same millisecond, they won't overwrite each other;
if you have a lot of transactions per account, then you may need to add an another column into partition key. For example, year, or year/month, so you don't have big partitions.
P.S. In linked answer use of non-equality operation is possible because ts is clustering column.

create database from with 2 distinct database according to customer name in python

Table A: Product Attributes
This table contains two columns; the first one is a unique product ID represented by an integer, the second is a string containing a collection of attributes assigned to that product.
product tags
100 chocolate, sprinkles
101 chocolate, sprinkles
102 glazed
Table B: Customer Attributes
The second table contains two columns as well; the first one is a string that contains a customer name, the second is an integer that contains a product number. The product IDs from column two are the same as the product IDs from column one of Table A.
customer product
A 100
A 101
B 101
C 100
C 102
B 101
A 100
C 102
Generated Table
I want to create a table matching this format, where the contents of the cells represent the count of occurrences of product attribute by customer.
customer chocolate sprinkles glazed
A ? ? ?
B ? ? ?
C ? ? ?
I want count instead of ?.
And I want to do this in python.
One more question: If the two starting tables were in a relational database or Hadoop cluster and each had 100 million rows, how might my approach change?

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Snowflake unpivoting - pivot

Related

Find the column in Subquery coalesce function

How to search a string from a collection of string in hive column

finding if n out of m columns are null over each row using calculated column functions in Spotfire

Cassandra asking to allow filter even after mentioning all partition key in query?

create database from with 2 distinct database according to customer name in python

Categories

Resources