Needhelp with Snowflake pivot - pivot

I am trying to pivot some data in Snowflake, but in all honesty, I dont really understand it. So the data is like this:
Create table Company_rank (company_name varchar(100), Public_rank varchar(20), Peer_rank varchar(20), Online_rank varchar(20), Company_id integer)
Insert into Company_rank (company_name , Public_rank , Peer_rank , Online_rank , Company_id )
VALUES ('ABCCompany', '20','35', '15',1)
VALUES ('BCDCompany', '25','32', '20',2)
VALUES ('DEFCompany', '18','20', '25',3)
What I need to see is the ranking as rows, which I can use to join to another table, for each company. I need the company_id to stay as a column, as I need that for joining, but I dont think that's possible? So, basically, I need the type of ranking and the company_name to be available for joining, as another table has ranking_name, and company_id
Sorry if this seems jumbled!
As above, sorry this is my first post,.
I was thinking of this:
expected results
But then I've lost the ability to get a key from another table, which needs both the rank_type, and the company_id (this can be derived from company_name or company_id)
So ideally the end result, I would have three tables, which would look like this (company_rank is my staged data, which I am trying to get into these):
final results

I believe what you are looking for is just an UNPIVOT.
Create table Company_rank (company_name varchar(100), Public_rank varchar(20), Peer_rank varchar(20), Online_rank varchar(20), Company_id integer);
Insert into Company_rank (company_name , Public_rank , Peer_rank , Online_rank , Company_id )
VALUES ('ABCCompany', '20','35', '15',1)
,('BCDCompany', '25','32', '20',2)
,('DEFCompany', '18','20', '25',3);
-- unpivot the above table
select
company_id
, company_name
, rank_type
, rank_value
from company_rank
unpivot(
rank_value for rank_type in (
Public_rank
, Online_rank
, Peer_rank
)
);
Output:
COMPANY_ID
COMPANY_NAME
RANK_TYPE
RANK_VALUE
1
ABCCompany
PUBLIC_RANK
20
1
ABCCompany
ONLINE_RANK
15
1
ABCCompany
PEER_RANK
35
2
BCDCompany
PUBLIC_RANK
25
2
BCDCompany
ONLINE_RANK
20
2
BCDCompany
PEER_RANK
32
3
DEFCompany
PUBLIC_RANK
18
3
DEFCompany
ONLINE_RANK
25
3
DEFCompany
PEER_RANK
20
Your example output image has each company as a column.
I would advise achieving that transposition in a downstream reporting tool as -- I assume -- the companies are dynamic so the number of columns would not be deterministic (as well as other issues). Additionally, it would be impossible to join the company id as each column is a company.
This format will allow you to join the company, as well as filter on the company and type of rank.

Related

Update column value in Cassandra table if value exists

I have a Cassandra table as below
CREATE TABLE inventory(
prodid varchar,
loc varchar,
qty float,
PRIMARY KEY (prodid)
) ;
Requirement :
For the provided primary key, if no record exists in table, we need to insert, which is straight forward. but when the record exists for the primary key, then we need to update the qty column by adding the existing value in the table with new values received.
As per my understanding, I need to query the table first for the provided primary key and get the value of the qty column and add with new value received from the request and execute the update query with light weight transaction.
Ex: table has say qty 10 for the prodid=1 and if I receive from user new qty as 2 (which is delta), then I need to update qty as 12 for the prodid=1.
Is that logic is correct? or any better way to design the table or handle the use case? Will this approach introduce latency issue during the load as we need to do select query first and if data exists update the column value with new value ? Please help.
You can change the qty column to static. This way you do not have to update the table but Insert. Updates are resource intensive so cassandra treats UPDATE statement as insert statement. So, your table definition should be -
CREATE TABLE inventory(
prodid varchar,
loc varchar,
qty float static,
PRIMARY KEY (prodid) ) ;
So you can use your business logic to calculate the new value of QTY column and use INSERT statement, which intern update the same column.
Other way is to use counter column -
CREATE TABLE inventory(
prodid varchar,
loc varchar,
qty counter,
PRIMARY KEY (prodid, loc ) ) ;
Which this design you can just use update query like below -
update inventory set qty = qty + <calculated Quantity> where prodid = 1;
Notice that, in second table design, all other columns have to the part of primary key. In your case, it is easy and convenient.

viewing as list in cassandra

Table
CREATE TABLE vehicle_details (
owner_name text,
vehicle list<text>,
price float,
vehicle_type text,
PRIMARY KEY(price , vehicle_type)
)
I have two issues over here
I am trying to view the list of the vehicle per user. If owner1 has 2 cars then it should show as owner_name1 vehicle1 & owner_name1 vehicle2. is it possible to do with a select query?
The output I am expecting
owner_name_1 | vehicle_1
owner_name_1 | vehicle_2
owner_name_2 | vehicle_1
owner_name_2 | vehicle_2
owner_name_2 | vehicle_3
I am trying to use owner_name in the primary key but whenever I use WHERE or DISTINCT or ORDER BY it does not work properly. I am going to query price, vehicle_type most of the time. but Owner_name would be unique hence I am trying to use it. I tried several combinations.
Below are three combinations I tried.
PRIMARY KEY(owner_name, price, vehicle_type) WITH CLUSTERING ORDER BY (price)
PRIMARY KEY((owner_name, price), vehicle_type)
PRIMARY KEY((owner_name, vehicle_type), price) WITH CLUSTERING ORDER BY (price)
Queries I am running
SELECT owner_name, vprice, vehicle_type from vehicle_details WHERE vehicle_type='SUV';
SELECT Owner_name, vprice, vehicle_type from vehicle_details WHERE vehicle_type='SUV' ORDER BY price desc;
Since your table has:
PRIMARY KEY(price , vehicle_type)
you can only run queries with filters on the partition key (price) or the partition key + clustering column (price + vehicle_type):
SELECT ... FROM ... WHERE price = ?
SELECT ... FROM ... WHERE price = ? AND vehicle_type = ?
If you want to be able to query by owner name, you need to create a new table which is partitioned by owner_name. I also recommend not storing the vehicle in a collection:
CREATE TABLE vehicles_by_owner
owner_name text,
vehicle text,
...
PRIMARY KEY (owner_name, vehicle)
)
By using vehicle as a clustering column, each owner will have rows of vehicles in the table. Cheers!

Cassandra where clause as a tuple

Table12
CustomerId CampaignID
1 1
1 2
2 3
1 3
4 2
4 4
5 5
val CustomerToCampaign = ((1,1),(1,2),(2,3),(1,3),(4,2),(4,4),(5,5))
Is it possible to write a query like
select CustomerId, CampaignID from Table12 where (CustomerId, CampaignID) in (CustomerToCampaign_1, CustomerToCampaign_2)
???
So the input is a tuple but the columns are not tuple but rather individual columns.
Sure, it's possible. But only on the clustering keys. That means I need to use something else as a partition key or "bucket." For this example, I'll assume that marketing campaigns are time sensitive and that we'll get a good distribution and easy of querying by using "month" as the bucket (partition).
CREATE TABLE stackoverflow.customertocampaign (
campaign_month int,
customer_id int,
campaign_id int,
customer_name text,
PRIMARY KEY (campaign_month, customer_id, campaign_id)
);
Now, I can INSERT the data described in your CustomerToCampaign variable. Then, this query works:
aploetz#cqlsh:stackoverflow> SELECT campaign_month, customer_id, campaign_id
FROM customertocampaign WHERE campaign_month=202004
AND (customer_id,campaign_id) = (1,2);
campaign_month | customer_id | campaign_id
----------------+-------------+-------------
202004 | 1 | 2
(1 rows)

SELECT range from Map in Cassandra

I have table in Cassandra and I wish to select 10 last blob from usermgmt.user_history.history
CREATE TABLE usermgmt.user_history (
id uuid,
history Map<timeuuid, blob>,
PRIMARY KEY(id)
);
I feel like it was easy with 5 year old Cassandra design with ordered typed column names. But now I can't find method to select range of 10 last entries in recent Cassandra 3.0
How about this:
CREATE TABLE usermgmt.user_history (
id uuid,
history_time timestamp,
history_blob blob,
PRIMARY KEY(id, history_time)
);
Then,
SELECT * FROM usermgmt.user_history WHERE id = your-uuid ORDER BY history_time limit 10;

cql-import dynamic map Entries in cassandra

I have 2 mysql tables as given below
Table Employee:
id int,
name varchar
Table Emails
emp_id int,
email_add varchar
Table Emails & Employee are connected by employee.id = emails.emp_id
I have entries like:
mysql> select * from employee;
id name
1 a
2 b
3 c
mysql> select * from emails;
empd_id emails
1 aa#gmail.com
1 aaa#gmail.com
1 aaaa#gmail.com
2 bb#gmail.com
2 bbb#gmail.com
3 cc#gmail.com
6 rows in set (0.02 sec)
Now i want to import data to cassandra in below 2 formats
---format 1---
table in cassandra : emp_details:
id , name , email map{text,text}
i.e. data should be like
1 , a, { 'email_1' : 'aa#gmail.com' , 'email_2 : 'aaa#gmail.com' ,'email_3' :'aaaa#gmail.com'}
2 , b , {'email_1' :'bb#gmail.com' ,'email_2':'bbb#gmail.com'}
3, c, {'email_1' : 'cc#gmail.com'}
---- format 2 ----
i want to have the dynamic columns like
id , name, email_1 , email_2 , email_3 .... email_n
Please help me for the same. My main concern is to import data from mysql into above 2 formats.
Edit: change list to map
Logically, you don't expect an user to have >1000 emails, I would suggest to use Map<text, text> or even List<text>. It's a good fit for CQL collections.
CREATE TABLE users (
id int,
name text,
emails map<text,text>,
PRIMARY KEY(id)
);
INSERT INTO users(id,name,emails)
VALUES(1, 'a', {'email_1': 'aa#gmail.com', 'email_2': 'bb#gmail.com', 'email_3': 'cc#gmail.com'});

Resources