cql-import dynamic map Entries in cassandra - cassandra

I have 2 mysql tables as given below
Table Employee:
id int,
name varchar
Table Emails
emp_id int,
email_add varchar
Table Emails & Employee are connected by employee.id = emails.emp_id
I have entries like:
mysql> select * from employee;
id name
1 a
2 b
3 c
mysql> select * from emails;
empd_id emails
1 aa#gmail.com
1 aaa#gmail.com
1 aaaa#gmail.com
2 bb#gmail.com
2 bbb#gmail.com
3 cc#gmail.com
6 rows in set (0.02 sec)
Now i want to import data to cassandra in below 2 formats
---format 1---
table in cassandra : emp_details:
id , name , email map{text,text}
i.e. data should be like
1 , a, { 'email_1' : 'aa#gmail.com' , 'email_2 : 'aaa#gmail.com' ,'email_3' :'aaaa#gmail.com'}
2 , b , {'email_1' :'bb#gmail.com' ,'email_2':'bbb#gmail.com'}
3, c, {'email_1' : 'cc#gmail.com'}
---- format 2 ----
i want to have the dynamic columns like
id , name, email_1 , email_2 , email_3 .... email_n
Please help me for the same. My main concern is to import data from mysql into above 2 formats.

Edit: change list to map
Logically, you don't expect an user to have >1000 emails, I would suggest to use Map<text, text> or even List<text>. It's a good fit for CQL collections.
CREATE TABLE users (
id int,
name text,
emails map<text,text>,
PRIMARY KEY(id)
);
INSERT INTO users(id,name,emails)
VALUES(1, 'a', {'email_1': 'aa#gmail.com', 'email_2': 'bb#gmail.com', 'email_3': 'cc#gmail.com'});

Related

Needhelp with Snowflake pivot

I am trying to pivot some data in Snowflake, but in all honesty, I dont really understand it. So the data is like this:
Create table Company_rank (company_name varchar(100), Public_rank varchar(20), Peer_rank varchar(20), Online_rank varchar(20), Company_id integer)
Insert into Company_rank (company_name , Public_rank , Peer_rank , Online_rank , Company_id )
VALUES ('ABCCompany', '20','35', '15',1)
VALUES ('BCDCompany', '25','32', '20',2)
VALUES ('DEFCompany', '18','20', '25',3)
What I need to see is the ranking as rows, which I can use to join to another table, for each company. I need the company_id to stay as a column, as I need that for joining, but I dont think that's possible? So, basically, I need the type of ranking and the company_name to be available for joining, as another table has ranking_name, and company_id
Sorry if this seems jumbled!
As above, sorry this is my first post,.
I was thinking of this:
expected results
But then I've lost the ability to get a key from another table, which needs both the rank_type, and the company_id (this can be derived from company_name or company_id)
So ideally the end result, I would have three tables, which would look like this (company_rank is my staged data, which I am trying to get into these):
final results
I believe what you are looking for is just an UNPIVOT.
Create table Company_rank (company_name varchar(100), Public_rank varchar(20), Peer_rank varchar(20), Online_rank varchar(20), Company_id integer);
Insert into Company_rank (company_name , Public_rank , Peer_rank , Online_rank , Company_id )
VALUES ('ABCCompany', '20','35', '15',1)
,('BCDCompany', '25','32', '20',2)
,('DEFCompany', '18','20', '25',3);
-- unpivot the above table
select
company_id
, company_name
, rank_type
, rank_value
from company_rank
unpivot(
rank_value for rank_type in (
Public_rank
, Online_rank
, Peer_rank
)
);
Output:
COMPANY_ID
COMPANY_NAME
RANK_TYPE
RANK_VALUE
1
ABCCompany
PUBLIC_RANK
20
1
ABCCompany
ONLINE_RANK
15
1
ABCCompany
PEER_RANK
35
2
BCDCompany
PUBLIC_RANK
25
2
BCDCompany
ONLINE_RANK
20
2
BCDCompany
PEER_RANK
32
3
DEFCompany
PUBLIC_RANK
18
3
DEFCompany
ONLINE_RANK
25
3
DEFCompany
PEER_RANK
20
Your example output image has each company as a column.
I would advise achieving that transposition in a downstream reporting tool as -- I assume -- the companies are dynamic so the number of columns would not be deterministic (as well as other issues). Additionally, it would be impossible to join the company id as each column is a company.
This format will allow you to join the company, as well as filter on the company and type of rank.

viewing as list in cassandra

Table
CREATE TABLE vehicle_details (
owner_name text,
vehicle list<text>,
price float,
vehicle_type text,
PRIMARY KEY(price , vehicle_type)
)
I have two issues over here
I am trying to view the list of the vehicle per user. If owner1 has 2 cars then it should show as owner_name1 vehicle1 & owner_name1 vehicle2. is it possible to do with a select query?
The output I am expecting
owner_name_1 | vehicle_1
owner_name_1 | vehicle_2
owner_name_2 | vehicle_1
owner_name_2 | vehicle_2
owner_name_2 | vehicle_3
I am trying to use owner_name in the primary key but whenever I use WHERE or DISTINCT or ORDER BY it does not work properly. I am going to query price, vehicle_type most of the time. but Owner_name would be unique hence I am trying to use it. I tried several combinations.
Below are three combinations I tried.
PRIMARY KEY(owner_name, price, vehicle_type) WITH CLUSTERING ORDER BY (price)
PRIMARY KEY((owner_name, price), vehicle_type)
PRIMARY KEY((owner_name, vehicle_type), price) WITH CLUSTERING ORDER BY (price)
Queries I am running
SELECT owner_name, vprice, vehicle_type from vehicle_details WHERE vehicle_type='SUV';
SELECT Owner_name, vprice, vehicle_type from vehicle_details WHERE vehicle_type='SUV' ORDER BY price desc;
Since your table has:
PRIMARY KEY(price , vehicle_type)
you can only run queries with filters on the partition key (price) or the partition key + clustering column (price + vehicle_type):
SELECT ... FROM ... WHERE price = ?
SELECT ... FROM ... WHERE price = ? AND vehicle_type = ?
If you want to be able to query by owner name, you need to create a new table which is partitioned by owner_name. I also recommend not storing the vehicle in a collection:
CREATE TABLE vehicles_by_owner
owner_name text,
vehicle text,
...
PRIMARY KEY (owner_name, vehicle)
)
By using vehicle as a clustering column, each owner will have rows of vehicles in the table. Cheers!

Cassandra create duplicate table with different primary key

I'm new to Apache Cassandra and have the following issue:
I have a table with PRIMARY KEY (userid, countrycode, carid). As described in many tutorials this table can be queried by using following filter criteria:
userid = x
userid = x and countrycode = y
userid = x and countrycode = y and carid = z
This is fine for most cases, but now I need to query the table by filtering only on
userid = x and carid = z
Here, the documentation sais that is the best solution to create another table with a modified primary key, in this case PRIMARY KEY (userid, carid, countrycode).
The question here is, how to copy the data from the "original" table to the new one with different index?
On small tables
On huge tables
And another important question concerning the duplication of a huge table: What about the storage needed to save both tables instead of only one?
You can use COPY command to export from one table and import into other table.
From your example - I created 2 tables. user_country and user_car with respective primary keys.
CREATE KEYSPACE user WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'datacenter1' : 2 } ;
CREATE TABLE user.user_country ( user_id text, country_code text, car_id text, PRIMARY KEY (user_id, country_code, car_id));
CREATE TABLE user.user_car ( user_id text, country_code text, car_id text, PRIMARY KEY (user_id, car_id, country_code));
Let's insert some dummy data into one table.
cqlsh> INSERT INTO user.user_country (user_id, country_code, car_id) VALUES ('1', 'IN', 'CAR1');
cqlsh> INSERT INTO user.user_country (user_id, country_code, car_id) VALUES ('2', 'IN', 'CAR2');
cqlsh> INSERT INTO user.user_country (user_id, country_code, car_id) VALUES ('3', 'IN', 'CAR3');
cqlsh> select * from user.user_country ;
user_id | country_code | car_id
---------+--------------+--------
3 | IN | CAR3
2 | IN | CAR2
1 | IN | CAR1
(3 rows)
Now we will export the data into a CSV. Observe the sequence of columns mentioned.
cqlsh> COPY user.user_country (user_id,car_id, country_code) TO 'export.csv';
Using 1 child processes
Starting copy of user.user_country with columns [user_id, car_id, country_code].
Processed: 3 rows; Rate: 4 rows/s; Avg. rate: 4 rows/s
3 rows exported to 1 files in 0.824 seconds.
export.csv can now be directly inserted into other table.
cqlsh> COPY user.user_car(user_id,car_id, country_code) FROM 'export.csv';
Using 1 child processes
Starting copy of user.user_car with columns [user_id, car_id, country_code].
Processed: 3 rows; Rate: 6 rows/s; Avg. rate: 8 rows/s
3 rows imported from 1 files in 0.359 seconds (0 skipped).
cqlsh>
cqlsh>
cqlsh> select * from user.user_car ;
user_id | car_id | country_code
---------+--------+--------------
3 | CAR3 | IN
2 | CAR2 | IN
1 | CAR1 | IN
(3 rows)
cqlsh>
About your other question - yes the data will be duplicated, but that's how cassandra is used.

Cassandra where clause as a tuple

Table12
CustomerId CampaignID
1 1
1 2
2 3
1 3
4 2
4 4
5 5
val CustomerToCampaign = ((1,1),(1,2),(2,3),(1,3),(4,2),(4,4),(5,5))
Is it possible to write a query like
select CustomerId, CampaignID from Table12 where (CustomerId, CampaignID) in (CustomerToCampaign_1, CustomerToCampaign_2)
???
So the input is a tuple but the columns are not tuple but rather individual columns.
Sure, it's possible. But only on the clustering keys. That means I need to use something else as a partition key or "bucket." For this example, I'll assume that marketing campaigns are time sensitive and that we'll get a good distribution and easy of querying by using "month" as the bucket (partition).
CREATE TABLE stackoverflow.customertocampaign (
campaign_month int,
customer_id int,
campaign_id int,
customer_name text,
PRIMARY KEY (campaign_month, customer_id, campaign_id)
);
Now, I can INSERT the data described in your CustomerToCampaign variable. Then, this query works:
aploetz#cqlsh:stackoverflow> SELECT campaign_month, customer_id, campaign_id
FROM customertocampaign WHERE campaign_month=202004
AND (customer_id,campaign_id) = (1,2);
campaign_month | customer_id | campaign_id
----------------+-------------+-------------
202004 | 1 | 2
(1 rows)

Not getting exact output using User defined data types in cassandra

In CASSANDRA, I created a User defined data type,
cqlsh:test> create type fullname ( firstname text, lastname text );
And i created a table with that data type and inserted into the table like this,
cqlsh:test> create table people ( id UUID primary key, names set < frozen <fullname>> );
cqlsh:test> insert into people (id, names) values (
... now(),
... {{firstname: 'Jim', lastname: 'Jones'}}
... );
When i querying into the table iam getting output with some additional values like this
cqlsh:test> SELECT * from people ;
id | names
--------------------------------------+--------------------------------------------
3a59e2e0-14df-11e5-8999-abcdb7df22fc | {\x00\x00\x00\x03Jim\x00\x00\x00\x05Jones}
How can i get output like this???
select * from people;
id | names
--------------------------------------+-----------------------------------------
69ba9d60-a06b-11e4-9923-0fa29ba414fb | {{firstname: 'Jim', lastname: 'Jones'}}

Resources