Cassandra - Nodejs - Issue while retrieving list type values - node.js

for example below is the table structure.
CREATE TABLE table_name(
name text,
id text PRIMARY KEY,
details list<text>
)
Assume
details[0]-> contact number,
details[1]-> Address
I want to write a query to extract contact number from this table.

Actually, you should not store arrays of data. The best and simplest way will be to refactor your database to something like this.
CREATE TABLE table_name(
name text,
id text PRIMARY KEY,
contact number NOT NULL,
address text NOT NULL,
)
Then you could do SELECT contact FROM table_name. If the same address can be reused between multiple entities then you may think about adding one more table Addresses and then using foreign keys to relate this data.

Related

How is denormalization handled in cassandra

What is the best approach to update table with duplicate data?
I have a table
table users (
id text PRIMARY KEY,
email text,
description,
salary
)
I will delete, update, insert etc to this table. But I also have a requirement to be able to search by email, and description. If I create new table with new composite keys for email, and description,
when I update my base table I do
insert into users (id, salary) values (1, 500);
I do not have the required data to also update my secondary table since all the client has is id and salary. How is the second table updated.
Other workarounds and shortcomings
I could have created a materialized view, but since the base table has only one primary key I can only add one more column. my search requirement requires more than one column.
Create secondary indexes on the columns that will be searched on. But the performance for this would be bad since the columns I will be searching on would have high cardinality. i.e. description, email, etc
So, the "correct" way of doing this is to create 3 tables. salary_by_id, salary_by_email and salary_by_description.
table salary_by_id (
id text PRIMARY KEY,
salary int
)
table salary_by_email (
email text PRIMARY KEY,
salary int
)
table salary_by_description (
description text,
id int,
salary int,
primary key (description, id)
)
The reason i added id to salary_by_description is that, from my own guessing, description won't be globally uniq, so it has to have something else in it's primary key.
Depending on the size of these tables the last one might need something extra added to it's partitioning key. And if needed you can add id, email and description to the other tables.
Now, when inserting or deleting values you need so do it in all 3 tables. If you use a driver, like in java, that supports asynchronous calls, then this doesn't cost very much extra.

Cassandra how can I simulate a join statement

I am new to cassandra and am coming from Postgres. I was wondering if there is a way that I can get data from 2 different tables or column family and then return the results. I have this query
select p.fullname,p.picture s.post, s.id, s.comments, s.state, s.city FROM profiles as p INNER JOIN Chats as s ON(p.id==s.profile_id) WHERE s.latitudes>=28 AND 29>= s.latitudes AND s.longitudes
">=-21 AND -23>= s.longitudes
The query has 2 tables: Profiles and Chat and they both share a common field Chats.id==Proifles.profile_id it boils down to this basically return all rows where Chat ID is equal to Profiles id. I would like to keep it that way because now updating profiles are simple and would only need to update 1 row per profile update instead of de-normalizing everything and updating thousands of records. Any help or suggestions would be great
You have to design tables in way you won't need joins. Best practice is if your table matches exactly the use case it is used for.
Cassadra has a feature called shared static columns; this allows you to bind values with partition part of primary key. Thus, you can create "joined" version of table without duplicates.
CREATE TABLE t (
p_id uuid,
p_fullname text STATIC,
p_picture text STATIC,
s_id uuid,
s_post text,
s_comments text,
s_state text,
s_city text,
PRIMARY KEY (p_id, s_id)
);

Cassandra table based query and primary key uniqueness

I have read here that for a table like:
CREATE TABLE user (
username text,
password text,
email text,
company text,
PRIMARY KEY (username)
);
We can create a table like:
CREATE TABLE user_by_company (
company text,
username text,
email text,
PRIMARY KEY (company)
);
In order to support query by the company. But what about primary key uniqueness for the second table?
Modify your table's PRIMARY KEY definition and add username as a clustering key:
CREATE TABLE user_by_company (
company text,
username text,
email text,
PRIMARY KEY (company,username)
);
That will enforce uniqueness, as well as return all usernames for a particular company. Additionally, your result set will be sorted in ascending order by username.
data will be partitioned by the company name over nodes. What if there is a lot of users from one company and less from other one. Data will be partition'ed in a non balanced way
That's the balance that you have to figure out on your own. PRIMARY KEY definition in Cassandra is a give-and-take between data distribution and query flexibility. And unless the cardinality of company is very low (like single digits), you shouldn't have to worry about creating hot spots in your cluster.
Also, if one particular company gets too big, you can use a modeling technique known as "bucketing." If I was going to "bucket" your user_by_company table, I would first add a company_bucket column, and it as an additional (composite) partitioning key:
CREATE TABLE user_by_company (
company text,
company_bucket text,
username text,
email text,
PRIMARY KEY ((company,company_bucket),username)
);
As for what to put into that bucket, it's up to you. Maybe that particular company has East and West locations, so something like this might work:
INSERT INTO user_by_company (company,company_bucket,username,email)
VALUES ('Acme','West','Jayne','jcobb#serenity.com');
The drawback here, is that you would then have to provide company_bucket whenever querying that table. But it is a solution that could help you if a company should get too big.
I think there is typo in the blog (the link you mentioned). You are right with the table structure as user_by_company there will be issue with uniqueness.
To support the typo theory:
In this case, creating a secondary index in the company field in the
user table could be a solution because it has much lower cardinality
than the user's email but let’s solve it with performance in mind.
Secondary indexes are always slower than dedicated table approach.
This are the lines mentioned in the blog for querying user by company.
If you were to define company as primary key OR part of primary key there should be no need to create secondary index.

Text based selection of records using CQL

I have a text field like 'address' in my Cassandra table. I want to search records on the basis of some piece of text from the 'address' field like city or street name.
for Example: I have address like 'House No. 18, Shehzad Colony, M.D.A. Chowk Lahore'. Here I want to search records having a part of string 'M.D.A. Chowk Lahore' in the address field.
how can i do this using CQL shell. can anyone guide me...
thanks...
There really isn't a way to do this out-of-the-box. In Cassandra, you need to design your tables to fit your query patterns. So if searching for addresses by city (or whatever) is a pattern you need to support, then there are a couple of ways to do this.
You can create a new query table, and partition by city:
CREATE TABLE userAddressesByCity (
userID uuid,
firstName text,
lastName text,
street text,
city text,
province text,
postalCode text,
PRIMARY KEY (city,userID));
This table structure would support querying by city as a partition key, and it also has userID as a clustering key to ensure uniqueness.
If you're working with addresses, a useful technique is to create a User Defined Type (UDT). UDTs are useful if you want to store a user's address in a single column. But you would still want to create a table specifically-designed to serve a query by whichever column you require.
Note: You could try one table and create a secondary index on one of the columns, but secondary indexes perform poorly at-scale, so I don't recommend that.

cqlsh:create super and sub columns?

In cqlsh I want to create 1 super column address. Then below the address I want to create 2 sub columns, permanent and temporary address.
How can I do that using cql shell?
Super columns are obsolete. Try to make sure any documentation, books, or blogs you read are recent.
phact is right, you will want to distance yourself from anything that talks about super columns. The way to solve this with cql (from within cqlsh) is to create address as a user-defined type:
CREATE TYPE address (
street text,
city text,
postal text,
country text
);
Then you could build a table to implement a MAP of the address type.
CREATE TABLE users (
login text PRIMARY KEY,
first_name text,
last_name text,
addresses map<text, frozen <address>>
);
To INSERT values from cqlsh, you could use something like this:
INSERT INTO users (login,first_name,last_name,addresses)
VALUES ('jones','Theora','Jones',{'work':{street:'101 Big Network Drive',city:'New York', postal:'10023',country:'USA'},
'home':{street:'821 Wembley St.',city:'London',postal:'W11 2BQ',country:'GBR'}});

Resources