Revoke cassandra accidental update - cassandra

Is there a way to revoke a columnfamily update command? I tried to update a column but ended up with "update columnfamily dev ; " and now i see only the ids when I query. But the data seem to exist there if I run a nodetool status. I tried to restore a snapshot but even that did not help.

So if I get you correctly you've erased your column metadata and you now get something like this:
cqlsh:test> select * from user;
uuid
--------------------------------------
fd24b190-072d-11e3-a1c4-97db6b0653ce
054a43d0-072e-11e3-a1c4-97db6b0653ce
0aa71920-072e-11e3-a1c4-97db6b0653ce
07fda400-072e-11e3-a1c4-97db6b0653ce
while you wanted something like this:
uuid | email | name
--------------------------------------+----------------------+-------
fd24b190-072d-11e3-a1c4-97db6b0653ce | user0#somedomain.com | User0
054a43d0-072e-11e3-a1c4-97db6b0653ce | user1#somedomain.com | User1
0aa71920-072e-11e3-a1c4-97db6b0653ce | user3#somedomain.com | User3
07fda400-072e-11e3-a1c4-97db6b0653ce | user2#somedomain.com | User2
You can get the data back by adding the information about the columns.
Given the original table was defined like this:
CREATE TABLE user(
uuid timeuuid PRIMARY KEY,
name varchar,
email varchar
);
You can add missing column information using CQL:
cqlsh:test> ALTER TABLE user ADD email varchar;
cqlsh:test> ALTER TABLE user ADD name varchar;

Related

Get value from specific map-key in Cassandra

For example. I have a map under the column 'users' in a table called 'table' with primary key 'Id'.
If the map looks like this, {{'Phone': '1234567899'}, {'City': 'Dublin'}}, I want to get the value from key 'Phone' for specific 'Id', in Cassandra database.
Yes, that's possible to do with CQL when using a MAP collection.
To test this, I created a simple table using the specifications and data you mentioned above:
> CREATE TABLE stackoverflow.usermap (
id text PRIMARY KEY,
users map<text, text>);
> INSERT INTO usermap (id,users)
VALUES ('1a',{'Phone': '1234567899','City': 'Dublin'});
> SELECT * FROM usermap WHERE id='1a';
id | users
----+-------------------------------------------
1a | {'City': 'Dublin', 'Phone': '1234567899'}
(1 rows)
Then, I queried with the same WHERE clause, but altering my SELECT to pull back the user's phone only:
> SELECT users['Phone'] FROM usermap WHERE id='1a';
users['Phone']
----------------
1234567899
(1 rows)

How to copy datas from one table to another table with different Schemas? (Cassandra)

I have a table named table1 which includes the following
+----------+-------+
|date |count |
+----------+-------+
|2022-01-07|2 |
|2022-01-06|0 |
|2022-01-05|1 |
+----------+-------+
Now I need to copy this table(table1) and paste this into a new table(table2) with a different schema. The new table should look like this
+----+----------+-------+
|type|date |count |
+----+----------+-------+
|Typ1|2022-01-07|2 |
|Typ1|2022-01-06|0 |
|Typ1|2022-01-05|1 |
+----+----------+-------+
Now the problems are:
I cannot use cqlsh COPY command as the scheme of both the tables is different.
I cannot manually add the data to table2 because the table1 has 1000s of rows
The schema of the tables are:
Table1:
CREATE TABLE table1(
date date PRIMARY KEY,
count bigint
);
Table2:
CREATE TABLE table2(
type text,
date date ,
count bigint,
PRIMARY KEY(type, date)
);
You want to populate data of one table into another table. You can write a utility to do this. This utility will read your first table and push data into another table. If you can use spark, then you can do it pretty fast.

cassandra data consistency issue

Hi I'm new in Apache Cassandra and I found article about Basic Rules of Cassandra Data Modeling. In example 1 are created 2 tables
CREATE TABLE users_by_username (
username text PRIMARY KEY,
email text,
age int
)
CREATE TABLE users_by_email (
email text PRIMARY KEY,
username text,
age int
)
This tables contains same data (username, email and age). Here I don't understand how to insert data into two tables. I think, that I have to execute two separate inserts. One for table users_by_username and one for table users_by_email. But how to maintain data consistency between tables. For example when I insert data into first table and I forgot to insert data to second table ... or the other way
It's your job as developer to make sure that data is in sync. Although, you can use things like materialized views to generate another "table" with slightly different primary key (there are some rules on what could be changed). For your case, for example, you can have following:
CREATE TABLE users_by_username (username text PRIMARY KEY,
email text, age int);
create MATERIALIZED VIEW users_by_email as SELECT * from
users_by_username where email is not null and
username is not null primary key (email, username);
and if you insert data as
insert into users_by_username (username, email, age)
values ('test', 'test#domain.com', 30);
you can query the materialized view for data in addition to query by username
SELECT * from users_by_username where username = 'test' ;
username | age | email
----------+-----+-----------------
test | 30 | test#domain.com
SELECT * from users_by_email where email = 'test#domain.com';
email | username | age
-----------------+----------+-----
test#domain.com | test | 30

Cassandra - alternate way for clustering key with ORDER BY and UPDATE

My schema is :
CREATE TABLE friends (
userId timeuuid,
friendId timeuuid,
status varchar,
ts timeuuid,
PRIMARY KEY (userId,friendId)
);
CREATE TABLE friends_by_status (
userId timeuuid,
friendId timeuuid,
status varchar,
ts timeuuid,
PRIMARY KEY ((userId,status), ts)
)with clustering order by (ts desc);
Here, whenever a friend-request is made, I'll insert record in both tables.
When I want to check one to one status of users, i'll use this query:
SELECT status FROM friends WHERE userId=xxx AND friendId=xxx;
When I need to query all the records with pending status, i'll use :
SELECT * FROM friends_by_status WHERE userId=xxx AND status='pending';
But, when there is a status change, I can update the 'status' and 'ts' in the 'friends' table, but not in the 'friends_by_status' table as both are part of PRIMARY KEY.
You could see that even if I denormalise it, I definitely need to update 'status' and 'ts' in 'friends_by_status' table to maintain consistency.
Only way I can maintain consistency is to delete the record and insert again.
But frequent delete is also not recommended in cassandra model. As said in Cassaandra Spottify summit.
I find this as the biggest limitation in Cassandra.
Is there any other way to sort this issue.
Any solution is appreciated.
I don't know how soon you need to deploy this, but in Cassandra 3.0 you could handle this with a materialized view. Your friends table would be the base table, and the friends_by_status would be a view of the base table. Cassandra would take care updating the view when you changed the base table.
For example:
CREATE TABLE friends ( userid int, friendid int, status varchar, ts timeuuid, PRIMARY KEY (userId,friendId) );
CREATE MATERIALIZED VIEW friends_by_status AS
SELECT userId from friends WHERE userID IS NOT NULL AND friendId IS NOT NULL AND status IS NOT NULL AND ts IS NOT NULL
PRIMARY KEY ((userId,status), friendID);
INSERT INTO friends (userid, friendid, status, ts) VALUES (1, 500, 'pending', now());
INSERT INTO friends (userid, friendid, status, ts) VALUES (1, 501, 'accepted', now());
INSERT INTO friends (userid, friendid, status, ts) VALUES (1, 502, 'pending', now());
SELECT * FROM friends;
userid | friendid | status | ts
--------+----------+----------+--------------------------------------
1 | 500 | pending | a02f7fe0-49f9-11e5-9e3c-ab179e6a6326
1 | 501 | accepted | a6c80980-49f9-11e5-9e3c-ab179e6a6326
1 | 502 | pending | add10830-49f9-11e5-9e3c-ab179e6a6326
So now in the view you can select rows by the status:
SELECT * FROM friends_by_status WHERE userid=1 AND status='pending';
userid | status | friendid
--------+---------+----------
1 | pending | 500
1 | pending | 502
(2 rows)
And then when you update the status in the base table, it automatically updates in the view:
UPDATE friends SET status='pending' WHERE userid=1 AND friendid=501;
SELECT * FROM friends_by_status WHERE userid=1 AND status='pending';
userid | status | friendid
--------+---------+----------
1 | pending | 500
1 | pending | 501
1 | pending | 502
(3 rows)
But note that in the view you couldn't have ts as part of the key, since you can only add one non-key field from the base table as part of the key in the view, which in your case would be adding 'status' to the key.
I think the first beta release for 3.0 is coming out tomorrow if you want to try this out.
Why do you need status to be in the primary key for your second table? If this was your schema:
CREATE TABLE friends_by_status (
userId timeuuid,
friendId timeuuid,
status varchar,
ts timeuuid,
PRIMARY KEY ((userId), status, ts)
with clustering order by (ts desc));
you can update the status as needed and still filter by it. You will be storing more data under one partition but it seems like you are storing one row for each friend a user has. This will be the same as in the first table, so I don't see partition size being a problem.

Primary key in cassandra is unique?

It could be kind of lame but in cassandra has the primary key to be unique?
For example in the following table:
CREATE TABLE users (
name text,
surname text,
age int,
adress text,
PRIMARY KEY(name, surname)
);
So if is it possible in my database to have 2 persons in my database with the same name and surname but different ages? Which means same primary key..
Yes the primary key has to be unique. Otherwise there would be no way to know which row to return when you query with a duplicate key.
In your case you can have 2 rows with the same name or with the same surname but not both.
By definition, the primary key has to be unique. But that doesn't mean you can't accomplish your goals. You just need to change your approach/terminology.
First of all, if you relax your goal of having the name+surname be a primary key, you can do the following:
CREATE TABLE users ( name text, surname text, age int, address text, PRIMARY KEY((name, surname),age) );
insert into users (name,surname,age,address) values ('name1','surname1',10,'address1');
insert into users (name,surname,age,address) values ('name1','surname1',30,'address2');
select * from users where name='name1' and surname='surname1';
name | surname | age | address
-------+----------+-----+----------
name1 | surname1 | 10 | address1
name1 | surname1 | 30 | address2
If, on the other hand, you wanted to ensure that the address is shared as well, then you probably just want to store a collection of ages in the user record. That could be achieved by:
CREATE TABLE users2 ( name text, surname text, age set<int>, address text, PRIMARY KEY(name, surname) );
insert into users2 (name,surname,age,address) values ('name1','surname1',{10,30},'address2');
select * from users2 where name='name1' and surname='surname1';
name | surname | address | age
-------+----------+----------+----------
name1 | surname1 | address2 | {10, 30}
So it comes back to what you actually need to accomplish. Hopefully the above examples give you some ideas.
The primary key is unique. With your data model, you can only have one age per (name, surname) combination.
Yes as mentioned in above comments you can have a composite key with name, surname, and age to achieve your goal but still, that won't solve the problem. Rather you can consider adding a new column userID and make that as the primary key. So even in case of name, surname and age duplicate, you don't have to revisit your data model.
CREATE TABLE users (
userId int,
name text,
surname text,
age int,
adress text,
PRIMARY KEY(userid)
);
I would state specifically that partition key should be unique.I could not get it in one place but from the following statements.
Cassandra needs all the partition key columns to be able to compute
the hash that will allow it to locate the nodes containing the
partition.
The partition key has a special use in Apache Cassandra beyond
showing the uniqueness of the record in the database..
Please note that there will not be any error if you insert same
partition key again and again as there is no constraint check.
Queries that you'll run equality searches on should be in a partition
key.
References
https://www.datastax.com/dev/blog/a-deep-look-to-the-cql-where-clause
how Cassandra chooses the coordinator node and the replication nodes?
Insert query replaces rows having same data field in Cassandra clustering column

Resources