Cassandra WriteTimeout when "IS NOT NULL" condition of materialized view not met - cassandra

I have this table and materialized view:
CREATE TABLE bubu (
a text,
b text,
c text,
d text,
PRIMARY KEY (a, b));
CREATE MATERIALIZED VIEW bubu_mv AS
SELECT *
FROM bubu
WHERE a IS NOT NULL AND b IS NOT NULL AND c IS NOT NULL
PRIMARY KEY (a, c, b)
When I do this insert and then this update, the insert succeeds but the update returns write timeout:
insert into bubu (a,b,c,d) values ('1','2',null,'3');
update bubu set d = '5' where a = '1' and b = '2';
WriteTimeout: Error from server: code=1100 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'}
this happens on cassandra 3.7 and does not happen on cassandra 3.2.1 (just 2 versions I tried).
also, if the insert is like this, it does not happen:
insert into bubu (a,b) values ('1','2');
If I increase write_request_timeout_in_ms in conf/cassandra.yaml to 20000, I get timeout after ~10 seconds:
update bubu set d = '5' where a = '1' and b = '2';
OperationTimedOut: errors={}, last_host=127.0.0.1
Is this a bug in cassandra or am I doing something wrong?

Can you increase timeout value in the config file conf/cassandra.yaml and check. Restart your server if it requires.
write_request_timeout_in_ms: 20000

Related

Why does this combination of foreign + primary key produce a Postgres deadlock?

Start with these two tables and this an initial record for c:
create table c
(
id serial primary key,
name varchar not null
);
create table e
(
id varchar not null,
c_id bigint references c (id) not null,
name varchar not null,
primary key (id, c_id)
);
insert into c (name) values ('deadlock test');
Thread 1:
begin;
select * from c where id = 1 for update;
insert into e (id, c_id, name) VALUES ('bar', 1, 'second') on conflict do nothing ;
commit;
Thread 2:
begin;
insert into e (id, c_id, name) VALUES ('bar', 1, 'first') on conflict do nothing ;
commit;
Execution order is:
Thread 1: begin
Thread 2: begin
Thread 1: lock c
Thread 2: insert e
Thread 1: insert e <-- deadlock
Why does this happen?
Adding a lock to c on Thread 2 of course avoids the deadlock, but it's not clear to me why. Also interesting is that if the row in e exists before Thread 1 or 2 run, then no deadlock happens.
I suspect there are at least two things going on:
The primary key creates a unique constraint that causes some sort of locking on e that I don't understand, even with the ON CONFLICT DO NOTHING.
The foreign key on c_id results in some sort of trigger causing a lock on c when a new record is inserted (or when c_id is updated, I presume).
Thanks!
To maintain integrity, every insert on e will lock the referenced row in c with a KEY SHARE lock. This keeps any concurrent transaction from deleting the row in c or modifying the primary key.
Such a KEY SHARE lock conflicts with the UPDATE lock session 1 took explicitly (see the documentation), so the INSERT of session 2 blocks - but it already inserted (and locked) an index tuple in the primary key index of e.
Now session 1 wants to insert a row with the same primary key that session 2 inserted, so it will block on the lock just taken by session 2, and the deadlock is perfect.
You probably wonder why ON CONFLICT DO NOTHING doesn't change the behavior. But PostgreSQL doesn't get there, because to know if there is a conflict, session 1 will have to wait until it knows if session 2 commits or rolls back. So the deadlock happens before we know if there will be a conflict or not.

How to reset record with counter column after delete it in Cassandra?

I create a table with counter column in using com.datastax.driver.core packages, and a function in class:
public void doStartingJob(){
session.execute("CREATE KEYSPACE myks WITH replication "
+ "= {'class':'SimpleStrategy', 'replication_factor':1};");
session.execute("CREATE TABLE myks.clients_count(ip_address text PRIMARY KEY,"
+ "request_count counter);");
}
After this I deleted table entry from CQLSH like:
jay#jayesh-380:~$ cqlsh
Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.9 | CQL spec 3.4.2 | Native protocol v4]
Use HELP for help.
cqlsh:myks> DELETE FROM clients_count WHERE ip_address='127.0.0.1';
Then to insert row with same primary-Key I used following statement(via cqlsh):
UPDATE myks.clients_count SET request_count = 1 WHERE ip_address ='127.0.0.1';
And it is not allowed as:
InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot set the value of counter column request_count (counters can only be incremented/decremented, not set)"
But, I want the value of record's counter column should be set to 1, and with same primary-Key. (Functional Requirement)
How to do the same ??
The usage of counters is a bit strange, but you'll get used to. The main thing however, is that counters cannot be reused. Once you delete a counter value for a particular primary key, this counter is lost forever. This is by design and I think is not going to change.
Back to your question, the first of your problems is the initial DELETE. Don't.
Second, if the counter value for a primary key doesn't exists, C* will treat it is zero by default. Following the documentation, to load data in the counter for the first time you have to issue:
UPDATE mysecurity.clients_count SET request_count = request_count + 1 WHERE ip_address ='127.0.0.1';
And a SELECT will return the correct answer: 1
Again, beware of deletes! Don't! If you do, any subsequent query:
UPDATE mysecurity.clients_count SET request_count = request_count + 1 WHERE ip_address ='127.0.0.1';
will NOT fail, but the counter will NOT be updated.
Another thing to note is that C* don't support atomic read and update (or update and read on counter columns. That is you cannot issue an update and get within the same query the new (or the old) value of the counter. You'll need to perform two distinct queries, one with SELECT and one with UPDATE, but in a multi-client environment the SELECT value you get could not reflect the counter value during the UPDATE.
Your app will definitely fail if you do underestimate this.

How to do multiget in CQL3 for composite row key?

CF schema:
CREATE TABLE mytable (
upperId int,
lowerId int,
hour timestamp,
counter text,
succ int,
fail int,
PRIMARY KEY ((upperId, lowerId), hour, counter));
each record is keyed by composite id upperId:lowerid, how can I do multiget with CQL3?
This is not valid:
select * from mytable where (upperid, lowerid) in ((10000, 1), (10000, 2), (20000, 1));
I can't do this either:
select * from mytable where (upperid = 10000 and lowerid in (1, 2)) or (upperid = 20000 and lowerid = 1);
I got error: missing EOF at ')'.
Please help point to effective way to do multiget for composite row key in CQL3.
Thanks,
William
CQL does not yet support a logical "or" in select statements.
Instead, in your application your could combine the result sets from the two queries:
select * from mytable where upperid = 10000 and lowerid in (1, 2);
select * from mytable where upperid = 20000 and lowerid = 1;
Reference:
SO question: Alternative for OR condition after where clause in select statement Cassandra
Latest CQL docs

Composite columns and "IN" relation in Cassandra

I have the following column family in Cassandra for storing time series data in a small number of very "wide" rows:
CREATE TABLE data_bucket (
day_of_year int,
minute_of_day int,
event_id int,
data ascii,
PRIMARY KEY (data_of_year, minute_of_day, event_id)
)
On the CQL shell, I am able to run a query such as this:
select * from data_bucket where day_of_year = 266 and minute_of_day = 244
and event_id in (4, 7, 11, 1990, 3433)
Essentially, I fix the value of the first component of the composite column name (minute_of_day) and want to select a non-contiguous set of columns based on the distinct values of the second component (event_id). Since the "IN" relation is interpreted as an equality relation, this works fine.
Now my question is, how would I accomplish the same type of composite column slicing programmatically and without CQL. So far I have tried the Python client pycassa and the Java client Astyanax, but without any success.
Any thoughts would be welcome.
EDIT:
I'm adding the describe output of the column family as seen through cassandra-cli. Since I am looking for a Thrift-based solution, maybe this will help.
ColumnFamily: data_bucket
Key Validation Class: org.apache.cassandra.db.marshal.Int32Type
Default column value validator: org.apache.cassandra.db.marshal.AsciiType
Cells sorted by: org.apache.cassandra.db.marshal.CompositeType(org.apache.cassandra.db.marshal.Int32Type,org.apache.cassandra.db.marshal.Int32Type)
GC grace seconds: 864000
Compaction min/max thresholds: 4/32
Read repair chance: 0.1
DC Local Read repair chance: 0.0
Populate IO Cache on flush: false
Replicate on write: true
Caching: KEYS_ONLY
Bloom Filter FP chance: default
Built indexes: []
Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
Compression Options:
sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor
There is no "IN"-type query in the Thrift API. You could perform a series of get queries for each composite column value (day_of_year, minute_of_day, event_id).
If your event_ids were sequential (and your question says they are not) you could perform a single get_slice query, passing in the range (e.g., day_of_year, minute_of_day, and range of event_ids). You could grab bunches of them in this way and filter the response programatically yourself (e.g., grab all data on the date with event ids between 4-3433). More data transfer, more processing on the client side so not a great option unless you really are looking for a range.
So, if you want to use "IN" with Cassandra you will need to switch to a CQL-based solution. If you are considering using CQL in python another option is cassandra-dbapi2. This worked for me:
import cql
# Replace settings as appropriate
host = 'localhost'
port = 9160
keyspace = 'keyspace_name'
# Connect
connection = cql.connect(host, port, keyspace, cql_version='3.0.1')
cursor = connection.cursor()
print "connected!"
# Execute CQL
cursor.execute("select * from data_bucket where day_of_year = 266 and minute_of_day = 244 and event_id in (4, 7, 11, 1990, 3433)")
for row in cursor:
print str(row) # Do something with your data
# Shut the connection
cursor.close()
connection.close()
(Tested with Cassandra 2.0.1.)

Inserting data in table with umlaut is not possible

I am using Cassandra 1.2.5 (cqlsh 3.0.2) and trying to inserting data in a small test-database with german characters which is not possible. I get back the message from cqlsh: "Bad Request: Input length = 1"
below is the setup of the keyspace, the table and the insert.
CREATE KEYSPACE test WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };
use test;
CREATE TABLE testdata (
id varchar,
text varchar,
PRIMARY KEY (id)
This is working:
insert into testdata (id, text) values ('4711', 'test');
This is not allowed:
insert into testdata (id, text) values ('4711', 'töst`);
->Bad Request: Input length = 1
my locale is :de_DE.UTF-8
Does Cassandra 1.2.5 has a problem with Umlaut ?
I just did what you posted and it worked for me. The one thing that was different however, is that instead of a single quote, you finished 'töst` with a backtick. That doesn't allow me to finish the statement in cqlsh. When I replace that with 'töst' it succeeds and I get:
cqlsh:test> select * from testdata;
id | text
------+------
4711 | töst

Resources