DSE Cassandra OperationTimedOutException - cassandra

I have a table with very huge data in cassandra. I am trying to read the data on partition using dse driver through hadoop mapreduce program. For some partition there might be more than 100 Million rows in the partition and when I am trying to read these partitions I am getting OperationTimedOutException.
Below is the stack trace -
com.datastax.driver.core.exceptions.OperationTimedOutException: [X.X.X.X/X.X.X.X:XXXX] Timed out waiting for server response
at com.datastax.driver.core.exceptions.OperationTimedOutException.copy(OperationTimedOutException.java:35)
at com.datastax.driver.core.exceptions.OperationTimedOutException.copy(OperationTimedOutException.java:17)
at com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:28)
at com.datastax.driver.core.ArrayBackedResultSet$MultiPage.prepareNextRow(ArrayBackedResultSet.java:304)
at com.datastax.driver.core.ArrayBackedResultSet$MultiPage.isExhausted(ArrayBackedResultSet.java:260)
at com.datastax.driver.core.ArrayBackedResultSet$1.hasNext(ArrayBackedResultSet.java:134)
at com.datastax.driver.core.ArrayBackedResultSet.all(ArrayBackedResultSet.java:123)
I have tried below things and it didn't work out for me-
setFetchSize() to large number like 250000.
setReadTimeoutMillis() to 30 seconds
Any help is appreciated :)

It seems Cassandra is not able to fulfil request within specified time. You can increase below parameter in cassandra.yaml file to deal with timeouts to an extent only not much higher.
read_request_timeout_in_ms:
write_request_timeout_in_ms:
If it doesn't help then you should look into your cassandra's log for other anomalies like tombstone etc..

Cassandra configuration file got parameter to limit the response time. If the response takes more time, you are bound to get Timed out waiting for server response. But this can be configured manually in cassandra.yaml file.
Please change the below parameters as per need :
# How long the coordinator should wait for read operations to complete
read_request_timeout_in_ms: 5000
# The default timeout for other, miscellaneous operations
request_timeout_in_ms: 10000
In case you if you don't have access to cassandra configuration file. Use Pagination feature of cassandra to Query large results or you can handle pagination in your code itself.

Related

Is it possible to configure different write timeouts per Cassandra table?

I have a datastax cassandra cluster with 8 nodes. The keyspace used by the application contains about 400 Tables. The parameter write_request_timeout_in_ms in the cassandra.yaml is set to 2000ms (default).
The default value is high enough for most tables. However, I for only two tables I require a much higher write_request_timeout. I know that stuff such as bloomfilter false-positive chance or compaction strategy can be configured per table.
Is that possible to do the same for timeouts and if so then how?
Regards
It isn't possible to configure different write timeouts because all writes are persisted to the same commitlog disk.
A coordinator will return a write timeout if not enough replicas (based on the write consistency level) acknowledged the write (to the commitlog disk) because the disk is busy.
Since there is only one commitlog disk on each node, it makes no sense to have different write timeouts. This in fact raises another question -- what problem are you trying to solve?
Increasing timeouts is almost never the right thing to do since all it does is hide the problem. You need to identify the root of the issue and fix it. Cheers!

Increase request timeout for CQL from NiFi

I am using QueryCassandra processor in NiFi to fetch data from Cassandra but my query is getting timedoutexception. I want to increase the request time out while running the CQL query from the processor. Is there a way to do that or I will have to write a custom processor?
Most probably you're getting an exception because you're performing query on non-partition key - in this case, the query is distributed to the all nodes, and requires to go through all available data, and this is very slow if you have big data set.
In Cassandra queries are fast only when you're performing them on (at least) partition key. If you need to search on non-partition column, then you need to re-model your tables to match your queries. I recommend to take DS220 course on DataStax Academy for better understanding how Cassandra works.
As #Alex ott said, it is not recommended to query on non partition key. If you still want to do so and increase the timeout for the query, just property Max Wait Time to whatever timeout you want.
EDIT:
tl;dr: Apache's timeout wrapper doesn't really let you use the timeout option.
Now that you mentioned that this is a DataStax exception and not java.util.concurrent.TimeoutException, I can tell you that I've looked into QueryCassandra processor's source code and it seems like Apache just wrapped the query function with a Future to achieve a timeout instead of using DataStax's built-in timeout option. This results in a default non-changeable timeout by the DataStax driver. It should be reported to Apache as a bug.

Cassandra vs Cassandra+Ignite

(Single Node Cluster)I've got a table having 2 columns, one is of 'text' type and the other is a 'blob'. I'm using Datastax's C++ driver to perform read/write requests in Cassandra.
The blob is storing a C++ structure.(Size: 7 KB).
Since I was getting lesser than desirable throughput when using Cassandra alone, I tried adding Ignite on top of Cassandra, in the hope that there will be significant improvement in the performance as now the data will be read from RAM instead of hard disks.
However, it turned out that after adding Ignite, the performance dropped even more(roughly around 50%!).
Read Throughput when using only Cassandra: 21000 rows/second.
Read Throughput with Cassandra + Ignite: 9000 rows/second.
Since, I am storing a C++ structure in Cassandra's Blob, the Ignite API uses serialization/de-serialization while writing/reading the data. Is this the reason, for the drop in the performance(consider the size of the structure i.e. 7K) or is this drop not at all expected and maybe something's wrong in the configuration?
Cassandra: 3.11.2
RHEL: 6.5
Configurations for Ignite are same as given here.
I got significant improvement in Ignite+Cassandra throughput when I used serialization in raw mode. Now the throughput has increased from 9000 rows/second to 23000 rows/second. But still, it's not significantly superior to Cassandra. I'm still hopeful to find some more tweaks which will improve this further.
I've added some more details about the configurations and client code on github.
Looks like you do one get per each key in this benchmark for Ignite and you didn't invoke loadCache before it. In this case, on each get, Ignite will go to Cassandra to get value from it and only after it will store it in the cache. So, I'd recommend invoking loadCache before benchmarking, or, at least, test gets on the same keys, to give an opportunity to Ignite to store keys in the cache. If you think you already have all the data in caches, please share code where you write data to Ignite too.
Also, you invoke "grid.GetCache" in each thread - it won't take a lot of time, but you definitely should avoid such things inside benchmark, when you already measure time.

RPC Timeout in Cassandra

I get the following error:
cqlsh:dev> SELECT DISTINCT id FROM raw_data;
Request did not complete within rpc_timeout.
This is a special query that I'll never make again, I don't care how long it takes, and I don't want to change my schema (since I'll never make the query again...).
How can I increase rpc_timeout for this one query?
I have tried adding LIMIT 9999 and ALLOW FILTERING, and it doesn't help. I expect less than 1000 rows in the result. The query works on another Cassandra cluster with half as much data.
Edit: as it turns out, this particular command succeeded after I ran nodetool compact, but what I'm more interested in the general case of temporarily increasing rpc_timeout for one query.
increase the read request time in cassandra.yaml file under /cassandra/conf
read_request_timeout_in_ms: 30000
change this restart server and execute your query, might be your problem get resolved.

cassandra is giving timeout exception in pentaho kettle

i am using cassandra nosql database for transformation in pentaho data integration.
At the time of manually checking the connection it will connect but while executing the transformation it is giving me timeout exception..
i increased the "request_timeout" but problem is still their. and as per my knowledge data in the cassandra database is increasing then only facing such problems.
so is it some problem with PDI tool or because of cassandra database it-self?
And how can i resolve this problem?
In cassandra.yaml, you need to increase the parameter read_request_timeout_in_ms to a higher number like 20000. The default number is 10000, and for selects with a huge limit and 10 or 20 columns, you may expect TimeoutException.
Increasing this value will make cassandra wait more for query completion.
I tried this in my database and it worked.

Resources