cassandra is giving timeout exception in pentaho kettle - cassandra

i am using cassandra nosql database for transformation in pentaho data integration.
At the time of manually checking the connection it will connect but while executing the transformation it is giving me timeout exception..
i increased the "request_timeout" but problem is still their. and as per my knowledge data in the cassandra database is increasing then only facing such problems.
so is it some problem with PDI tool or because of cassandra database it-self?
And how can i resolve this problem?

In cassandra.yaml, you need to increase the parameter read_request_timeout_in_ms to a higher number like 20000. The default number is 10000, and for selects with a huge limit and 10 or 20 columns, you may expect TimeoutException.
Increasing this value will make cassandra wait more for query completion.
I tried this in my database and it worked.

Related

Increase request timeout for CQL from NiFi

I am using QueryCassandra processor in NiFi to fetch data from Cassandra but my query is getting timedoutexception. I want to increase the request time out while running the CQL query from the processor. Is there a way to do that or I will have to write a custom processor?
Most probably you're getting an exception because you're performing query on non-partition key - in this case, the query is distributed to the all nodes, and requires to go through all available data, and this is very slow if you have big data set.
In Cassandra queries are fast only when you're performing them on (at least) partition key. If you need to search on non-partition column, then you need to re-model your tables to match your queries. I recommend to take DS220 course on DataStax Academy for better understanding how Cassandra works.
As #Alex ott said, it is not recommended to query on non partition key. If you still want to do so and increase the timeout for the query, just property Max Wait Time to whatever timeout you want.
EDIT:
tl;dr: Apache's timeout wrapper doesn't really let you use the timeout option.
Now that you mentioned that this is a DataStax exception and not java.util.concurrent.TimeoutException, I can tell you that I've looked into QueryCassandra processor's source code and it seems like Apache just wrapped the query function with a Future to achieve a timeout instead of using DataStax's built-in timeout option. This results in a default non-changeable timeout by the DataStax driver. It should be reported to Apache as a bug.

Cassandra unpredictable failure depending on WHERE clause

I am attempting to execute a SELECT statement against a large Cassandra table (10m rows) with various WHERE clauses. I am issuing these from the Datastax DevCenter application. The columns I am using in the where clause have secondary indexes.
The where clause looks like WHERE fileid = 18000 or alternatively WHERE fileid < 18000. In this example, the second where clause results in the error Unable to execute CQL script on 'connection1': Cassandra failure during read query at consistency ONE (1 responses were required but only 0 replica responded, 1 failed)
I have no idea why it is failing in this unpredictable manner. Any ideas?
NOTE: I am aware that this is a terrible idea, and Cassandra is not meant to be used in this way. I am issuing these queries and timing them to prove to others how inefficient Cassandra is for our use case compared to other solutions.
Your query is probably failing because of a READ timeout (the timeout on waiting to read data). You could try updating the Cassandra.yaml with a larger read timeout time with read_request_timeout_in_ms: 200000 (for 200s) to give an output rather than an error. However, if you're trying to prove the inefficiency of Cassandra in your use case, this error seems like a pretty good way to do it.

DSE Cassandra OperationTimedOutException

I have a table with very huge data in cassandra. I am trying to read the data on partition using dse driver through hadoop mapreduce program. For some partition there might be more than 100 Million rows in the partition and when I am trying to read these partitions I am getting OperationTimedOutException.
Below is the stack trace -
com.datastax.driver.core.exceptions.OperationTimedOutException: [X.X.X.X/X.X.X.X:XXXX] Timed out waiting for server response
at com.datastax.driver.core.exceptions.OperationTimedOutException.copy(OperationTimedOutException.java:35)
at com.datastax.driver.core.exceptions.OperationTimedOutException.copy(OperationTimedOutException.java:17)
at com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:28)
at com.datastax.driver.core.ArrayBackedResultSet$MultiPage.prepareNextRow(ArrayBackedResultSet.java:304)
at com.datastax.driver.core.ArrayBackedResultSet$MultiPage.isExhausted(ArrayBackedResultSet.java:260)
at com.datastax.driver.core.ArrayBackedResultSet$1.hasNext(ArrayBackedResultSet.java:134)
at com.datastax.driver.core.ArrayBackedResultSet.all(ArrayBackedResultSet.java:123)
I have tried below things and it didn't work out for me-
setFetchSize() to large number like 250000.
setReadTimeoutMillis() to 30 seconds
Any help is appreciated :)
It seems Cassandra is not able to fulfil request within specified time. You can increase below parameter in cassandra.yaml file to deal with timeouts to an extent only not much higher.
read_request_timeout_in_ms:
write_request_timeout_in_ms:
If it doesn't help then you should look into your cassandra's log for other anomalies like tombstone etc..
Cassandra configuration file got parameter to limit the response time. If the response takes more time, you are bound to get Timed out waiting for server response. But this can be configured manually in cassandra.yaml file.
Please change the below parameters as per need :
# How long the coordinator should wait for read operations to complete
read_request_timeout_in_ms: 5000
# The default timeout for other, miscellaneous operations
request_timeout_in_ms: 10000
In case you if you don't have access to cassandra configuration file. Use Pagination feature of cassandra to Query large results or you can handle pagination in your code itself.

Fetching more 100000 rows in Cassandra

I am currently using Cassandra 1.6.6, however I having a big problem. I am trying to fetch more than 100000 rows using the LIMIT clause, however I always get the error below and then the database just shutdown.
TSocket read 0 bytes.
Secondly, does any know how to update all the rows in a Cassandra database.
Thanks awaiting your reply. I just can't find anything online, very distress.
TSocket read 0 bytes means you lost the connection to Cassandra, possibly due to the timeout that stops running a malformed query that would cause system instability. I don't think you can run one query that updates all rows because you need to specify the unique key to update a row.

RPC Timeout in Cassandra

I get the following error:
cqlsh:dev> SELECT DISTINCT id FROM raw_data;
Request did not complete within rpc_timeout.
This is a special query that I'll never make again, I don't care how long it takes, and I don't want to change my schema (since I'll never make the query again...).
How can I increase rpc_timeout for this one query?
I have tried adding LIMIT 9999 and ALLOW FILTERING, and it doesn't help. I expect less than 1000 rows in the result. The query works on another Cassandra cluster with half as much data.
Edit: as it turns out, this particular command succeeded after I ran nodetool compact, but what I'm more interested in the general case of temporarily increasing rpc_timeout for one query.
increase the read request time in cassandra.yaml file under /cassandra/conf
read_request_timeout_in_ms: 30000
change this restart server and execute your query, might be your problem get resolved.

Resources