Fetching more 100000 rows in Cassandra - cassandra

I am currently using Cassandra 1.6.6, however I having a big problem. I am trying to fetch more than 100000 rows using the LIMIT clause, however I always get the error below and then the database just shutdown.
TSocket read 0 bytes.
Secondly, does any know how to update all the rows in a Cassandra database.
Thanks awaiting your reply. I just can't find anything online, very distress.

TSocket read 0 bytes means you lost the connection to Cassandra, possibly due to the timeout that stops running a malformed query that would cause system instability. I don't think you can run one query that updates all rows because you need to specify the unique key to update a row.

Related

Cassandra - Truncate a table while inserts in progress

I want to understand how the truncate command works in Cassandra (version 3.9) to be able to know what would happen in the following scenario:
I have about 100GB of data on a table in production on a table that needs to be truncated.
I want to truncate this table, but at the same time there will be a few hundred requests per second that will be making inserts at the same time.
I am trying to understand, theoretically how would this play out.
Would the truncate try to acquire some sort of a lock on the table before it can proceed? and possibly stop the insert requests or itself be timed out?
Or would the truncate go through in sequence as the request came in and following insert requests would create the additional rows and I would end up with a small number of rows remaining after the truncate.
I am just trying to reclaim space, so I am not particularly concerned if a small amount of data remains from the insert requests run after the truncate command.
I am just trying to understand if you'd expect this to complete successfully or it would fail / time-out.
I will try to run a similar scenario on a smaller cluster, but I'm not sure if that will be a good substitute to understand the actual behavior. Any inputs will be helpful.
Truncate sends a message to all the nodes with a request to delete all the SSTables at the moment of execution, you will have information only of those upserts received after the truncate was issued.
In the Datastax documentation it is stated that this is done with JMX, but looking at the comments of this answer, this is done with CQL and the messaging service.
If you are trying to reclaim disk space, please note that a snapshot will be created with the truncate if auto_snapshot is set to true (true is the default value), so you will need to remove the snapshot after the execution of the command. Also, note that truncate will require to have all the nodes to be up and healthy to be able to complete.
I tried this for myself. On a 2 node Cassandra cluster I Made inserts at about 160 requests per second in the background and ran a truncate query on the same table that had about 200,000 records.
The table got truncated and the inserts continued without an error.
The new rows inserted after the truncate showed on the DB.

Cassandra unpredictable failure depending on WHERE clause

I am attempting to execute a SELECT statement against a large Cassandra table (10m rows) with various WHERE clauses. I am issuing these from the Datastax DevCenter application. The columns I am using in the where clause have secondary indexes.
The where clause looks like WHERE fileid = 18000 or alternatively WHERE fileid < 18000. In this example, the second where clause results in the error Unable to execute CQL script on 'connection1': Cassandra failure during read query at consistency ONE (1 responses were required but only 0 replica responded, 1 failed)
I have no idea why it is failing in this unpredictable manner. Any ideas?
NOTE: I am aware that this is a terrible idea, and Cassandra is not meant to be used in this way. I am issuing these queries and timing them to prove to others how inefficient Cassandra is for our use case compared to other solutions.
Your query is probably failing because of a READ timeout (the timeout on waiting to read data). You could try updating the Cassandra.yaml with a larger read timeout time with read_request_timeout_in_ms: 200000 (for 200s) to give an output rather than an error. However, if you're trying to prove the inefficiency of Cassandra in your use case, this error seems like a pretty good way to do it.

Sails fails to return or error on large result set

I am bypassing the ORM and using the Model.query function to query and return a "large" result set from PostgreSQL. The query returns around 2 million rows. When running the query directly from postgres it returns in around 20s. The query fails silently when executed from sails. Is there a limit on the number of rows that can be returned?
Is there a limit on the number of rows that can be returned?
No there is no limit.
The query fails silently when executed from sails
What does "fails silently" mean? How do you know that it's failed? It may still be processing; or the adapter might have a connection timeout that you're breaching.
Two million rows serialized out of the database, translated to JSON, and piped down to the client is much different than just running SQL directly on the database. It could take 20x longer, depending on your system resource situation. I strongly recommend using sails.js's paging features to pull the data out in chunks. Pulling two million rows in one operation from a web server doesn't make a lot of sense.

RPC Timeout in Cassandra

I get the following error:
cqlsh:dev> SELECT DISTINCT id FROM raw_data;
Request did not complete within rpc_timeout.
This is a special query that I'll never make again, I don't care how long it takes, and I don't want to change my schema (since I'll never make the query again...).
How can I increase rpc_timeout for this one query?
I have tried adding LIMIT 9999 and ALLOW FILTERING, and it doesn't help. I expect less than 1000 rows in the result. The query works on another Cassandra cluster with half as much data.
Edit: as it turns out, this particular command succeeded after I ran nodetool compact, but what I'm more interested in the general case of temporarily increasing rpc_timeout for one query.
increase the read request time in cassandra.yaml file under /cassandra/conf
read_request_timeout_in_ms: 30000
change this restart server and execute your query, might be your problem get resolved.

cassandra is giving timeout exception in pentaho kettle

i am using cassandra nosql database for transformation in pentaho data integration.
At the time of manually checking the connection it will connect but while executing the transformation it is giving me timeout exception..
i increased the "request_timeout" but problem is still their. and as per my knowledge data in the cassandra database is increasing then only facing such problems.
so is it some problem with PDI tool or because of cassandra database it-self?
And how can i resolve this problem?
In cassandra.yaml, you need to increase the parameter read_request_timeout_in_ms to a higher number like 20000. The default number is 10000, and for selects with a huge limit and 10 or 20 columns, you may expect TimeoutException.
Increasing this value will make cassandra wait more for query completion.
I tried this in my database and it worked.

Resources