RPC timeout in cqlsh - Cassandra - cassandra

I have 5 nodes in my ring with SimpleTopologyStrategy and replication_factor=3. I inserted 1M rows using stress tool . When am trying to read the row count in cqlsh using
SELECT count(*) FROM Keyspace1.Standard1 limit 1000000;
It fails with error:
Request did not complete within rpc_timeout.
It fetches for limit 100000. Fails even for 500000.
All my nodes are up. Do I need to increase the rpc_timeout?
Please help.

You get this error because the request is timing out on the server side. One should know that this is a very expensive operation in Cassandra as others have pointed out.
Still, if you really want to do this you should update your /etc/cassandra/cassandra.yaml file and change the range_request_timeout_in_ms parameter. This will be valid for all your range queries.
Example to set a 40 second timeout:
range_request_timeout_in_ms: 40000
You will probably have to adjust at the client side as well. When using cqlsh as a client this is accomplished by creating/updating your configuration file for cqlsh under ~/.cassandra/cqlshrc and add the client_timeout parameter to the connection section.
Example to set a 40 second timeout:
[connection]
client_timeout=40

It takes a long time to read in 1M rows so that is probably why it is timing out. You shouldn't use count like this, it is very expensive since it has to read all the data. Use Cassandra counters if you need to count lots of items.
You should also check your Cassandra logs to confirm there aren't any other issues - sometimes exceptions in Cassandra lead to timeouts on the client.

If you can live with an approximate row count, take a look at this answer to Row count of a column family in Cassandra.

Related

Unable to delete large number of rows from Spanner

I have 3 node Spanner instance, and a single table that contains around 4 billion rows. The DDL looks like this:
CREATE TABLE predictions (
name STRING(MAX),
...,
model_version INT64,
) PRIMARY KEY (name, model_version)
I'd like to setup a job to periodically remove some old rows from this table using the Python Spanner client. The query I'd like to run is:
DELETE FROM predictions WHERE model_version <> ?
According to the docs, it sounds like I would need to execute this as a Partitioned DML statement. I am using the Python Spanner client as follows, but am experiencing timeouts (504 Deadline Exceeded errors) due to the large number of rows in my table.
# this always throws a "504 Deadline Exceeded" error
database.execute_partitioned_dml(
"DELETE FROM predictions WHERE model_version <> #version",
params={"model_version": 104},
param_types={"model_version": Type(code=INT64)},
)
My first intuition was to see if there was some sort of timeout I could increase, but I don't see any timeout parameters in the source :/
I did notice there was a run_in_transaction method in the Spanner lib that contains a timeout parameter, so I decided to deviate from the partitioned DML approach to see if using this method worked. Here's what I ran:
def delete_old_rows(transaction, model_version):
delete_dml = "DELETE FROM predictions WHERE model_version <> {}".format(model_version),
dml_statements = [
delete_dml,
]
status, row_counts = transaction.batch_update(dml_statements)
database.run_in_transaction(delete_old_rows,
model_version=104,
timeout_secs=3600,
)
What's weird about this is the timeout_secs parameter appears to be ignored, because I still get a 504 Deadline Exceeded error within a minute or 2 of executing the above code, despite a timeout of one hour.
Anyways, I'm not too sure what to try next, or whether or not I'm missing something obvious that would allow me to run a delete query in a timely fashion on this huge Spanner table. The model_version column has pretty low cardinality (generally 2-3 unique model_version values in the entire table), so I'm not sure if that would factor into any recommendations. But if someone could offer some advice or suggestions, that would be awesome :) Thanks in advance
The reason that setting timeout_secs didn't help was because the argument is unfortunately not the timeout for the transaction. It's the retry timeout for the transaction so it's used to set the deadline after which the transaction will stop being retried.
We will update the docs for run_in_transaction to explain this better.
The root cause was that the total timeout for the Streaming RPC calls was set too low in the client libraries, being set to 120s for Streaming APIs (eg ExecuteStreamingSQL used by partitioned DML calls.)
This has been fixed in the client library source code, changing them to a 60 minute timout (which is the maximum), and will be part of the next client library release.
As a workaround, in Java, you can configure the timeouts as part of the SpannerOptions when you connect your database. (I do not know how to set custom timeouts in Python, sorry)
final RetrySettings retrySettings =
RetrySettings.newBuilder()
.setInitialRpcTimeout(Duration.ofMinutes(60L))
.setMaxRpcTimeout(Duration.ofMinutes(60L))
.setMaxAttempts(1)
.setTotalTimeout(Duration.ofMinutes(60L))
.build();
SpannerOptions.Builder builder =
SpannerOptions.newBuilder()
.setProjectId("[PROJECT]"));
builder
.getSpannerStubSettingsBuilder()
.applyToAllUnaryMethods(
new ApiFunction<UnaryCallSettings.Builder<?, ?>, Void>() {
#Override
public Void apply(Builder<?, ?> input) {
input.setRetrySettings(retrySettings);
return null;
}
});
builder
.getSpannerStubSettingsBuilder()
.executeStreamingSqlSettings()
.setRetrySettings(retrySettings);
builder
.getSpannerStubSettingsBuilder()
.streamingReadSettings()
.setRetrySettings(retrySettings);
Spanner spanner = builder.build().getService();
The first suggestion is to try gcloud instead.
https://cloud.google.com/spanner/docs/modify-gcloud#modifying_data_using_dml
Another suggestion is to pass the range of name as well so that limit the number of rows scanned. For example, you could add something like STARTS_WITH(name, 'a') to the WHERE clause so that make sure each transaction touches a small amount of rows but first, you will need to know about the domain of name column values.
Last suggestion is try to avoid using '<>' if possible as it is generally pretty expensive to evaluate.

How long will it take to delete a row?

I have a 2 node cassandra cluster with RF=2.
When a delete from x where y cql statement is issued - is it known how long it will take all nodes to delete the row?
What I see in one of the integration tests:
A row is deleted, the result of the deletion is tested with a select * from y where id = xxx statement. What I see is that sometimes the result is not null as expected and the deleted row is still found.
Is the correct approch to read with CL=2 to get the result I am expecting?
make sure that the servers time are in synch if you are using server side timestamp.
Better use client side timestamp.
Is the correct approch to read with CL=2 to get the result I am
expecting?
i assume you are using default consistecy while delete ie 1 and as 1+2 > 2 (ie W+R > N) in your case hence it is ok.
Local to the replica it will be sub ms. The time is dominated in the time from app->coordinator->replica->coordinator->app network hops. Use quorum or local_quorum for consistency across sequential requests like that on both write and read.

select count(*) runs into timeout issues in Cassandra

Maybe it is a stupid question, but I'm not able to determine the size of a table in Cassandra.
This is what I tried:
select count(*) from articles;
It works fine if the table is small but once it fills up, I always run into timeout issues:
cqlsh:
OperationTimedOut: errors={}, last_host=127.0.0.1
DBeaver:
Run 1: 225,000 (7477 ms)
Run 2: 233,637 (8265 ms)
Run 3: 216,595 (7269 ms)
I assume that it hits some timeout and just aborts. The actual number of entries in the table is probably much higher.
I'm testing against a local Cassandra instance which is completely idle. I would not mind if it has to do a full table scan and is unresponsive during that time.
Is there a way to reliably count the number of entries in a Cassandra table?
I'm using Cassandra 2.1.13.
Here is my current workaround:
COPY articles TO '/dev/null';
...
3568068 rows exported to 1 files in 2 minutes and 16.606 seconds.
Background: Cassandra supports to
export a table to a text file, for instance:
COPY articles TO '/tmp/data.csv';
Output: 3568068 rows exported to 1 files in 2 minutes and 25.559 seconds
That also matches the number of lines in the generated file:
$ wc -l /tmp/data.csv
3568068
As far as I see you problem connected to timeout of cqlsh: OperationTimedOut: errors={}, last_host=127.0.0.1
you can simple increase it with options:
--connect-timeout=CONNECT_TIMEOUT
Specify the connection timeout in seconds (default: 5
seconds).
--request-timeout=REQUEST_TIMEOUT
Specify the default request timeout in seconds
(default: 10 seconds).
Is there a way to reliably count the number of entries in a Cassandra table?
Plain answer is no. It is not a Cassandra limitation but a hard challenge for distributed systems to count unique items reliably.
That's the challenge that approximation algorithms like HyperLogLog address.
One possible solution is to use counter in Cassandra to count the number of distinct rows but even counters can miscount in some corner cases so you'll get a few % error.
This is a good utility for counting rows that avoids the timeout issues that happen when running a large COUNT(*) in Cassandra:
https://github.com/brianmhess/cassandra-count
The reason is simple:
When you're using:
SELECT count(*) FROM articles;
it has the same effect on the database as:
SELECT * FROM articles;
You have to query over all your nodes. Cassandra simply runs into a timeout.
You can change the timeout, but it isn't a good solution. (For one time it's fine but don't use it in your regular queries.)
There's a better solution: make your client count your rows. You can create a java app where you count your rows, when you inserting them, and insert the result using a counter column in a Cassandra table.
You can use copy to avoid cassandra timeout usually happens on count(*)
use this bash
cqlsh -e "copy keyspace.table_name (first_partition_key_name) to '/dev/null'" | sed -n 5p | sed 's/ .*//'

Cassandra COPY consistently fails

I was trying to import a CSV with about 20 million rows.
I did a pilot run with a few 100 rows worth of CSV just to check if the columns were in order and that there were no parsing errors. All went well.
Every time I tried importing the 20 million row CSV, it failed after varying amounts of time. On my local machine it failed after 90 minutes with the following error. On the server box it fails within 10 minutes:
Processed 4050000 rows; Write: 624.27 rows/ss
code=1100 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info=
{'received_responses': 0, 'required_responses': 1, 'write_type': 0, 'consistency': 1}
Aborting import at record #4050617. Previously-inserted values still present.
4050671 rows imported in 1 hour, 26 minutes, and 43.649 seconds.
Cassandra: Coordinator node timed out waiting for replica nodes' responses (It is a one node cluster and replication factor is 1 so why is it wating for other nodes is another question)
Then based on recommendation in another thread I changed the write time out though I was not convinced it was the root cause.
write_request_timeout_in_ms: 20000
(Also tried changing it to 300000)
But it still eventually fails.
So now, I have chopped the original CSV into many 500,000 line CSVs.
This has a better success rate (compared to 0!). But even this fails 2 of 5 times for various reasons.
Sometimes I get the following error:
Processed 460000 rows; Write: 6060.32 rows/ss
Connection heartbeat failure
Aborting import at record #443491. Previously inserted records are still present, and some records after that may be present as well.
Other times it just stops updating the progress on console and the only way out is to abort using Ctrl+C
I've spent most of the day like this. My RDBMS is running happily with 5 billion rows. I wanted to test Cassandra with 10 times as much data but I'm having trouble even importing a million rows at a time.
One observation about how the COPY command proceeds is this: Once the command is entered, it starts writing at the rate of about 10,000 rows per second. It sustanins this speed till it has inserted about 80,000 rows. Then there is a pause of about 30 seconds after which it consumes another 70,000 to 90,000 rows, pauses for another 30 seconds and so on till it either finishes all rows in the CSV or fails midway with an error or simply hangs.
I need to get to the root of this. I really hope to find that I am doing something silly and it's not something I have to accept and work around.
I am using Cassandra 2.2.3
There is a lot of people having trouble with the COPY command, it seems that it works for small datasets but it starts to fail when you have a lot of data.
In the documentation they recommend to use the SSTable loader if you have a few million rows to import, i used it with my company and I had a lot of consistency problems.
I have tried everything and for me the safest way to import large amount of data into cassandra is by writing a little script that reads your CSV and then execute async queries. Python does it very well.
Will is correct. COPY is meant for small data sets and usually struggles when you start hitting the millions of rows. In addition to SSTable loader, there's this utility: https://github.com/brianmhess/cassandra-loader which I find to have very good performance with some added convenience.

Cassandra throwing NoHostAvailableException after 5 minutes of high IOPS run

I'm using datastax cassandra 2.1 driver and performing read/write operations at the rate of ~8000 IOPS. I've used pooling options to configure my session and am using separate session for read and write each of which connect to a different node in the cluster as contact point.
This works fine for say 5 mins but after that I get a lot of exceptions like :
Failed with: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /10.0.1.123:9042 (com.datastax.driver.core.TransportException: [/10.0.1.123:9042] Connection has been closed), /10.0.1.56:9042 (com.datastax.driver.core.exceptions.DriverException: Timeout while trying to acquire available connection (you may want to increase the driver number of per-host connections)))
Can anyone help me out here on what could be the problem?
The exception asks me to increase number of connections per host but how high a value can I set for this parameter ?
Also I'm not able to set CoreConnectionsPerHost beyond 2 as it throws me exception saying 2 is the max.
This is how I'm creating each read / write session.
PoolingOptions poolingOpts = new PoolingOptions();
poolingOpts.setCoreConnectionsPerHost(HostDistance.REMOTE, 2);
poolingOpts.setMaxConnectionsPerHost(HostDistance.REMOTE, 200);
poolingOpts.setMaxSimultaneousRequestsPerConnectionThreshold(HostDistance.REMOTE, 128);
poolingOpts.setMinSimultaneousRequestsPerConnectionThreshold(HostDistance.REMOTE, 2);
cluster = Cluster
.builder()
.withPoolingOptions( poolingOpts )
.addContactPoint(ip)
.withRetryPolicy( DowngradingConsistencyRetryPolicy.INSTANCE )
.withReconnectionPolicy( new ConstantReconnectionPolicy( 100L ) ).build();
Session s = cluster.connect(keySpace);
Your problem might not actually be in your code or the way you are connecting. If you say the problem is happening after a few minutes then it could simply be that your cluster is becoming overloaded trying to process the ingestion of data and cannot keep up. The typical sign of this is when you start seeing JVM garbage collection "GC" messages in the cassandra system.log file, too many small ones batched together of large ones on their own can mean that incoming clients are not responded to causing this kind of scenario. Verify that you do not have too many of these event showing up in your logs first before you start to look at your code. Here's a good example of a large GC event:
INFO [ScheduledTasks:1] 2014-05-15 23:19:49,678 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 2896 ms for 2 collections, 310563800 used; max is 8375238656
When connecting to a cluster there are some recommendations, one of which is only have one Cluster object per real cluster. As per the article I've linked below (apologies if you already studied this):
Use one cluster instance per (physical) cluster (per application lifetime)
Use at most one session instance per keyspace, or use a single Session and explicitly specify the keyspace in your queries
If you execute a statement more than once, consider using a prepared statement
You can reduce the number of network roundtrips and also have atomic operations by using batches
http://www.datastax.com/documentation/developer/java-driver/2.1/java-driver/fourSimpleRules.html
As you are doing a high number of reads I'd most definitely recommend using setFetchSize also if its applicable to your code
http://www.datastax.com/documentation/developer/java-driver/2.1/common/drivers/reference/cqlStatements.html
http://www.datastax.com/documentation/developer/java-driver/2.1/java-driver/reference/queryBuilderOverview.html
For reference heres the connection options in case you find it useful
http://www.datastax.com/documentation/developer/java-driver/2.1/common/drivers/reference/connectionsOptions_c.html
Hope this helps.

Resources