Does Cassandra log query attempts that are part of a retry policy? - cassandra

For example, would these attempts be recorded as part of a trace session in system_traces.sessions or system_traces.events?
Edit: The driver I'm using is called gocql

In the Java driver, there is a logging retry policy which can act as a parent policy for another retry policy - it should log the decision of retrying.
In the gocql driver though looking at the query executor, I cannot see an explicit log regardless of retry - only one of the retry mechanisms appears to have logging, the DowngradingConsistencyRetryPolicy. If debug is set it will log the downgrade.

Related

Warnings on startup with atomikos starter dependency

I have a spring boot application with postgresql and rabbitmq. I wanted to use a best-effort JTA transaction that contains both a postgres and rabbitmq transaction.
I have added the spring-boot-started-jta-atomikos dependency. When I start my application I receive this warning multiple times:
atomikos connection proxy for Pooled connection wrapping physical connection org.postgresql.jdbc.PgConnection#99c4993: WARNING: transaction manager not running?
Do I need any additional configuration?
I also get this warning at startup:
AtomikosDataSoureBean 'dataSource': poolSize equals default - this may cause performance problems!
I run with the following settings, but setMinPoolSize is never called
spring.jta.atomikos.connectionfactory.max-pool-size: 10
spring.jta.atomikos.connectionfactory.min-pool-size: 5
The documentation at:
https://docs.spring.io/spring-boot/docs/current/reference/html/features.html#features.jta.atomikos
https://www.atomikos.com/Documentation/SpringBootIntegration
just says I can add the starter dependency. But it seems like spring boot doesn't properly auto-configure some things.
spring.jta.atomikos.connectionfactory properties are for controlling a JMS connectionFactory.
You should use the spring.jta.atomikos.datasource properties for controlling JDBC DataSource configuration.

Cosmos DB: How to retry failures with TransactionalBatch

I have a few stored procedures in Cosmos DB that I'd like to convert to .NET transactions. Recently, I saw this post https://devblogs.microsoft.com/cosmosdb/introducing-transactionalbatch-in-the-net-sdk/ that goes over transaction support. I was also able to test it, and it seems to be working fine.
I know that .NET has added built-in retry logic into many of its supported packages. Does TransactionalBatch have any built-in retry policy? What is the recommended approach to retrying any failures? The post above is looking at IsSuccessStatusCode. Should we retry once the status is fail?
Does TransactionalBatch have any built-in retry policy?
For now, it does not support built-in retry policy.
What is the recommended approach to retrying any failures?
TransactionalBatch describes a group of point operations that need to either succeed or fail.If any operation fails, the entire transaction is rolled back.
Because the failed status code will be 424 and 409, so we could not use RetryOptions.MaxRetryAttemptsOnThrottledRequests.
So, you could use for (int i = 0; i < MaxRetries; i++){} to perform the retry logic.

How to control the number of Hadoop IPC retry attempts for a Spark job submission?

Suppose I attempt to submit a Spark (2.4.x) job to a Kerberized cluster, without having valid Kerberos credentials. In this case, the Spark launcher tries repeatedly to initiate a Hadoop IPC call, but fails:
20/01/22 15:49:32 INFO retry.RetryInvocationHandler: java.io.IOException: Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]; Host Details : local host is: "node-1.cluster/172.18.0.2"; destination host is: "node-1.cluster":8032; , while invoking ApplicationClientProtocolPBClientImpl.getClusterMetrics over null after 1 failover attempts. Trying to failover after sleeping for 35160ms.
This will repeat a number of times (30, in my case), until eventually the launcher gives up and the job submission is considered failed.
Various other similar questions mention these properties (which are actually YARN properties but prefixed with spark. as per the standard mechanism to pass them with a Spark application).
spark.yarn.maxAppAttempts
spark.yarn.resourcemanager.am.max-attempts
However, neither of these properties affects the behavior I'm describing. How can I control the number of IPC retries in a Spark job submission?
After a good deal of debugging, I figured out the properties involved here.
yarn.client.failover-max-attempts (controls the max attempts)
Without specifying this, the number of attempts appears to come from the ratio of these two properties (numerator first, denominator second).
yarn.resourcemanager.connect.max-wait.ms
yarn.client.failover-sleep-base-ms
Of course as with any YARN properties, these must be prefixed with spark.hadoop. in the context of a Spark job submission.
The relevant class (which resolves all these properties) is RMProxy, within the Hadoop YARN project (source here). All these, and related, properties are documented here.

DefaultRetryPolicy - write timeout

The documentation for DefaultRetryPolicy says that
This policy retries queries in only two cases:
On a read timeout, if enough replicas replied but data was not retrieved.
On a write timeout, if we timeout while writing the
distributed log used by batch statements. This retry policy is
conservative in that it will never retry with a different consistency
level than the one of the initial operation.
Does this mean that when I do a simple session.execute(BoundStatement) without using any custom retry policy and get a write time out that the default retry policy will kick in and there will be a retry to write the data again ? What does the "distributed log used by batch statements" mean ?
If you don't specify any specific retry policy , driver will use DefaultRetryPolicy
By default, Retry on write timeout in applicable for Logged Batch operation (logged batch enforces atomicity).
no retry will happen on write timeout in case of non batch operation

Request timed out is not logging in server side Cassandra

I have set server timeout in cassandra as 60 seconds and client timeout in cpp driver as 120 seconds.
I use Batch query which has 18K operations, I get the Request timed out error in cpp driver logs but in Cassandra server logs there is no TRACE available in spite of enabling ALL logs in Cassandra logback.xml
So how can I confirm that It is thrown from the server / client side in Cassandra?
BATCH is not intended to work that way. It’s designed to apply 6 or 7 mutations to different tables atomically. You’re trying to use it like it’s RDBMS counterpart (Cassandra just doesn’t work that way). The BATCH timeout is designed to protect the node/cluster from crashing due to how expensive that query is for the coordinator.
In the system.log, you should see warnings/failures concerning the sheer size of your BATCH. If you’ve modified them and don’t see that, you should see a warning about a timeout threshold being exceeded (I think BATCH gets its own timeout in 3.0).
If all else fails, run your BATCH statement (part of it) in cqlsh with tracing on, and you’ll see precisely why this is a bad idea (server side).
Also, the default query timeouts are there to protect your cluster. You really shouldn’t need to alter those. You should change your query/model or approach before looking at adjusting the timeout.

Resources