OverloadException in single node cassandra - cassandra

I read through this page : https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsRepairNodesHintedHandoff.html and identified that OverloadedException in Cassandra is due to the
"The coordinator tracks how many hints it is currently writing, and if the number increases too much, the coordinator refuses writes and throws the OverloadedException exception."
But I use a single node and able to get Overload exception frequently so what could be the reason for Overload exception in singlenode with Consistency as 1 and ReplicationFactor as 1?
EDITED :
Total hints In progress JMX
I checked in the code :
private static void checkHintOverload(InetAddressAndPort destination)
{
// avoid OOMing due to excess hints. we need to do this check even for "live" nodes, since we can
// still generate hints for those if it's overloaded or simply dead but not yet known-to-be-dead.
// The idea is that if we have over maxHintsInProgress hints in flight, this is probably due to
// a small number of nodes causing problems, so we should avoid shutting down writes completely to
// healthy nodes. Any node with no hintsInProgress is considered healthy.
if (StorageMetrics.totalHintsInProgress.getCount() > maxHintsInProgress
&& (getHintsInProgressFor(destination).get() > 0 && shouldHint(destination)))
{
throw new OverloadedException("Too many in flight hints: " + StorageMetrics.totalHintsInProgress.getCount() +
" destination: " + destination +
" destination hints: " + getHintsInProgressFor(destination).get());
}
}
1) Is this the only way to get the Overloaded exception ?
2) Why am I getting the Overload exception in single node ?
3) When this checkHintOverload method is called in Single node?
NOTE :
1) My Keyspace has been configured with NetworkTopologyStrategy, Will that be a reason? : CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 'datacenter_1': '1'} AND durable_writes = true;
2) hinted_handoff = enabled in cassandra.yaml, Still I wonder will it trigger the hints in Single node, If so why?
3) consistency level is Quorum in this single node
Could any of the three parameter might be a reason for this?

Related

Cassandra-driver Client.batch() gives RangeError

This code
const cassandra = require('cassandra-driver');
const Long = require('cassandra-driver').types.Long;
const client = new cassandra.Client({
contactPoints: ['localhost:9042'],
localDataCenter: 'datacenter1',
keyspace: 'ks'
});
let q =[]
const ins_q = 'INSERT INTO ks.table1 (id , num1, num2, txt, date) VALUES (?,33,44,\'tes2\',toTimeStamp(now()));'
for (let i = 50000000003n; i < 50000100003n; i++) {
q.push({query: ins_q, params: [Long.fromString(i.toString(),true)]})
}
client.batch(q, { prepare: true }).catch(err => {
console.log('Failed %s',err);
})
is causing this error
Failed RangeError [ERR_OUT_OF_RANGE]: The value of "value" is out of range. It must be >= 0 and <= 65535. Received 100000
at new NodeError (f:\node\lib\internal\errors.js:371:5)
at checkInt (f:\node\lib\internal\buffer.js:72:11)
at writeU_Int16BE (f:\node\lib\internal\buffer.js:832:3)
at Buffer.writeUInt16BE (f:\node\lib\internal\buffer.js:840:10)
at FrameWriter.writeShort (f:\node\test\node_modules\cassandra-driver\lib\writers.js:47:9)
at BatchRequest.write (f:\node\test\node_modules\cassandra-driver\lib\requests.js:438:17)
Is this a bug? I tried execute() with one bigint the same way and there was no problem.
"cassandra-driver": "^4.6.3"
Failed RangeError [ERR_OUT_OF_RANGE]: The value of "value" is out of range. It must be >= 0 and <= 65535. Received 100000
Is this a bug?
No, this is Cassandra protecting the cluster from running a large batch and crashing one or more nodes.
While you do appear to be running this on your own machine, Cassandra is first and foremost a distributed system. So it has certain guardrails built in to prevent non-distributed things from causing problems. This is one of them.
What will happen here, is that the driver looks at the id and figures out real fast that a single node isn't responsible for all of the different possible values of id. So, it sends the batch of 100k statements to one node picked as the "coordinator." That coordinator "coordinates" retrieving each partition of data from all nodes in the cluster, and assembles the result set.
Or rather, it'll try to, but probably time-out before getting through even 1/5th of a batch this size. Remember, BATCH with Cassandra was built to really only run 5 or 6 write operations to keep 5 or 6 tables in-sync; not 100k write operations to the same table.
The way to approach this scenario, is to execute each write operation individually. If you want to optimize the process, make each write operation asynchronous with a listenable future. Run only a certain number of async threads at a time, block on their completion, and then run the next set of threads. Repeat this process until complete.
In short, there are many nuances about Cassandra that are different from a relational database. The use and implementation of BATCH writes being one of them.
Why does it cause a range error?
Because of this part in the error message:
It must be >= 0 and <= 65535
The Cassandra Node.js driver will not allow a batch to exceed 65535 statements. By the looks of it, it is being sent 100000 statements.

Datastax cassandra seem to cache preparestatent

When my application runs a long time, everything works as well. But when I change type a column from int to text(Drop table and recreate), I caught a Exception:
com.datastax.oss.driver.api.core.type.codec.CodecNotFoundException: Codec not found for requested operation: [INT <-> java.lang.String]
at com.datastax.oss.driver.internal.core.type.codec.registry.CachingCodecRegistry.createCodec(CachingCodecRegistry.java:609)
at com.datastax.oss.driver.internal.core.type.codec.registry.DefaultCodecRegistry$1.load(DefaultCodecRegistry.java:95)
at com.datastax.oss.driver.internal.core.type.codec.registry.DefaultCodecRegistry$1.load(DefaultCodecRegistry.java:92)
at com.datastax.oss.driver.shaded.guava.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3527)
at com.datastax.oss.driver.shaded.guava.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2276)
at com.datastax.oss.driver.shaded.guava.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2154)
at com.datastax.oss.driver.shaded.guava.common.cache.LocalCache$Segment.get(LocalCache.java:2044)
at com.datastax.oss.driver.shaded.guava.common.cache.LocalCache.get(LocalCache.java:3951)
at com.datastax.oss.driver.shaded.guava.common.cache.LocalCache.getOrLoad(LocalCache.java:3973)
at com.datastax.oss.driver.shaded.guava.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4957)
at com.datastax.oss.driver.shaded.guava.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4963)
at com.datastax.oss.driver.internal.core.type.codec.registry.DefaultCodecRegistry.getCachedCodec(DefaultCodecRegistry.java:117)
at com.datastax.oss.driver.internal.core.type.codec.registry.CachingCodecRegistry.codecFor(CachingCodecRegistry.java:215)
at com.datastax.oss.driver.api.core.data.SettableByIndex.set(SettableByIndex.java:132)
at com.datastax.oss.driver.api.core.data.SettableByIndex.setString(SettableByIndex.java:338)
This exception appears occasionally. I'm using PreparedStatement to execute the query, I think it is cached from DataStax's driver.
I'm using AWS Keyspaces(Cassandra version 3.11.2), DataStax driver 4.6.
Here is my application.conf:
basic.request {
timeout = 5 seconds
consistency = LOCAL_ONE
}
advanced.connection {
max-requests-per-connection = 1024
pool {
local.size = 1
remote.size = 1
}
}
advanced.reconnect-on-init = true
advanced.reconnection-policy {
class = ExponentialReconnectionPolicy
base-delay = 1 second
max-delay = 60 seconds
}
advanced.retry-policy {
class = DefaultRetryPolicy
}
advanced.protocol {
version = V4
}
advanced.heartbeat {
interval = 30 seconds
timeout = 1 second
}
advanced.session-leak.threshold = 8
advanced.metadata.token-map.enabled = false
}
Yes, Java driver 4.x caches prepared statement - it's a difference from the driver 3.x. From documentation:
the session has a built-in cache, it’s OK to prepare the same string twice.
...
Note that caching is based on: the query string exactly as you provided it: the driver does not perform any kind of trimming or sanitizing.
I'm not sure 100% about the source code, but the relevant entries in the cache may not be cleared up on the table drop. I suggest to open the JIRA against Java driver, although, such type changes are often not really recommended - it's better to introduce new field with new type, even if it's possible to re-create table.
That's correct. Prepared statements are cached -- it's the optimisation that makes prepared statements more efficient if they are reused since they only need to be prepared once (the query doesn't need to get parsed again).
But I suspect that underlying issue in your case is that your queries involve SELECT *. Best practice recommendation (regardless of the database you're using) is to explicitly enumerate the columns you are retrieving from the table.
In the prepared statement, each of the columns are bound to a data type. When you alter the schema by adding/dropping columns, the order of the columns (and their data types) no longer match the data types of the result set so you end up in situations where the driver gets an int when it's expecting a text or vice-versa. Cheers!

NoNodeAvailableException after some insert request to cassandra

I am trying to insert data into Cassandra local cluster using async execution and version 4 of the driver (as same as my Cassandra instance)
I have instantiated the cql session in this way:
CqlSession cqlSession = CqlSession.builder()
.addContactEndPoint(new DefaultEndPoint(
InetSocketAddress.createUnresolved("localhost",9042))).build();
Then I create a statement in an async way:
return session.prepareAsync(
"insert into table (p1,p2,p3, p4) values (?, ?,?, ?)")
.thenComposeAsync(
(ps) -> {
CompletableFuture<AsyncResultSet>[] result = data.stream().map(
(d) -> session.executeAsync(
ps.bind(d.p1,d.p2,d.p3,d.p4)
)
).toCompletableFuture()
).toArray(CompletableFuture[]::new);
return CompletableFuture.allOf(result);
}
);
data is a dynamic list filled with user data.
When I exec the code I get the following exception:
Caused by: com.datastax.oss.driver.api.core.NoNodeAvailableException: No node was available to execute the query
at com.datastax.oss.driver.api.core.AllNodesFailedException.fromErrors(AllNodesFailedException.java:53)
at com.datastax.oss.driver.internal.core.cql.CqlPrepareHandler.sendRequest(CqlPrepareHandler.java:210)
at com.datastax.oss.driver.internal.core.cql.CqlPrepareHandler.onThrottleReady(CqlPrepareHandler.java:167)
at com.datastax.oss.driver.internal.core.session.throttling.PassThroughRequestThrottler.register(PassThroughRequestThrottler.java:52)
at com.datastax.oss.driver.internal.core.cql.CqlPrepareHandler.<init>(CqlPrepareHandler.java:153)
at com.datastax.oss.driver.internal.core.cql.CqlPrepareAsyncProcessor.process(CqlPrepareAsyncProcessor.java:66)
at com.datastax.oss.driver.internal.core.cql.CqlPrepareAsyncProcessor.process(CqlPrepareAsyncProcessor.java:33)
at com.datastax.oss.driver.internal.core.session.DefaultSession.execute(DefaultSession.java:210)
at com.datastax.oss.driver.api.core.cql.AsyncCqlSession.prepareAsync(AsyncCqlSession.java:90)
The node is active and some data are inserted before the exception rise. I have also tried to set up a data center name on the session builder without any result.
Why this exception rise if the node is up and running? Actually I have only one local node and that could be a problem?
The biggest thing that I don't see, is a way to limit the current number of active async threads.
Basically, if that (mapped) data stream gets hit hard enough, it'll basically create all of these new threads that it's awaiting. If the number of writes coming in from those threads creates enough back-pressure that node can't keep up or catch up to, the node will become overwhelmed and not accept requests.
Take a look at this post by Ryan Svihla of DataStax:
Cassandra: Batch Loading Without the Batch — The Nuanced Edition
Its code is from the 3.x version of the driver, but the concepts are the same. Basically, provide some way to throttle-down the writes, or limit the number of "in flight threads" running at any given time, and that should help greatly.
Finally, I have found a solution using BatchStatement and a little custom code to create a chucked list.
int chunks = 0;
if (data.size() % 100 == 0) {
chunks = data.size() / 100;
} else {
chunks = (data.size() / 100) + 1;
}
final int finalChunks = chunks;
return session.prepareAsync(
"insert into table (p1,p2,p3, p4) values (?, ?,?, ?)")
.thenComposeAsync(
(ps) -> {
AtomicInteger counter = new AtomicInteger();
final List<CompletionStage<AsyncResultSet>> batchInsert = data.stream()
.map(
(d) -> ps.bind(d.p1,d.p2,d.p3,d.p4)
)
.collect(Collectors.groupingBy(it -> counter.getAndIncrement() / finalChunks))
.values().stream().map(
boundedStatements -> BatchStatement.newInstance(BatchType.LOGGED, boundedStatements.toArray(new BatchableStatement[0]))
).map(
session::executeAsync
).collect(Collectors.toList());
return CompletableFutures.allSuccessful(batchInsert);
}
);

Is it possible to create a mutable shared data structure without using accumulators in spark?

I am new to spark and there are somethings which are quite unclear to me. But basic knowledge dictates that only accumulators are mutable variables which can be updated across executors and its value can be retrieved by driver. Any other variables initialized in the code , which are updated across executors the updated values are not relayed back to the driver as they are separate JVM's.
I am working on part of a project which stores offsets from zookeeper in a data structure for future use. As the offsets are obtained on executors, it was almost impossible to have a shared data structure which will update offsets per partition back to the driver as well.That is until I came across this code in https://spark.apache.org/docs/2.3.0/streaming-kafka-0-8-integration.html.
AtomicReference<OffsetRange[]> offsetRanges = new AtomicReference<>();
directKafkaStream.transformToPair(rdd -> {
OffsetRange[] offsets = ((HasOffsetRanges) rdd.rdd()).offsetRanges();
offsetRanges.set(offsets); return rdd;
}).map(
...
).foreachRDD(rdd -> { for (OffsetRange o : offsetRanges.get()) {
System.out.println(
o.topic() + " " + o.partition() + " " + o.fromOffset() + " " + o.untilOffset()
);}
...
});
System.out.println(Arrays.toString(offsetRanges.get()));
This contradicts the underlying theory as when I access the value of AtomicReference<OffsetRange[]> offsetRanges in my driver I get the correct updated value(as updated in the transformToPair method in the executor code ) even though it should return me a null or empty response. Please can someone explain me this behavior?
Is it possible to create a mutable shared data structure without using accumulators in spark?
No.
This contradicts the underlying theory as when I access the value of
It doesn't, because the value is not modified outside driver. The closure of transformToPair is executed on the driver, not executors.
Therefore offsetRanges.set(offsets) is executed on the same JVM where the original offsetRanges value lives.

Hazelcast mapreduce executor overload

I'm setting up a new cluster and I'm getting an error from the hazelcast mapreduce executor:
java.util.concurrent.RejectedExecutionException: Executor[mapreduce::hz::default] is overloaded
Using spring, I am configuring the jobtracker as follows:
<hz:jobtracker name="default" max-thread-size="8" queue-size="0"/>
Per documentation, 0 is the default queue size which is un-bound.
Thoughts? I am only sending about 100 jobs simultaneously
The manual is wrong about it.
A value that's less or equal zero means that the queue size is twice the partitionCount.
int queueSize = jobTrackerConfig.getQueueSize();
if (queueSize <= 0) {
queueSize = ps.getPartitionCount() * 2;
}
Code snippet on github
Use an integer that's big enough for your use-case.

Resources