Cassandra-driver Client.batch() gives RangeError - node.js

This code
const cassandra = require('cassandra-driver');
const Long = require('cassandra-driver').types.Long;
const client = new cassandra.Client({
contactPoints: ['localhost:9042'],
localDataCenter: 'datacenter1',
keyspace: 'ks'
});
let q =[]
const ins_q = 'INSERT INTO ks.table1 (id , num1, num2, txt, date) VALUES (?,33,44,\'tes2\',toTimeStamp(now()));'
for (let i = 50000000003n; i < 50000100003n; i++) {
q.push({query: ins_q, params: [Long.fromString(i.toString(),true)]})
}
client.batch(q, { prepare: true }).catch(err => {
console.log('Failed %s',err);
})
is causing this error
Failed RangeError [ERR_OUT_OF_RANGE]: The value of "value" is out of range. It must be >= 0 and <= 65535. Received 100000
at new NodeError (f:\node\lib\internal\errors.js:371:5)
at checkInt (f:\node\lib\internal\buffer.js:72:11)
at writeU_Int16BE (f:\node\lib\internal\buffer.js:832:3)
at Buffer.writeUInt16BE (f:\node\lib\internal\buffer.js:840:10)
at FrameWriter.writeShort (f:\node\test\node_modules\cassandra-driver\lib\writers.js:47:9)
at BatchRequest.write (f:\node\test\node_modules\cassandra-driver\lib\requests.js:438:17)
Is this a bug? I tried execute() with one bigint the same way and there was no problem.
"cassandra-driver": "^4.6.3"

Failed RangeError [ERR_OUT_OF_RANGE]: The value of "value" is out of range. It must be >= 0 and <= 65535. Received 100000
Is this a bug?
No, this is Cassandra protecting the cluster from running a large batch and crashing one or more nodes.
While you do appear to be running this on your own machine, Cassandra is first and foremost a distributed system. So it has certain guardrails built in to prevent non-distributed things from causing problems. This is one of them.
What will happen here, is that the driver looks at the id and figures out real fast that a single node isn't responsible for all of the different possible values of id. So, it sends the batch of 100k statements to one node picked as the "coordinator." That coordinator "coordinates" retrieving each partition of data from all nodes in the cluster, and assembles the result set.
Or rather, it'll try to, but probably time-out before getting through even 1/5th of a batch this size. Remember, BATCH with Cassandra was built to really only run 5 or 6 write operations to keep 5 or 6 tables in-sync; not 100k write operations to the same table.
The way to approach this scenario, is to execute each write operation individually. If you want to optimize the process, make each write operation asynchronous with a listenable future. Run only a certain number of async threads at a time, block on their completion, and then run the next set of threads. Repeat this process until complete.
In short, there are many nuances about Cassandra that are different from a relational database. The use and implementation of BATCH writes being one of them.
Why does it cause a range error?
Because of this part in the error message:
It must be >= 0 and <= 65535
The Cassandra Node.js driver will not allow a batch to exceed 65535 statements. By the looks of it, it is being sent 100000 statements.

Related

Datastax cassandra seem to cache preparestatent

When my application runs a long time, everything works as well. But when I change type a column from int to text(Drop table and recreate), I caught a Exception:
com.datastax.oss.driver.api.core.type.codec.CodecNotFoundException: Codec not found for requested operation: [INT <-> java.lang.String]
at com.datastax.oss.driver.internal.core.type.codec.registry.CachingCodecRegistry.createCodec(CachingCodecRegistry.java:609)
at com.datastax.oss.driver.internal.core.type.codec.registry.DefaultCodecRegistry$1.load(DefaultCodecRegistry.java:95)
at com.datastax.oss.driver.internal.core.type.codec.registry.DefaultCodecRegistry$1.load(DefaultCodecRegistry.java:92)
at com.datastax.oss.driver.shaded.guava.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3527)
at com.datastax.oss.driver.shaded.guava.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2276)
at com.datastax.oss.driver.shaded.guava.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2154)
at com.datastax.oss.driver.shaded.guava.common.cache.LocalCache$Segment.get(LocalCache.java:2044)
at com.datastax.oss.driver.shaded.guava.common.cache.LocalCache.get(LocalCache.java:3951)
at com.datastax.oss.driver.shaded.guava.common.cache.LocalCache.getOrLoad(LocalCache.java:3973)
at com.datastax.oss.driver.shaded.guava.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4957)
at com.datastax.oss.driver.shaded.guava.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4963)
at com.datastax.oss.driver.internal.core.type.codec.registry.DefaultCodecRegistry.getCachedCodec(DefaultCodecRegistry.java:117)
at com.datastax.oss.driver.internal.core.type.codec.registry.CachingCodecRegistry.codecFor(CachingCodecRegistry.java:215)
at com.datastax.oss.driver.api.core.data.SettableByIndex.set(SettableByIndex.java:132)
at com.datastax.oss.driver.api.core.data.SettableByIndex.setString(SettableByIndex.java:338)
This exception appears occasionally. I'm using PreparedStatement to execute the query, I think it is cached from DataStax's driver.
I'm using AWS Keyspaces(Cassandra version 3.11.2), DataStax driver 4.6.
Here is my application.conf:
basic.request {
timeout = 5 seconds
consistency = LOCAL_ONE
}
advanced.connection {
max-requests-per-connection = 1024
pool {
local.size = 1
remote.size = 1
}
}
advanced.reconnect-on-init = true
advanced.reconnection-policy {
class = ExponentialReconnectionPolicy
base-delay = 1 second
max-delay = 60 seconds
}
advanced.retry-policy {
class = DefaultRetryPolicy
}
advanced.protocol {
version = V4
}
advanced.heartbeat {
interval = 30 seconds
timeout = 1 second
}
advanced.session-leak.threshold = 8
advanced.metadata.token-map.enabled = false
}
Yes, Java driver 4.x caches prepared statement - it's a difference from the driver 3.x. From documentation:
the session has a built-in cache, it’s OK to prepare the same string twice.
...
Note that caching is based on: the query string exactly as you provided it: the driver does not perform any kind of trimming or sanitizing.
I'm not sure 100% about the source code, but the relevant entries in the cache may not be cleared up on the table drop. I suggest to open the JIRA against Java driver, although, such type changes are often not really recommended - it's better to introduce new field with new type, even if it's possible to re-create table.
That's correct. Prepared statements are cached -- it's the optimisation that makes prepared statements more efficient if they are reused since they only need to be prepared once (the query doesn't need to get parsed again).
But I suspect that underlying issue in your case is that your queries involve SELECT *. Best practice recommendation (regardless of the database you're using) is to explicitly enumerate the columns you are retrieving from the table.
In the prepared statement, each of the columns are bound to a data type. When you alter the schema by adding/dropping columns, the order of the columns (and their data types) no longer match the data types of the result set so you end up in situations where the driver gets an int when it's expecting a text or vice-versa. Cheers!

NoNodeAvailableException after some insert request to cassandra

I am trying to insert data into Cassandra local cluster using async execution and version 4 of the driver (as same as my Cassandra instance)
I have instantiated the cql session in this way:
CqlSession cqlSession = CqlSession.builder()
.addContactEndPoint(new DefaultEndPoint(
InetSocketAddress.createUnresolved("localhost",9042))).build();
Then I create a statement in an async way:
return session.prepareAsync(
"insert into table (p1,p2,p3, p4) values (?, ?,?, ?)")
.thenComposeAsync(
(ps) -> {
CompletableFuture<AsyncResultSet>[] result = data.stream().map(
(d) -> session.executeAsync(
ps.bind(d.p1,d.p2,d.p3,d.p4)
)
).toCompletableFuture()
).toArray(CompletableFuture[]::new);
return CompletableFuture.allOf(result);
}
);
data is a dynamic list filled with user data.
When I exec the code I get the following exception:
Caused by: com.datastax.oss.driver.api.core.NoNodeAvailableException: No node was available to execute the query
at com.datastax.oss.driver.api.core.AllNodesFailedException.fromErrors(AllNodesFailedException.java:53)
at com.datastax.oss.driver.internal.core.cql.CqlPrepareHandler.sendRequest(CqlPrepareHandler.java:210)
at com.datastax.oss.driver.internal.core.cql.CqlPrepareHandler.onThrottleReady(CqlPrepareHandler.java:167)
at com.datastax.oss.driver.internal.core.session.throttling.PassThroughRequestThrottler.register(PassThroughRequestThrottler.java:52)
at com.datastax.oss.driver.internal.core.cql.CqlPrepareHandler.<init>(CqlPrepareHandler.java:153)
at com.datastax.oss.driver.internal.core.cql.CqlPrepareAsyncProcessor.process(CqlPrepareAsyncProcessor.java:66)
at com.datastax.oss.driver.internal.core.cql.CqlPrepareAsyncProcessor.process(CqlPrepareAsyncProcessor.java:33)
at com.datastax.oss.driver.internal.core.session.DefaultSession.execute(DefaultSession.java:210)
at com.datastax.oss.driver.api.core.cql.AsyncCqlSession.prepareAsync(AsyncCqlSession.java:90)
The node is active and some data are inserted before the exception rise. I have also tried to set up a data center name on the session builder without any result.
Why this exception rise if the node is up and running? Actually I have only one local node and that could be a problem?
The biggest thing that I don't see, is a way to limit the current number of active async threads.
Basically, if that (mapped) data stream gets hit hard enough, it'll basically create all of these new threads that it's awaiting. If the number of writes coming in from those threads creates enough back-pressure that node can't keep up or catch up to, the node will become overwhelmed and not accept requests.
Take a look at this post by Ryan Svihla of DataStax:
Cassandra: Batch Loading Without the Batch — The Nuanced Edition
Its code is from the 3.x version of the driver, but the concepts are the same. Basically, provide some way to throttle-down the writes, or limit the number of "in flight threads" running at any given time, and that should help greatly.
Finally, I have found a solution using BatchStatement and a little custom code to create a chucked list.
int chunks = 0;
if (data.size() % 100 == 0) {
chunks = data.size() / 100;
} else {
chunks = (data.size() / 100) + 1;
}
final int finalChunks = chunks;
return session.prepareAsync(
"insert into table (p1,p2,p3, p4) values (?, ?,?, ?)")
.thenComposeAsync(
(ps) -> {
AtomicInteger counter = new AtomicInteger();
final List<CompletionStage<AsyncResultSet>> batchInsert = data.stream()
.map(
(d) -> ps.bind(d.p1,d.p2,d.p3,d.p4)
)
.collect(Collectors.groupingBy(it -> counter.getAndIncrement() / finalChunks))
.values().stream().map(
boundedStatements -> BatchStatement.newInstance(BatchType.LOGGED, boundedStatements.toArray(new BatchableStatement[0]))
).map(
session::executeAsync
).collect(Collectors.toList());
return CompletableFutures.allSuccessful(batchInsert);
}
);

Read, update and save cached value atomically

I have a multiple streams (N) which should update the same cache. So, assume, that there is at least N threads. Each thread may process values with similar keys. The problem is that if i do update as following:
1. Read old value from cache (multiple threads get the same old value)
2. Merge new value with old value (each thread update old value)
3. Save updated value back to the cache (only the last update was saved, another one is lost)
i can lost some updates if multiple threads will simultaneously try to update the same record. At first glance, there is a solution to make all updates atomic: for example, use Increment mutation in hbase or add in aerospike (currently, i'm considering these caches for my case). If value consists only of numeric primitive types, then it is ok, because both cache implementations support atomic inc/dec.
1. Inc/dec each value (cache will resolve sequence of this ops by it's self)
But what if value consists not only of primitives? Then i have to read value and update it in my code. In this case i still can lose some updates.
As i wrote, currently i'm considering hbase and aerospike, but both not fully fit for my case. In hbase, as i know, there is no way to lock row from client side (> ~0.98), so i have to use checkAndPut operation for each complex type. In aerospike i can achieve something like row-based lock using lua udfs, but i want to avoid them. Redis allow to watch record and if there is was update from another thread the transaction will fail and i can catch this error and try again.
So, my question is how to achieve something like row-based lock for such updates and is row-based lock will be a correct way? Maybe there is another approach?
def main(args: Array[String]): Unit = {
val sparkConf = new SparkConf().setMaster("local[2]").setAppName("sample")
val sc = new SparkContext(sparkConf)
val ssc = new StreamingContext(sc, Duration(500))
val source = Source()
val stream = source.stream(ssc)
stream.foreachRDD(rdd => {
if (!rdd.isEmpty()) {
rdd.foreachPartition(partition => {
if (partition.nonEmpty) {
val cache = Cache()
partition.foreach(entity=> {
// in this block if 2 distributed workers (in case of apache spark, for example)
//will process entities with the same keys i can lose one of this update
// worker1 and worker2 will get the same value
val value = cache.get(entity.key)
// both workers will update this value but may get different results
val updatedValue = ??? // some non-trivial update depends on entity
// for example, worker1 put new value, then worker2 put new value. In this case only updates from worker2 are visible and updates from worker1 are lost
cache.put(entity.key, updatedValue)
})
}
})
}
})
ssc.start()
ssc.awaitTermination()
}
So, in case if i use kafka as source i can workaround this if messages are partitioned by keys. In this case i can rely on the fact that only 1 worker will process particular record at any point of time. But how to handle the same situation when messages partitioned randomly (key is inside message body)?

How do I limit write operations to 1k records/sec?

Currently, I am able to write to database in the batchsize of 500. But due to the memory shortage error and delay synchronization between child aggregator and leaf node of database, sometimes I am running into Leaf Node Memory Error. The only solution for this is if I limit my write operations to 1k records per second, I can get rid of the error.
dataStream
.map(line => readJsonFromString(line))
.grouped(memsqlBatchSize)
.foreach { recordSet =>
val dbRecords = recordSet.map(m => (m, Events.transform(m)))
dbRecords.map { record =>
try {
Events.setValues(eventInsert, record._2)
eventInsert.addBatch
} catch {
case e: Exception =>
logger.error(s"error adding batch: ${e.getMessage}")
val error_event = Events.jm.writeValueAsString(mapAsJavaMap(record._1.asInstanceOf[Map[String, Object]]))
logger.error(s"event: $error_event")
}
}
// Bulk Commit Records
try {
eventInsert.executeBatch
} catch {
case e: java.sql.BatchUpdateException =>
val updates = e.getUpdateCounts
logger.error(s"failed commit: ${updates.toString}")
updates.zipWithIndex.filter { case (v, i) => v == Statement.EXECUTE_FAILED }.foreach { case (v, i) =>
val error = Events.jm.writeValueAsString(mapAsJavaMap(dbRecords(i)._1.asInstanceOf[Map[String, Object]]))
logger.error(s"insert error: $error")
logger.error(e.getMessage)
}
}
finally {
connection.commit
eventInsert.clearBatch
logger.debug(s"committed: ${dbRecords.length.toString}")
}
}
The reason for 1k records is that, some of the data that I am trying to write can contains tons of json records and if batch size if 500, that may result in 30k records per second. Is there any way so that I can make sure that only 1000 records will be written to the database in a batch irrespective of the number of records?
I don't think Thead.sleep is a good idea to handle this situation. Generally we don't recommend to do so in Scala and we don't want to block the thread in any case.
One suggestion would be using any Streaming techniques such as Akka.Stream, Monix.Observable. There are some pro and cons between those libraries I don't want to spend too much paragraph on it. But they do support back pressure to control the producing rate when consumer is slower than producer. For example, in your case your consumer is database writing and your producer maybe is reading some json files and doing some aggregations.
The following code illustrates the idea and you will need to modify as your need:
val sourceJson = Source(dataStream.map(line => readJsonFromString(line)))
val sinkDB = Sink(Events.jm.writeValueAsString) // you will need to figure out how to generate the Sink
val flowThrottle = Flow[String]
.throttle(1, 1.second, 1, ThrottleMode.shaping)
val runnable = sourceJson.via[flowThrottle].toMat(sinkDB)(Keep.right)
val result = runnable.run()
The code block is already called by a thread and there are multiple threads running in parallel. Either I can use Thread.sleep(1000) or delay(1.0) in this scala code. But if I use delay() it will use a promise which might have to call outside the function. Looks like Thread.sleep() is the best option along with batch size of 1000. After performing the testing, I could benchmark 120,000 records/thread/sec without any problem.
According to the architecture of memsql, all loads into memsql are done into a rowstore first into the local memory and from there memsql will merge into the columnstore at the end leaves. That resulted into the leaf error everytime I pushed more number of data causing bottleneck. Reducing the batchsize and introducing a Thread.sleep() helped me writing 120,000 records/sec. Performed testing with this benchmark.

Spark RDD.isEmpty costs much time

I built a Spark cluster.
workers:2
Cores:12
Memory: 32.0 GB Total, 20.0 GB Used
Each worker gets 1 cpu, 6 cores and 10.0 GB memory
My program gets data source from MongoDB cluster. Spark and MongoDB cluster are in the same LAN(1000Mbps).
MongoDB document format:
{name:string, value:double, time:ISODate}
There is about 13 million documents.
I want to get the average value of a special name from a special hour which contains 60 documents.
Here is my key function
/*
*rdd=sc.newAPIHadoopRDD(configOriginal, classOf[com.mongodb.hadoop.MongoInputFormat], classOf[Object], classOf[BSONObject])
Apache-Spark-1.3.1 scala doc: SparkContext.newAPIHadoopFile[K, V, F <: InputFormat[K, V]](path: String, fClass: Class[F], kClass: Class[K], vClass: Class[V], conf: Configuration = hadoopConfiguration): RDD[(K, V)]
*/
def findValueByNameAndRange(rdd:RDD[(Object,BSONObject)],name:String,time:Date): RDD[BasicBSONObject]={
val nameRdd = rdd.map(arg=>arg._2).filter(_.get("name").equals(name))
val timeRangeRdd1 = nameRdd.map(tuple=>(tuple, tuple.get("time").asInstanceOf[Date]))
val timeRangeRdd2 = timeRangeRdd1.map(tuple=>(tuple._1,duringTime(tuple._2,time,getHourAgo(time,1))))
val timeRangeRdd3 = timeRangeRdd2.filter(_._2).map(_._1)
val timeRangeRdd4 = timeRangeRdd3.map(x => (x.get("name").toString, x.get("value").toString.toDouble)).reduceByKey(_ + _)
if(timeRangeRdd4.isEmpty()){
return basicBSONRDD(name, time)
}
else{
return timeRangeRdd4.map(tuple => {
val bson = new BasicBSONObject()
bson.put("name", tuple._1)
bson.put("value", tuple._2/60)
bson.put("time", time)
bson })
}
}
Here is part of Job information
My program works so slowly. Does it because of isEmpty and reduceByKey? If yes, how can I improve it ? If not, why?
=======update ===
timeRangeRdd3.map(x => (x.get("name").toString, x.get("value").toString.toDouble)).reduceByKey(_ + _)
is on the line of 34
I know reduceByKey is a global operation, and may costs much time, however, what it costed is beyond my budget. How can I improvet it or it is the defect of Spark. With the same calculation and hardware, it just costs several seconds if I use multiple thread of java.
First, isEmpty is merely the point at which the RDD stage ends. The maps and filters do not create a need for a shuffle, and the method used in the UI is always the method that triggers a stage change/shuffle...in this case isEmpty. Why it's running slow is not as easy to discern from this perspective, especially without seeing the composition of the originating RDD. I can tell you that isEmpty first checks the partition size and then does a take(1) and verifies whether data was returned or not. So, the odds are that there is a bottle neck in the network or something else blocking along the way. It could even be a GC issue... Click into the isEmpty and see what more you can discern from there.

Resources