accessing IMap from EntryProcessor - hazelcast

Can IMap or other Hazelcast distributed data structures like AtomicLong be accessed from within process() method of an EntryProcessor?
I'm getting following exception:
java.util.concurrent.ExecutionException: java.lang.IllegalThreadStateException: Thread[hz.Alcatraz-ANP-Sys-HAZLE-2.actiance.local.partition-operation.thread-5,5,Alcatraz-ANP-Sys-HAZLE-2.actiance.local] cannot make remote call: com.hazelcast.concurrent.lock.operations.LockOperation#3229190f
at ~[na:1.7.0_51]
at java.util.concurrent.FutureTask.get( ~[na:1.7.0_51]
at com.hazelcast.executor.impl.DistributedExecutorService$ ~[hazelcast-3.3_actiance.jar:3.3]
at com.hazelcast.util.executor.CachedExecutorServiceDelegate$ [hazelcast-3.3_actiance.jar:3.3]
at java.util.concurrent.ThreadPoolExecutor.runWorker( [na:1.7.0_51]
at java.util.concurrent.ThreadPoolExecutor$ [na:1.7.0_51]
at [na:1.7.0_51]
at com.hazelcast.util.executor.HazelcastManagedThread.executeRun( [hazelcast-3.3_actiance.jar:3.3]
at [hazelcast-3.3_actiance.jar:3.3]
I'm using Hazelcast version 3.3

You can access other datastructures but you need to make sure they're in the same data partition as the currently processed entry. This means you can use (for example) data affinity to pin all data together in the same partition.
Sharing an IAtomicLong between different partitions is not possible though.
PS: You also shouldn't mutate other data than the current processed entry since it might end up in a deadlock between different nodes.


Recovering from "Offsets out of range with no configured reset policy for partitions"

I've got Spark Structured Streaming application (Spark 2.4.5) which is consuming from Kafka. The application was down for a bit, but when I restarted it I get the below error.
I fully understand why I'm getting the error, and I'm ok with that. But I cannot seem to get around it. Based on the logs I see "Recovering from the earliest offset: 1234332978" but this does seem to be happening. I've tried deleting the 'source' folder in my checkpoint location which also didn't help.
My code is using a mapGroupWithState function, so I do have State data which I don't want to lose, as a result deleting the entire Checkpoint directory isn't my preferred approach.
I have set:
.option("failOnDataLoss", false)
.option("startingOffsets", "latest")
But it seems this only applies to new partitions.
Is there a way to tell Spark to just accept that there are missing offsets and continue? Or some approach to delete the offset data manually without impacting the application 'state'?
20/07/29 01:02:40 WARN InternalKafkaConsumer: Cannot fetch offset 1215191190 (GroupId: spark-kafka-source-f9562fca-ab0c-4f7a-93c3-20506cbcdeb7--1440771761-executor, TopicPartition: cmusstats-0).
Some data may have been lost because they are not available in Kafka any more; either the
data was aged out by Kafka or the topic may have been deleted before all the data in the
topic was processed. If you want your streaming query to fail on such cases, set the source
option "failOnDataLoss" to "true".
org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Offsets out of range with no configured reset policy for partitions: {cmusstats-0=1215191190}
at org.apache.kafka.clients.consumer.internals.Fetcher.parseCompletedFetch(
at org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(
at org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(
at org.apache.spark.sql.kafka010.InternalKafkaConsumer.fetchData(KafkaDataConsumer.scala:470)
at org.apache.spark.sql.kafka010.InternalKafkaConsumer$$anonfun$get$1.apply(KafkaDataConsumer.scala:251)
at org.apache.spark.sql.kafka010.InternalKafkaConsumer$$anonfun$get$1.apply(KafkaDataConsumer.scala:234)
at org.apache.spark.util.UninterruptibleThread.runUninterruptibly(UninterruptibleThread.scala:77)
at org.apache.spark.sql.kafka010.InternalKafkaConsumer.runUninterruptiblyIfPossible(KafkaDataConsumer.scala:209)
at org.apache.spark.sql.kafka010.InternalKafkaConsumer.get(KafkaDataConsumer.scala:234)
at org.apache.spark.sql.kafka010.KafkaDataConsumer$class.get(KafkaDataConsumer.scala:64)
at org.apache.spark.sql.kafka010.KafkaDataConsumer$CachedKafkaDataConsumer.get(KafkaDataConsumer.scala:500)
at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:49)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$
at java.util.concurrent.ThreadPoolExecutor.runWorker(
at java.util.concurrent.ThreadPoolExecutor$
20/07/29 01:02:40 WARN InternalKafkaConsumer: Some data may be lost. Recovering from the earliest offset: 1234332978
20/07/29 01:02:40 WARN InternalKafkaConsumer:
The current available offset range is AvailableOffsetRange(1234332978,1328165875).
Offset 1215191190 is out of range, and records in [1215191190, 1215691190) will be
skipped (GroupId: spark-kafka-source-f9562fca-ab0c-4f7a-93c3-20506cbcdeb7--1440771761-executor, TopicPartition: cmusstats-0).
Some data may have been lost because they are not available in Kafka any more; either the
data was aged out by Kafka or the topic may have been deleted before all the data in the
topic was processed. If you want your streaming query to fail on such cases, set the source
option "failOnDataLoss" to "true".
It turns out, that the Structured Streaming application recovered eventually. For a period of time, many errors about 'Cannot fetch offset' were being logged. But after a period of time the processing continued with the earliest offset.
I cannot explain why I got so many of these errors before the processing starting continuing but it did continue in the end.

Spark Structured Streaming Blue/Green Deployments

We'd like to be able to deploy our Spark jobs such that there isn't any downtime in processing data during deployments (currently there's about a 2-3 minute window). In my mind, the easiest way to do this is to simulate the "blue/green deployment" philosophy, which is to spin up the new version of the Spark job, let it warm up, then shut down the old job. However, with structured streaming & checkpointing, we cannot do this because the new Spark job sees that the latest checkpoint file already exists (from the old job). I've attached a sample error below. Does anyone have any thoughts on a potential workaround?
I thought about copying over the existing checkpoint directory to another checkpoint directory for the newly created job - while that should work as a workaround (some data might get reprocessed, but our DB should deduplicate), this seems super hacky and something I'd rather not pursue.
Caused by: org.apache.hadoop.fs.FileAlreadyExistsException: rename destination /user/checkpoint/job/offsets/3472939 already exists
at org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.validateOverwrite(
at org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.unprotectedRenameTo(
at org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.renameTo(
at org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.renameToInt(
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameTo(
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rename2(
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.rename2(
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$
at org.apache.hadoop.ipc.RPC$
at org.apache.hadoop.ipc.Server$Handler$
at org.apache.hadoop.ipc.Server$Handler$
at Method)
at org.apache.hadoop.ipc.Server$
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
at java.lang.reflect.Constructor.newInstance(
at org.apache.hadoop.ipc.RemoteException.instantiateException(
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(
at org.apache.hadoop.hdfs.DFSClient.rename(
at org.apache.hadoop.fs.Hdfs.renameInternal(
at org.apache.hadoop.fs.AbstractFileSystem.rename(
at org.apache.hadoop.fs.FileContext.rename(
at org.apache.spark.sql.execution.streaming.HDFSMetadataLog$FileContextManager.rename(HDFSMetadataLog.scala:356)
... 20 more
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.fs.FileAlreadyExistsException): rename destination /user/checkpoint/job/offsets/3472939 already exists
It is possible, but it will add some complexity to your application. Starting streams is in general fast, so it is fair to assume, that delay is caused by initialization of static objects and dependencies. In that case you'll need only SparkContext / SparkSession, and no streaming dependencies so process can be described as:
Start new Spark application.
Initialize batch-oriented objects.
Pass message to the previous application to step down.
Wait for confirmation.
Start streams.
At the very high level, the happy path could be visualized as:
Since it is very generic pattern it could be implemented in a different ways, depending on a language and infrastructure:
Lightweight messaging queue like ØMQ.
Passing messages through distributed file system.
Placing applications in an interactive context (Apache Toree, Apache Livy) and using external client for orchestration.

JDBC From Informix to Spark using Dataframes

I can connect to the Informix database using simple JDBC connection program but when I try to load the tables using Spark Dataframes I am facing an exception. Do we need to use a specific connector for Informix spark connection?
Below is the stack trace of the exception:
java.sql.SQLException: System or internal error java.lang.NumberFormatException: For input string: "table_name"
at com.informix.util.IfxErrMsg.getSQLException(
at com.informix.jdbc.IfxChar.toLong(
at com.informix.jdbc.IfxResultSet.getLong(
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.getNext(JDBCRDD.scala:411)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.hasNext(JDBCRDD.scala:472)
at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:241)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.executor.Executor$
at java.util.concurrent.ThreadPoolExecutor.runWorker(
at java.util.concurrent.ThreadPoolExecutor$
When Spark generates the database queries, it puts the column names in quotes. To accommodate this, in the JDBC connection URL you need to add
From stack trace it seems that there is connection to Informix database.
The problem probably is with reading data from Informix. Spark calls getNext() which calls getLong() and getLong() recives 'table_name' which cannot be parsed as a number.
I don't know Spark. Maybe add some details (proably code) about how do you use Spark.

Decommission cassandra node times out with "received only 0 responses"

When I try to decommission a node in my Cassandra cluster, the process starts (I see active streams flowing from the node to decommission to the other nodes in the cluster (using vnodes)), but then after a little delay nodetool decommission exists with the following error message.
I can repeatedly run nodetool decommission and it will start streaming data to other nodes, but so far always exists with the below error.
Why am I seeing this, and is there a way I can safely decommission this node?
Exception in thread "main" java.lang.RuntimeException: org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 0 responses.
at org.apache.cassandra.db.HintedHandOffManager.getHintsSlice(
at org.apache.cassandra.db.HintedHandOffManager.listEndpointsPendingHints(
at org.apache.cassandra.service.StorageService.streamHints(
at org.apache.cassandra.service.StorageService.unbootstrap(
at org.apache.cassandra.service.StorageService.decommission(
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(
at sun.reflect.DelegatingMethodAccessorImpl.invoke(
at java.lang.reflect.Method.invoke(
at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(
at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(
at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(
at com.sun.jmx.mbeanserver.PerInterface.invoke(
at com.sun.jmx.mbeanserver.MBeanSupport.invoke(
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(
at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(
at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(
at java.lang.reflect.Method.invoke(
at sun.rmi.server.UnicastServerRef.dispatch(
at sun.rmi.transport.Transport$
at Method)
at sun.rmi.transport.Transport.serviceCall(
at sun.rmi.transport.tcp.TCPTransport.handleMessages(
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(
at sun.rmi.transport.tcp.TCPTransport$
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(
at java.util.concurrent.ThreadPoolExecutor$
Caused by: org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 0 responses.
at org.apache.cassandra.service.ReadCallback.get(
at org.apache.cassandra.service.StorageProxy.getRangeSlice(
at org.apache.cassandra.db.HintedHandOffManager.getHintsSlice(
... 33 more
The hinted handoff manager is checking for hints to see if it needs to pass those off during
the decommission so that the hints don't get lost. You most likely have a lot of hints, or
a bunch of tombstones, or something in the table causing the query to timeout. You aren't
seeing any other exceptions in your logs before the timeout are you? Raising the read timeout
period on your nodes before you decommission them, or manually deleting the hints CF, should
most likely get your past this. If you delete them, you would then want to make sure you
ran a full cluster repair when you are done with all of your decommissions, to propagate data
from any hints you deleted.
The short answer is that the node I was trying to decommission was underpowered for the amount of data it held. As of this writing there seems to be a reasonable hard minimum of resources needed to handle nodes with arbitrary amounts of data, which seems to be somewhere in the neighborhood of what an AWS i2.2xlarge provides. In particular, the old m1 instances let you get into trouble by allowing you to store far more data on each node than the memory and compute resources available can support on it.

Cassandra 1.1 or 1.2 for production usage?

We are encountering random SSTable corruptions with 1.2.3/1.2.4 (Datastax Community Edition) on single node development machines with a mixed read/write load using a data model with wide rows from a number of columns POV. Writes are more frequent than reads though. The problems manifests with stack traces like:
ERROR [ReadStage:13899] 2013-04-24 07:09:00,770 (line 132) Exception in thread Thread[ReadStage:13899,5,main]
at org.apache.cassandra.service.StorageProxy$
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$ Source)
at Source)
Caused by:
at org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(
... many more
Caused by:
at Source)
... many more
java.lang.RuntimeException: org.apache.cassandra.db.ColumnSerializer$CorruptColumnException: invalid column name length 0
Unfortunately, we don't have a reproducible test case yet, because this happens randomly (e.g. after a few days) and not immediately.
I have also researched similar issues with 1.2 in this/other forum(s).
The question is: What is your experience with Cassandra 1.2 in production or would you recommend 1.1 being 1.2.4 the most recent release to date in the 1.2 series?
While we encounter these issues on single node development environments, things might get backed up when running the whole stuff in a cluster served by several nodes, but in our opinion things should run on a single node without corruption as well.
Any hints are much appreciated. Thanks.
I have better experience with cassandra-1.1 in production. Current version 1.2.6 still do not passes our heavy preproduction testing.
