Time Based Eviction in Hazelcast - hazelcast

I am working on a requirement where i'd have N hazelcast instances running in a cluster and also have kafka consumers running on all of them.
Now the ask is, each message that comes in on kafka, should be added to the distributed map and the entry must be evicted every 20 seconds, which i did by using a combination of time to live and max-idle seconds parameters in the map config.
But what i really want is that when the entry is evicted, only one of the nodes should process it, right now, the entry eviction is being informed to all the nodes.
Let me know if any more information is needed.

You have to add a localEntryListener to your distributed map so that a member will only receive notifications for which it is an owner.
e.g.
if(map != null){
map.addLocalEntryListener(new EntryAddedListener<Long, Long>() {
#Override
public void entryAdded(EntryEvent<Long, Long> event) {
log.info("LOCAL ENTRY ADDED : {} at {}", event, System.currentTimeMillis());
}
});
The above example is for the EntryAddedListener, you can similarly implement a EntryEvictedListener as well.

Related

LMAX Distruptor Partition and join batch

So currently I have a Executor implementation with blocking queue and the implementation specific is like, I have list of items per request and I divide them into partitions each partition is then computed and finally they are joined to have the final list.
How do I go about implementing it in LMAX? I see that once I have partition and push them into RingBuffer, each partition is treated as separate item so I am custom joining them.
something like,
ConcurrentHashMap<Long, LongAdder> map = new ConcurrentHashMap<>();
#Override
public List<SomeTask> score(final List<SomeTask> tasks) {
long id = tasks.get(0).id;
map.put(id, new LongAdder());
for (SomeTask task : tasks) {
producer.onData(task);
}
while (map.get(id).intValue() != tasks.size()) ;
map.remove(id);
return tasks;
}
Is there a clean way to do it ? I looked at https://github.com/LMAX-Exchange/disruptor/tree/master/src/test/java/com/lmax/disruptor/example and KeyedBatching specifically but they seem to batch and execute on one thread.
Currently for me each partition takes up around 200ms and I wanted to parallel execute them.
Any help is greatly appreciated.
I think you should take a look at the worker-pool options and followed by a final event processor which re-combines the shards.

Is there a way to dynamically stop Spark Structured Streaming?

In my scenario I have several dataSet that comes every now and then that i need to ingest in our platform. The ingestion processes involves several transformation steps. One of them being Spark. In particular I use spark structured streaming so far. The infrastructure also involve kafka from which spark structured streaming reads data.
I wonder if there is a way to detect when there is nothing else to consume from a topic for a while to decide to stop the job. That is i want to run it for the time it takes to consume that specific dataset and then stop it. For specific reasons we decided not to use the batch version of spark.
Hence is there any timeout or something that can be used to detect that there is no more data coming it and that everything has be processed.
Thank you
Structured Streaming Monitoring Options
You can use query.lastProgress to get the timestamp and build logic around that. Don't forget to save your checkpoint to a durable, persistent, available store.
Putting together a couple pieces of advice:
As #Michael West pointed out, there are listeners to track progress
From what I gather, Structured Streaming doesn't yet support graceful shutdown
So one option is to periodically check for query activity, dynamically shutting down depending on a configurable state (when you determine no further progress can/should be made):
// where you configure your spark job...
spark.streams.addListener(shutdownListener(spark))
// your job code starts here by calling "start()" on the stream...
// periodically await termination, checking for your shutdown state
while(!spark.sparkContext.isStopped) {
if (shutdown) {
println(s"Shutting down since first batch has completed...")
spark.streams.active.foreach(_.stop())
spark.stop()
} else {
// wait 10 seconds before checking again if work is complete
spark.streams.awaitAnyTermination(10000)
}
}
Your listener can dynamically shutdown in a variety of ways. For instance, if you're only waiting on a single batch, then just shutdown after the first update:
var shutdown = false
def shutdownListener(spark: SparkSession) = new StreamingQueryListener() {
override def onQueryStarted(_: QueryStartedEvent): Unit = println("Query started: " + queryStarted.id)
override def onQueryTerminated(_: QueryTerminatedEvent): Unit = println("Query terminated! " + queryTerminated.id)
override def onQueryProgress(_: QueryProgressEvent): Unit = shutdown = true
}
Or, if you need to shutdown after more complicated state changes, you could parse the json body of the queryProgress.progress to determine whether or not to shutdown at a given onQueryUpdate event firing.
You can probably use this:-
def stopStreamQuery(query: StreamingQuery, awaitTerminationTimeMs: Long) {
while (query.isActive) {
try{
if(query.lastProgress.numInputRows < 10){
query.awaitTermination(1000)
}
}
catch
{
case e:NullPointerException => println("First Batch")
}
Thread.sleep(500)
}
}
You can make a numInputRows variable.

How to trigger the pre-load of Hazelcast NearCache?

I understand that the NearCache gets loaded only after first get operation is performed on that key on the IMap. But I am interested in knowing if there is any way to trigger the pre-load of the NearCache with all the entries from its cluster.
Use Case:
The key is a simple bean object and the value is a DAO object of type TIntHashMap containing lot of entries.
Size:
The size of value object ranges from 0.1MB to 24MB (and >90% of the entries have less than 5MB). The number of entries range from 150-250 in the IMap.
Benchmarks:
The first call to the get operation is taking 2-3 seconds and later calls are taking <10 ms.
Right now I have created below routine which reads the IMap and reads each entries to refresh the NearCache.
long startTime = System.currentTimeMillis();
IMap<Object, Object> map = client.getMap("utility-cache");
log.info("Connected to the Cache cluster. Starting the NearCache refresh.");
int i = 0;
for (Object key : map.keySet()) {
Object value = map.get(key);
if(log.isTraceEnabled()){
SizeOf sizeOfKey = new SizeOf(key);
SizeOf sizeOfValue = new SizeOf(value);
log.info(String.format("Size of %s Key(%s) Object = %s MB - Size of %s Value Object = %s MB", key.getClass().getSimpleName(), key.toString(),
sizeOfKey.sizeInMB(), value.getClass().getSimpleName(), sizeOfValue.sizeInMB()));
}
i++;
}
log.info("Refreshed NearCache with " + i + " Entries in " + (System.currentTimeMillis() - startTime) + " ms");
As you said, the Near Cache gets populated on get() calls on IMap or JCache data structures. At the moment there is no system to automatically preload any data.
For efficiency you can use getAll() which will get the data in batches. This should improve the performance of your own preloading functionality. You can vary your batch sizes until you find the optimum for your use case.
With Hazelcast 3.8 there will be a Near Cache preloader feature, which will store the keys in the Near Cache on disk. When the Hazelcast client is restarted the previous data set will be pre-fetched to re-populate the previous hot data set in the Near Cache as fast as possible (only the keys are stored, the data is fetched again from the cluster). So this won't help for the first deployment, but for all following restarts. Maybe this is already what you are looking for?
You can test the feature in the 3.8-EA or the recent 3.8-SNAPSHOT version. The documentation for the configuration can be found here: http://docs.hazelcast.org/docs/latest-dev/manual/html-single/index.html#configuring-near-cache
Please be aware that we changed the configuration parameter from file-name to filename between EA and the actual SNAPSHOT. I recommend the SNAPSHOT version, since we also made some other improvements in the preloader code.

Spring Kafka Listening to all topics and adjusting partition offsets

Based on the documentation at spring-kafka, I am using Annotation based #KafkaListener to configure my consumer.
What I see is that -
Unless I specify the offset to zero, up on start, Kafka consumer picks up the future messages and not the existing ones. (I understand this is an expected result because I am not specifying the offset to what I want)
I see an option in the documentation to specify a topic + partition combination and along with that an offset of zero, but if I do this - I have to explicitly specify which topic I want my consumer to listen to.
Using approach 2 above, this is how my consumer looks now -
#KafkaListener(id = "{group.id}",
topicPartitions = {
#TopicPartition(topic = "${kafka.topic.name}",
partitionOffsets = #PartitionOffset(partition = "0", initialOffset = "0"))
},
containerFactory = "kafkaListenerContainerFactory")
public void listen(#Payload String payload,
Acknowledgment ack) throws InterruptedException, IOException {
logger.debug("This is what we received in the Kafka Consumer = " + payload);
idService.process(payload);
ack.acknowledge();
}
While I understand that there is an option to specify the "topicPattern" wild card or a "topics" list as a part of the annotation configuration, I don't see a place where I can provide the offset value to start from zero for the topics / topic patterns listed. Is there a way to do a combination of both? Please advise.
When using topics and topicPatterns (rather than explicitly declaring the partitions), Kafka decides which consumer instance will get which partitions.
Kafka will allocate the partitions and the initial offset will be the last committed for that group id. You cannot currently change that offset but we are considering adding a seek function.
If you always want to start at the first available offset, use a unique group id (e.g. UUID.randomUUID().toString()) and set
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
Since Kafka will have no existing offset for that group id it will use that property to determine where to start.
You can also use MANUAL ack mode and never ack, which will effectively do the same thing.

Replaying an RDD in spark streaming to update an accumulator

I am actually running out of options.
In my spark streaming application. I want to keep a state on some keys. I am getting events from Kafka. Then I extract keys from the event, say userID. When there is no events coming from Kafka I want to keep updating a counter relative to each user ID each 3 seconds, since I configured the batchduration of my StreamingContext with 3 seconds.
Now the way I am doing it might be ugly, but at least it works: I have an accumulableCollection like this:
val userID = ssc.sparkContext.accumulableCollection(new mutable.HashMap[String,Long]())
Then I create a "fake" event and keep pushing it to my spark streaming context as the following:
val rddQueue = new mutable.SynchronizedQueue[RDD[String]]()
for ( i <- 1 to 100) {
rddQueue += ssc.sparkContext.makeRDD(Seq("FAKE_MESSAGE"))
Thread.sleep(3000)
}
val inputStream = ssc.queueStream(rddQueue)
inputStream.foreachRDD( UPDATE_MY_ACCUMULATOR )
This would let me access to my accumulatorCollection and update all the counters of all userIDs. Up to now everything works fine, however when I change my loop from:
for ( i <- 1 to 100) {} #This is for test
To:
while (true) {} #This is to let me access and update my accumulator through the whole application life cycle
Then when I run my ./spark-submit, my application gets stuck on this stage:
15/12/10 18:09:00 INFO BlockManagerMasterActor: Registering block manager slave1.cluster.example:38959 with 1060.3 MB RAM, BlockManagerId(1, slave1.cluster.example, 38959)
Any clue on how to resolve this ? Is there a pretty straightforward way that would allow me updating the values of my userIDs (rather than creating an unuseful RDD and pushing it periodically to the queuestream)?
The reason why the while (true) ... version does not work is that the control never returns to the main execution line and therefore nothing below that line gets executed. To solve that specific problem, we should execute the while loop in a separate thread. Future { while () ...} should probably work.
Also, the Thread.sleep(3000) when populating the QueueDStream in the example above is not needed. Spark Streaming will consume one message from the queue on each streaming interval.
A better way to trigger that inflow of 'tick' messages would be with the ConstantInputDStream that plays back the same RDD at each streaming interval, therefore removing the need to create the RDD inflow with the QueueDStream.
That said, it looks to me that the current approach seems fragile and would need revision.

Resources