Slow hazelcast migration when using index - hazelcast

I'm running a microservice in a openshift environment using hazelcast 4.1.1 and the 2.2.1 kubernetes discovery plugin. I have configured hazelcast in embedded mode and I'm running 4 instances of that service. When I scale down the application from 4 to 3 pods the whole migration does not finish and my application throws exception due to WrongTargetException all the time (after one minute).
I analyzed the diagnostic file and I believe that the error comes from the index calculation. If I disable all my indices on my maps, everthing works like a charm. I think this might be related to https://github.com/hazelcast/hazelcast/issues/18079
It seems that the deserialization of my objects are called for each index separatly. Since we have configured a custom (de-)serializer which also applies some compression (LZ4) the migration takes ages.
Can somebody confirm my assumtions? Or are there any other known issues with index calculation and migration?

Related

Records not showing until Azure Databricks cluster restarted

We have been using Azure Databricks / Delta lake for the last couple of months and recently have started to spot some strange behaviours with loaded records, in particular latest records not being returned unless the cluster is restarted or a specific version number is specified.
For example (returns no records)
df_nw = spark.read.format('delta').load('/mnt/xxxx')
display(df_nw.filter("testcolumn = ???"))
But this does
%sql
SELECT * FROM delta.`/mnt/xxxx` VERSION AS OF 472 where testcolumn = ???
As mentioned above this only seems to be effecting newly inserted records. Has anyone else come across this before?
Any help would be appreciated.
Thanks
Col
Check to see if you've set a staleness limit. If you have, this is expected, if not, please create a support ticket.
https://docs.databricks.com/delta/optimizations/file-mgmt.html#manage-data-recency
Just in case anyone else is having a similar problem, I thought it would be worth sharing the solution I accidentally stumbled across.
Over the last week I was encountering issues with our Databricks cluster, whereby the spark drivers kept crashing with resource intensive workloads. After a lot of investigations, it turned out that our cluster was in Standard (Single User) mode. So, I spun up a new High Concurrency cluster.
The issue was still occasionally appearing on the High Concurrency cluster, so I decided to flip the notebook to the old cluster, which was still in an active state, and the newly loaded data was there to be queried. This led me to believe that Databricks / Spark Engine was not refreshing the underlying data set and using a previously cached version of the data even though I hadn’t explicitly cached the underlying data set.
By running %sql CLEAR CACHE the data appeared as expected.

Hazelcast’s HTTP-based health check behaviour

We have enabled Hazelcast’s HTTP-based health check implementation which provides basic information about your cluster and member on the
node which it is launched.
i.e http://<member’s host IP>:5701/hazelcast/health
and getting output as below:
Hazelcast::NodeState=ACTIVE
Hazelcast::ClusterState=ACTIVE
Hazelcast::ClusterSafe=TRUE
Hazelcast::MigrationQueueSize=0
Hazelcast::ClusterSize=5
Our cluster size is 5 but sometimes monitor reports size 3 or 4 or 2.
Can someone explain on which parameters clustersize is determined means how hazelcast member failure detection works?
If the clustersize is not steady it mean nodes could be dropping from the cluster. this could be due to network issue or nodes running out of resources.
Hazelcast failure detection is explained along with how to select and fine tune detection rules here Here - Reference Manual

Does "spring data cassandra" have client side loadbalancing?

I'm operating project using spring-boot, spring-data-cassandra.
When I setup that project, I set cassandra properties by ip and port.
(referred by https://www.baeldung.com/spring-data-cassandra-tutorial)
When set it up like this, If I had 3 cassandra nodes and 1 cassandra node died, I think project should fail to connect with cassandra at a 33% probability.
But my project was fine even though 1 cassandra node was dead. (just have some error on one's deathbed)
Do It happen to have A function in spring-data-cassandra like client-side-loadbalancing?
If they have that function, Where can I see that code??
I tried to find that code but failed.
Please give me a little clue.
Spring Data Cassandra relies on the functionality of the DataStax Java driver that is responsible for making everything works. This includes:
establishing the initial connection to the cluster. This is where the contact points play their role. After driver is connected to any of points, it reads information about the whole cluster and establishes connections to all nodes (by default)
establishing the control connection that is used to receive notifications about changes in the cluster - nodes going up & down, changes in schema, etc. If node goes down or up, this information is used to modify the list of the active nodes
providing the load balancing of requests based on the replication, and nodes availability - if the node is down, it's excluded from list of candidates, so we don't send queries to node that is known to be down

Adding [hazelcast-jet] to existing Application

I have an existing application that uses Hazelcast for tracking cluster membership and for distributed task execution. I'm thinking that Jet could be useful for adding analytics on top of the existing application, and I'm trying to figure out how best to layer Jet on top of what we already have.
So my first question is, how should run Jet on top of our existing Hazelcast configuration? Do I have to run Jet separately, or replace our existing Hazelcast configuration with Jet (since Jet does expose the HazelcastInstance.)
My second question is, I see lots of examples using IMap and IList, but I'm not seeing anything that uses topics as a source (I also don't see this as an option from the Sources builder). My initial thought on using Jet was to emit events (io perf data, http request data) from our existing code to a topic and then have Jet process that topic, generate analytics from that data, and then push that to an IMap. Is this the wrong approach? Should I be using some other structure to push these events into Jet? I saw that I can make my own custom Source where I could do this, but I felt that I must be going down the wrong path if I was pursuing this given there wasn't one already provided by the library for this specific purpose.
You can either upgrade your current Hazelcast IMDG cluster to a Jet cluster and run your legacy application alongside Jet jobs. This setup is simpler to deploy and operate. Starting an extra cluster for Jet is also perfectly fine. The advantage of it is isolation (cluster lifecycle, failures etc.). Just be aware that you can't combine IMDG 3.x with Jet 4.x clusters.
Use IMap with Journal to connect two jobs or to ingest data into the cluster. It's simplest fault-tolerant option that works OOTB. Jet's data source must be replayable - if the Job fails, it goes back to last state snapshot rewinding the data source offset respectively.
Topic can be used (via Source Builder) but it won't be fault-tolerant (some messages might get lost). Jet achieves fault tolerance by snapshotting the job regularly. In the case of failure, latest snapshot is restored and the data following the snapshot is replayed. Unlike journal, topic consumer can't replay the data using an offset.

Process of deploying changes to Spark-Streaming to production

What is the process followed to make some changes on production in Spark-Streaming without any downtime?
If you are looking for Upgrading Application Code , Please refer to spark-streaming documentation .
Upgrading Application Code If a running Spark Streaming application
needs to be upgraded with new application code, then there are two
possible mechanisms.
The upgraded Spark Streaming application is started and run in
parallel to the existing application. Once the new one (receiving the
same data as the old one) has been warmed up and is ready for prime
time, the old one be can be brought down. Note that this can be done
for data sources that support sending the data to two destinations
(i.e., the earlier and upgraded applications).
The existing application is shutdown gracefully (see
StreamingContext.stop(...) or JavaStreamingContext.stop(...) for
graceful shutdown options) which ensure data that has been received is
completely processed before shutdown. Then the upgraded application
can be started, which will start processing from the same point where
the earlier application left off. Note that this can be done only with
input sources that support source-side buffering (like Kafka, and
Flume) as data needs to be buffered while the previous application was
down and the upgraded application is not yet up. And restarting from
earlier checkpoint information of pre-upgrade code cannot be done. The
checkpoint information essentially contains serialized
Scala/Java/Python objects and trying to deserialize objects with new,
modified classes may lead to errors. In this case, either start the
upgraded app with a different checkpoint directory, or delete the
previous checkpoint directory.
https://spark.apache.org/docs/latest/streaming-programming-guide.html

Resources