If Kubernetes auto-scales of a Hazelcast IMDG cluster, how can I update the minimum-cluster-size to match the new quorum requirements?
E.g. I have a 3-instance cluster with minimum cluster size set to 2. The cache is getting full, so Kubernetes decides to spin another two instances. How can I update the minimum cluster size to 3 now? Or the other way round, Kubernetes now shuts down 2 instances, how can I update the minimum cluster size back to 2?
I am using Hazelcast embedded with Spring Boot if that matters.
Currently you cannot dynamically change minimum-cluster-size in Hazelcast, so it's not possible.
You can achieve a dynamically changing quorum size by implementing your own custom QuorumFunction and configuring your quorum to use that. For example:
public class DynamicQuorumSize implements QuorumFunction {
private volatile int minimumClusterSize;
#Override
public boolean apply(Collection<Member> members) {
// don't lookup minimumClusterSize here from external services, DBs
// or other slow sources
return members.size() >= minimumClusterSize;
}
// allow updating minimum cluster size
public void setQuorumSize(int minimumClusterSize) {
this.minimumClusterSize = minimumClusterSize;
}
}
Related
I'm using dse driver 3.6.8, java 8, spring 3.2.18
I'm trying to set different consistency levels for each table
The desired consitency levels are stored in a propoerty file
<entry key="consistency.level.strongWriteLevel">EACH_QUORUM</entry>
<entry key="consistency.level.strongReadLevel">LOCAL_QUORUM</entry>
<entry key="consistency.level.lightWriteLevel">TWO</entry>
<entry key="consistency.level.lightReadLevel">ONE</entry>
I tried this
#Component
#Table(name = "someName",
readConsistency = "${consistency.level.strongReadLevel}",
writeConsistency = "${consistency.level.strongWriteLevel}")
public class MMBaseLoginHistory {
but it didn't work.
I know I can set the CL on the mapper which overrides the #Table CL, but I wanted to know at least if it was possible.
I tried multiple variations of this code, with or without #Component
by adding a field
#Value("${consistency.level.strongReadLevel}")
private String strongReadLevel;
and then try to reffer to it
#Component
#Table(name = "someName",
readConsistency = strongReadLevel)
public class MMBaseLoginHistory {
none of the previous worked
EDIT:
I found this solution, but it doesn't stisfies me at all
import static com.cardlinkin.mm.model.beans.MMBaseLoginHistory.writeConsistencyLevel;
import static com.cardlinkin.mm.model.beans.MMBaseLoginHistory.readConsistencyLevel;
#Component
#Table(name = "someName",
writeConsistency = writeConsistencyLevel,
readConsistency = readConsistencyLevel)
public class MMBaseLoginHistory {
#Value("${consistency.level.strongWriteLevel}")
public static final String writeConsistencyLevel = "";
#Value("${consistency.level.strongReadLevel}")
public static final String readConsistencyLevel = "";
This isn't a direct answer to your question but I wanted to point out that our general recommendation is to use LOCAL_QUORUM for both reads and writes. There are very limited edge cases where other consistency levels such as ONE is an appropriate choice.
For example, EACH_QUORUM is very expensive and you need to be fully aware of the penalty you'll incur in situations where the Cassandra DCs are in different geographic locations. The price of requiring quorum acknowledgements from remote DCs can be very costly the higher the latency is across the network.
If the C* DCs are located in the same physical location, or if the distance/latency between the DCs is very negligible then the cost of EACH_QUORUM is low and you should expect that the mutations are replicated successfully so there's no real benefit using an expensive CL.
Similarly, a consistency of ONE is only recommended for use cases where consistency really doesn't matter. For example a social feed where if a post doesn't appear on someone's timeline, it won't make a huge difference since the user will see the post the next time they refresh their feed. Cheers!
I have a simple usecase. I have a system where duplicate requests to a REST service (with dozens of instances) are not allowed. However, also difficult to prevent because of a complicated datastore configuration and downstream services.
So the only way I can prevent duplicate "transactions" is to have some centralized place where I write a unique hash of a request data. Each REST endpoint first checks if the hash of a new request already exists and only proceeds if no such hash exists.
For purposes of this question assume that it's not possible to do this with database constraints.
One solution is to create a table in the database where I store my request hashes and always write to this table before proceeding with the request. However, I want something lighter than that.
Another solution is to use something like Redis and write to redis my unique hashes before proceeding with the request. However, I don't want to spin up a Redis cluster and maintain it etc..
I was thinking of embedding Hazelcast in each of my app instances and write my unique hashes there. In theory, all instances will see the hash in the memory grid and will be able to detect duplicate requests. This solves my problem of having a lighter solution than a database and the other requirement of not having to maintain a Redis cluster.
Ok now for my question finally. Is it a good idea to use Hazelcast for this usecase?
Will hazelcast be fast enough to detect duplicate requests that come in milliseconds or microseconds apart ?
If request 1 comes into instance 1 and request 2 comes into instance 2 microseconds apart. Instance 1 writes to hazelcast a hash of the request, instance 2 checks hazelcast for existence of the hash only millyseconds later will the hash have be detected? Is hazelcast going to propagate the data across the cluster in time? Does it even need to do that?
Thanks in advance, all ideas are welcome.
Hazelcast is definitely a good choice for this kind of usecase. Especially if you just use a Map<String, Boolean> and just test with Map::containsKey instead of retrieving the element and check for null. You should also put a TTL when putting the element, so you won't run out of memory. However, same as with Redis, we recommend to use Hazelcast with a standalone cluster for "bigger" datasets, as the lifecycle of cached elements normally interferes with the rest of the application and complicates GC optimization. Running Hazelcast embedded is a choice that should be taken only after serious considerations and tests of your application at runtime.
Yes you can use Hazelcast distributed Map to detect duplicate requests to a REST service as whenever there is put operation in hazelcast map data will be available to all the other clustered instance.
From what I've read and seen in the tests, it doesn't actually replicate. It uses a data grid to distribute the primary data evenly across all the nodes rather than each node keeping a full copy of everything and replicating to sync the data. The great thing about this is that there is no data lag, which is inherent to any replication strategy.
There is a backup copy of each node's data stored on another node, and that obviously depends on replication, but the backup copy is only used when a node crashes.
See the below code which creates two hazelcast clustered instances and get the distributed map. One hazelcast instance putting the data into distibuted IMap and other instance is getting data from the IMap.
import com.hazelcast.config.Config;
import com.hazelcast.core.Hazelcast;
import com.hazelcast.core.HazelcastInstance;
import com.hazelcast.core.IMap;
public class TestHazelcastDataReplication {
//Create 1st Instance
public static final HazelcastInstance instanceOne = Hazelcast
.newHazelcastInstance(new Config("distributedFisrtInstance"));
//Create 2nd Instance
public static final HazelcastInstance instanceTwo = Hazelcast
.newHazelcastInstance(new Config("distributedSecondInstance"));
//Insert in distributedMap using instance one
static IMap<Long, Long> distributedInsertMap = instanceOne.getMap("distributedMap");
//Read from distributedMap using instance two
static IMap<Long, Long> distributedGetMap = instanceTwo.getMap("distributedMap");
public static void main(String[] args) {
new Thread(new Runnable() {
#Override
public void run() {
for (long i = 0; i < 100000; i++) {
//Inserting data in distributedMap using 1st instance
distributedInsertMap.put(i, System.currentTimeMillis());
//Reading data from distributedMap using 2nd instance
System.out.println(i + " : " + distributedGetMap.get(i));
}
}
}).start();
}
}
EDIT question summary:
I want to expose an endpoints, that will be capable of returning portions of xml data by some query parameters.
I have a statefull service (that is keeping the converted to DTOs xml data into a reliable dictionary)
I use a single, named partition (I just cant tell which partition holds the data by the query parameters passed, so I cant implement some smarter partitioning strategy)
I am using service remoting for communication between the stateless WEBAPI service and the statefull one
XML data may reach 500 MB
Everything is OK when the XML only around 50 MB
When data gets larger I Service Fabric complaining about MaxReplicationMessageSize
and the summary of my few questions from below: how can one achieve storing large amount of data into a reliable dictionary?
TL DR;
Apparently, I am missing something...
I want to parse, and load into a reliable dictionary huge XMLs for later queries over them.
I am using a single, named partition.
I have a XMLData stateful service that is loading this xmls into a reliable dictionary in its RunAsync method via this peace of code:
var myDictionary = await this.StateManager.GetOrAddAsync<IReliableDictionary<string, List<HospitalData>>>("DATA");
using (var tx = this.StateManager.CreateTransaction())
{
var result = await myDictionary.TryGetValueAsync(tx, "data");
ServiceEventSource.Current.ServiceMessage(this, "data status: {0}",
result.HasValue ? "loaded" : "not loaded yet, starts loading");
if (!result.HasValue)
{
Stopwatch timer = new Stopwatch();
timer.Start();
var converter = new DataConverter(XmlFolder);
List <Data> data = converter.LoadData();
await myDictionary.AddOrUpdateAsync(tx, "data", data, (key, value) => data);
timer.Stop();
ServiceEventSource.Current.ServiceMessage(this,
string.Format("Loading of data finished in {0} ms",
timer.ElapsedMilliseconds));
}
await tx.CommitAsync();
}
I have a stateless WebApi service that is communicating with the above stateful one via service remoting and querying the dictionary via this code:
ServiceUriBuilder builder = new ServiceUriBuilder(DataServiceName);
DataService DataServiceClient = ServiceProxy.Create<IDataService>(builder.ToUri(),
new Microsoft.ServiceFabric.Services.Client.ServicePartitionKey("My.single.named.partition"));
try
{
var data = await DataServiceClient.QueryData(SomeQuery);
return Ok(data);
}
catch (Exception ex)
{
ServiceEventSource.Current.Message("Web Service: Exception: {0}", ex);
throw;
}
It works really well when the XMLs do not exceeds 50 MB.
After that I get errors like:
System.Fabric.FabricReplicationOperationTooLargeException: The replication operation is larger than the configured limit - MaxReplicationMessageSize ---> System.Runtime.InteropServices.COMException
Questions:
I am almost certain that it is about the partitioning strategy and I need to use more partitions. But how to reference a particular partition while in the context of the RunAsync method of the Stateful Service? (Stateful service, is invoked via the RPC in WebApi where I explicitly point out a partition, so in there I can easily chose among partitions if using the Ranged partitions strategy - but how to do that while the initial loading of data when in the Run Async method)
Are these thoughts of mine correct: the code in a stateful service is operating on a single partition, thus Loading of huge amount of data and the partitioning of that data should happen outside the stateful service (like in an Actor). Then, after determining the partition key I just invoke the stateful service via RPC and pointing it to this particular partition
Actually is it at all a partitioning problem and what (where, who) is defining the Size of a Replication Message? I.e is the partiotioning strategy influencing the Replication Message sizes?
Would excerpting the loading logic into a stateful Actor help in any way?
For any help on this - thanks a lot!
The issue is that you're trying to add a large amount of data into a single dictionary record. When Service Fabric tries to replicate that data to other replicas of the service, it encounters a quota of the replicator, MaxReplicationMessageSize, which indeed defaults to 50MB (documented here).
You can increase the quota by specifying a ReliableStateManagerConfiguration:
internal sealed class Stateful1 : StatefulService
{
public Stateful1(StatefulServiceContext context)
: base(context, new ReliableStateManager(context,
new ReliableStateManagerConfiguration(new ReliableStateManagerReplicatorSettings
{
MaxReplicationMessageSize = 1024 * 1024 * 200
}))) { }
}
But I strongly suggest you change the way you store your data. The current method won't scale very well and isn't the way Reliable Collections were meant to be used.
Instead, you should store each HospitalData in a separate dictionary item. Then you can query the items in the dictionary (see this answer for details on how to use LINQ). You will not need to change the above quota.
PS - You don't necessarily have to use partitioning for 500MB of data. But regarding your question - you could use partitions even if you can't derive the key from the query, simply by querying all partitions and then combining the data.
I am using Cassandra 2.1.12 to store the event data in a column family. Below is the c# code for creating the client for the .net which manage connections from Cassandra. Now the problem is rate of insert/update data is very high. So, now let's say i increment a column value in Cassandra on subsequent request. But as i said rate of insert/update is very high. So in my 3 node cluster if first time
i write value of the column would be 1 then in next request i will read the value of this column and update it to 2. But if the value if fetched from other node where the value has not been initialized to 1. Then again value would be stored as 1. So, now to solve this problem i have also kept the value of consistency to be QUORUM. But still the problem persists. Can any one tell me the possible solution for this ?
private static ISession _singleton;
public static ISession GetSingleton()
{
if (_singleton == null)
{
Cluster cluster = Cluster.Builder().AddContactPoints(ConfigurationManager.AppSettings["cassandraCluster"].ToString().Split(',')).Build();
ISession session = cluster.Connect(ConfigurationManager.AppSettings["cassandraKeySpace"].ToString());
_singleton = session;
}
return _singleton;
}
No, It is not possible to achieve your goal in cassandra. The reason is, every distributed application falls within the CAP theorem. According to that, cassandra does not have Consistency.
So in your scenario, you are trying to update a same partition key for many time in multi threaded environment, So it is not guaranteed to see latest data in all the threads. If you try with small interval gap then you might see latest data in all the threads. If your requirement is to increment/decrement the integers then you can go with cassandra counters. But however cassandra counter does not support to retrieve the updated value with in a single request. Which means you can have a request to increment the counter and have a separate request to get the updated value. It is not possible to increment and to get the incremented value in a single request. If you requirement is to only incrementing the value (like counting the number of times a page is viewed) then you can go with cassandra counters. Cassandra counters will not miss any increments/decrements. You can see actual data at last. Hope it helps.
I have three node Cassandra cluster which is serving currently 50 writes/sec. Now, It would be 100 writes/sec and following are the details of my cluster :
Keyspace definition :
CREATE KEYSPACE keyspacename WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'datacenter1' : 3 };
Partitioner :
org.apache.cassandra.dht.RandomPartitioner
I have client in c# (datastax c# driver) and i am using the singleton design pattern or rather creating only one object of cassandra server. Which will be used for writing and reading the data from ring. And reason for doing it was tcp connections were not getting closed on ring. Till now my ring is working fine and it is able to sustain the load of 50 writes/sec. Now it would increase to 100 writes/sec.
So, my question is will the same design pattern would be able to handle the same given the configuration of my ring?
C# code :
public static ISession GetSingleton()
{
if (_singleton == null)
{
Cluster cluster = Cluster.Builder().AddContactPoints(ConfigurationManager.AppSettings["cassandraCluster"].ToString().Split(',')).Build();
ISession session = cluster.Connect(ConfigurationManager.AppSettings["cassandraKeySpace"].ToString());
_singleton = session;
}
return _singleton;
}
From the Cassandra side, 100 writes/sec is quite low. It would handle it easily.
From the client side, I see no problem with your design. According to me, it is a good idea to use Singleton pattern. But I cannot give you an exact answer since I do not know :
What the size of your written data is.
How performant your network is.
Whether you use synchronous or asynchronous execution.
Generaly, we can reasonably consider 10 ms/writes. With synchronous execution, you would be able to write 100 times/sec. But you could not go along indefinitely because the driver would not create more connections.
In the other hand, you can use ExecuteAsync method to execute writes asynchronously. The C# Cassandra driver will manage the connection pool for you.
Another tip I can give you is PreparedStatement.