setting consistency level from property - cassandra

I'm using dse driver 3.6.8, java 8, spring 3.2.18
I'm trying to set different consistency levels for each table
The desired consitency levels are stored in a propoerty file
<entry key="consistency.level.strongWriteLevel">EACH_QUORUM</entry>
<entry key="consistency.level.strongReadLevel">LOCAL_QUORUM</entry>
<entry key="consistency.level.lightWriteLevel">TWO</entry>
<entry key="consistency.level.lightReadLevel">ONE</entry>
I tried this
#Component
#Table(name = "someName",
readConsistency = "${consistency.level.strongReadLevel}",
writeConsistency = "${consistency.level.strongWriteLevel}")
public class MMBaseLoginHistory {
but it didn't work.
I know I can set the CL on the mapper which overrides the #Table CL, but I wanted to know at least if it was possible.
I tried multiple variations of this code, with or without #Component
by adding a field
#Value("${consistency.level.strongReadLevel}")
private String strongReadLevel;
and then try to reffer to it
#Component
#Table(name = "someName",
readConsistency = strongReadLevel)
public class MMBaseLoginHistory {
none of the previous worked
EDIT:
I found this solution, but it doesn't stisfies me at all
import static com.cardlinkin.mm.model.beans.MMBaseLoginHistory.writeConsistencyLevel;
import static com.cardlinkin.mm.model.beans.MMBaseLoginHistory.readConsistencyLevel;
#Component
#Table(name = "someName",
writeConsistency = writeConsistencyLevel,
readConsistency = readConsistencyLevel)
public class MMBaseLoginHistory {
#Value("${consistency.level.strongWriteLevel}")
public static final String writeConsistencyLevel = "";
#Value("${consistency.level.strongReadLevel}")
public static final String readConsistencyLevel = "";

This isn't a direct answer to your question but I wanted to point out that our general recommendation is to use LOCAL_QUORUM for both reads and writes. There are very limited edge cases where other consistency levels such as ONE is an appropriate choice.
For example, EACH_QUORUM is very expensive and you need to be fully aware of the penalty you'll incur in situations where the Cassandra DCs are in different geographic locations. The price of requiring quorum acknowledgements from remote DCs can be very costly the higher the latency is across the network.
If the C* DCs are located in the same physical location, or if the distance/latency between the DCs is very negligible then the cost of EACH_QUORUM is low and you should expect that the mutations are replicated successfully so there's no real benefit using an expensive CL.
Similarly, a consistency of ONE is only recommended for use cases where consistency really doesn't matter. For example a social feed where if a post doesn't appear on someone's timeline, it won't make a huge difference since the user will see the post the next time they refresh their feed. Cheers!

Related

Spark Broadcast Variable with only one task using it

To my understanding, Spark works like this:
For standard variables, the Driver sends them together with the lambda (or better, closure) to the executors for each task using them.
For broadcast variables, the Driver sends them to the executors only once, the first time they are used.
Is there any advantage to use a broadcast variable instead of a standard variable when we know it is used only once, so there would be only one transfer even in case of a standard variable?
Example (Java):
public class SparkDriver {
public static void main(String[] args) {
String inputPath = args[0];
String outputPath = args[1];
Map<String,String> dictionary = new HashMap<>();
dictionary.put("J", "Java");
dictionary.put("S", "Spark");
SparkConf conf = new SparkConf()
.setAppName("Try BV")
.setMaster("local");
try (JavaSparkContext context = new JavaSparkContext(conf)) {
final Broadcast<Map<String,String>> dictionaryBroadcast = context.broadcast(dictionary);
context.textFile(inputPath)
.map(line -> { // just one transformation using BV
Map<String,String> d = dictionaryBroadcast.value();
String[] words = line.split(" ");
StringBuffer sb = new StringBuffer();
for (String w : words)
sb.append(d.get(w)).append(" ");
return sb.toString();
})
.saveAsTextFile(outputPath); // just one action!
}
}
}
There are several advantages regarding the use of broadcast variables, even if you use only once:
You avoid several problems of serialization. When you serialize an anonymous inner class that uses a field belonging to the external class this involves serializing its enclosing class. Even if spark and other framework has written a workaround to partially mitigate this problem, although sometimes the ClosureCleaner doesn't do the trick. You could avoid the NotSerializableExceptions by performing some tricks i.e.: copy a class member variable into a closure, transform the anonymous inner class into a class and put only the required fields in the constructor etc.
If you use the BroadcastVariable you don't even think about that, the method itself will serialize only the required variable. I suggest reading not-serializable-exception question and the first answer to deepen better the concept.
The serialization performance of the closure is, most of the time, worse than a specialized serialization method. As the official documentation of spark says: data-serialization
Kryo is significantly faster and more compact than Java serialization (often as much as 10x).
Searching on the Spark classes from the official spark repo I had seen that the closure is serialized through the variable SparkEnv.get.closureSerializer. The only assignment of that variable is the one present at line 306 of the SparkEnv class that use the standard and inefficient JavaSerializer.
In that case, if you serialize a big object you lose some performance due to the network bandwidth. This could be also an explanation of why the official doc claiming about to switch to BroadcastVariable for task larger than 20 KiB.
There is only one copy for each machine, in case of more executor on the same phisical machine there is an advantages.
> Broadcast variables allow the programmer to keep a read-only variable cached on each machine rather than shipping a copy of it with tasks
The distribution algorithm is probably a lot more efficient. Using the immutability property of the BroadcastVariable is not difficult to think of distributing following a p2p algorithm instead of a centralized one. Imagine for example from the driver whenever you had finished with the first executor sending the BroadcastVariable to the second, but in parallel the initial executor send the data to the third and so on. Picture kindly provided by the bitTorrent Wikipedia page:
I had no deepen the implementation from spark but, as the documentation of the Broadcast variables says:
Spark also attempts to distribute broadcast variables using efficient broadcast algorithms to reduce communication cost.
Surely a more efficient algorithm than the trivial centralized-one can be designed using the immutability property of the BroadcastVariable.
Long story short: isn't the same thing use a closure or a broadcast variable. If the object that you are sending is big, use a broadcast variable.
Please refer to this excellent article: https://www.mikulskibartosz.name/broadcast-variables-and-broadcast-joins-in-apache-spark/ I could re-write it but it serves the purpose well and answers your question.
In summary:
A broadcast variable is an Apache Spark feature that lets us send a
read-only copy of a variable to every worker node in the Spark
cluster.
The broadcast variable is useful only when we want to:
Reuse the same variable across multiple stages of the Spark job
Speed up joins via a small table that is broadcast to all worker nodes, not all Executors.

Scaling and minimum quorum size

If Kubernetes auto-scales of a Hazelcast IMDG cluster, how can I update the minimum-cluster-size to match the new quorum requirements?
E.g. I have a 3-instance cluster with minimum cluster size set to 2. The cache is getting full, so Kubernetes decides to spin another two instances. How can I update the minimum cluster size to 3 now? Or the other way round, Kubernetes now shuts down 2 instances, how can I update the minimum cluster size back to 2?
I am using Hazelcast embedded with Spring Boot if that matters.
Currently you cannot dynamically change minimum-cluster-size in Hazelcast, so it's not possible.
You can achieve a dynamically changing quorum size by implementing your own custom QuorumFunction and configuring your quorum to use that. For example:
public class DynamicQuorumSize implements QuorumFunction {
private volatile int minimumClusterSize;
#Override
public boolean apply(Collection<Member> members) {
// don't lookup minimumClusterSize here from external services, DBs
// or other slow sources
return members.size() >= minimumClusterSize;
}
// allow updating minimum cluster size
public void setQuorumSize(int minimumClusterSize) {
this.minimumClusterSize = minimumClusterSize;
}
}

Should I use Hazelcast to detect duplicate requests to a REST service

I have a simple usecase. I have a system where duplicate requests to a REST service (with dozens of instances) are not allowed. However, also difficult to prevent because of a complicated datastore configuration and downstream services.
So the only way I can prevent duplicate "transactions" is to have some centralized place where I write a unique hash of a request data. Each REST endpoint first checks if the hash of a new request already exists and only proceeds if no such hash exists.
For purposes of this question assume that it's not possible to do this with database constraints.
One solution is to create a table in the database where I store my request hashes and always write to this table before proceeding with the request. However, I want something lighter than that.
Another solution is to use something like Redis and write to redis my unique hashes before proceeding with the request. However, I don't want to spin up a Redis cluster and maintain it etc..
I was thinking of embedding Hazelcast in each of my app instances and write my unique hashes there. In theory, all instances will see the hash in the memory grid and will be able to detect duplicate requests. This solves my problem of having a lighter solution than a database and the other requirement of not having to maintain a Redis cluster.
Ok now for my question finally. Is it a good idea to use Hazelcast for this usecase?
Will hazelcast be fast enough to detect duplicate requests that come in milliseconds or microseconds apart ?
If request 1 comes into instance 1 and request 2 comes into instance 2 microseconds apart. Instance 1 writes to hazelcast a hash of the request, instance 2 checks hazelcast for existence of the hash only millyseconds later will the hash have be detected? Is hazelcast going to propagate the data across the cluster in time? Does it even need to do that?
Thanks in advance, all ideas are welcome.
Hazelcast is definitely a good choice for this kind of usecase. Especially if you just use a Map<String, Boolean> and just test with Map::containsKey instead of retrieving the element and check for null. You should also put a TTL when putting the element, so you won't run out of memory. However, same as with Redis, we recommend to use Hazelcast with a standalone cluster for "bigger" datasets, as the lifecycle of cached elements normally interferes with the rest of the application and complicates GC optimization. Running Hazelcast embedded is a choice that should be taken only after serious considerations and tests of your application at runtime.
Yes you can use Hazelcast distributed Map to detect duplicate requests to a REST service as whenever there is put operation in hazelcast map data will be available to all the other clustered instance.
From what I've read and seen in the tests, it doesn't actually replicate. It uses a data grid to distribute the primary data evenly across all the nodes rather than each node keeping a full copy of everything and replicating to sync the data. The great thing about this is that there is no data lag, which is inherent to any replication strategy.
There is a backup copy of each node's data stored on another node, and that obviously depends on replication, but the backup copy is only used when a node crashes.
See the below code which creates two hazelcast clustered instances and get the distributed map. One hazelcast instance putting the data into distibuted IMap and other instance is getting data from the IMap.
import com.hazelcast.config.Config;
import com.hazelcast.core.Hazelcast;
import com.hazelcast.core.HazelcastInstance;
import com.hazelcast.core.IMap;
public class TestHazelcastDataReplication {
//Create 1st Instance
public static final HazelcastInstance instanceOne = Hazelcast
.newHazelcastInstance(new Config("distributedFisrtInstance"));
//Create 2nd Instance
public static final HazelcastInstance instanceTwo = Hazelcast
.newHazelcastInstance(new Config("distributedSecondInstance"));
//Insert in distributedMap using instance one
static IMap<Long, Long> distributedInsertMap = instanceOne.getMap("distributedMap");
//Read from distributedMap using instance two
static IMap<Long, Long> distributedGetMap = instanceTwo.getMap("distributedMap");
public static void main(String[] args) {
new Thread(new Runnable() {
#Override
public void run() {
for (long i = 0; i < 100000; i++) {
//Inserting data in distributedMap using 1st instance
distributedInsertMap.put(i, System.currentTimeMillis());
//Reading data from distributedMap using 2nd instance
System.out.println(i + " : " + distributedGetMap.get(i));
}
}
}).start();
}
}

Service Fabric - (reaching MaxReplicationMessageSize) Huge amount of data in a reliable dictionary

EDIT question summary:
I want to expose an endpoints, that will be capable of returning portions of xml data by some query parameters.
I have a statefull service (that is keeping the converted to DTOs xml data into a reliable dictionary)
I use a single, named partition (I just cant tell which partition holds the data by the query parameters passed, so I cant implement some smarter partitioning strategy)
I am using service remoting for communication between the stateless WEBAPI service and the statefull one
XML data may reach 500 MB
Everything is OK when the XML only around 50 MB
When data gets larger I Service Fabric complaining about MaxReplicationMessageSize
and the summary of my few questions from below: how can one achieve storing large amount of data into a reliable dictionary?
TL DR;
Apparently, I am missing something...
I want to parse, and load into a reliable dictionary huge XMLs for later queries over them.
I am using a single, named partition.
I have a XMLData stateful service that is loading this xmls into a reliable dictionary in its RunAsync method via this peace of code:
var myDictionary = await this.StateManager.GetOrAddAsync<IReliableDictionary<string, List<HospitalData>>>("DATA");
using (var tx = this.StateManager.CreateTransaction())
{
var result = await myDictionary.TryGetValueAsync(tx, "data");
ServiceEventSource.Current.ServiceMessage(this, "data status: {0}",
result.HasValue ? "loaded" : "not loaded yet, starts loading");
if (!result.HasValue)
{
Stopwatch timer = new Stopwatch();
timer.Start();
var converter = new DataConverter(XmlFolder);
List <Data> data = converter.LoadData();
await myDictionary.AddOrUpdateAsync(tx, "data", data, (key, value) => data);
timer.Stop();
ServiceEventSource.Current.ServiceMessage(this,
string.Format("Loading of data finished in {0} ms",
timer.ElapsedMilliseconds));
}
await tx.CommitAsync();
}
I have a stateless WebApi service that is communicating with the above stateful one via service remoting and querying the dictionary via this code:
ServiceUriBuilder builder = new ServiceUriBuilder(DataServiceName);
DataService DataServiceClient = ServiceProxy.Create<IDataService>(builder.ToUri(),
new Microsoft.ServiceFabric.Services.Client.ServicePartitionKey("My.single.named.partition"));
try
{
var data = await DataServiceClient.QueryData(SomeQuery);
return Ok(data);
}
catch (Exception ex)
{
ServiceEventSource.Current.Message("Web Service: Exception: {0}", ex);
throw;
}
It works really well when the XMLs do not exceeds 50 MB.
After that I get errors like:
System.Fabric.FabricReplicationOperationTooLargeException: The replication operation is larger than the configured limit - MaxReplicationMessageSize ---> System.Runtime.InteropServices.COMException
Questions:
I am almost certain that it is about the partitioning strategy and I need to use more partitions. But how to reference a particular partition while in the context of the RunAsync method of the Stateful Service? (Stateful service, is invoked via the RPC in WebApi where I explicitly point out a partition, so in there I can easily chose among partitions if using the Ranged partitions strategy - but how to do that while the initial loading of data when in the Run Async method)
Are these thoughts of mine correct: the code in a stateful service is operating on a single partition, thus Loading of huge amount of data and the partitioning of that data should happen outside the stateful service (like in an Actor). Then, after determining the partition key I just invoke the stateful service via RPC and pointing it to this particular partition
Actually is it at all a partitioning problem and what (where, who) is defining the Size of a Replication Message? I.e is the partiotioning strategy influencing the Replication Message sizes?
Would excerpting the loading logic into a stateful Actor help in any way?
For any help on this - thanks a lot!
The issue is that you're trying to add a large amount of data into a single dictionary record. When Service Fabric tries to replicate that data to other replicas of the service, it encounters a quota of the replicator, MaxReplicationMessageSize, which indeed defaults to 50MB (documented here).
You can increase the quota by specifying a ReliableStateManagerConfiguration:
internal sealed class Stateful1 : StatefulService
{
public Stateful1(StatefulServiceContext context)
: base(context, new ReliableStateManager(context,
new ReliableStateManagerConfiguration(new ReliableStateManagerReplicatorSettings
{
MaxReplicationMessageSize = 1024 * 1024 * 200
}))) { }
}
But I strongly suggest you change the way you store your data. The current method won't scale very well and isn't the way Reliable Collections were meant to be used.
Instead, you should store each HospitalData in a separate dictionary item. Then you can query the items in the dictionary (see this answer for details on how to use LINQ). You will not need to change the above quota.
PS - You don't necessarily have to use partitioning for 500MB of data. But regarding your question - you could use partitions even if you can't derive the key from the query, simply by querying all partitions and then combining the data.

Dealing with deadlocks in long-running Hibernate transactions

I have a Hibernate application that may produce concurrent inserts and updates (via Session.saveOrUpdate) to records with the same primary key, which is assigned. These transactions are somewhat long-running, perhaps 15 seconds on average (since data is collected from remote sources and persisted as it comes in). My DB isolation level is set to Read Committed, and I'm using MySQL and InnoDB.
The problem is this scenario creates excessive lock waits which timeout, either as a result of a deadlock or the long transactions. This leads me to a few questions:
Does the database engine only release its locks when the transaction is committed?
If this is the case, should I seek to shorten my transactions?
If so, would it be a good practice to use separate read and write transactions, where the write transaction could be made short and only take place after all of my data is gathered (the bulk of my transaction length involves collecting remote data).
Edit:
Here's a simple test that approximates what I believe is happening. Since I'm dealing with long running transactions, commit takes place long after the first flush. So just to illustrate my situation I left commit out of the test:
#Entity
static class Person {
#Id
Long id = Long.valueOf(1);
#Version
private int version;
}
#Test
public void updateTest() {
for (int i = 0; i < 5; i++) {
new Thread() {
public void run() {
Session s = sf.openSession();
Transaction t = s.beginTransaction();
Person p = new Person();
s.saveOrUpdate(p);
s.flush(); // Waits...
}
}.run();
}
}
And the queries that this expectantly produces, waiting on the second insert:
select id, version from person where id=?
insert into person (version, id) values (?, ?)
select id, version from person where id=?
insert into person (version, id) values (?, ?)
That's correct, the database releases locks only when the transaction is committed. Since you're using hibernate, you can use optimistic locking, which does locks the database for long periods of time. Essentially, hibernate does what you suggest, separating the reading and writing portions into separate transactions. On write, it checks that the data in memory has not been changed concurrently in the database.
Hibernate Reference - Optimistic Transactions
Opportunistic locking:
Base assumption: update conflicts do occur seldom.
Mechanic:
Read dataset with version field
Change dataset
Update dataset
3.1.Read Dataset with current Version field and key
If you get it, nobody has changed the record.
Apply the next version field value.
update the record.
If you do not get it, the record has been changed, return en aproriate message to the caller and you are done
Inserts are not affected, you either
have a separate primary key anyway
or you accept multiple record with identical values.
Therefore the example given above is not a case for optimistic locking.

Resources