I'm in the process of evaluating GridGain and have read and re-read all the documentation I could find. While much of it is very thorough, you can tell that it's mostly written by the developers. It would be great if there were a reference book written by an outsider's perspective.
Anyway, I have five basic questions I'm hoping someone from GridGain can answer and clarify for me.
It's my understanding that GridCacheQueue (and the other Distributed Data Structures) are built on top of the GridCache implementation. Does that mean that each element of the GridCacheQueue is really just a GridCacheElement of the GridCache map, or is each GridCacheQueue a GridCacheElement, or do I have this totally wrong?
If I set a default TTL on the GridCache, will the elements of a GridCacheQueue expire in the TTL time, or does it only apply to GridCacheElements (which might be answered in #1 above)?
Is there a way to make a GridCacheQueue expire after some period of time without having to remove it manually?
If a cache is set-up to be backed-up onto other nodes and the cache is using off-heap memory and/or swap storage, is the off-heap memory and/or swap storage also replicated onto the back-up nodes?
Is it possible to create a new cache dynamically, or can it only be created via configuration when the node is created?
Thanks for any insightful information!
-Colin
After experimenting with a GridCache and a GridCacheQueue, here's what I've learned about my 5 questions:
I don't know how the GridCacheQueue or its elements are attached to a GridCache, but I know that the elements of a GridCacheQueue DO NOT show up as GridCacheElements of the GridCache.
If you set a TTL on a GridCache and add a GridCacheQueue to it, once the elements of the GridCache begin expiring, the GridCacheQueue becomes unusable and will cause a GridRuntimeException to be thrown.
Yes, see #2 above. However, there doesn't seem to be a safe way to test if the queue is still in existence once the elements of the GridCache start to expire.
Still have no information about this yet. Would REALLY like some feedback on that.
That was a question I never should have asked. A GridCache can be created entirely in code and configured.
Let me first of all say that GridGain supports several queue configuration parameters:
Colocated vs. non-colocated. In colocated mode you can have many queues. Each queue will be assigned to some grid node and all the data in that queue will be cached on that grid node. This way, if you have many queues, each queue may be cached on a different node, but queues themselves should be evenly distributed across all nodes. Non-colocated mode, on the other hand is meant for larger queues, where data for the same queue is partitioned across multiple nodes.
Capacity - this parameter defines maximum queue capacity. When queue reaches this capacity it will automatically start evicting elements oldest elements.
Now, let me try to tackle some of these questions.
I believe each element of GridCacheQuery is a separate element in cache, but implementation marks them as internal elements. That is why you don't see these elements when iterating through cache.
TTL should not be used with elements in the queue (GridGain will be adding this feature soon). For now, you should limit the maximum size of the queue by specifying queue 'capacity' at creation time.
I don't believe so, but I think this feature is being added. For now, you can try using org.gridgain.grid.schedule.GridScheduler to schedule a job that will delete a queue later.
The answer is YES. Both, data in off-heap and swap spaces is backed up and replicated the same way as main on-heap cache data.
A cache should be created in configuration, either from code or XML. However, GridGain has a cool notion of GridCacheProjection which allows to create various sub-caches (cache views) on the same cache. For example, if you store Person and Organization classes in the same cache, then you can use cache projection for type Person when working with Person class, and cache projection of type Organization when working with Organization class.
Related
We are using the EventProcessorClient class to read messages from all partitions of an Event Hub.
To enhance the checkpoint writing, we hold a counter and perform the checkpointing after processing multiple messages in ProcessEventAsync instead of after each one.
When the service restarts or gets deployed, we noticed that we re-read previously processed messages due to the fact that the checkpoint was not updated right before stopping.
This blog mentions the pattern of frequent checkpointing and suggests performing one last checkpoint inside the OnPartitionProcessingStopped when implementing EventProcessor<TPartition>.
However, when using EventProcessorClient, the implementation gets wrapped and only the PartitionClosingAsync event gets called, which does not have the required details to update the checkpoint.
Is there a way to update the checkpoint to the latest one when shutting down while using EventProcessorClient?
There are a couple of different ways to do this, with varying degrees of complexity. However, in most scenarios, it's unlikely that checkpointing on shutdown is going to have the effect that you would like it to. For the interested, context is way down below in "Load balancing details".
How-to: A simple approach
The most straightforward approach would be to hold on to the ProcessEventArgs for the last event processed by each partition in a class-level member to do so.
With this, you're still very likely to run into the dueling owners scenario described in the details.
How-to: More complex, less memory, same challenges
If you extend the EventProcessorClient, you could hold onto just the last processed offset for each partition and use it to call the protected UpdateCheckpointAsync method on the processor. This saves you some space, as you're holding onto just a long rather than the state associated with the event args.
With this, you're still very likely to run into the dueling owners scenario described in the details.
How-to: A lot more complexity, more efficient, and without overlaps
For this one to make sense, you'd have to have a stable number of processors - no dynamic scaling.
Checkpointing takes the same approach as bove - you'll extend the processor client and using the offset to call UpdateCheckpointAsync as described above.
To avoid ownership changes and rewinds, you'll assign the processor a set of static partitions and bypass load balancing. The approach is described in the in the Static partition assignment sample, which can be applied when extending the EventProcessorClient.
With this, you do not have the potential for overlapping owners. Since the partitions are statically assigned, you know this node will be the only one to process them.
The trade-off here is that you lose load balancing and dynamic assignment. If the node crashes or loses network connectivity, nobody will be processing the set of partitions that it owns. This generally works well in host environments where there's an orchestrator monitoring nodes and ensuring they're healthy.
Load balancing details
When partitions change owners, it not an ordered hand-off; processors do not coordinate to ensure that the old owner has stopped before the new owner begins reading. As a result, there is often a period of overlap where the old owner has a set of events in memory and won't be aware that a new owner has taken over until it tries to read the next batch.
During this time, both the old and new owners are processing events from the same partition. Both may be emitting checkpoints. If the old owner emits a checkpoint when the processing stops for that partition, there's a good chance it will rewind the checkpoint position to an earlier point than the current owner has written. That causes a bigger rewind if ownership changes again before the new owner emits a checkpoint.
We generally recommend that you expect to rewind by one checkpoint any time you're scaling the number of processors or deploying/rebooting nodes. This is something to take into account when devising your checkpoint strategy.
I am using client-server topology for hazelcast cache. I have multiple maps which I load eagerly using MapLoaders. When there is a cache miss- Maploader's load(key) method is called. The MapLoader.load(key) method seems to be executed by the partition thread, which means that all other operations on the partition are blocked until loading is done. A very common use case for the MapLoader is to load data from a DB, which arguably can take some time. So what is the best possible approach to take so that other operations on partition are not blocked when laod is taking place? Is there any other way to load missing data at runtime?(Hazelcast version : Hazelcast 4.0.3)
There's a good answer to this question that gives a few options.
MapLoader.load(key) only loads a single entry, but if the remote source is really slow or there are lots of cache misses it's going to mount up.
Another alternative to #mike-yawn 's answer would be to have a Runnable that fetches needed items from the database and writes them directly into the map. You can still have the MapLoader.load(key) as well, but the chances of cache miss are reduced if your fetcher code is good at predicting what entries will be needed.
If you don't cache 100% of the records, then a cache miss is inevitable. If it's punitively slow you could always return a Entry.Value that contains some sort of flag that it's a placeholder and launch a thread to do the actual load. Then your code has to deal with that placeholder and try again later -- noting that when it tries later the eventual result of the database query could be no record found.
I used the embedded hazelcast 4.0.1 in the spring boot project to manage the cache of the project. I set up Near Cache, and also set up the split-brain protection function, which was called Quorum before 4.0.
However, I found a problem. For example, I put the cache operation on a service:
#Cacheable(value ="CacheSpaceName", key ="#id")
public String findById(String id) {
...
}
If the correct data has been cached in Near Cache, even if the split-brain protection is in effect, the service will still return the correct result instead of being rejected by the split-brain protection.
How can I make Near Cache also be controlled by Split Brain Protection? I hope that when split brain occurs, small clusters cannot operate normally, and only large clusters can operate normally.
The following is the near cache configuration and split-brain protection configuration code in the project:
final NearCacheConfig nearCacheConfig = new NearCacheConfig()
.setInMemoryFormat(InMemoryFOrmat.OBJECT)
.setCacheLocalEntries(true)
.setMaxIdleSeconds(xxx);
MapConfig allMapConfig = new MapConfgi.setName("*").setNearCacheConfig(nearCacheConfig)
.setBackupCount(0).setMaxIndleSeconds(xxx).setInMemoryFormat(InMemoryFormat.OBJECT)
.setMergePolicyConfig(xxx)
final SplitBrainProtectionConfig splitBrainProtectionConfig = new SplitBrainProtectionConfig("name", true, 2);
splitBrainProtectionConfig.setProtectOn(SplitBrainProtectionOn.READ_WRITE);
allMapConfig.setSplitBrainProtectionName("name");
config.addSplitBrainProtectionConfig(splitBrainProtectionConfig);
config.addMapConfig(allMapConfig);
NearCache is not covered by Split Brain protection as NearCache is application side caching and Hazelcast built-in split brain protection is meant to protect cluster members(servers).
Additionally to your point -
I hope that when split brain occurs, small clusters cannot operate
normally, and only large clusters can operate normally
In split brain clusters, both sides regardless of their sizes, work fine. Cluster size becomes relevant when network partitioning is resolved and the two sides are ready to merge. Hazelcast deploys a background task that periodically searches for split clusters. When a split is detected, the side that will initiate the merge process is decided. This decision is based on the cluster size; the smaller cluster, by member count, merges into the bigger one. If they have an equal number of members, then a hashing algorithm determines the merging cluster. When deciding the merging side, both sides ensure that there’s no intersection in their member lists.
The problem that I raised that NearCahce is not controlled by the brain protection function is that I hope that when the cluster has a split brain, the small cluster is prohibited from providing any services, and the large cluster can provide services normally. The reason for this demand is that some businesses rely on Hazelcast's cache synchronization function. We hope that the cache can be updated as needed at certain times to avoid using outdated data. If there is a split brain, the cache update cannot be performed in a complete cluster. Therefore, if the small cluster still provides services normally at this time, it is likely to provide wrong services.
There is also a similar "minimum number of members" configuration in the Hazelcast split-brain protection function. When Hazelcast detects that the current number of cluster members is less than this value, certain caching functions of the cluster are prohibited. But because it is only a restriction on the cache operation, and I found that the NearCache cache can't be controlled, I have my question. Although it was later discovered that Hazelcast's split-brain protection may not meet my needs at all.
But now I have found another way to achieve my needs. In fact, it is to use a filter to verify the minimum number of members. Hazelcast's split-brain protection function is no longer needed (currently it is still needed for split-brain recovery, so The merge strategy is also configured normally).
Using Cassandra as Queue:
Is it really that bad?
Setup: 5 node cluster, all operations execute at quorum
Using DateTieredCompaction should significantly reduce the cost of TombStones, and allow entire SSTables to be dropped at once.
We add all messages to the queue with the same TTL
We partition messages based on time (say 1 minute intervals), and keep track of the read-position.
Messages consumed will be explicitly deleted. (only 1 thread extracts messages)
Some Messages may be explicitly deleted prior to being read (i.e. we may have tombstones after the read-position). (i.e. the TTL initially used is an upper limit) gc_grace would probably be set to 0, as quorum reads will do blocking-repair (i.e. we can have repair turned off, as messages only reside in 1 cluster (DC), and all operations a quorum))
Messages can be added/deleted only, no updates allowed.
In our use case, if a tombstone does not replicate its not a big deal, its ok for us to see the same message multiple times occasionally. (Also we would likely not run Repair on regular basis, as all operations are executing at quorum.)
Thoughts?
Generally, it is an anti-pattern, this link talks much of the impact on tombstone: http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets
My opinion is, try to avoid that if possible, but if you really understand the performance impact, and it is not an issue in your architecture, of course you could do that.
Another reason to not do that if possible is, the cassandra data structure is not designed for queues, it will always look ugly, UGLY!
Strongly suggest to consider Redis or RabbitMQ before making your final decision.
I have a Hazelcast (version 3.1.6) map with a map store configured this way:
<map-store enabled="true">
<class-name>com.mypackage.MapStore</class-name>
<write-delay-seconds>60</write-delay-seconds>
</map-store>
So I expected that entry will be stored ~once in 60 seconds (if it is being updated). But instead, if I update map entry, say, 10 times in quick succession, MapStore.store() will be called ~10 times too (but after 60 seconds). The most strange thing that it sometimes get called less than 10 times (but never one time as I want it to). Is there any way to change this behavior? I have a very write-intensive storage so these excessive store() calls create heavy load on it.
Currently that isn't possible. Some customers need to have each update written to storage for auditing purposes. But I think it certainly is an interesting feature to have as a sort of 'write cache' to prevent hitting storage too often.
Can you create a ticket for it? I'll add the appropriate tags.
https://github.com/hazelcast/hazelcast/issues
Once the 3.2 version release on done, we are going to focus a lot less on features, but will be shifting out attention to scalability and performance. Your feature falls nicely in those categories.
Can we provide access to the map's dirty entries as part of the solution. This way there will be more freedom to write any kind of writer for the map.