Configuring Distributed Objects Dynamically - hazelcast

I'm currently evaluating using Hazelcast for our software. Would be glad if you could help me elucidate the following.
I have one specific requirement: I want to be able to configure distributed objects (say maps, queues, etc.) dynamically. That is, I can't have all the configuration data at hand when I start the cluster. I want to be able to initialise (and dispose) services on-demand, and their configuration possibly to change in-between.
The version I'm evaluating is 3.6.2.
The documentation I have available (Reference Manual, Deployment Guide, as well as the "Mastering Hazelcast" e-book) are very skimpy on details w.r.t. this subject, and even partially contradicting.
So, to clarify an intended usage: I want to start the cluster; then, at some point, create, say, a distributed map structure, use it across the nodes; then dispose it and use a map with a different configuration (say, number of backups, eviction policy) for the same purposes.
The documentation mentions, and this is to be expected, that bad things will happen if nodes have different configurations for the same distributed object. That makes perfect sense and is fine; I can ensure that the configs will be consistent.
Looking at the code, it would seem to be possible to do what I intend: when creating a distributed object, if it doesn't already have a proxy, the HazelcastInstance will go look at its Config to create a new one and store it in its local list of proxies. When that object is destroyed, its proxy is removed from the list. On the next invocation, it would go reload from the Config. Furthermore, that config is writeable, so if it has been changed in-between, it should pick up those changes.
So this would seem like it should work, but given how silent the documentation is on the matter, I'd like some confirmation.
Is there any reason why the above shouldn't work?
If it should work, is there any reason not to do the above? For instance, are there plans to change the code in future releases in a way that would prevent this from working?
If so, is there any alternative?

Changing the configuration on the fly on an already created Distributed object is not possible with the current version though there is a plan to add this feature in future release. Once created the map configs would stay at node level not at cluster level.
As long as you are creating the Distributed map fresh from the config, using it and destroying it, your approach should work without any issues.

Related

Recommended ways to deal with database migrations while doing a swap using deployment slots

I am trying to understand the use of deployment slots for hosting my web app using the Azure app service.
I am particularly confused with the ideal ways to deal with the database while the swap is performed.
While maintaining two database versions seems like a solution, it adds the complexity of maintaining data across multiple databases to make them consistent.
What are the recommended ways for dealing with database schema and migrations while using blue/green deployments and in particular deployment slots?
Ideally you'll stage / production would share the same database, so it would not be an issue.
But if you have more slots, then you'd better also work with different databases and handle migrations during the release phase
We've worked through various solutions to this problem for a few years. There's not a toolset that provides a magic bullet for all cases. There are a few solutions:
Smaller databases/trivial changes
If it is possible to execute a migration script on a database that will complete in a second or two, and you can have an easy fallback script, you can execute the script concurrently with the swap. This can also be automated. But it's a higher stress situation and not one I'd recommend. This can even be done with EF Migrations.
Carefully ensure database compatibility between versions
Since we're dealing with a few hundred GB of data that cannot go down, we've just made it a rule that the database has to work with both versions of our application. It's not as awful or impossible as it sounds. For example, net new tables and fields can oftentimes be added before you even perform the swap. We test rollback between versions as part of our QA. If some fields need to be dropped, we wait until after the new version has been deployed and burned in, then run another script to perform the drops after we're sure we won't need rollback. We'll create new stored procedures when one needs to be upgraded so that the new version has its own. Example: sp_foo and sp_foo2.
We've had a lot of success with this strategy.
Slots are a feature specifically for App Services and not for DBs, if you want to use a specific DB with a specific slot then you setup the slot like this:
https://learn.microsoft.com/en-us/azure/app-service/deploy-staging-slots
Now when using Slots and swapping it also swaps App Configurations\Settings, and in App Settings you can have two DB connections strings but each with its own slot name and setting enabled. You can see it has been shown in this example here as well: https://learn.microsoft.com/en-us/azure/app-service/deploy-staging-slots#swap-two-slots

In an Event-Driven Microservice, how to I update private database with older data

I'm working on a new project, and I am still learning about how to use Microservice/Domain Driven Design.
If the recommended architecture is to have a Database-Per-Service, and use Events to achieve eventual consistency, how does the service's database get initialized with all the data that it needs?
If the events indicating an update to the database occurred before the new service/db was ever designed, do I need to start with a copy of the previous database?
Or should I publish a 'New Service On The Block' event, and allow all the other services to vomit back everything back to me again? Which could be a LOT of chatty-ness, and cause performance issues.
how does the service's database get initialized with all the data that it needs?
It asks for it; which is to say that you design a protocol so that the service that is spinning up can get copies of all of the information that it needs. That often includes tracking checkpoints, and queries that allow you to ask what has happened since some checkpoint.
Think "pull", rather than "push".
Part of the point of "services": designing the right data boundaries. The need to copy a lot of data between services often indicates that the service boundaries need to be reconsidered.
There is a special streaming platform named Apache Kafka, that solves something similar.
With Kafka you would publish events for other services to consume. What makes Kafka special is the fact, that events never (depends on configuration) get deleted and can be consumed again by new services spinning up. This feature can be used for initially populating the database (by setting the offset for a Topic to 0 and therefore re-read the history of events).
There also is another feature, called GlobalKTable what is a TableView of all events for a particular Topic. The GlobalKTable holds the latest value for each key (like primary key) and can be turned into an state-store (RocksDB under the hood), what makes it queryable. This state-store initializes itself whenever the application starts up. So the application does not need to have a database itself, because the state-store would be kept up-to-date automatically (consistency still is a thing to keep in mind). Only for more complex queries that state-store would need to be accompanied with a database (with kafka you would try to pre-compute the results of those queries and make them accessible to a distinct state-store itself).
This would be a complex endeavor, but if it suits your needs it is a fun thing to do!

What is the best way to resolve CouchDB document conflicts across 2 DB instances?

I have one application running over NodeJS and I am trying to make a distributed app. All write request goes to Node application and it writes to CouchDB A and on success of that It writes to CouchDB B. We read data through ELB(which reads from the 2 DBs).It's working fine.
But I faced a problem recently, my CouchDB B goes down and after CouchDB B up, now there is document _rev mismatch between the 2 instances.
What would be the best approach to resolve the above scenario without any down time?
If your CouchDB A & CouchDB B are in the same data centre, then #Flimzy's suggestion of using CouchDB 2.0 in a clustered deployment is a good one. You can have n CouchDB nodes configured in a cluster with a load balancer sitting above the cluster, delivering HTTP(s) traffic to any node that is "up".
If A & B are geographically separated, you can use CouchDB Replication to move data from A-->B and B-->A which would keep both instances perfectly in sync. A & B could each be clusters of 3 or more CouchDB 2.0 nodes, or single instances of CouchDB 1.7.
None of these solutions will "fix" the problem you are seeing when two copies of the database are modified in different ways at the same time. This "conflict" state is CouchDB's way of preventing data loss when two writes clash. Your app can resolve the conflict by picking a winning revision or writing a new one. It's not a fault condition, it's helping your application recover from a data loss during concurrent writes in a distributed system.
You can read more about document conflicts in this blog post series.
If both of your 1.6.x nodes are syncing buckets using standard replication, turning off one node shouldn’t be an issue. On node up it receives all updates without having conflicts – because there were no way to make them, the node was down.
If you experience conflicts during normal operation, unfortunately there exist no common general way to resolve them automatically. However, in most cases you can find a strategy of marking affected doc subtrees in a way allowing to determine which subversion is most recent (or more important).
To detect docs that have conflicts you may use standard views: a doc received by a view function has the _conflicts property if there exist conflicting revisions. Using appropriate view you can detect conflicts and merge docs. Anyway, regardless of how you detect conflicts, you need external code for resolving them.
If your conflicting data is numeric by nature, consider using CRDT structures and standard map/reduce to obtain final value. If your data is text-like you may also try to use CRDT, but to obtain reasonable performance you need to use reducers written in Erlang.
As for 2.x. I do not recommend using 2.x for your case (actually, for any real case except experiments). First, using 2.x will not remove conflicts, so it does not solve your problem. Also taking in account 2.x requires a lot of poorly documented manual operations across nodes and is unable to rebalance, you will get more pain than value.
BTW using any cluster solution have very little sense for two nodes.
As for above mentioned CVE 12635 and CouchDB 1.6.x: you can use this patch https://markmail.org/message/kunbxk7ppzoehih6 to cover the vulnerability.

Sharing Hazelcast Cluster across multiple development team

Read from a few places which indicate mapstore and maploader must be running with the Hazelcast Node. Would like to find out if there any ways to implement mapstore/maploader separate from the Hazelcast Node?
Background:
If i have a hazelcast cluster for the team, and this cluster is to be use by different sub-team providing different map as data, and each sub-team should implement mapstore/maploader for the map they own, how can this be done? (Note that each sub-team have their own SVN repository)
Thanks in advance~
MapLoader's load() operation is only invoked on the node that would have the key when the key is missing, so there is no way to push this processing elsewhere.
However, each map can have a different MapStore/MapLoader implementation, so having a different team provide each is certainly feasible.
Exactly how you achieve this comes down to your build and deploy practices. For example, each team's classes could be in a separate jar file on the classpath. Or, there could be a single jar file constructed containing the classes provided by each team. Many ways exist!

Automatic migration between different Persistent backends

I wonder if there's a tool for automatic migrations between different DBMS's using the Persistent package. In theory, it should be relatively easy to do, so I thought there should be a tool already written to do it.
One (maybe hacky) solution is to just create a program that uses mkPersist twice in different modules with the same definitions but different backend configurations, and to then manually perform the copy operation.
There is, however, not a tool currently available to do this as far as I know.

Resources