Is there any guidance on how to configure a grid with nodes that have different roles?
For example I have some nodes that can store data and others that should only have a near cache - effectively clients of the grid.
Is this done by specifying separate config files for each node type or by overriding settings on different nodes in code?
Also generally what are the resolution rules for deploying differing configurations for the same entity (e.g. a particular named cache), or is this the mechanism to solve the above?
Thanks, Joe
You can configure same cache on different grid nodes with different distribution modes. I think this link is what you are looking for: Cache Distribution Mode
Related
I have one application running over NodeJS and I am trying to make a distributed app. All write request goes to Node application and it writes to CouchDB A and on success of that It writes to CouchDB B. We read data through ELB(which reads from the 2 DBs).It's working fine.
But I faced a problem recently, my CouchDB B goes down and after CouchDB B up, now there is document _rev mismatch between the 2 instances.
What would be the best approach to resolve the above scenario without any down time?
If your CouchDB A & CouchDB B are in the same data centre, then #Flimzy's suggestion of using CouchDB 2.0 in a clustered deployment is a good one. You can have n CouchDB nodes configured in a cluster with a load balancer sitting above the cluster, delivering HTTP(s) traffic to any node that is "up".
If A & B are geographically separated, you can use CouchDB Replication to move data from A-->B and B-->A which would keep both instances perfectly in sync. A & B could each be clusters of 3 or more CouchDB 2.0 nodes, or single instances of CouchDB 1.7.
None of these solutions will "fix" the problem you are seeing when two copies of the database are modified in different ways at the same time. This "conflict" state is CouchDB's way of preventing data loss when two writes clash. Your app can resolve the conflict by picking a winning revision or writing a new one. It's not a fault condition, it's helping your application recover from a data loss during concurrent writes in a distributed system.
You can read more about document conflicts in this blog post series.
If both of your 1.6.x nodes are syncing buckets using standard replication, turning off one node shouldn’t be an issue. On node up it receives all updates without having conflicts – because there were no way to make them, the node was down.
If you experience conflicts during normal operation, unfortunately there exist no common general way to resolve them automatically. However, in most cases you can find a strategy of marking affected doc subtrees in a way allowing to determine which subversion is most recent (or more important).
To detect docs that have conflicts you may use standard views: a doc received by a view function has the _conflicts property if there exist conflicting revisions. Using appropriate view you can detect conflicts and merge docs. Anyway, regardless of how you detect conflicts, you need external code for resolving them.
If your conflicting data is numeric by nature, consider using CRDT structures and standard map/reduce to obtain final value. If your data is text-like you may also try to use CRDT, but to obtain reasonable performance you need to use reducers written in Erlang.
As for 2.x. I do not recommend using 2.x for your case (actually, for any real case except experiments). First, using 2.x will not remove conflicts, so it does not solve your problem. Also taking in account 2.x requires a lot of poorly documented manual operations across nodes and is unable to rebalance, you will get more pain than value.
BTW using any cluster solution have very little sense for two nodes.
As for above mentioned CVE 12635 and CouchDB 1.6.x: you can use this patch https://markmail.org/message/kunbxk7ppzoehih6 to cover the vulnerability.
Read from a few places which indicate mapstore and maploader must be running with the Hazelcast Node. Would like to find out if there any ways to implement mapstore/maploader separate from the Hazelcast Node?
Background:
If i have a hazelcast cluster for the team, and this cluster is to be use by different sub-team providing different map as data, and each sub-team should implement mapstore/maploader for the map they own, how can this be done? (Note that each sub-team have their own SVN repository)
Thanks in advance~
MapLoader's load() operation is only invoked on the node that would have the key when the key is missing, so there is no way to push this processing elsewhere.
However, each map can have a different MapStore/MapLoader implementation, so having a different team provide each is certainly feasible.
Exactly how you achieve this comes down to your build and deploy practices. For example, each team's classes could be in a separate jar file on the classpath. Or, there could be a single jar file constructed containing the classes provided by each team. Many ways exist!
I'm currently evaluating using Hazelcast for our software. Would be glad if you could help me elucidate the following.
I have one specific requirement: I want to be able to configure distributed objects (say maps, queues, etc.) dynamically. That is, I can't have all the configuration data at hand when I start the cluster. I want to be able to initialise (and dispose) services on-demand, and their configuration possibly to change in-between.
The version I'm evaluating is 3.6.2.
The documentation I have available (Reference Manual, Deployment Guide, as well as the "Mastering Hazelcast" e-book) are very skimpy on details w.r.t. this subject, and even partially contradicting.
So, to clarify an intended usage: I want to start the cluster; then, at some point, create, say, a distributed map structure, use it across the nodes; then dispose it and use a map with a different configuration (say, number of backups, eviction policy) for the same purposes.
The documentation mentions, and this is to be expected, that bad things will happen if nodes have different configurations for the same distributed object. That makes perfect sense and is fine; I can ensure that the configs will be consistent.
Looking at the code, it would seem to be possible to do what I intend: when creating a distributed object, if it doesn't already have a proxy, the HazelcastInstance will go look at its Config to create a new one and store it in its local list of proxies. When that object is destroyed, its proxy is removed from the list. On the next invocation, it would go reload from the Config. Furthermore, that config is writeable, so if it has been changed in-between, it should pick up those changes.
So this would seem like it should work, but given how silent the documentation is on the matter, I'd like some confirmation.
Is there any reason why the above shouldn't work?
If it should work, is there any reason not to do the above? For instance, are there plans to change the code in future releases in a way that would prevent this from working?
If so, is there any alternative?
Changing the configuration on the fly on an already created Distributed object is not possible with the current version though there is a plan to add this feature in future release. Once created the map configs would stay at node level not at cluster level.
As long as you are creating the Distributed map fresh from the config, using it and destroying it, your approach should work without any issues.
I am exploring the notion of using Hazelcast (or any another caching framework) to advertise services within a cluster. Ideally when a cluster member departs then its services (or objects advertising them) should be removed from the cache.
Is this at all possible?
It is possible for sure.
The question is: which solution do you like.
If the services can be stored in a map, you could create a map with a ttl of e.g. a few minutes and each member needs to refresh its service to prevent the services from expiring.
An alternative solution is to listen to member changes using the membershiplistener and once a member leaves, the services that belong to that member need to be removed from the map.
If you don't like none of this, you could create your own SPI based implementation. The SPI is the lower level infrastructure used by hazelcast to create its distributed datastructures. A lot more work, but also a lot of flexibility.
So there are many solutions.
I am investigating the use of a graph database (like Neo4j - mainly because I need the python bindings) for modeling a real physical network. However, one of the requirements is to be able to track the history of where machines were, the state of network ports etc.
In a relational database, I can quite easily create an 'archive' table that I can use to do historical queries, however, I've been bitten many times with the issues of fixed table schemas and rather awkward left joins all over the place.
Does any one have any suggestions on how it would be best to maintain the historical relations and node properties in a graph database?
Depending on the number of nodes, you might be able to take snapshots of the graph network. Then index each node so that you can query it in each revision of the network.
You could also try versioning each node. Each time a node or one of its vertices changes, copy the node with references to the current version of each node it connects to. Then up the version number of the node you just modified.
Since Neo4J is based on a file system, you can easily keep the versions of your graph database via Git. Then go back and forth between versions to see how the graph was etc.
I know that Sones provides version control within the database.
"... place them under version control and administer various editions ..." Link