JCR locking on multi clustered environment - multithreading

I have been struggling to come up with a working solution when it comes to adding content to JCR nodes executed in different instances of the same cluster.
It has been explained here that "when multiple cluster nodes write to the same nodes, those nodes must be locked first".
Which i have done, but i still get stale item exceptions like below:
javax.jcr.InvalidItemStateException: Unable to update a stale item: item.save()
at org.apache.jackrabbit.core.ItemSaveOperation.perform(ItemSaveOperation.java:262)
at org.apache.jackrabbit.core.session.SessionState.perform(SessionState.java:216)
at org.apache.jackrabbit.core.ItemImpl.perform(ItemImpl.java:91)
at org.apache.jackrabbit.core.ItemImpl.save(ItemImpl.java:329)
at org.apache.jackrabbit.core.session.SessionSaveOperation.perform(SessionSaveOperation.java:65)
at org.apache.jackrabbit.core.session.SessionState.perform(SessionState.java:216)
at org.apache.jackrabbit.core.SessionImpl.perform(SessionImpl.java:361)
at org.apache.jackrabbit.core.SessionImpl.save(SessionImpl.java:812)
I also follow the suggested approach on how to lock nodes here, (see 17.10 Locks and Transactions)
here is simplified version of my code when it comes to lock procedures
session.getRootNode().addNode("one").addMixin("mix:lockable");
session.save();
session.getWorkspace().getLockManager().lock("/one", true, true, 5000, session.getUserID());
session.save();// usually it explodes here
session.getNode("/one").addNode("two").addMixin("mix:lockable");
session.save();
session.getWorkspace().getLockManager().unlock("/one");
Please note this would be executed on two different instances (clustered) at the same time.
As you can see in my code above it explodes after i attempt to save the lock which was just added to the node, however this is recommendation that is stated in the link i shared earlier.
I understand why it explodes, this is because two instances were trying to add lock on the same node. When a lock is added to a node it modifies the node by adding two properties (jcr:lockOwner and jcr:lockIsDeep), so if instance 1 added lock then instance 2 added lock, and instance1 attempted to save then you get stale item, because instance 2 modified the node by adding a lock to it... so how would i prevent this from happening?
Many thanks for your support!

I somehow found this topic and have checked your code, what do you wrong here is that you are obtaining a session based lock therefore your clustered repositories has not idea of the lock since its not applied to clustered node.
What you should have done instead is to do the following:
session.getWorkspace().getLockManager().lock("/one", true, false, 5000, session.getUserID());
More information can be found at:
https://wiki.apache.org/jackrabbit/Clustering#Concurrent_Write_Behavior
Especially under limitations column.

Maybe you could try with node.refresh(false) to force the node to refresh his cache and get new modifications

Related

Boostrap many new cassandras to cluster with no errors

I have cluster about 100 nodes and it grows. I need to add 10-50 on request. As I know by default cassandra has cassandra.consistent.rangemovement=true this means multiple nodes can't to bootstrap in a moment.
Anyway when I add many nodes using Terraform and some kind of default configuration (using Puppet) at least 2-3 becomes UJ state and eventually only one bootstrap successfully. Earlier I used random time delay before start cassandra.service, but it doesn't work adding 10+ nodes.
I'm trying to figure out how to implement kind of "lock" for bootstrap.
I have Consul and can get kind of lock for bootstrap in KV. For instance get lock using ExecPreStart systemd feature but I can't get how to release it after bootstrap.
I'm looking for any solutions for that.
I've done something similar using Rundeck before. Basically, we had Rundeck kick off a bash script, taking parameters about the deployment of our nodes as well as how many.
What we did, was parse the output of nodetool status. We'd count the number of nodes as well as the number of UN indicators. If those two numbers didn't match, we'd do a sleep 30s and try again.
Once those numbers matched, we knew that it was safe to add another node. The total operation could take a while to add all nodes, but it worked.

Hazelcast load data from RDBMS in client-server topology

I am using client-server topology for hazelcast cache. I have multiple maps which I load eagerly using MapLoaders. When there is a cache miss- Maploader's load(key) method is called. The MapLoader.load(key) method seems to be executed by the partition thread, which means that all other operations on the partition are blocked until loading is done. A very common use case for the MapLoader is to load data from a DB, which arguably can take some time. So what is the best possible approach to take so that other operations on partition are not blocked when laod is taking place? Is there any other way to load missing data at runtime?(Hazelcast version : Hazelcast 4.0.3)
There's a good answer to this question that gives a few options.
MapLoader.load(key) only loads a single entry, but if the remote source is really slow or there are lots of cache misses it's going to mount up.
Another alternative to #mike-yawn 's answer would be to have a Runnable that fetches needed items from the database and writes them directly into the map. You can still have the MapLoader.load(key) as well, but the chances of cache miss are reduced if your fetcher code is good at predicting what entries will be needed.
If you don't cache 100% of the records, then a cache miss is inevitable. If it's punitively slow you could always return a Entry.Value that contains some sort of flag that it's a placeholder and launch a thread to do the actual load. Then your code has to deal with that placeholder and try again later -- noting that when it tries later the eventual result of the database query could be no record found.

Node.js - Scaling with Redis atomic updates

I have a Node.js app that preforms the following:
get data from Redis
preform calculation on data
write new result back to Redis
This process may take place several times per second. The issue I now face is that I wish to run multiple instances of this process, and I am obviously seeing out of date date being updated due to each node updating after another has got the last value.
How would I make the above process atomic?
I cannot add the operation to a transaction within Redis as I need to get the data (which would force a commit) before I can process and update.
Can anyone advise?
Apologies for the lack of clarity with the question.
After further reading, indeed I can use transactions however the area I was struggling to understand was that I need separate out the read from the update, and just wrap the update in the transaction along with using WATCH on the read. This causes the update transaction to fail if another update has taken place.
So the workflow is:
WATCH key
GET key
MULTI
SET key
EXEC
Hopefully this is useful for anyone else looking to an atomic get and update.
Redis supports atomic transactions http://redis.io/topics/transactions

Global Redis Twemproxy Architecture

I'm launching a global service on a Node/Mongo/Redis stack. I've got an Architecture question about my Redis/Twemproxy config. Here's the crux of the issue: all of my data is 'global' - that is to say, users from anywhere around the world need to access the same data. Unfortunately, there's it's a ~300ms hop across an ocean - so, to avoid slow reads, I need to host a copy of all my data on a server that's 'local' to the user.
This is pretty easy to accomplish with MongoDB. You simply create a replica set, with members all over the globe, and you set readPreference to 'nearest' (least lag). Done.
However, with Redis/Twemproxy, it's not that easy...
My current solution is to take a serious hit on write performance by writing to every global server (within the req/res cycle). This does lead to faster reads since I can let every user read from a local set of the data. If you do it the other way around, write 'local', read 'global' -- you save a bunch of space (you only have one copy of the data), but reads take a huge performance hit. If I had to choose, I need faster reads.
I've tried creating a 'master' cluster (AMER) and then slaving other 'global' clusters (ASIA, EUROPE) to that, but when I tried to read from the 'global' clusters, it returned nothing. This works with a single Redis instance, so, I'm assuming this has to do with the addition of Twemproxy, and key mapping.
Does anyone have any suggestions or ideas? What's the optimal way to configure a global Redis/Twemprox architecture?

Multiple node instances with a single database

I'm currently writing a Node app and I'm thinking ahead in scaling. As I understand, horizontal scaling is one of the easier ways to scale up an application to handle more concurrent requests. My working copy currently uses MongoDb on the backend.
My question is thus this: I have a data structure that resembles a linked list that requires the order to be strictly maintained. My (imaginary) concern is that when there is a race condition to the database via multiple node instances, it is possible that the resolution of the linked list will be incorrect.
To give an example: Imagine the server having this list a->b. Instance 1 comes in with object c and instance 2 comes in with object d. It is possible that there is a race condition in which both instances read a->b and decides to append their own objects to the list. Instance 1 will then imagine it's insertion to be a->b->c while instance 2 think it's a->b->d when the database actually holds a->b->c->d.
In general, this sounds like a job for optimistic locking, however, as I understand, neither MongoDB or Redis (the other database that I am considering) does transactions in the SQL manner.
I therefore imagine the solution to be one of the below:
Implement my own transaction in MongoDB using flags. The client does a findAndModify on the lock variable and if successful, performs the operations. If unsuccessful, the client retries after a certain timeout.
Use Redis transactions and pubsub to achieve the same effect. I'm not exactly sure how to do this yet, but it sounds like it might be plausible.
Implement some sort of smart load balancing. If multiple clients is operating on the same item, route them to the same instance. Since JS is single threaded, the problem would be solved. Unfortunately, I didn't find a straightforward solution to that.
I sure there exists a better, more elegant way to achieve the above, and I would love to hear any solutions or suggestions. Thank you!
If I understood correctly, and the list is being stored as one single document, you might be looking at row versioning. So add a property to the document that will handle the version, when you update, you increase (or change) the version and you make that a conditional update:
//update(condition, value)
update({version: whateverYouReceivedWhenYouDidFind}, newValue)
Hope it helps.
Gus
You want the findAndModify command on mongodb that will guarantee an atomic modification while returning the newly modified doc. As the changes are serial and atomic instance 1 will have a->b->c and instance 2 will have a->b->c->d
Cheers
If all you are doing is adding new elements to the list, you could use a Redis list and include the time in every value you add. The list may be unsorted on redis but should be quickly sortable when retrieved.

Resources