How to Synchronize object between multiple instance of Node Js application

How to Synchronize object between multiple instance of Node Js application - node.js

Is there any to lock any object in Node JS application.
Is there are multiple instance for application is available some function shouldnt run concurrent. If instance A function is completed, it should unlock that object/key or some identifier and B instance of application should check if its unlock it should run some function.
Any Object or Key can be used for identifying the locking and unlocking the function.
How to do that in NodeJS application which have multiple instances.

As mentioned above Redis may be your answer, however, it really depends on the resources available to you. There are some other possibilities less complicated and certainly less powerful which may also do the trick.
node-cache may also do the trick, if you set it up correctly. It is not any where near as powerful as Redis, but on the bright side it does not require as much setup and interaction with your environment.
So there is Redis and node-cache for memory locks. I should mention there are quite a few NPM packages which do the cache. Depends on what you need, and how intricate your cache needs to be.
However, there are less elegant ways to do what you want, though less elegant is not necessarily worse.
You could use a JSON file based system and hold locks on the files for a TTL. lockfile or proper-lockfile will accomplish the task. You can read the information from the files when needed, delete when required, give them a TTL. Basically a cache system to disk.
The memory system is obviously faster. The file system requires just as much planning in your code as the memory system.
There is yet another way. This is possibly the most dangerous one, and you would have to think long and hard on the consequences in terms of security and need.
Node.js has its own process.env. As most know this holds the system global variables available to all by simply writing process.env.foo where foo would have been declared as a global system variable. A package such as .dotenv allows you to add to your system variables by way of a .env text file. Thus if you put in that file sam=mongoDB, then in your code where you write process.env.sam it will be interpreted as mongoDB. Tons of system wide variables can be set up here.
So what good does that do, you may ask? Well these are system wide variables, and they can be changed in mid-flight. So if you need to lock the variables and then change them it is a simple manner to do it with. Beware though of the gotcha here. Once the system goes down, or all processes stop, and is started again, your environment variables will return to the default in the .env file.
Additionally, unless you are running a system which is somewhat safe on AWS or Azure etc. I would not feel secure in having my .env file open to the world. There is a way around this one too. You can use a hash to encrypt all variables and put the hash in the file. When you call it, decrypt before actually requesting use of the full variable.
There are probably many wore ways to lock and unlock, not the least of which is to use the native Node.js structure. Combine File System events together with Crypto. But this demands a much deeper level of understanding of the actual Node.js library and structures.
Hope some of this helped.

I strongly recommend Redis in your case.
There are several ways to create a application/process shared object, using locks is one of them, as you mentioned.
But they're just complicated. Unless you really need to do that yourself, Redis will be good enough. Atomic ops cross multiple process, transaction and so on.

Old thread but I didn't want to use redis so I made my own open source solution which utilizes websocket connections:
https://github.com/OneAndonlyFinbar/sync-cache

Related

How can tokio tasks access shared data in Rust?

I am creating a webserver using tokio. Whenever a client connection comes in, a green thread is created via tokio::spawn.
The main function of my web server is proxy. Target server information for proxy is stored as a global variable, and for proxy, all tasks must access the data. Since there are multiple target servers, they must be selected by round robin. So the global variable (struct) must have information of the recently selected server(by index).
Concurrency problems occur because shared information can be read/written by multiple tasks at the same time.
According to the docs, there seems to be a way to use Mutex and Arc or a way to use channel to solve this.
I'm curious which one you usually prefer, or if there is another way to solve the problem.

If it's shared data, you generally do want Arc, or you can leak a box to get a 'static reference (assuming that the data is going to exist until the program exits), or you can use a global variable (though global variables tends to impede testability and should generally be considered an anti-pattern).
As far as what goes in the Arc/Box/global, that depends on what your data's access pattern will be. If you will often read but rarely write, then Tokio's RwLock is probably what you want; if you're going to be updating the data every time you read it, then use Tokio's Mutex instead.
Channels make the most sense when you have separate parts of the program with separate responsibilities. It doesn't work as well to update multiple workers with the same changes to data, because then you get into message ordering problems that can result in each worker's state disagreeing about something. (You get many of the problems of a distributed system without any of the benefits.)
Channels can work if there is a single entity responsible for maintaining the data, but at that point there isn't much benefit over using some kind of mutual exclusion mechanism; it winds up being the same thing with extra steps.

Libgit2 global state and thread safety

I'm trying to revise our codebase which seems to be using libgit2 wrong (at least TSAN is going crazy over how we use it).
I understand that most operations are object based (aka, operations on top of repo are localized to that repo), but I'm unclear when it comes to the global state and which operations need to be synchronized globally.
Is there a list of functions that require global synchronization?
Also when it comes to git_repository_open(), do I need to ensure that one path is only ever held by a single thread? I.e. do I need to prevent multiple threads accessing the same repo?

Can LMDB be made concurrent for writes as well under specific circumstances?

MDB_NOLOCK as described at mdb_env_open() apidoc:
MDB_NOLOCK Don't do any locking. If concurrent access is anticipated, the caller must manage all concurrency itself. For proper operation the caller must enforce single-writer semantics, and must ensure that no readers are using old transactions while a writer is active. The simplest approach is to use an exclusive lock so that no readers may be active at all when a writer begins.
What if an RW txnA intends to modify a set of keys which has no key in common with another set of keys which another RW txnB intends to modify? Couldn't they be sent concurrently?
Isn't the single-writer semantic wasteful for such situations? As one txn is waiting for the previous one to finish, even though they intend to operate in entirely separate regions in an lmdb env.
In an environment opened with MDB_NOLOCK, what if the client app calculates in the domainland, that two write transactions are intending to RW to mutually exclusive set of keys anywhere in an lmdb environment, and sends only such transactions concurrently anyway? What could go wrong?
Could such concurrent writes scale linearly with cores? Like RO txns do? Given the app is able to manage these concurrent writes, in the manner described in 3.

No, since modifying key/value pairs requires also modifying the b-tree structure, and the two transactions would conflict with each other.
You should avoid doing long-running computations in the middle of a write transaction. Try to do as much as possible beforehand. If you can't do this, then LMDB might not be a great fit for you application. Usually you can though.
Very bad stuff. Application crashes and DB corruption.
Writes are generally IO bound, and will not scale with many cores anyway. There are some very hacky things you can do with LMDB's writemap and/or pwrite(2), but you are very much on your own here.

I'm going to assume that writing to the value part of a pre-existing key does not modify the b-tree because you are not modifying the keys. So what Doug Hoyte's comment stands, except possibly point 3:
Key phrase here is "are intending to RW to mutually exclusive set of keys". So assuming that the keys are pre-allocated, and already in the DB, changing the values should not matter. I don't even know if LMDB can store variable sized values, in which case it could matter if the values are different sizes.
So, it should be possible to write with MDB_NOLOCK concurrently as long as you can guarantee to never modify, add, or delete any keys during the concurrent writes.

Empirically I can state that working with LMDB opened with MDB_NO_LOCK (or lock=False in Python) and simply modifying values of pre-existing keys, or even only adding new key/values - seems to work well. Even if LMDB itself is mounted across an NFS like medium and queried from different machines.
#Doug Hoyte - I would appreciate more context as to what specific circumstances might lead to a crash or corruption. In my case there are many small short-lived type of writes to the same DB.

Nodejs - How to maintain a global datastructure

So I have a backend implementation in node.js which mainly contains a global array of JSON objects. The JSON objects are populated by user requests (POSTS). So the size of the global array increases proportionally with the number of users. The JSON objects inside the array are not identical. This is a really bad architecture to begin with. But I just went with what I knew and decided to learn on the fly.
I'm running this on a AWS micro instance with 6GB RAM.
How to purge this global array before it explodes?
Options that I have thought of:
At a periodic interval write the global array to a file and purge. Disadvantage here is that if there are any clients in the middle of a transaction, that transaction state is lost.
Restart the server every day and write the global array into a file at that time. Same disadvantage as above.
Follow 1 or 2, and for every incoming request - if the global array is empty look for the corresponding JSON object in the file. This seems absolutely absurd and stupid.
Somehow I can't think of any other solution without having to completely rewrite the nodejs application. Can you guys think of any .. ? Will greatly appreciate any discussion on this.

I see that you are using memory as a storage. If that is the case and your code is synchronous (you don't seem to use database, so it might), then actually solution 1. is correct. This is because JavaScript is single-threaded, which means that when one code is running the other cannot run. There is no concurrency in JavaScript. This is only a illusion, because Node.js is sooooo fast.
So your cleaning code won't fire until the transaction is over. This is of course assuming that your code is synchronous (and from what I see it might be).
But still there are like 150 reasons for not doing that. The most important is that you are reinventing the wheel! Let the database do the hard work for you. Using proper database will save you all the trouble in the future. There are many possibilites: MySQL, PostgreSQL, MongoDB (my favourite), CouchDB and many many other. It shouldn't matter at this point which one. Just pick one.

I would suggest that you start saving your JSON to a non-relational DB like http://www.couchbase.com/.
Couchbase is extremely easy to setup and use even in a cluster. It uses a simple key-value design so saving data is as simple as:
couchbaseClient.set("someKey", "yourJSON")
then to retrieve your data:
data = couchbaseClient.set("someKey")
The system is also extremely fast and is used by OMGPOP for Draw Something. http://blog.couchbase.com/preparing-massive-growth-revisited

Is it worth using memcached on node.js

Is there any particular reason to use memcached for fast access to cached data instead of just creating a global CACHE variable in the node program and using that?
Assume that the application will we running in one instance and not distributed across multiple machines.
The global variable option seems like it would be faster and more efficient but I wasn't sure if there was a good reason to not do this.

It depends on the size and number of items. If you're working with a few items of modest size and they don't need to be accessible to other node instances then using an object has a key/value store is fine. The one trick is that when you go to delete/remove items from the cache/object make sure you don't keep any other references to it, otherwise you will have a leak.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string