How does nodejs-redis(&connect-redis) deal with sync and async? - node.js

I used connect-redis for my session store, and when I use req.session, it seems all the operations on it are synchronized, it's like operating on ordinary Javascript variables, the code obey the order. but I check the source code, which uses the asynchronized way, so I wonder why the req.session acts like that.
Another question is that if I have multiple redis queries,
client.sadd('test', 1);
client.del('test');
client.sadd('test', 2);
client.sadd('test', 3);
no matter where I put the del operation, the results always the same. I thought these queries might be run in any order right? since they all asynchronized called, so the results I expected should be different every time.
Thanks for you help

The fact that roundtrips to the Redis server are managed asynchronously does not mean the queries will be sent in random order.
Redis (and therefore most Redis client libraries) supports pipelining, generally used to optimize the number of roundtrips. The idea is to send multiple queries, and then wait for the replies. The order is critical, because it is used by the client to match queries and replies.
Node.js is very well suited to support this kind of mechanisms. Matt Ranney's node_redis client supports pipelining in a transparent way. Provided the same client object is used, all the queries will be serialized and executed in order.
In your example, it is normal the queries are always executed in the same order. You can check this point by using the monitor command to display the flow of queries sent to Redis.
Now, it is important the last query of the pipeline is associated with a callback, otherwise your program will never know when the last query is complete.

Related

QLDB Transaction Isolation

I was looking at the sample code here: https://docs.aws.amazon.com/qldb/latest/developerguide/getting-started.python.step-5.html, and noticed that the three functions get_document_id_by_gov_id, is_secondary_owner_for_vehicle and add_secondary_owner_for_vin are executed separately in three driver.execute_lambda calls. So if there are two concurrent requests that are trying to add a secondary owner, would this trigger a serialization conflict for one of the requests?
The reason I'm asking is that I initially thought we would have to run all three functions within the same execute_lambda call in order for the serialization conflict to happen since each execute_lambda call uses one session, which in turn uses one transaction. If they are run in three execute_lambda calls, then they would be spread out into three transactions, and QLDB wouldn't be able to detect a conflict. But it seems like my assumption is not true, and the only benefit of batching up the function calls would just be better performance?
Got an answer from a QLDB specialist so going to answer my own question: the operations should have been wrapped in a single transaction so my original assumption was actually true. They are going to update the code sample to reflect this.

Does node-cache uses locks

I'm trying to understand if the node-cache package uses locks for the cache object and can't find anything.
I tried to look at the source code and it doesn't look like it, but this answer suggests otherwise with the quote:
So there is Redis and node-cache for memory locks.
This cache is used in a CRUD server and I want to make sure that GET/UPDATE requests will not create a race condition on the data.
I don't see any evidence of locking in the code.
If two requests for the same key which is not in the cache are made one after the other, then it will launch two separate fetch() operations and whichever request comes back last is the one that will remain in the cache. This is probably not normally a problem, but an improved implementation could make only one request for that same key and have the second request just wait for the first request to provide the value that was already in flight.
Since the cache itself is all in-memory, all access to the cache is synchronous and thus regulated by Javascript's single threaded nature. So, the only place concurrency issues could affect things in the cache code itself are when they launch an asynchronous fetch() operation.
There are, of course, race conditions waiting to happen in how one uses the code that accesses the data just like there are with a database interface so the calling code has to be smart about how it uses the interface to avoid creating race conditions because of how it calls things.
Unfortunately no, you can write a unit test to confirm it.
I have written a library to fix that and also added read through method to easy the code usage:
https://github.com/KhanhPham2411/node-cache-async-lock

node_redis: is SMEMBERS blocking?

Under Redis's SCAN documentation, it mentions this about SMEMBERS:
However while blocking commands like SMEMBERS are able to provide all the elements that are part of a Set in a given moment, The SCAN family of commands only offer limited guarantees about the returned elements since the collection that we incrementally iterate can change during the iteration process.
Surprisingly, I can't find any additional information about how SMEMBERS is blocking and when to avoid using it. If SMEMBERS is a blocking call, is it safe to use in node_redis or will blocking Redis end up blocking Node's thread as well?
Slightly related, if SSCAN is the best practice instead of calling SMEMBERS, is there an equivalent SCAN call for SINTER?
Thanks in advance
Almost all of Redis' commands are blocking, SCAN included (it guarantees short execution time however). The only commands that are non-blocking are those performed by other threads (currently persistence-related only, e.g. BGSAVE).
Specifically, SMEMBERS is blocking. This can be ok if your Set isn't too large (a few K's perhaps(. If the Set becomes too large, Redis will block while preparing the reply and will consume a RAM to buffer it before sending it back. In such cases, iterating through the Set with SSCAN is advisable to allow other requests interleave between calls to it.

Nodejs - How to maintain a global datastructure

So I have a backend implementation in node.js which mainly contains a global array of JSON objects. The JSON objects are populated by user requests (POSTS). So the size of the global array increases proportionally with the number of users. The JSON objects inside the array are not identical. This is a really bad architecture to begin with. But I just went with what I knew and decided to learn on the fly.
I'm running this on a AWS micro instance with 6GB RAM.
How to purge this global array before it explodes?
Options that I have thought of:
At a periodic interval write the global array to a file and purge. Disadvantage here is that if there are any clients in the middle of a transaction, that transaction state is lost.
Restart the server every day and write the global array into a file at that time. Same disadvantage as above.
Follow 1 or 2, and for every incoming request - if the global array is empty look for the corresponding JSON object in the file. This seems absolutely absurd and stupid.
Somehow I can't think of any other solution without having to completely rewrite the nodejs application. Can you guys think of any .. ? Will greatly appreciate any discussion on this.
I see that you are using memory as a storage. If that is the case and your code is synchronous (you don't seem to use database, so it might), then actually solution 1. is correct. This is because JavaScript is single-threaded, which means that when one code is running the other cannot run. There is no concurrency in JavaScript. This is only a illusion, because Node.js is sooooo fast.
So your cleaning code won't fire until the transaction is over. This is of course assuming that your code is synchronous (and from what I see it might be).
But still there are like 150 reasons for not doing that. The most important is that you are reinventing the wheel! Let the database do the hard work for you. Using proper database will save you all the trouble in the future. There are many possibilites: MySQL, PostgreSQL, MongoDB (my favourite), CouchDB and many many other. It shouldn't matter at this point which one. Just pick one.
I would suggest that you start saving your JSON to a non-relational DB like http://www.couchbase.com/.
Couchbase is extremely easy to setup and use even in a cluster. It uses a simple key-value design so saving data is as simple as:
couchbaseClient.set("someKey", "yourJSON")
then to retrieve your data:
data = couchbaseClient.set("someKey")
The system is also extremely fast and is used by OMGPOP for Draw Something. http://blog.couchbase.com/preparing-massive-growth-revisited

Is there a blocking redis library for node.js?

Redis is very fast. For most part on my machine it is as fast as say native Javascript statements or function calls in node.js. It is easy/painless to write regular Javascript code in node.js because no callbacks are needed. I don't see why it should not be that easy to get/set key/value data in Redis using node.js.
Assuming node.js and Redis are on the same machine, are there any npm libraries out there that allow interacting with Redis on node.js using blocking calls? I know this has to be a C/C++ library interfacing with V8.
I suppose you want to ensure all your redis insert operations have been performed. To achieve that, you can use the MULTI commands to insert keys or perform other operations.
The https://github.com/mranney/node_redis module queues up the commands pushed in multi object, and executes them accordingly.
That way you only require one callback, at the end of exec call.
This seems like a common bear-trap for developers who are trying to get used to Node's evented programming model.
What happens is this: you run into a situation where the async/callback pattern isn't a good fit, you figure what you need is some way of doing blocking code, you ask Google/StackExchange about blocking in Node, and all you get is admonishment on how bad blocking is.
They're right - blocking, ("wait for the result of this before doing anything else"), isn't something you should try to do in Node. But what I think is more helpful is to realize that 99.9% of the time, you're not really looking for a way to do blocking, you're just looking for a way to make your app, "wait for the result of this before going on to do that," which is not exactly the same thing.
Try looking into the idea of "flow control" in Node rather than "blocking" for some design patterns that could be a clearer fit for what you're trying to do. Here's a list of libraries to check out:
https://github.com/joyent/node/wiki/modules#wiki-async-flow
I'm new to Node too, but I'm really digging Async: https://github.com/caolan/async
Blocking code creates a MASSIVE bottleneck.
If you use blocking code your server will become INCREDIBLY slow.
Remember, node is single threaded. So any blocking code, will block node for every connected client.
Your own benchmarking shows it's fast enough for one client. Have you benchmarked it with a 1000 clients? If you try this you will see why blocking code is bad
Whilst Redis is quick it is not instantaneous ... this is why you must use a callback if you want to continue execution ensuring your values are there.
The only way I think you could (and am not suggesting you do) achieve this use a callback with a variable that is the predicate for leaving a timer.

Resources