I know that in PHP objects are created for each request and destroyed when the processing is finished.
And in Java, depending on configuration, objects can remain in memory and be either associated with a single user (through server sessions) or shared between multiple users.
Is there a general rule for this in Node.js?
I see many projects instantiating all app objects in the entry script, in which case they will be shared between requests.
Others will keep object creation inside functions, so AFAIK objects are destroyed after processing each request.
What are the downsides of each approach? Obviously, things like memory usage and information sharing should be considered, but are there any other things specific to Node.js that we should pay attention to?
Javascript has no such thing as objects that are tied to a given request. The language is garbage collected and all objects are garbage collected when there are no more references to them and no code can reach them. This has absolutely nothing to do with request handlers.
so AFAIK objects are destroyed after processing each request.
No. The lifetime of objects in Javascript has absolutely nothing to do with requests.
Instead, think of function scopes. If you create an object in a request handler and use it in that request handler and don't store it somewhere that creates a long lasting reference to the object, then just like ANY other function in Javascript, when that request handler function finishes and returns and has no more asynchronous operations still in-flight, then any objects created within that function that are not stored in some other scope will be cleaned up by the garbage collector.
It is the exact same rules for a request handler as it is for any other function call in the language.
So, please forget anything you know about PHP as its request-specific architecture will only mess you up in Javascript/node.js. There is no such thing in node.js.
Instead, think of a node.js server as one, long running process with a garbage collector. All objects that are created will be garbage collected when they are no longer reachable by live code (e.g. there are no live references to them that any code can get to). This is the same whether the object is created at startup of the server, in a request handler on the server, in a recurring timer on the server or any other event on the server. The language has one garbage collector that works the same everywhere and has no special behavior for server requests.
The usual way to do things in a node.js server is to create objects that are local variables in the request handler function (or in any functions that it calls) or maybe even occasionally assigned as properties of the request or response objects (middleware will often do this). Since everything is scoped to a function call in the request chain when that function call is done, the things you created as local variables in those functions will become eligible for garbage collection.
In general, you do not use many higher scoped variables outside the request handler except for purposeful long term storage (session state, database connections, or other server-wide state).
Is there a general rule for this in Node.js?
Not really in the sense you were asking since Javascript is really just about the scope that a variable is declared in and then garbage collection from there, but I will offer some guidelines down below.
If data is stored in a scope higher than the request handler (module scope or global scope), then it probably lasts for a long time because there is a lasting reference that future request handlers can access so it will not be garbage collected.
If objects are created and used within a request handler and not attached to any higher scope, then they will be garbage collected by the language automatically when the function is done executing.
Session frameworks typically create a specific mechanism for storing server-side state that persists on the server on a per-user basis. A popular node.js session manager, express-session does exactly this. There, you follow the rules for the session framework for how to store or remove data from each user's session. This isn't really a language feature as it is specific library built in the language. Even the session manage relies on the garbage collector. Data persists in the session manager when desired because there are lasting references to the data to make it available to future requests.
node.js has no such thing as "per-user" or "per-request" data built into the language or the environment. A session manager builds "per-user" data artificially by making persistent data that can be requested or accessed on a per-user basis.
Some general rules for node.js:
Define in your head and your design which data is local to a specific request handler, which data is meant for long term store, which data is meant for user-specific sessions. You should be very clear about that.
Don't ever put request-specific variables into any higher scope that any other request handler can access unless these are purposeful shared variables that are meant to be accessed by multiple requests. Accidentally sharing variables between requests creates concurrency issues and race conditions and very hard-to-track-down server bugs as one request may write to that variable in doing it's work and then another request may come along and also write to it, trouncing what the first request was working on. Keep these kind of request-specific variables local to the request handler (local to the function for the request handler) so that can never happen.
If you are storing data for long term use (beyond the lifetime of a specific request) which would generally mean storing it in a module scoped variable or in a global scoped variable (should generally not use global scoped variables), then be very, very careful about how the data is stored and accessed to avoid race conditions or inconsistent state that might mess up some other request handler reading/writing to that data. node.js makes this simpler because it runs your Javascript as single threaded, but once your request handler makes some sort of asynchronous function call (like a database call), then other request handlers get to run so you have to be careful about modifications to shared state across asynchronous boundaries.
I see many projects instantiating all app objects in the entry script, in which case they will be shared between requests.
In the example of an web server using the Express framework, there is one app object that all requests have access to. The only request-specific variables are the request and response objects that are created by the web server framework and passed into your request handler. Those will be unique to each new request. All other server state is accessible by all requests.
What are the downsides of each approach?
If you're asking for a comparison of the Apache/PHP web server model to the node.js/Express web server model, that's a really giant question. They are very different architectures and the topic has been widely discussed and debated before. I'd suggest you do some searching on that topic, read what has been previously written and then ask a more specific question about things you don't quite understand or need some clarification on.
Related
I just read this article from Node.js: Don't Block the Event Loop
The Ask
I'm hoping that someone can read over the use case I describe below and tell me whether or not I'm understanding how the event loop is blocked, and whether or not I'm doing it. Also, any tips on how I can find this information out for myself would be useful.
My use case
I think I have a use case in my application that could potentially cause problems. I have a functionality which enables a group to add members to their roster. Each member that doesn't represent an existing system user (the common case) gets an account created, including a dummy password.
The password is hashed with argon2 (using the default hash type), which means that even before I get to the need to wait on a DB promise to resolve (with a Prisma transaction) that I have to wait for each member's password to be generated.
I'm using Prisma for the ORM and Sendgrid for the email service and no other external packages.
A take-away that I get from the article is that this is blocking the event loop. Since there could potentially be hundreds of records generated (such as importing contacts from a CSV or cloud contact service), this seems significant.
To sum up what the route in question does, including some details omitted before:
Remove duplicates (requires one DB request & then some synchronous checking)
Check remaining for existing user
For non-existing users:
Synchronously create many records & push each to a separate array. One of these records requires async password generation for each non-existing user
Once the arrays are populated, send a DB transaction with all records
Once the transaction is cleared, create invitation records for each member
Once the invitation records are created, send emails in a MailData[] through SendGrid.
Clearly, there are quite a few tasks that must be done sequentially. If it matters, the asynchronous functions are also nested: createUsers calls createInvites calls sendEmails. In fact, from the controller, there is: updateRoster calls createUsers calls createInvites calls sendEmails.
There are architectural patterns that are aimed at avoiding issues brought by potentially long-running operations. Note here that while your example is specific, any long running process would possibly be harmful here.
The first obvious pattern is the cluster. If your app is handled by multiple concurrent independent event-loops of a cluster, blocking one, ten or even thousand of loops could be insignificant if your app is scaled to handle this.
Imagine an example scenario where you have 10 concurrent loops, one is blocked for a longer time but 9 remaining are still serving short requests. Chances are, users would not even notice the temporary bottleneck caused by the one long running request.
Another more general pattern is a separated long-running process service or the Command-Query Responsibility Segregation (I'm bringing the CQRS into attention here as the pattern description could introduce more interesting ideas you could be not familiar with).
In this approach, some long-running operations are not handled directly by backend servers. Instead, backend servers use a Message Queue to send requests to yet another service layer of your app, the layer that is solely dedicated to running specific long-running requests. The Message Queue is configured so that it has specific throughput so that if there are multiple long-running requests in short time, they are queued, so that possibly some of them are delayed but your resources are always under control. The backend that sends requests to the Message Queue doesn't wait synchronously, instead you need another form of return communication.
This auxiliary process service can be maintained and scaled independently. The important part here is that the service is never accessed directly from the frontend, it's always behind a message queue with controlled throughput.
Note that while the second approach is often implemented in real-life systems and it solves most issues, it can still be incapable of handling some edge cases, e.g. when long-running requests come faster than they are handled and the queue grows infintely.
Such cases require careful maintenance and you either scale your app to handle the traffic or you introduce other rules that prevent users from running long processes too often.
I have a backend NodeJS API and I am trying to setting trace id. What I have been thinking is that I would generate a UUID through a Singleton module and then use it across for logging. But since NodeJS is single-threaded, would that mean that UUID will always remain the same for all clients?
For eg: If the API gets a request from https://www.example.com/client-1 and https://www.example-two.com/client-2, would it spin a new process and thereby generate separate UUIDs? or it's just one process that would be running with a single thread? If it's just one process with one thread then I think both the client apps will get the same UUID assigned.
Is this understanding correct?
Nodejs uses only one single thread to run all your Javascript (unless you specifically create a WorkerThread or child_process). Nodejs uses some threads internally for use in some of the library functions, but those aren't used for running your Javascript and are transparent to you.
So, unlike some other environments, each new request runs in the same thread. There is no new process or thread created for an incoming request.
If you use some singleton, it will have the same value for every request.
But since NodeJS is single threaded, would that mean that UUID will always remains the same for all clients?
Yes, the UUID would be the same for all requests.
For eg: If the API gets a request from https://www.example.com/client-1 and https://www.example-two.com/client-2, would it spin a new process and thereby generate separate UUIDs?
No, it would not spin a new process and would not generate a new UUID.
or it's just one process that would be running with a single thread? If it's just one process with one thread then I think both the client apps will get the same UUID assigned.
One process. One thread. Same UUID from a singleton.
If you're trying to put some request-specific UUID in every log statement, then there aren't many options. The usual option is to coin a new UUID for each new request in some middleware and attach it to the req object as a property such as req.uuid and then pass the req object or the uuid itself as a function argument to all code that might want to have access to it.
There is also a technology that has been called "async local storage" that could serve you here. Here's the doc. It can be used kind of like "thread local storage" works in other environments that do use a thread for each new request. It provides some local storage that is tied to an execution context which each incoming request that is still being processed will have, even as it goes through various asynchronous operations and even when it returns control temporarily back to the event loop.
As best I know, the async local storage interface has undergone several different implementations and is still considered experimental.
See this diagram to understand ,how node js server handles requests as compared to other language servers
So in your case there won't be a separate thread
And unless you are creating a separate process by using pm2 to run your app or explicitly creating the process using internal modules ,it won't be a separate process
Node.js is a single thread run-time environment provided that internally it does assign threads for requests that block the event loop.
What I have been thinking is that I would generate a UUID through a
Singleton module
Yes, it will generate UUID only once and every time you have new request it will reuse the same UUID, this is the main aim of using the Singleton design pattern.
would it spin a new process and thereby generate separate UUIDs? or
it's just one process that would be running with a single thread?
The process is the instance of any computer program that can have one or multiple threads in this case it is Node.js(the process), the event loop and execution context or stack are two threads part of this process. Every time the request is received, it will go to the event loop and then be passed to the stack for its execution.
You can create a separate process in Node.js using child modules.
Is this understanding correct?
Yes, your understanding is correct about the UUID Singleton pattern. I would recommend you to see how Node.js processes the request. This video helps you understand how the event loop works.
I'd like to store some info. in a node.js array variable (to be a local cache) that my middleware would check before making a database query.
I know that I can do this w/redis and it's generally the preferred method b/c redis offers snapshots for persistence and is quite performant, but I can't imagine anything being more performant than a variable stored in-memory.
Every time someone brings up this topic, however, folks say "memory leaks" make this a bad idea. But why? Why is node.js bad at managing server-side vars?
Is there a preferred method (outside of an external k/v db store) of managing a server-side array/cache through node.js?
The problem with using a node variable as storage is that by using it you have made your application unable to scale. Consider a large application which serves thousands of requests per second, and cannot be run on a single machine. If you spin up a second node process, it has different values for your node storage variable.
Let's say a user making an API call to your application hits machine 1, and stores a session variable. They make a second API call and this time are routed by your load balancer to machine 2. Their session variable is not found and you throw an error.
If you are writing a small application and have no expectations of scaling up in the near term, by all means use a node variable - I've done this myself before for auth tokens on small websites. You can always switch to redis later if you need to. Of course, you need to be aware that if your node process restarts, the contents of your variable will be lost.
I know that there is an analogous question but that is about ASP.NET and not about ASP.NET Core. The answers are 7-9 years old, and mixing there talking about ASP.NET and ASP.NET Core may not be a good idea.
What I mean thread safe in this case:
Is it safe to use the read write methods (like Set(...)) of the Session (accessed via HttpContext, which accessed via an injected IHttpContextAccessor) in multiple requests belonging to the same session?
The obvious answer would be yes, because if it would not be safe, then all developers should make their session accessing code thread safe...
I took a look of the DistributedSession source code which seems to be the default (my session in the debugger which accessed as described above is an instance of DistributedSession) and no traces of any serialization or other techniques, like locks... even the private _store member is a pure Dictionary...
How could be this thread safe for concurrent modification usage? What am I missing?
DistributedSession is created by DistributedSessionStore which is registered as a transient dependency. That means that the DistributedSessionStore itself is implicitly safe because it isn’t actually shared between requests.
The session uses a dictionary as the underlying data source which is also local to the DistributedSession object. When the session is initialized, the session initializes the _store dictionary lazily when the session is being accessed, by deserializing the stored data from the cache. That looks like this:
var data = _cache.Get(_sessionKey);
if (data != null)
{
Deserialize(new MemoryStream(data));
}
So the access to _cache here is a single operation. The same applies when writing to the cache.
As for IDistributedCache implementations, you can usually expect them to be thread-safe to allow parallel access. The MemoryCache for example uses a concurrent collection as the backing store.
What all this means for concurrent requests is basically that you should not expect one request to directly impact the session of the other request. Sessions are usually only deserialized once so updates that happen during the request (by other requests) will not appear.
In our ColdFusion application we have stateless model objects.
All the data I want I can get with one method call (it calls other internally without saving the state).
Methods usually ask the database for the data. All methods are read only, so I don't have to worry about thread safety (please correct me if I'm wrong).
So there is no need to instantiate objects at all. I could call them statically, but ColdFusion doesn't have static methods - calling the method would mean instantiating the object first.
To improve performance I have created singletons for every Model object.
So far it works great - each object is created once and then accessed as needed.
Now my worry is that all requests for data would go through only 1 model object.
Should I? I mean if on my object I have a method getOfferData() and it's time-consuming.
What if a couple of clients want to access it?
Will second one wait for the first request to finish or is it executed in a separate thread?
It's the same object after all.
Should I implement some kind of object pool for this?
The singleton pattern you are using won't cause the problem you are describing. If getOfferData() is still running when another call to that function gets called on a different request then this will not cause it to queue unless you do one of the following:-
Use cflock to grant an exclusive lock
Get queueing connecting to your database because of locking / transactions
You have too many things running and you use all the available concurrent threads available to ColdFusion
So the way you are going about it is fine.
Hope that helps.