We have done some experiments with the continuation-local-storage, and it seems to solve a specific problem nicely.
What makes me nervous is that it relies on a polyfill of process.addAsyncListener that seems to have been deprecated ages ago.
This makes me worry a bit about using it for production code, so is it safe to use?
Our particular code base still has to run on node 6 on some servers for some time yet (it runs on node 8 on most servers).
The problem
This module helps us creating an execution context that is kept during async operations.
What we specifically add to this context are bunyan child loggers, initialized with context data, such as request ID/message ID/correlation ID.
This helps us to avoid having to pass tramp data through the layers, yet still being able to track all log messages associated with a specific request.
Related
I am having a small problem with an application ran by pm2 cluster mode. Normally everything is working fine, but due to the logic of my application and recently switching to cluster mode i am now facing an issue, i can't handle properly without refactoring my application from the ground.
My application uses express for http-request handling and uses also global variables to store data, timers, etc. Now after switching to pm2 cluster mode, only one of the instances has a value, but the others don't. Thats resulting in problems, because of inconsistencies over the different instances. The behaviour is clear, but i would have to refactor many things to make the application in whole work properly again.
I already saw things like the INSTANCE_VAR, but could not find out how that could help me.
All i can think of at the moment is, am i able to force pm2 to send a http request to all instances simultanously, or if not can i tell pm2 to handle my request with a specific instance, which i define on the runtime from the outside and without interfering the other instances?
I am actually having something similar were i want a specific api request to make an update available registered only on process. Here i found a way to send a message/event to a specific process, i think it may help
I'm using node-cache to cache data from a CLI application that observes changes in files and caches them to avoid new data processing.
the problem is that I noticed that this cache is destroyed on each command, since each time the tool is called in the terminal a new instance is generated and the old one is destroyed. probably, the data is also destroyed.
I need to keep, for a specific TTL, two things in cache/memory, even if the process ends:
the processed data
the specific instance of fs.watcher, watching and executing caching operations
the question is: how do i do it? I've been searching for days on the internet and trying alternatives and I can't find a solution.
I need to keep ... things in cache/memory, even if the process ends
That's, pretty much by definition, not possible. When a process terminates, all its resources are freed up for use by something else (barring a memory-leak bug in the OS itself).
It sounds like you need to refactor your app into a service that can run in the background and separate front-end that can communicate with it.
I got several applications working with Node on the back-end and React on the front-end, it works great, I do axios get and post requests from React to Express and I get data back and forth, then on production I use pm2 to get everything up and running.
My question is when two users access the same application at the same time, how does Node treat this, as two separated instances or just one?.
I am considering using socket.io to be able to notify the front-end on changes that are happening on Node, and I wonder if those notifications will be emitted from the back-end no matter what another user might be doing or not.
Thanks.
As you have probably heard node.js is addressed as a "single-threaded" runtime. This is only partially true. Even though node runs on a single thread of your processor it runs the majority of its tasks in a thread pool which can process up to 4 tasks at the same time.
If you want to know about this you might want to look into the node event loop which describes the steps node goes through on each "tick".
So as you see node can often not process one but up to 4 actions on each loop cycle. But there is more, to solve the performance issues that might occur on big applications you can run node on a cluster mode. This allows you to extend the thread pool and add multiple node instances and therefore handle high demand efficiently.
One note to your socket.io question. As you see a high demand of tasks gets queued until it is handled in the node event loop, so sometimes you need to wait. Fortunatly we are in a race of big tech to create the fastest JS-runtime so this thing is pretty fast.
I'm currently writing a Node library to execute untrusted code within Docker containers. It basically maintains a pool of containers running, and provides an interface to run code in one of them. Once the execution is complete, the corresponding container is destroyed and replaced by a new one.
The four main classes of the library are:
Sandbox. Exposes a constructor with various options including the pool size, and two public methods: executeCode(code, callback) and cleanup(callback)
Job. A class with two attributes, code and callback (to be called when the execution is complete)
PoolManager, used by the Sandbox class to manage the pool of containers. Provides the public methods initialize(size, callback) and executeJob(job, callback). It has internal methods related to the management of the containers (_startContainer, _stopContainer, _registerContainer, etc.). It uses an instance of the dockerode library, passed in the constructor, to do all the docker related stuff per se.
Container. A class with the attributes tmpDir, dockerodeInstance, IP and a public method executeCode(code, callback) which basically sends a HTTP POST request against ContainerIP:3000/compile along with the code to compile (a minimalist API runs inside each Docker container).
In the end, the final users of the library will only be using the Sandbox class.
Now, my question is: how should I test this?
First, it seems pretty clear to my that I should begin by writing functional tests against my Sandbox class:
it should create X containers, where X is the required pool size
it should correctly execute code (including the security aspects: handling timeouts, fork bombs, etc. which are in the library's requirements)
it should correctly cleanup the resources it uses
But then I'm not sure what else it would make sense to test, how to do it, and if the architecture I'm using is suitable to be correctly tested.
Any idea or suggestion related to this is highly appreciated! :) And feel free to ask for a clarification if anything looks unclear.
Christophe
Try and separate your functional and unit testing as much as you can.
If you make a minor change to your constructor on Sandbox then I think testing will become easier. Sandbox should take a PoolManager directly. Then you can mock the PoolManager and test Sandbox in isolation, which it appears is just the creation of Jobs, calling PoolManager for Containers and cleanup. Ok, now Sandbox is unit tested.
PoolManager may be harder to unit test as the Dockerode client might be hard to mock (API is fairly big). Regardless if you mock it or not you'll want to test:
Growing/shrinking the pool size correctly
Testing sending more requests than available containers in the pool
How stuck containers are handled. Both starting and stopping
Handling of network failures (easier when you mock things)
Retries
Any of failure cases you can think of
The Container can be tested by firing up the API from within the tests (in a container or locally). If it's that minimal recreating it should be straightforward. Once you have that it's really just testing an HTTP client it sounds like.
The source code for the actual API within the container can be tested however you like with standard unit tests. Because you're dealing with untrusted code there are a lot of possibilities:
Doesn't compile
Never completes execution
Never starts
All sorts of bombs
Uses all host's disk space
Is a bot and talks over the network
The code could do basically anything. You'll have to pick the things you care about. Try and restrict everything else.
Functional tests are going to be important too, there are a log of pieces to deal with here and mocking Docker isn't going to be easy.
Code isolation is a difficult problem; I wish Docker was around last time I had to deal with it. Just remember that your customers will always do things you didn't expect! Good luck!
I have a Node app which accesses a static, large (>100M), complex, in-memory data structure, accepts queries, and then serves out little slices of that data to the client over HTTP.
Most queries can be answered in tenths of a second. Hurray for Node!
But, for certain queries, searching this data structure takes a few seconds. This sucks because everyone else has to wait.
To serve more clients efficiently, I would like to use some sort of parallelism.
But, because this data structure is so large, I would like to share it among the workers or threads or what have you, so I don't burn hundreds of megabytes. This would be perfectly safe, because the data structure is not going to be written to. A typical 'fork()' in any other language would do it.
However, as far as I can tell, all the standard ways of doing parallelism in Node explicitly make this impossible. For safety, they don't want you to share anything.
But is there a way?
Background:
It is impractical to put this data structure in a database, or use memcached, or anything like that.
WebWorker API libraries and similar only allow short serialized messages to be passed in and out of the workers.
Node's Cluster uses a call named 'fork', but it is not really a fork of the existing process, it is spawning a new one. So once again, no shared memory.
Probably the really correct answer would be to use filesystem-like access to shared memory, aka tmpfs, or mmap. There are some node libraries that make mount() and mmap() available for exactly something like this. Unfortunately then one has to implement complex data structure access on top of synchronous seeks and reads. My application uses arrays of arrays of dicts and so on. It would be nice to not have to reimplement all that.
I tried write a C/C++ binding of shared memory access from nodejs. https://github.com/supipd/node-shm
Still work in progress (but working for me), maybe usefull, if bug or suggestion, inform me.
building with waf is old style (node 0.6 and below), new build is with gyp.
You should look at node cluster (http://nodejs.org/api/cluster.html). Not clear this is going to help you without having more details, but this runs multiple node processes on the same machine using fork.
Actually Node does support spawning processes. I'm not sure how close Node's fork is to real fork, but you can try it:
http://nodejs.org/api/child_process.html#child_process_child_process_fork_modulepath_args_options
By the way: it is not true that Node is unsuited for that. It is as suited as any other language/web server. You can always fire multiple instances of your server on different ports and put a proxy in front.
If you need more memory - add more memory. :) It is as simple as that. Also you should think about putting all of that data on a dedicated in-memory database like Redis or Memcached ( or even Couchbase if you need complex queries ). You won't have to worry about duplicating that data any more.
Most web applications spend the majority of their life waiting for network buffers and database reads. Node.js is designed to excel at this io bound work. If your work is truly bound by the CPU, you might be served better by another platform.
With that out of the way...
Use process.nextTick (perhaps even nested blocks) to make sure that expensive CPU work is properly asynchronous and not allowed to block your thread. This will make sure one client making expensive requests doesn't negatively impact all the others.
Use node.js cluster to add a worker process for each CPU in the system. Worker processes can all bind to a single HTTP port and use Memcached or Redis to share memory state. Workers also have a messaging API that can be used to keep an in-process memory cache synchronized, however it has some consistency limitations.