Share memory between multiple process in nodejs environment - node.js

So here is the problem, on which i'm thinking of:
One physical server running node js http server through cluster module, which means that there are multiple separate process, each second I'm receiving large amount of requests (5000-10000k), each process counting it's incoming request separately, and then they aggregate this statistics in memcache.
Such architecture creates additional processor time consumption on i/o operation + additional large service running on same server.
What I'm thinking about is to create small service, which allocates some memory for request counters, after that, when http server processes starts - they connect to this service and receive from it pointer on memory, where counter located, so they can increment and read number directly from it without intermediate service commands.
Question: Is there any way to allocate memory in one process and then give pointer address of this memory to a multiple set of another process so this set of processes could read and write that memory directly? And this should be possible in node js.
Answer: After some researches i had come across shared memory system calls and used them in self written nodejs ad-don, that allowed me to use single memory block among multiple processes. Disadvantage of this method is that only primitive types allowed (char, int)

Related

Do the "get" operation consume additional memory

I have a key in Redis, let's call it 'key', so it consumes an amount of memory.
Then I have a Nodejs application with Redis driver, I get that 'key' by simple API
var data = await redis.get('key')
So my question is, do the 'data' variable create new memory consumption other than use the original memory address of 'key' that is currently used by Redis
Yes it is new and also different memory.
Redis is running in a different process and your node application is a different process. Two processes have different memory spaces. They are not shared. So when you grab the key - somewhere in your node js memory now that data also exists
That's why you are using a client that goes over the network over tcp to grab the data - the memory is not shared

Worker thread communication protocol

I stumbled upon worker threads in NodeJS and I started investigating on lower level abstraction how intercommunication and data sharing works between them, especially postMessage function that is being used to send message data between threads.
Looking at this line of code const { Worker, isMainThread, parentPort } = require('worker_threads'); one would guess that it uses sockets in order to communicate as keyword port is being used, but I found no open port connections when searching them trough command prompt.
I want to understand what communication protocol is worker_thread mechanism using? Is it TCP or its some other mechanism of sharing data and messages in between threads? This is based on a research that I want to commit myself in order to understand efficiency of transmitting large amount of data in between worker_threads versus ICP communication between child processes using memory sharing/TCP.
Workers don't communicate with their parents with any sort of TCP / IP or other interprocess communication protocol. The messages passed between workers, and between workers and parents, pass data back and forth directly. Workers and their parents share a single address space and V8 instance.
.postMessage() looks like this, where transferList is optional.
port.postMessage(value, transferList)
The items in the first parameter are cloned, and copies passed from the sender to the receiver of the message.
The items in the second parameter are passed without cloning them. Only certain array-style data types can be transferred this way. The sender loses access to these items and the receiver gains access. This saves the cloning time and makes it much quicker to pass large data structures like images. This sort of messaging works in both browser and nodejs code.
Child processes can pass data back and forth from the spawned process to the spawning process. That data is copied and pushed through the interprocess communication method set up at spawning time. In many cases the IPC mechanism uses OS-level pipes; those are as efficient as any IPC mechanism, especially on UNIX-derived OSs. Child processes do not share memory with parents, so they can't use a transferList change-of-ownership scheme directly.

why Redis is single threaded(event driven)

I am trying to understanding basics of Redis.
One that that keep coming everywhere is, Redis is single threaded that makes things atomic.But I am unable to imagine how this is working internally.I have below doubt.
Don't we design a server Single thread if it is IO bound application(like Node.js),where thread got free for another request after initiating IO operation and return data to client once IO operation is finished(providing concurrency). But in case of redis all data are available in Main Memory,We are not going to do IO operation at all.So then why Redis is single threaded?What will happen if first request is taking to much time,remaining request will have to keep waiting?
TL;DR: Single thread makes redis simpler, and redis is still IO bound.
Memory is I/O. Redis is still I/O bound. When redis is under heavy load and reaches maximum requests per second it is usually starved for network bandwidth or memory bandwidth, and is usually not using much of the CPU. There are certain commands for which this won't be true, but for most use cases redis will be severely I/O bound by network or memory.
Unless memory and network speeds suddenly get orders of magnitude faster, being single threaded is usually not an issue. If you need to scale beyond one or a few threads (ie: master<->slave<->slave setup) you are already looking at Redis Cluster. In that case you can set up a cluster instance per CPU core if you are somehow CPU starved and want to maximize the number of threads.
I am not very familiar with redis source or internals, but I can see how using a single thread makes it easy to implement lockless atomic actions. Threads would make this more complex and doesn't appear to offer large advantages since redis is not CPU bound. Implementing concurrency at a level above a redis instance seems like a good solution, and is what Redis Sentinel and Redis Cluster help with.
What happens to other requests when redis takes a long time?
Those other requests will block while redis completes the long request. If needed, you can test this using the client-pause command.
The correct answer is Carl's, of course. However.
In Redis v4 we're seeing the beginning of a shift from being mostly single threaded to selectively and carefully multi threaded. Modules and thread-safe contexts are one example of that. Another two are the new UNLINK command and ASYNC mode for FLUSHDB/FLUSHALL. Future plans are to offload more work that's currently being done by the main event loop (e.g. IO-bound tasks) to worker threads.
From redis website
Redis uses a mostly single threaded design. This means that a single
process serves all the client requests, using a technique called
multiplexing. This means that Redis can serve a single request in
every given moment, so all the requests are served sequentially. This
is very similar to how Node.js works as well. However, both products
are not often perceived as being slow. This is caused in part by the
small amount of time to complete a single request, but primarily
because these products are designed to not block on system calls, such
as reading data from or writing data to a socket.
I said that Redis is mostly single threaded since actually from Redis 2.4 we use threads in Redis in order to perform some slow I/O operations in the background, mainly related to disk I/O, but this does not change the fact that Redis serves all the requests using a single thread.
Memory is no I/O operation

aws kernel is killing my node app

Problem :
I am executing test of my mongoose query but kernel kills my node app for OutOfMemory Reasons.
flow scenario: for a single request
/GET REQUEST -> READ document of user(eg.schema) [This schema has ref : user schema with one of its fields] -> COMPILE/REARRANGE the output of query read from mongodb [This involves filtering and looping of data] according the response format as required by the client. -> UPDATE a field of this document and SAVE it back to mongoDB again -> UPDATE REDIS -> SEND response [the above compiled response ] back to requested client
** the above fails when 100 concurrent customers do the same...
MEM - goes very low (<10MB)
CPU - MAX (>98%)
What i could figure out is the rate at which read and writes are occurring which is choking mongodb by queuing all requests and thereby delaying nodejs which causes such drastic CPU and MEM values and finally app gets killed by the kernel.
PLEASE suggest how do i proceed to achieve concurrency in such flows...
You've now met the Linux OOM Killer. Basically, all linux kernels (not just Amazon's) need to take action when they've run out of RAM, so they need to find a process to kill. Generally, this is the process that has been asking for the most memory.
Your 3 main options are:
Add swap space. You can create a swapfile on the root disk if it has enough space, or create a small EBS volume, attach it to the instance, and configure it as swap.
Move to an instance type with more RAM.
Decrease your memory usage on the instance, either by stopping/killing unused processes or reconfiguring your app.
Option 1 is probably the easiest for short-term debugging. For production performance, you'd want to look at optimizing your app's memory usage or getting an instance with more RAM.

is node.js a one process server?

is node.js a one process server, or can it emulate Apache bunch of child processes, each serves a different request, and each is independent from the other (and the cycling of child processes to avoid long term memory leaks).
Is it at all needed when using node.js?
Node.js by default is a one process server. For most purposes that's all that's needed (IO limits and memory limits are typically reached before CPU limits).
If you need more processes you can use http://learnboost.github.com/cluster/
It's single process and single threaded, due to the fact that Node is non-blocking and event-based. This means this single process can handle many requests at the same time, sending a response back whenever the response is ready.
The key point to note, is that Node is non-blocking.

Resources