Nodejs multithreading vs nodejs single thread

Nodejs multithreading vs nodejs single thread - node.js

I don't understand the difference in java multithread system and Nodejs mutltithread system in terms of peformance and resource sharing. As NodeJS use event loop single thread for your program but behind the scene it assign task to different threads like file reading or db queries. so there it uses multithread and threadpool (similar to Java?).
But whenever we compare the performance, NodeJS apps are far better than other multi threading system.
How actually NodeJS handle multithread programming challenges like overflow or locking thread. How does it share resources between threads for example I am accessing same file at same time with two I/O so there will be two thread accessing to one resource, does it apply in NodeJS multithreading system? Or I have misunderstood this point?

Nodejs is using libuv for this purposes, which is written on C.
That's why you can't compare Java and Nodejs, we can say, that Nodejs is using low-level mechanism to make async IO.
libuv designed for nodejs, but it can be used in any projects.
You mentioned async disc operations - you can find good post about it here.
Short version:
use asynchronous disk I/O, instead of the synchronous disk calls in the disk thread in 0.16.x versions.
What does this means? It means that you can use same approach(async low-level IO operations) and i bet you can raise same speed with, for example, Java.
The other thing you mentioned - event loop. There are nothing hard - it is easy for understanding, for example you can read this good post.

here is my 2 pence worth ...
Multi-threading capability
Truth : Node.js ( currently ) does not provide native support for multi-threading in the sense of low level execution/processing threads. Java, and its implementations /frameworks, provides native support for multi-threading, and extensively too ( pre-emption, multi-tenancy, synchronous multi-threading, multi-tasking, thread pools etc )
Pants on Fire(ish) : lack of multi-threading in Nodejs is a show stopper. Nodejs is built around an event driven architecture , where events are produced and consumed as quickly as possible. There is native support for functional call backs. Depending on the application design, this highlevel functionality can support what could otherwise be done by thread. s
For serverside applications, at an application level , what is important is the ability to perform multiple tasks, concurrently :ie multi-tasking. There are several ways to implement multi-tasking . Multi-threading being one of them, and is a natural fit for the task. That said, the concept of “multi -threading “ is is a low level platform aspect. For instance multi-threaded platform such as java , hosted/running on a single core process server ( server with 1 CPU processor core) still supports multi-multi at the application level, mapped to multi-threading at the low level , but in reality , only a single thread can execute at any ontime. On a multi-core machine with sa y 4 cores , the same multi-tasking at application level is supported , and with up to 4 threads can executing simultaneously at any given time. The point is, in most cases, what really matters is the support for mult-tasking, which is not always synonymous with multi-threading.
Back to node.js , the real discussion should be on application design and architecture , and more specifically, support for MULTI-TASKING. In general, there is a whole paradigm shift between serverside applications and clientside or standalone applications, more so in terms of design, and process flow. Among other things, server side applications need to run along side other applications( onthe server ), need to be resilient and self contained (not affect the rest o f the server when the application fails or crashes ) , perform robust exception handling ( ie recover from errors, even critical ones ) and need to perform multiple tasks .
Just the ability to support multi-tasking is a critical capability for any server side technology . And node.js has this capability, and presented in a very easy to use packaging . This all means that design for sever side applications need to focus more on multi-tasking, and less on just multi-threading. Yes granted, working on a server-side platform that supports multi-threading has its obvious benefits ( enhanced functionality , performance) but that alone does not resolve the need to support multi-tasking at the application level . Any solid application design for server side applications ,AND node.js must be based on multi-tasking through event generation and consumption ( event processing). In node.js , the use of function callbacks, and small event processors (as functions ), with data checkpointing ( saving processing data , in files or data bases) across event processing instances is key.

Related

Node.js vs ASP.Net Core: Response times when doing I/O heavy operations under stress test?

Let's assume we are stress testing 2 servers that perform database read/write operations without caching, and make network calls. One is a Node.js server, and the other is an ASP.Net Core one that utilizes Tasks. Assume that both perform the same operations, receive the same requests/second, and run on machines with equal processing power.
I've read that Node.js is very performant on I/O operations, but I can't get my head around it. Sure, the awaited db and network calls run in a non-blocking way, however each of these calls are handled by an individual thread from a limited thread pool. This doesn't sound performant to me at all. I would be forced to implement a caching mechanism to mitigate this, which is something I don't really like. So either Node.js is not the greatest choice for these kinds of operations, or I have incorrect knowledge.
As for ASP.NET Core, I don't know the internals of it, but I'm pretty sure it doesn't have the thread limitation issues Node.js has, so it should have shorter response times by logic. Yet I still can't know for sure if this is the case due to resource consumption and context switching cost concerns.
So which of these 2 would theoretically have shorter response times, and why exactly?

Efficient way to process many threads of same Application

I have a Multi-Client Single-Server application where client and server gets connected through sockets. Client and Server are in different machine.
In client Application, client socket gets connected to server and sends data periodically to server.
In server application server socket listens for client to connect. When a client is connected, new thread is created for client to receive data.
for example: 1 client = 1 thread created by server for receiving data. If its 10000 client, server creates 10000 threads. This seems not good and scalable too.
My Application is in Java.
Is there an alternate method for this problem?
Thanks in advance

This is a typical C10K problem. There are patterns to solve this, one examples is Reactor pattern
Java NIO is another way where the incoming request can be processed in non blocking way. See a reference implementation here

Yes, you should not need a separate thread for each client. There's a good tutorial here that explains how to use await to handle asynchronous socket communication. Once you receive data over the socket you can use a fixed number of threads. The tutorial also covers techniques to handle thousands of simultaneous communications.
Unfortunately given the complexity it's not possible to post the code here, so although link-only answers are frowned upon ...

I would say it's a perfect candidate for an Erlang/Elixir application. Whatsapp, RabbitMQ...
Erlang processes are cheap and fast to start, Erlang manages the scheduling for you so you don't have to think about the number of threads, CPUs or even machines, Erlang manages garbage collection for each process after you don't need it anymore.
Haskell is slow, Erlang is fast enough for most applications that are not doing heavy calculations and even then you can use it and hand off the heavy lifting to a C process.
What are you writing in?

Yes, you can use the Actor model, with e.g. Akka or Akka.net. This allows you to create millions of actors that run on e.g. 4 threads. Erlang is a programming language that implements the actor model natively.
However, actors and non-blocking code won't do you much good if you are relying on blocking library calls for backend services that you rely on, such as (the most prominent example in the JVM world) JDBC calls.
There is also a rather interesting approach that Haskell uses, called green threads. It means that the runtime threads are very lightweight and are dynamically mapped to OS threads. It also means that you get a certain amount of scalability "for free", with no need to write non-blocking IO code. It does however require a good IO manager in the runtime to schedule the IO operations efficiently, and GHC Haskell has had a substantial amount of work put into that in recent years.

Play Framework and Node.js non-blocking behaviour for relational databases

Play Framework advises to relay blocking IO to an appropriate sized thread pool, as in:
https://www.playframework.com/documentation/2.5.x/ThreadPools
This is the case for relational database access because there are no non-blocking JDBC drivers available (with few exceptions).
I am currently learning about Node.JS and I couldn't figure out how this is handled in node. I didn't see any need to code thinking about thread pools in node.
So, are the relational database drivers used in node.js able to do non-blocking IO? Or are these computations being relayed to some kind of worker threads behind the scenes?
On a broader sense: what is the correct way to code a node.js app that is very DB (relational) intensive?

Node is single threaded so there are no user thread pools[1]. Instead you need to scale horizontally with more Node servers. And you can do that within a Node app: https://devcenter.heroku.com/articles/node-concurrency
And on another note, I've had good success with the async-JDBC-ish postgresql-async driver. I've used it with jdub-async and scalikejdbc. Here is a blog I wrote on using it with scalikejdbc: https://www.jamesward.com/2015/04/07/reactive-postgres-with-play-framework-scalikejdbc
[1] User code runs single threaded (but you can use web workers to have threads) however libuv is multi-threaded. Read more: How the single threaded non blocking IO model works in Node.js

I think you basically answered your own question: in nodejs you don't have to code in terms of thread pools or so. DB thread pools in Play are inherent to Java JDBC API. Pure nodejs DB drivers are asynchronous by design. The architecture of a nodejs wrapper driver depends on that of a wrapped library.
The answer to the broader question is:
There is not much difference between how you code DB intensive apps in nodejs or java, as most probably your bottleneck will be persistent storage behind your DB regardless of platform. But in asynchronous architectures:
it is more natural to design a system that doesn't overwhelm your DB with too much load
in case of a DB slowdown, the application itself usually will not demand more system resources
A good DB driver will let you achieve the points above with managed connection pools, per-query time-outs, per-connection query-queues. Though some of those could also be a feature of the native DB interface.

Nodejs to utilize all cores on all CPUs

I'm going to create multithreaded application that highly utilize all cores on all CPUs doing some intensive IO (web browsing) and then intensive CPU (analyzis of crawled streams). Is NodeJS good for that (since it's single threaded and I don't wanna run couple of nodejs instances [one per single core] and sync between them). Or should I consider some other platform?

Node is perfect for that; it is actually named Node as reference to the intended topology of its apps, as multiple (distributed) nodes that communicate with each other.
Take a look at the built-in cluster module, which handles multi-instance applications and thread sharing.
Further reading
Multi Core NodeJS App, is it possible in a single thread framework? by Cristian Ramirez on Codeburst
Scaling NodeJS Applications by Samer Buna on FreeCodeCamp

JavaScript V8 Engine was made to work with async tasks running on One core. However, it doesn't mean that you can have multiple cores running the same or perhaps, differente applications that communicate between each other.
You just have to be aware of some multiple-cores problems that might occur.
For example, if you are going to share LOTS of information between threads, then perhaps this is not the best language for you.
Considering the factor of multi-core language, I have recently been introduced to Elixir, based on Erlang (http://elixir-lang.org/).
It is a really cool language, developed 100% thinking about multi-thread applications. But it was made to make it easy, and also very fast applications that can be scalonable for as many cores as you want/can.
Back to node, the answer is yes, it support multi-thread, but is up to you to decide what to continue with. Take a look at this answer, and you might clarify your mind: Node.js on multi-core machines

Why are event-based network applications inherently faster than threaded ones?

We've all read the benchmarks and know the facts - event-based asynchronous network servers are faster than their threaded counterparts. Think lighttpd or Zeus vs. Apache or IIS. Why is that?

I think event based vs thread based is not the question - it is a nonblocking Multiplexed I/O, Selectable sockets, solution vs thread pool solution.
In the first case you are handling all input that comes in regardless of what is using it- so there is no blocking on the reads- a single 'listener'. The single listener thread passes data to what can be worker threads of different types- rather than one for each connection. Again, no blocking on writing any of the data- so the data handler can just run with it separately. Because this solution is mostly IO reads/writes it doesn't occupy much CPU time- thus your application can take that to do whatever it wants.
In a thread pool solution you have individual threads handling each connection, so they have to share time to context switch in and out- each one 'listening'. In this solution the CPU + IO ops are in the same thread- which gets a time slice- so you end up waiting on IO ops to complete per thread (blocking) which could traditionally be done without using CPU time.
Google for non-blocking IO for more detail- and you can prob find some comparisons vs. thread pools too.
(if anyone can clarify these points, feel free)

Event-driven applications are not inherently faster.
From Why Events Are a Bad Idea (for High-Concurrency Servers):
We examine the claimed strengths of events over threads and show that the
weaknesses of threads are artifacts of specific threading implementations
and not inherent to the threading paradigm. As evidence, we present a
user-level thread package that scales to 100,000 threads and achieves
excellent performance in a web server.
This was in 2003. Surely the state of threading on modern OSs has improved since then.
Writing the core of an event-based server means re-inventing cooperative multitasking (Windows 3.1 style) in your code, most likely on an OS that already supports proper pre-emptive multitasking, and without the benefit of transparent context switching. This means that you have to manage state on the heap that would normally be implied by the instruction pointer or stored in a stack variable. (If your language has them, closures ease this pain significantly. Trying to do this in C is a lot less fun.)
This also means you gain all of the caveats cooperative multitasking implies. If one of your event handlers takes a while to run for any reason, it stalls that event thread. Totally unrelated requests lag. Even lengthy CPU-invensive operations have to be sent somewhere else to avoid this. When you're talking about the core of a high-concurrency server, 'lengthy operation' is a relative term, on the order of microseconds for a server expected to handle 100,000 requests per second. I hope the virtual memory system never has to pull pages from disk for you!
Getting good performance from an event-based architecture can be tricky, especially when you consider latency and not just throughput. (Of course, there are plenty of mistakes you can make with threads as well. Concurrency is still hard.)
A couple important questions for the author of a new server application:
How do threads perform on the platforms you intend to support today? Are they going to be your bottleneck?
If you're still stuck with a bad thread implementation: why is nobody fixing this?

It really depends what you're doing; event-based programming is certainly tricky for nontrivial applications. Being a web server is really a very trivial well understood problem and both event-driven and threaded models work pretty well on modern OSs.
Correctly developing more complex server applications in an event model is generally pretty tricky - threaded applications are much easier to write. This may be the deciding factor rather than performance.

It isn't about the threads really. It is about the way the threads are used to service requests. For something like lighttpd you have a single thread that services multiple connections via events. For older versions of apache you had a process per connection and the process woke up on incoming data so you ended up with a very large number when there were lots of requests. Now however with MPM apache is event based as well see apache MPM event.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string