Nodejs to utilize all cores on all CPUs - multithreading

I'm going to create multithreaded application that highly utilize all cores on all CPUs doing some intensive IO (web browsing) and then intensive CPU (analyzis of crawled streams). Is NodeJS good for that (since it's single threaded and I don't wanna run couple of nodejs instances [one per single core] and sync between them). Or should I consider some other platform?

Node is perfect for that; it is actually named Node as reference to the intended topology of its apps, as multiple (distributed) nodes that communicate with each other.
Take a look at the built-in cluster module, which handles multi-instance applications and thread sharing.
Further reading
Multi Core NodeJS App, is it possible in a single thread framework? by Cristian Ramirez on Codeburst
Scaling NodeJS Applications by Samer Buna on FreeCodeCamp

JavaScript V8 Engine was made to work with async tasks running on One core. However, it doesn't mean that you can have multiple cores running the same or perhaps, differente applications that communicate between each other.
You just have to be aware of some multiple-cores problems that might occur.
For example, if you are going to share LOTS of information between threads, then perhaps this is not the best language for you.
Considering the factor of multi-core language, I have recently been introduced to Elixir, based on Erlang (http://elixir-lang.org/).
It is a really cool language, developed 100% thinking about multi-thread applications. But it was made to make it easy, and also very fast applications that can be scalonable for as many cores as you want/can.
Back to node, the answer is yes, it support multi-thread, but is up to you to decide what to continue with. Take a look at this answer, and you might clarify your mind: Node.js on multi-core machines

Related

Would it be really an advantage to achieve multi-threading using web workers in NodeJS?

I think the question is pretty explicit. JavaScript is single threaded and NodeJS still achieves incredible performances. We could think obvious that multi-threading would take NodeJS performances further, but it might be wrong in some cases.
For example, I'm currently building a starter project using NextJS. I wonder if handling each request in a separate thread would be worth it.
Thank you!
As far as I know in production mode nodeJs "usually" used as:
nginx server (used as security layer and as HTTPS proxy)
number of child NodeJs processes (amount === number of cores)
That means that all cores are used,
request is processed by single core,
server processes several requests at once
=== UPDATE ===
If you want to divide single request processing into several threads - then just remember that cross-process communication is expensive in NodeJS, and you need to delegate huge tasks to other threads/webworkers
If you see the need to split app into several threads - consider using microservices architecture, e.g. using http://senecajs.org/

Sockets server made with Erlang vs others

I am learning Erlang and trying to understand how its sockets work as it is meant to be one of the strongest parts of the language and OTP.
I have experience with NodeJS, and wonder, how the applications made with NodeJS and Erlang differ in regards on how multiple sockets connections are managed.
As I understand, while JavaScript is single-threaded, V8 manages all the multiple simultaneous connections for it, though Erlang can manage multiple connections itself.
So, I wonder, if Erlang has excellent support for managing multiple connections at a time, how is it different from other technologies for a programmer? I mean, when I write an app for NodeJS, it can have as many connections open and well-managed as if I wrote code in Erlang, isn't it?
Please share your thoughts, links to some articles about the specialties of Erlang in this context are welcome too.
I am by no means an expert in Erlang, but I think I know Erlang and NodeJs on the same level.
The things you say, are all correct. Bot can handle multiple connections very efficiently, well well-managed you say.
But the thing is, the problems are not only handling multiple concurrent connections. The problems Erlang tries to solve very good, are fail safety, and distribution. I don't think NodeJs will be as good at it, as it is now.
Don't take it wrong, I'm not saying no one can code a distributed app in NodeJs, but considering the tools Erlang gives you, it maybe is a better choice.
For fail safety, as an example, Erlang let's you link your processes, so when one fails, other also fails or gets notified. That is not very practical by itself, but when you look at it alongside supervisors and shared-nothing processes, it is a great tool.
For distribution, Erlang let's you link nodes together. Linked nodes can talk together as if they were on the same machine, and they can spawn processes on other side too. Consider this, with the ability to start a failed app from a failed node on another node that is healthy. Gives you a great uptime.
And not to mention that these tools have years of experience behind them.
Just try to solve these issues on another ecosystem. I say ecosystem, because Erlang as a language is not complete, but the tools and frameworks (mostly OTP) have to be considered too. Then you can also say that Erlang really shines in this areas.
But Erlang also is not very good when it comes to linear processing, number crunching, image/sound processing, etc. That would be better implemented in another system.
I think, in this areas, the big difference between NodeJs and Erlang is their runtime model. NodeJs has one process, one thread that is working async on io-related tasks. Of course, you can run multiple processes, but that is the basic thing. On the other hand, Erlang has a VM called BEAM. Erlang uses special processes inside this VM, very light processes. BEAM schedules them itself, because they are not OS processes. This gives BEAM the advantage to have hundreds of thousands processes at the same time, each doing a task, be it io or anything else.
You see the difference now, I think. Erlang is more battle-tested, more better when fail safety or distribution is a must. NodeJs maybe better when you need faster development, and deployment.

Scaling Nodejs server to multiple systems?

I want to build the chat servers in nodejs using express.I have used cluster module for scaling the server among the multiple cores but how do I scale up to different system?
Since Node.js does not support shared memory, distributing Node.js processes across multiple machines provides for the same experience as using a cluster to distribute processes across multiple cores—if your application can run as multiple independent processes within a single system, then it can also be distributed to run as multiple independent processes across multiple systems.
Great, so that's one less thing to worry about! Now, there are many infrastructure solutions out there that would abstract running clusters on several systems, but your application is otherwise oblivious to any one you might pick.
What will concern you, though, within the realm of your application and any single process, is discovering external services, communicating to processes across the infrastructure and communicating with processes within a cluster. Again, there are many solutions out there that will curtail to any particular requirement your application needs to address.
So far, the Node.js community has favored simple approaches that are highly specialized for solving a particular problem and then get out of your way. For instance:
Web socket clients and servers: low latency within a cluster; also works well across the whole network when you can just send some data and get on with your life, but it will bring things down to a crawl if you need to synchronize processes, such as sending some data, waiting and idling until a result eventually comes back
Redis: clusters are easy to set up, instances handle discovery on their own, enough atomic operations to provide a solid approach to sharing data among different instances and the pub-sub support provides for low-latency IPC
ZMQ: lauded for it's intelligent, highly-available connections, you can devise any messaging protocol with a few dozen lines of code that the next human being maintaining your application will be able to reason about
etcd: distributed, consistent key-value store; low infrastructural overhead, allows for implementing straightforward service discovery on top that will integrate nicely with every infrastructure solution out there
Consul: based on serf, like etcd, but strongly opinionated, provides for service discovery on steroids with many additional niceties; if you like managing things on your own and have the time to invest up front, I would heartily recommend further investigation
While this certainly doesn't cover all the options available, it should be enough to get you going in the right direction. With just these simple building blocks that are ridiculously easy to reason about, you should be able to distribute your application across several systems, running across several machines in several datacenters.
If you're using a process manager like PM2, it will take care of starting up your node app on different or same machines but to handle multiple machines you should look into Puppet, Chef or Ansible to scale. If you're on AWS, EC2 can be set to do it automatically.
Actually there can be multiple answers to this question because the answer depends on how you want to communicate amongst nodes, how you want to assign tasks to nodes and how you manage failures.
You may want to research on how other cluster managers work and then try to design something similar in your application.
Few Approaches:
1) Use a load balancer in the front and distribute load amongst the machine. This I think can be the simplest approach.
2) Use a messaging system like RabbitMQ/ActiveMQ (or any other AMQP) system for inter node communication and let there be a pool of master nodes who assigns tasks to specific nodes and communicates to node via AMQP Protocol.

Monolithic (vs) Micro-services ==> Threads (vs) Process

I have a monolithic application with single process having 5 threads. Each thread accomplishes certain specific task. Thinking to move this application to microservices using dockers. If I look at the architecture, each worker thread would become a docker process. So, in some way Monolithic vs Microservices becomes more like Thread vs Process discussion in my case.
The original thinking of having the monolithic was to have threads for performance and share the same memory. Now with microservices arch, I am pushed to a process model that may not suit from performance point of view.
I am kind of stuck on how to approach this problem.
What you are missing here is that microservices is not suitable for any software system in the world! Think about the drivers for migrating your current monolithic system to microservices before doing anything. Are you seeking for high availability and scalability? Do you want to have freedom for writing each thread in different programming languages? Is your system that complicated that could not be comprehended in a monolithic style? and finally, are you ready for paying the expenses of having a microservices style?
Microservices brings in many complexities to the system and may cause performance penalties in favor of higher scalability due to chattiness of services. If performance is an important concern, the system is not that large, and your answer to most of the above questions is "NO", I strongly suggest that you do not go for microservices style. Instead, try to modularize your current code base and refactor the code for better quality and comprehensibility.
Regarding Docker, you can use it even with the monolithic style in order to remove some of the deployment barriers and inconsistency in the development and the deployment environments. If the mentioned issues around deployment do not bother you, do not go for docker either since it will be just a layer of computational overhead.
Microservice will gain your application a more power , but this depend how size your project , what is the degree of the availability do you need , Do you have a lot of teams , a lot of languages and extra
Microservice for some project will be over kill and this can be handled within multithreading , so you can think about your vision before to migrate to this Architecture ,

performance - multithreaded or multiprocess applications

In order to develop a highly network intensive server application on linux, what sort of architecture is preferred? The idea is that this app would typically run on machines with multiple cores (either virtual or physical). Considering that performance is the key criteria, is it better to go for a multi-threaded application or the one with multi-process design? I do know that sharing of resources and synchronization to access of such resources from multiple processes is a lot of programming overhead, but as mentioned earlier overall performance is the key requirement and so we can ignore those things. And the programming language would be C/C++.
I have heard that even the multi-threaded applications (single process) can take advantage of multiple cores and run each thread on a different core independently (as long as there is no sync issues). And this scheduling is done by the kernel. If so, is there not much difference in performance between multi-threaded applications and multi-process applications? Nginx uses a multi-process architecture and is really quick, but can one get the same performance with multi-threaded applications?
Thanks.
Processes and threads on linux are very similar to each other - the main difference is that the whole virtual memory is shared as well as certain things like signal handling differ.
This makes for cheaper context switches between threads (no need for costly MMU reloads etc.) but doesn't necessarily cause much difference in speed (especially outside of thread creation).
For designing a highly network intensive application, basically the only solution is to use an evented architecture (otherwise you'll bog down the system with huge amount of processes/threads and spend more time on their management than actually running work code), where you react to I/O on sockets and based on which sockets exhibit activity do apropriate operations.
A famous writeup about the problems faced in such situations is "The C10k problem", available from http://www.kegel.com/c10k.html - it describes different I/O approaches, so despite being a bit dated, it's a very good introduction.
Be careful before jumping deeply into reactor-like designs, though - they can get unwieldy and complex, so see if you can't use library/language that provides a nicer abstraction over it (Erlang is my personal favourite in this, languages with coroutines like Go can be useful too).
If your threads are doing the job independent from one another, under linux, there is simply no reason to not going with multiple processes instead. Multiple processes would increase your memory usage as each process has its own private memory space, but on the other hand sharing the memory space between independent threads is the worse decision. Context switching between threads vs processes is usually done better for processes rather than threads although its a little bit architecture and code dependent. Processes are safe to not get serialized with locks and mutex es. Processes are easier to manage and interact with in Linux. here is a good document you might find interesting (http://elinux.org/images/1/1c/Ben-Yossef-GoodBadUgly.pdf).

Resources