Managing multiple-processes: What are the common strategies? - multithreading

While multithreading is faster in some cases, sometimes we just want to spawn multiple worker processes to do work. This has the benefits of not crashing the main app if one of the worker crashes, and that the user doesn't need to worry a lot about inter-locking stuffs.
COM+'s Application Pooling seems like a good way to achieve this on Windows. The downside is that we need to write a COM+ wrapper for the worker process.
However, when I search for Application Pooling on Google, it seems like most of its usages are related to IIS. Don't other applications (such as scientific/graphics) find it useful to spawn multiple worker processes?
So there are several questions:
Why isn't COM+ more popular in areas other than IIS? If I write a non-IIS application and want to use process management on Windows, should I go with COM+ or are there better alternatives out there?
What would be the cross platform way to do it? Are there libraries out there that give me a "process pool" (worker processes will intelligently pick up work, can be managed, etc.)

I can't offer any answers to the COM aspect of your question, but it's worth noting there's another world (besides HPC MPI) where multi-processing (rather than the more common multi-threading approach) is apparently alive, well and thriving: Python.
Why ? Python's GIL ("global interpreter lock") cripples most attempts to multithread python code so badly that multiprocessing is the generally recommended approach to parallelising Python on SMP. The standard library includes process pools; there are various other options too.
Python certainly ought to satisfy any multi-platform requirement!

You might want to investigate how the apache web server manages process pools. From version 2.0 it runs natively on windows and one of the multi-processing models it supports are process pools. A part of apache is also APR (apache portable runtime), which handles platform-specific issues.

No one can answer why something is not popular because may be no body is looking for what you are looking for. After .NET came in picture, people shifted from COM to Managed Environment, before .NET, COM and ATL and relative other technologies were quite painful to implement and they would crash and were also quite difficult to debug.
That is the reason, managed environment came in existence.
However, .NET 4 onwards, parallel libraries give much more power to user for parallel programming and also you can spawn and control other proceeses.
For multiplatform, you can look for zvrba's answer.

Yes, other applications--especially science applications--find it useful to spawn multiple processes. Since few super-computers run Microsoft Windows, scientists generally avoid using anything that ties them to a Microsoft platform. Nothing related to COM will help scientists leverage their enormous existing code base written in Fortran.
People who choose to run IIS have generally already drunk the Microsoft Koolaid, so they have fewer inhibitions to tying themselves to Microsoft's proprietary platforms, which is why COM-specific terminology will get lots of hits related to IIS.
One of the open standards for doing what you want is the Message Passing Interface. Several implementations exist and some of them run on supercomputers using Fortran. Some of them run on cheaper computers using sexier languages.
See http://en.wikipedia.org/wiki/Message_Passing_Interface

There hasn't been a mob rushing through the doors of COM application pooling primarily because of two factors:
COM is a pain in the ass to deal with compared to just about anything else
Threading can be a headache, but it's a lot easier and more convenient to manage than inter-process communication
COM application pooling was essentially created for IIS. It has one very specific benefit over normal multithreading: the multiple processes are fully isolated from each other. This is important for data security and for app stability when dealing with third party plugins of questionable stability.
Scientific computing generally doesn't need strong data security isolation between operations, and I would venture to guess that scientific computing doesn't rely much on third party plugins of questionable stability. When doing big math operations, you're either using a sexy numerics library that had better be rock solid to be taken seriously, or you're using your own code, in which case crashes should be fixed and repeat offenders should be spanked.
Oh, and all crashes except stack overflow can be trapped and dealt with within a multithreaded app, especially if it's your own code.
In short, COM app pooling is overkill for just about anything other than IIS.

Google's webbrowser chrome is a multi-process architecture software. It is open source, so you can check out its code and see how to manage processes.

Related

How to create a Node.js program that runs the same on all supported platforms

I want to create a Node.js application that runs on Windows, Mac and most linuxes. Is that easy? Are there any good examples of such? What do I need to take into account to do it? I understand file-path separator is one important issue. Are there others?
I'd like to hear if anybody has actual experiences
and "gotchas" they've encountered when creating
a cross-platform Node.js application. Thanks
I agree with https://stackoverflow.com/users/3731501/estus, the question is a bit broad in regards to what functionality you'd like to have in your application.
With that said, it may very well be impossible to create any application that executes the same across all platforms, but you should be able to achieve near functional parity with a bit of understanding and effort.
The main issues you'll encounter are around file systems. The node.js team has created a great guide on working with different filesystems, and would be a good start in at least understanding some of the best practices and approaches to to handling the differences and utilizing the fs module on different platforms.
Whatever other intricacies and considerations around platform dependent operations you may have are inevitably tied to what your application is trying to do. Once that's determined, you'll need to address those differences by reviewing whatever module you're using to execute the expected functionality and coding for the deviations. The documentation for the api's in the node.js common library are very good at exposing any behavioral or functional differences across operating systems, so if using those, you should at the very least know how those modules and corresponding methods behave on host systems. Hope that helps.

Sockets server made with Erlang vs others

I am learning Erlang and trying to understand how its sockets work as it is meant to be one of the strongest parts of the language and OTP.
I have experience with NodeJS, and wonder, how the applications made with NodeJS and Erlang differ in regards on how multiple sockets connections are managed.
As I understand, while JavaScript is single-threaded, V8 manages all the multiple simultaneous connections for it, though Erlang can manage multiple connections itself.
So, I wonder, if Erlang has excellent support for managing multiple connections at a time, how is it different from other technologies for a programmer? I mean, when I write an app for NodeJS, it can have as many connections open and well-managed as if I wrote code in Erlang, isn't it?
Please share your thoughts, links to some articles about the specialties of Erlang in this context are welcome too.
I am by no means an expert in Erlang, but I think I know Erlang and NodeJs on the same level.
The things you say, are all correct. Bot can handle multiple connections very efficiently, well well-managed you say.
But the thing is, the problems are not only handling multiple concurrent connections. The problems Erlang tries to solve very good, are fail safety, and distribution. I don't think NodeJs will be as good at it, as it is now.
Don't take it wrong, I'm not saying no one can code a distributed app in NodeJs, but considering the tools Erlang gives you, it maybe is a better choice.
For fail safety, as an example, Erlang let's you link your processes, so when one fails, other also fails or gets notified. That is not very practical by itself, but when you look at it alongside supervisors and shared-nothing processes, it is a great tool.
For distribution, Erlang let's you link nodes together. Linked nodes can talk together as if they were on the same machine, and they can spawn processes on other side too. Consider this, with the ability to start a failed app from a failed node on another node that is healthy. Gives you a great uptime.
And not to mention that these tools have years of experience behind them.
Just try to solve these issues on another ecosystem. I say ecosystem, because Erlang as a language is not complete, but the tools and frameworks (mostly OTP) have to be considered too. Then you can also say that Erlang really shines in this areas.
But Erlang also is not very good when it comes to linear processing, number crunching, image/sound processing, etc. That would be better implemented in another system.
I think, in this areas, the big difference between NodeJs and Erlang is their runtime model. NodeJs has one process, one thread that is working async on io-related tasks. Of course, you can run multiple processes, but that is the basic thing. On the other hand, Erlang has a VM called BEAM. Erlang uses special processes inside this VM, very light processes. BEAM schedules them itself, because they are not OS processes. This gives BEAM the advantage to have hundreds of thousands processes at the same time, each doing a task, be it io or anything else.
You see the difference now, I think. Erlang is more battle-tested, more better when fail safety or distribution is a must. NodeJs maybe better when you need faster development, and deployment.

Running a single threaded app across multiple cores/cpu's

I have a single-threaded java application that I'd like to make run on multiple cores. Is there a software that'll allow me to essentially "combine" multiple cores into one logical unit for the software to run on, with minimal to no changes to the underlying software?
If not, how would I go about addressing this issue without changing too much of the java application?
No, this is basically a fundamental problem in hardware utilization. There is currently no uniform way of automatically parallelizing single-threaded applications (aside from running the application as multiple processes).
This is well established and has been the focus of compiler research for decades. That being said, it is probably not impossible, but evidently very hard to achieve.
More info:
Automatic parallelization
Does the JVM have the ability to detect opportunities for parallelization?

hesitation between two technologies for a little program

I want to make a program (more precisely, a service) that periodically scans directories to find some video files (.avi, .mkv, etc) and automatically download some associated files (mostly subtitles) from one or several websites.
This program could run on linux or windows as well.
On one hand, I know well Qt from a long time and I know all its benefits, but on the other hand, I'm attracted by node.js and it extreme flexibility and liveliness.
I need to offer some interactivity with the end user of my program (for instance, chose the scans directories, etc).
What would be the best choice in your opinion in 2013?
I advise against Node.js for "small tools and programs". Especially for iterative tasks.
The long story
The reason is quite simply the way Node.js works. Its asynchronous model makes simple tasks unnecessarily convoluted. Additionally, because many callbacks are called from the Node.js event loop, you can't just use try/catch structures so every tiny error will crash your whole Application.
Of course there are ways to catch those errors or work with them, but the docs advise you against all of them and advise you to restart the application gracefully in any case to prevent memory leaks. This means you have to implement yet another piece of code.
The only real solution in Node.js would be writing your Application as a Cluster, which is a great concept but of course would require you to use some kind of IPC to get your data back to a process that can handle it.
Also, since you wrote about "periodically scan"ning a directory, I want to point out that you should...
Use file system watchers for services
Almost every language kit has those now and I strongly suggest using those and only use a fallback full-scan.
In Qt there is a system-independent class QFileSystemWatcher that provides a handy callback whenever specified files are changed
In Java there is the java.nio.file.FileSystem.getWatchService()
Node.js has the fs.watch function, if you really want to go for it

Node.js event vs thread programming on server side

We are planning to start a fairly complex web-portal which is expected to attract good local traffic and I've been told by my boss to consider/analyse node.js for the serve side.
I think scalability and multi-core support can be handled with an Nginx or Cherokee in front.
1) Is this node.js ready for some serious/big business?
2) Does this 'event/asynchronous' paradigm on server side has the potential to support the heavy traffic and data operation ? considering the fact that 'everything' is being processed in a single thread and all the live connections would be lost if it got crashed (though its easy to restart).
3) What are the advantages of event based programming compared to thread based style ? or vice-versa.
(I know of higher cost associated with thread switching but hardware can be squeezed with event model.)
Following are interesting but contradicting (to some extent) papers:-
1) http://www.usenix.org/events/hotos03/tech/full_papers/vonbehren/vonbehren_html
2) http://pdos.csail.mit.edu/~rtm/papers/dabek:event.pdf
Node.js is developing extremely rapidly, and most of its functionality is sturdy and ready for business. However, there are a lot of places where its lacking, like database drivers, jquery and DOM, multiple http headers, etc. There are plenty of modules coming up tackling every aspect, but for a production environment you'll have to be careful to pick ones that are stable.
Its actually much MUCH more efficient using a single thread than a thousand (or even fifty) from an operating system perspective, and benchmarks I've read (sorry, don't have them on hand -- will try to find them and link them later) show that it's able to support heavy traffic -- not sure about file-system access though.
Event based programming is:
Cleaner-looking code than threaded code (in JavaScript, that is)
The JavaScript engine is extremely efficient with processing events and handling callbacks, and its easily one of the languages seeing the most runtime optimization right now.
Harder to fit when you are thinking in terms of control flow. With events, you can never be sure of the flow. However, you can also come to think of it as more dynamic programming. You can treat each event being fired as independent.
It forces you to be more security-conscious when programming, for the above reason. In that sense, its better than linear systems, where sometimes you take sanitized input for granted.
As for the two papers, both are relatively old. The first benchmarks against this, which as you can see, has a more recent note about these studies:
http://www.eecs.harvard.edu/~mdw/proj/seda/
It also cites the second paper you linked about what they have done, but refuses to comment on its relevance to the comparison between event-based systems and thread-based ones :)
Try yourself to discover the truth
See What is Node.js? where we cover exactly that:
Node in production is definitely possible, but far from the "turn-key" deployment seemingly promised by the docs. With Node v0.6.x, "cluster" has been integrated into the platform, providing one of the essential building blocks, but my "production.js" script is still ~150 lines of logic to handle stuff like creating the log directory, recycling dead workers, etc. For a "serious" production service, you also need to be prepared to throttle incoming connections and do all the stuff that Apache does for PHP. To be fair, Rails has this exact problem. It is solved via two complementary mechanisms: 1) Putting Rails/Node behind a dedicated webserver (written in C and tested to hell and back) like Nginx (or Apache / Lighttd). The webserver can efficiently serve static content, access logging, rewrite URLs, terminate SSL, enforce access rules, and manage multiple sub-services. For requests that hit the actual node service, the webserver proxies the request through. 2) Using a framework like "Unicorn" that will manage the worker processes, recycle them periodically, etc. I've yet to find a Node serving framework that seems fully baked; it may exist, but I haven't found it yet and still use ~150 lines in my hand-rolled "production.js".

Resources