web application multiple calls to executable - multithreading

Sorry, but I am a complete noob to web applications, and I was just wondering what happens if my web application makes a call to an external binary executable which can take some time to process an input file, and multiple users try to call it at the same time, or when one user tries to call it while it is still running the previous process?
I think this has something to do with threading, but I'm not sure how that applies to external executables.....if someone could provide a resource for me to learn about how it works that would be great too!

When a process is launched it has it's isolated from other processes, and the same process can be launched several times, the only limitation being CPU and memory utilization. You have to use some concurrent access protection if writing to shared file, but if you do database access the DB engine takes care of concurrent access so it's not a problem.

Related

Electron running multiple main processes vs multiple browser windows

I'm running electron on linux server for web scraping. And currently I'm running new electron command for each task. But it results in high cpu usage. Now thinking about running single electron instance, and create new BrowserWindow for each task. It will take some time to adapt the code base for this style, so I wanted to ask here first. Will it make a difference in cpu usage, and how much?
Basically, creating a new NodeJS process will result in re-parsing your application's code, which will highly affect your CPU usage. Creating only a new BrowserWindow will only create a new renderer process, which is way more efficient.
If your application is packaged, e.g. with electron-packager, then creating a new instance will also affect your CPU usage like creating another NodeJS process, because that packaged (aka compiled) application has a copy of NodeJS in it, which is enough to run your code, but still affects the CPU usage.
But the decision depends on how you use the server. If you only run the Electron application to carry out the tasks that have been defined by you, adapting your working code would have no to only a low benefit. If you want to release this application and/or that server is used by some other tasks, e.g. a web server, it would be a real benefit if you adapt your code.
Running multiple instances of the main nodejs process with the default configuration is not actually supported or tested. You'll find that any features that persists data to disk either don't work, or don't work as expected (ie. localstorage, indexeddb, sessions, etc).
https://github.com/electron/electron/issues/2493
You can work around this by changing the data directory for each instance so they don't trample over each other but this is likely to use a lot of disk space and you'd need a way to keep track of all these data directories.
A single main process with multiple renderers is nearly always the answer.

Nodejs scaling and prioritising functions

We have a node application running on the server that gets hit a lot and has to compile a zip file for download. That works well so far but I am nervous we will hit a point where performance becomes an issue.
(The application is currently running with forever on a ubuntu 14.04 machine.)
I am now asked to add all kinds of new features to the app which are more secondary and should not decrease the performance of the main function (the zip download). It would be OK to have those additional features fail in case the app is hit too many times in favour of the main zipping process.
What is the best practise here. Creating a REST API for the secondary features and put everything into a waiting list? It surely isn't enough to just create a second app and spawn a new process each time the main zip process finishes? How Can I ensure the most redundancy? I'm not talking about a multi-core cluster setup or load-balancing on NGINX, but a smart way of prioritising application functions on application level.
I hope this is not too broad. Cheers
First off, everything should be using async I/O, no synchronous I/O anywhere in your server. That's the #1 rule for building a scalable node.js server.
Second off, the highest priority tasks that have any significant CPU usage should be allowed to use multiple cores. If, as you say, the highest priority tasks is creating the zip download, then you should makes sure that that operation can take advantage of multiple cores.
You can accomplish that either with clustering (your whole server runs multiple instances that can each be on a separate core) or by creating a set of processes specifically for creating the zip files and then create a work queue in the main process that feeds these other processes work and gets the result back from them. This second option is likely a bit more complex to code than clustering, but it does prioritize the zip file creation so only one core is serving other server needs and all other cores of working on zip file creation. Clustering shares all cores with all server responsibilities.
At the pure server application level, your server can maintain a work queue of all incoming work to be done no matter what kind and it can prioritize that work. For example, if an API call comes in and there are already N zip file requests in the queue, you could immediately fail the API call to keep it from building up on the server. I don't think I'd personally recommend that solution unless your API calls are really heavy operations because it's very hard for a developer to reliably use your API if it regularly just fails on them. They would generally find it better for the API to just be slow sometimes than to regularly fail.
You might not even have to use a queue, you could just use a counter to keep track of how many ZIP file requests were "in process", but you'd have to make absolutely sure the counter was accurate in all cases. If there was ever an accumulating error in the counter, then you might just end up failing all API requests until your server was restarted.

Any limitations creating processes under Azure Web Sites (specifically Web Jobs)?

Are there any limitations on creating separate processes from an Azure Web Site (specifically, from a continuous Web Job)? I have an executable that often (about %20 of the time) stalls and eventually fails with exit code -1073741819 (access denied? or access violation?), but only when run as a separate process. If this work is retried later, it eventually succeeds (usually on the first retry).
When instead I call this logic directly via a .NET method call (so within the same process and app domain), the code succeeds 100% of the time. The same code also always succeeds when run locally, even when it creates a separate process.
Is there anything going on at the Azure Web Sites/Web Jobs level that I should be aware of, such as using Windows job objects or other security mechanisms to limit the creation or runtime of spawned processes? If not, any suggestions on how to diagnose further what might be going wrong? (I believe remote desktop to a web site isn't possible; anything else that would help "see" what's failing, such as whether there's a WER dialog appearing?)
In case it matters, the logic (in both cases) includes P/Invoking custom native code, and the web site I'm using is Always On, x64, Basic pricing tier.
#David Ebbo, thanks for the suggestion. I used it to help isolate, and I ultimately found this was non-determinism in the code made more likely in the Azure Web Sites environment but not 100% restricted to that context.

Node.js Clusters with Additional Processes

We use clustering with our express apps on multi cpu boxes. Works well, we get the maximum use out of AWS linux servers.
We inherited an app we are fixing up. It's unusual in that it has two processes. It has an Express API portion, to take incoming requests. But the process that acts on those requests can run for several minutes, so it was build as a seperate background process, node calling python and maya.
Originally the two were tightly coupled, with the python script called by the request to upload the data. But this of course was suboptimal, as it would leave the client waiting for a response for the time it took to run, so it was rewritten as a background process that runs in a loop, checking for new uploads, and processing them sequentially.
So my question is this: if we have this separate node process running in the background, and we run clusters which starts up a process for each CPU, how is that going to work? Are we not going to get two node processes competing for the same CPU. We were getting a bit of weird behaviour and crashing yesterday, without a lot of error messages, (god I love node), so it's bit concerning. I'm assuming Linux will just swap the processes in and out as they are being used. But I wonder if it will be problematic, and I also wonder about someone getting their web session swapped out for several minutes while the longer running process runs.
The smart thing to do would be to rewrite this to run on two different servers, but the files that maya uses/creates are on the server's file system, and we were not given the budget to rebuild the way we should. So, we're stuck with this architecture for now.
Any thoughts now possible problems and how to avoid them would be appreciated.
From an overall architecture prospective, spawning 1 nodejs per core is a great way to go. You have a lot of interdependencies though, the nodejs processes are calling maya which may use mulitple threads (keep that in mind).
The part that is concerning to me is your random crashes and your "process that runs in a loop". If that process is just checking the file system you probably have a race condition where the nodejs processes are competing to work on the same input/output files.
In theory, 1 nodejs process per core will work great and should help to utilize all your CPU usage. Linux always swaps the processes in and out so that is not an issue. You could start multiple nodejs per core and still not have an issue.
One last note, be sure to keep an eye on your memory usage, several linux distributions on EC2 do not have a swap file enabled by default, running out of memory can be another silent app killer, best to add a swap file in case you run into memory issues.

How can I manage use of a shared resource used by several Perl programs?

I am looking for a good way to manage the access to an external FTP server from various programs on a single server.
Currently I am working with a lock file, so that only one process can use the ftp server at a time. What would be a good way to allow 2-3 parallel processes to access the ftp server simultaneously. Unfortunately the provider does not allow more sessions and locks my account for a day if too many processes access their server.
Used platforms are Solaris and Linux - all ftp access is encapsulated in a single library thus there is only 1 function which I need to change. Would be nice if there is something on CPAN.
I'd look into perlipc(1) for SystemV semaphores or modules like POSIX::RT::Semaphore for posix semaphores. I'd create a semaphore with a resource count of 2-3, and then in the different process try to get the semaphore.
Instead of making a bunch of programs wait in line, could you create one local program that handled all the remote communication while the local programs talked to it? You effectively create a proxy and push that complexity away from your programs so you don't have to deal with it in every program.
I don't know the other constraints on your problem, but this has worked for me on similar issues.

Resources