Service-monitoring tool - health-monitoring

We have several Linux processes implemented in various technologies, Java, C++, etc. They interact with each other by passing messages on Websphere MQ. If any process crashes, we would like it to be restarted automatically for a configured number of times.
Would it involve a change in the applications, such as periodically raising a heartbeat to indicate that the application is in good health?
Thanks,
Yash

At my previous job we had a similar problem.
We developed our own solution. We implemented a watcher program in two different technologies: one in Java and one in C++ using QT. Each had a list of programs to watch. For every program that the watcher watch we had maximum time between two heartbeats, what program to run on every heartbeat and what program to run if the time between two heartbeats maxed out.
Watcher in Java had an entry for the watcher in C++ and vise-versa.

Related

Communication between multiple programs

I am currently planning a complicated networking project on Windows IoT Enterprise. Basically, I will have a C program keeping alive a special network interface. This C program should receive tasks in some way from other programs on the same host, which will generally be written in all sorts of languages (e.g. node.js). I have never did such cooperation between tasks. Do you have any advice on how a node.js server can pass information to an already running C program, and preferably receive a success code or an error message?
It is very important for me that this process is as fast as possible, as this solution will handle multiple thousand requests per second.
In one of the comments I was pointed towards zeroMQ, and I am now using it successfully in my application, thank you for the help!

(Tomcat) Web service : OutOfMemoryError: unable to create new native thread

I am creating a web service that creates a huge amount of small java timer threads over (10k). I can only seem to create 2k timer threads before I get the OutOfMemoryError: unable to create new native thread. How do i solve this? I am using a macbook pro to run my Tomcat server on. I'v configured the ulimit (-u) max user processes to double what it used to be but I still get the same problem. What are my options, if any, to make this doable?
It's often a bad idea for web applications to start their own (few) threads, let alone 10K threads - and then "as timers"? Seriously? Don't go there.
What can you do?
Don't rely on the ability to create those threads.
Change your architecture! Use a scheduler library that has solved this problem already (e.g. Quartz or others).
If you don't want to use an external library (why wouldn't you?): Implement a single timer thread that executes the scheduled operations when they're due. Do not use a new thread for each scheduled operation
If you wanted to boil 100 eggs, would you buy 100 timers?

Node.js Clusters with Additional Processes

We use clustering with our express apps on multi cpu boxes. Works well, we get the maximum use out of AWS linux servers.
We inherited an app we are fixing up. It's unusual in that it has two processes. It has an Express API portion, to take incoming requests. But the process that acts on those requests can run for several minutes, so it was build as a seperate background process, node calling python and maya.
Originally the two were tightly coupled, with the python script called by the request to upload the data. But this of course was suboptimal, as it would leave the client waiting for a response for the time it took to run, so it was rewritten as a background process that runs in a loop, checking for new uploads, and processing them sequentially.
So my question is this: if we have this separate node process running in the background, and we run clusters which starts up a process for each CPU, how is that going to work? Are we not going to get two node processes competing for the same CPU. We were getting a bit of weird behaviour and crashing yesterday, without a lot of error messages, (god I love node), so it's bit concerning. I'm assuming Linux will just swap the processes in and out as they are being used. But I wonder if it will be problematic, and I also wonder about someone getting their web session swapped out for several minutes while the longer running process runs.
The smart thing to do would be to rewrite this to run on two different servers, but the files that maya uses/creates are on the server's file system, and we were not given the budget to rebuild the way we should. So, we're stuck with this architecture for now.
Any thoughts now possible problems and how to avoid them would be appreciated.
From an overall architecture prospective, spawning 1 nodejs per core is a great way to go. You have a lot of interdependencies though, the nodejs processes are calling maya which may use mulitple threads (keep that in mind).
The part that is concerning to me is your random crashes and your "process that runs in a loop". If that process is just checking the file system you probably have a race condition where the nodejs processes are competing to work on the same input/output files.
In theory, 1 nodejs process per core will work great and should help to utilize all your CPU usage. Linux always swaps the processes in and out so that is not an issue. You could start multiple nodejs per core and still not have an issue.
One last note, be sure to keep an eye on your memory usage, several linux distributions on EC2 do not have a swap file enabled by default, running out of memory can be another silent app killer, best to add a swap file in case you run into memory issues.

Concurrency issue with AWS Simple Email Service on Mono

I am in the process of moving a newsletter service from a Windows server running Microsoft.NET 4.5 to a Linux server running Mono 3.0.3. The service uses Amazon's "Simple Email Service" (SES) to deliver the emails, via the official .NET SDK (wrapping a REST interface).
While sending emails via SES sequentially from Mono turns out to be slightly faster than Microsoft.NET using similar hardware, I am running into serious performance trouble when attempting to deliver multiple mails in parallel. Below is a chart showing the time required to send 128 emails on both platforms using a varying number of threads. As you can see, performance on Mono degrades rapidly after 8 threads, and with 128 threads I get only HTTP timeouts – not a single email is delivered.
Profiling via console output, it turns out that the first "batch" of mails is the source of the slowdown. With two threads, sending one email in each, both threads finish in around 2200 ms. With four threads, sending one email in each, they all finish in around 4400 ms. Eight threads, around 8800 ms, etc. It seems as if the web service, while spawned simultaneously, are run sequentially and are required to wait for one another before returning.
Any ideas what might be triggering this behavior? The source code for the Amazon SDK is available on GitHub, but I have not been able to pinpoint anything suspiciously. Maybe the use of the async methods on HttpWebRequest?
Yes, stop using async HttpWebRequest* for now because there is a bug being discussed in the Mono list. A patch has been provided, but apparently is not good enough and has been reverted from master.
If you're good with low level code, it would be nice that you contribute a patch.
* The fastest way to stop using the async infrastructure is calling mono witht eh environment variable MONO_DISABLE_AIO=1. By the way, if you're using more than one thread anyway, maybe a Parallel.For would be enough but keeping the code non-asynchronous? The best use-case of async is actually to avoid threading and still manage to achieve parallelization (or rather, avoid blocking waits).

JUnit Thread Testing

I have client and server threads in my applications. When I run these apps as standalone apps, these threads communicate properly.
But when I run client as JUnit and server as standalone, client thread dies within few seconds.
I couldn't get, why such different behavior.
When the JUnit runner terminates, all spawned threads etc. are killed too (as it is most likely run in a separate JVM instance).
Here is a (rather old) article describing the problem you experienced (the GroboUtils library it is recommending seems to have been abandoned long time ago though). And another, recent one, with a more modern solution using the new Java concurrency framework.
The gist of the latter solution is that it runs the threads via an executor, which publishes the results of the runs via Futures. And Future.get is blocking until the thread finishes with the task, automatically keeping the JUnit tests alive. You may be able to adapt this trick to your case.

Resources