why Linux keeps killing Apache? - linux

I have a long running apache webserver with lots of requests after sometime I find the apache server stopped with
Killed line at the end
what can I do to solve this problem or prevent the system from killing the apache instance ??

Linux usually kills processes when resources like memory are getting low. You might want to have a look at the memory consumption of your apache process over time.
You might find some more details here:
https://unix.stackexchange.com/questions/136291/will-linux-start-killing-my-processes-without-asking-me-if-memory-gets-short
Also you can monitor your processes using the MMonit software, have a look here: https://serverfault.com/questions/402834/kill-processes-if-high-load-average

The is a utility top which shows the process consumption (e.g. Mem, CPU, User etc), you can use it to keep an eye on apache process.

Related

Unable to Run Multiple Node Child Processes without Choking on DigitalOcean

I've been struggling to run multiple instances of Puppeteer on DigitalOcean for quite some time with little luck. I'm able to run ~5 concurrently using tools like puppeteer-cluster, but for some reason the whole thing just chokes with little helpful messaging. So, I switched to spawning ~5 child processes without any additional library -- just Puppeteer itself. Same issue. Chokes with no helpful errors.
I'm able to run all of these jobs just fine locally, but after I deploy, I hit these walls. So, my hunch is that it's a resource/performance issue, but I can't say for sure.
I'm running a droplet with 1GB and 3CPUs on Digital Ocean.
Basically, I'm just looking for ways to start troubleshooting something like this. is there a way I can know for sure that I'm hitting resource walls? I've tried pm2 and the DO dashboard graphs, but I feel like those are all leaving a lot of information out, or else I'm missing something else altogether.
Author of puppeteer-cluster here. You are right, 1 GB of memory is likely not enough for running 5 browser windows (or tabs) in addition to your operating system and maybe even other background tasks.
Here is a list of resources you should check:
Memory: Use a tool like htop to check your memory usage while your application is running.
CPU: Again, you can use htop for that, 3 vCPUs should be more than enough for 5 windows.
Disk space: Use a tool like df to check if there is enough space on the disk. I know of multiple cases in which there was not enough space on the disk (like some old kernels filling the disk), and Chrome needs at least some space to run.
Network throughput: Rarely the problem, but sometimes the network just does not have the bandwidth to support many open browser. Use a tool like nload to check the network throughput.
To use htop or nload, you start your script in the background (node script.js &) or use a terminal multiplexer (like tmux). Resource problems should then be easy to spot.
Most probably you're running out of memory, 5 puppeteer processes are a lot for a 1GB VM.
You can run
grep -i 'killed process' /var/log/messages
to confirm that the OOM killer terminated your processes.

Unable to locate the memory hog on openvz container

i have a very odd issue on one of my openvz containers. The memory usage reported by top,htop,free and openvz tools seems to be ~4GB out of allocated 10GB.
when i list the processes by memory usage or use ps_mem.py script, i only get ~800MB of memory usage. Similarily, when i browse the process list in htop, i find myself unable to pinpoint the memory hogging offender.
There is definitely a process leaking ram in my container, but even when it hits critical levels and i stop everything in that container (except for ssh, init and shells) i cannot reclaim the ram back. Only restarting the container helps, otherwise the OOM starts kicking in in the container eventually.
I was under the assumption that leaky process releases all its ram when killed, and you can observe its misbehavior via top or similar tools.
If anyone has ever experienced behavior like this, i would be grateful for any hints. The container is running icinga2 (which i suspect for leaking ram) , although at most times the monitoring process sits idle, as it manages to execute all its scheduled checks in more than timely manner - so i'd expect the ram usage to drop at those times. It doesn't though.
I had a similar issue in the past and in the end it was solved by the hosting company where I had my openvz container. I think the best approach would be to open a support ticket to your hoster, explain them the problem and ask them to investigate. Maybe they use some outdated kernel version or they did changes on the server that have impact on your ovz container.

Shorting the time between process crash and shooting server in the head?

I have a routine that crashes linux and force a reboot using a system function.
Now I have the problem that I need to crash linux when a certain process dies. Using a script starting the process and if the script ends restart the server is not appropriate since it takes some ms.
Another idea is spawning the shooting processes alongside and use polling of a counter and if the counter is not incremented reboot the server would be another idea.
This would result in an almost instant reaction.
Now the question is what would be a good timeframe. I have no idea how the scheduler of linux would guarantee a certain update of any such counter and what a good timeout would be.
Also I would like to hear some alternatives to this second process spawning. Is there a possibility to advice linux to run a certain routine in case of a crash of the given process or a listener meachanism for the even of problems with a given process?
The timeout idea is already implemented in the kernel. You can register any application as a software watchdog, but you'll have to lower the default timeout. Have a look at http://linux.die.net/man/8/watchdog for some ideas. That application can also handle user-defined tests. Realistically unless you're trying to run kernel like linux-rt, having timeouts lower than 100ms can be dangerous on a system with heavy load - especially if the check needs to poll another application.
In cases of application crashes, you can handle them if your init supports notifications. For example both upstart and systemd can do that by monitoring files (make sure coredumps are created in the right place).
But in any case, I'd suggest rethinking the idea of millisecond-resolution restarts. Do you really need to kill the system in that time, or do you just need to isolate it? Just syncing the disks will take a few extra milliseconds and you probably don't want to miss that step. Instead of just killing the host, you could make sure the affected app isn't working (SIGABRT?) and kill all networking (flush iptables, change default to DROP).

Node.js Clusters with Additional Processes

We use clustering with our express apps on multi cpu boxes. Works well, we get the maximum use out of AWS linux servers.
We inherited an app we are fixing up. It's unusual in that it has two processes. It has an Express API portion, to take incoming requests. But the process that acts on those requests can run for several minutes, so it was build as a seperate background process, node calling python and maya.
Originally the two were tightly coupled, with the python script called by the request to upload the data. But this of course was suboptimal, as it would leave the client waiting for a response for the time it took to run, so it was rewritten as a background process that runs in a loop, checking for new uploads, and processing them sequentially.
So my question is this: if we have this separate node process running in the background, and we run clusters which starts up a process for each CPU, how is that going to work? Are we not going to get two node processes competing for the same CPU. We were getting a bit of weird behaviour and crashing yesterday, without a lot of error messages, (god I love node), so it's bit concerning. I'm assuming Linux will just swap the processes in and out as they are being used. But I wonder if it will be problematic, and I also wonder about someone getting their web session swapped out for several minutes while the longer running process runs.
The smart thing to do would be to rewrite this to run on two different servers, but the files that maya uses/creates are on the server's file system, and we were not given the budget to rebuild the way we should. So, we're stuck with this architecture for now.
Any thoughts now possible problems and how to avoid them would be appreciated.
From an overall architecture prospective, spawning 1 nodejs per core is a great way to go. You have a lot of interdependencies though, the nodejs processes are calling maya which may use mulitple threads (keep that in mind).
The part that is concerning to me is your random crashes and your "process that runs in a loop". If that process is just checking the file system you probably have a race condition where the nodejs processes are competing to work on the same input/output files.
In theory, 1 nodejs process per core will work great and should help to utilize all your CPU usage. Linux always swaps the processes in and out so that is not an issue. You could start multiple nodejs per core and still not have an issue.
One last note, be sure to keep an eye on your memory usage, several linux distributions on EC2 do not have a swap file enabled by default, running out of memory can be another silent app killer, best to add a swap file in case you run into memory issues.

Get sharepoint w3wp.exe process memory dump using the winDBG

I try to use the winDBG (adplus) to dump the w3wp process.
When I run this command adplus.vbs -hang -quiet -p ****, I found it create a folder with a big size file, and the size was growing. Then suddenly, the big size file disappeared and the process re-start again. Does anyone know about it?
Best Regards,
Yongwei,
Colin is right; in effect, you're racing against IIS as it is recylcing the application pool. As you're snapping your process snapshot, you're either hitting a memory-threshold for recycling, or health checks are perceiving the process to be hung and instituting a recycle (possibly due to ADPlus locking the process)
Here's how I would modify your application pool characteristics prior to attempting your next capture. You only need these changes in effect for as long as it takes to capture your dump:
Turn off memory-based recycling limits (physical and virtual)
Turn off the idle timeout limit (if it's on)
Disable both Pinging and Rapid Fail Protection
In effect: you need to turn off all of the features that try to keep your app pools running well. Capturing a memory snapshot takes time (as you know).
I would also recommend checking out ProcDump (http://technet.microsoft.com/en-us/sysinternals/dd996900.aspx) from the SysInternals guys. It was just released last month, and it makes process memory captures a bit easier. An article on using it to capture the W3WP is here: http://blogs.msdn.com/webtopics/archive/2009/08/08/using-procdump-exe-to-monitor-w3wp-exe-for-cpu-spikes.aspx
I hope this helps!
I can only imagine the memory usage of the w3wp process got to much which triggered an app pool recycle, which means restarting w3wp.

Resources