Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
Good day
In my current situation when the jvm crashes or needs to be restarted then the apache tomcat server has to be manually started. I was wondering if there is a way to force the tomcat to start when the jvm finishes starting up. I'm on an ubuntu linux machine.
You are probably a little bit confused - or there is a critical piece of information missing from your question: Tomcat is not something separate from the JVM - it is written in Java and its code is executed within the JVM. On Linux, you will typically have one JVM process for each Java application, such as a Tomcat instance.
Therefore you cannot "start Tomcat once the JVM finishes starting up" - whichever JVM your Tomcat setup is using will just start executing the server bytecode as soon as it is loaded. The Tomcat start-up scripts will launch it with the correct parameters as soon as they are invoked.
I believe that there are four parts to your actual problem:
Determine what the exact behavior of your server is. Is the JVM actually crashing? Or is the Tomcat server encountering a critical exception? Or, perhaps, you just find your server in an unresponsive state? The Linux system logs and the Tomcat log files should contain enough information to tell what is happening.
Or is your Tomcat server just not starting once the OS boots, and you just need to fix your Linux boot configuration?
Determine why that behavior is happening. Is the JVM running out of memory and being terminated by the kernel? Is it crashing due to another issue? Is your web application stuck waiting on e.g. a dead DB server?
Determine how to fix the actual problem. Restarting the application server on a regular basis is a good indication that you need to fix your Tomcat setup - or your application code.
When you have done all you can with the previous steps, only then should you consider an automated solution to help restart your server. There are several service monitoring tools, such as Monit that you could use, although they usually need someone at least moderately experienced on Linux to set-up right.
Related
I have an api application that is running in a docker container, and since moving to AWS, the api stops daily with the error: Erlang closed the connection. I've monitored the server during that time and no IOPS seem to be causing this issue. Beyond that though, when the api fails, it won't restart on it's own on one of our clusters. I'm not sure where to find the logs to get more context and could use any input that may be helpful. Also, more context here, is that this api worked fairly well before in our data-center/physical server space, but now in AWS, it fails daily. Any thoughts or suggestions as to why this may be failing to restart?
I've looked at the syslogs and the application server logs and don't see any kind of failures. Maybe I'm not looking in the proper place. At this point, someone from one of our teams has to manually restart the api with an init.d command. I don't want to create a cron job to "fix" this because that's a band-aid and not a true fix.
There really isn't enough information on the structure of your app or its connections, so no one can give a "true fix".
The problem may be as small as configuring your nodes differently, changing some of the server's local configurations, or you may need some "keep alive" handler towards AWS.
Maybe try adding a manual periodic dump to see if its an accumulating problem, but I believe if Erlang closed the connection there's a problem between your API and AWS, and your API can't understand where it is located.
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I have a NodeJS and ExpressJS app running with Nginx on it's front. The app is pretty big and we have around a millions users per day. Memory of the app keeps growing as the load increases. And, at a point requests starts getting dropped as there is no more memory left on the server.
My initial guess was some module / snippet is giving memory leaks in the code, explore memory heaps and profiled the app. but, still not found the culprit. Any suggestions??
You can use spawn few more machines with higher RAM. Then use HAProxy and sticky sessions and balance the load accordingly.
Also you can use cluster mode and pm2 tools.
I have a routine that crashes linux and force a reboot using a system function.
Now I have the problem that I need to crash linux when a certain process dies. Using a script starting the process and if the script ends restart the server is not appropriate since it takes some ms.
Another idea is spawning the shooting processes alongside and use polling of a counter and if the counter is not incremented reboot the server would be another idea.
This would result in an almost instant reaction.
Now the question is what would be a good timeframe. I have no idea how the scheduler of linux would guarantee a certain update of any such counter and what a good timeout would be.
Also I would like to hear some alternatives to this second process spawning. Is there a possibility to advice linux to run a certain routine in case of a crash of the given process or a listener meachanism for the even of problems with a given process?
The timeout idea is already implemented in the kernel. You can register any application as a software watchdog, but you'll have to lower the default timeout. Have a look at http://linux.die.net/man/8/watchdog for some ideas. That application can also handle user-defined tests. Realistically unless you're trying to run kernel like linux-rt, having timeouts lower than 100ms can be dangerous on a system with heavy load - especially if the check needs to poll another application.
In cases of application crashes, you can handle them if your init supports notifications. For example both upstart and systemd can do that by monitoring files (make sure coredumps are created in the right place).
But in any case, I'd suggest rethinking the idea of millisecond-resolution restarts. Do you really need to kill the system in that time, or do you just need to isolate it? Just syncing the disks will take a few extra milliseconds and you probably don't want to miss that step. Instead of just killing the host, you could make sure the affected app isn't working (SIGABRT?) and kill all networking (flush iptables, change default to DROP).
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
There are advantages making a process daemonized, as it it detached from the terminal. But the same thing also can be achieved by cron job as well. [ Kindly correct me if not ]
What is the best requirement with which i can differentiate the scenarios when to use cronjob or daemon process?
In general, if your task needs to run more than a few times per hour (maybe <10 minutes) you probably want to run a daemon.
A daemon which is always running, has the following benefits:
It can run at frequencies greater than 1 per minute
It can remember state from its previous run more easily, which makes programming simpler (if you need to remember state) and can improve efficiency in some cases
On an infrastructure with many hosts, it does not cause a "stampedeing herd" effect
Multiple invocations can be avoided more easily (perhaps?)
BUT
If it quits (e.g. following an error), it won't automatically be restarted unless you implemented that feature
It uses memory even when not doing anything useful
Memory leaks are more of a problem.
In general, robustness favours "cron", and performance favours a daemon. But there is a lot of overlap (where either would be ok) and counter-examples. It depends on your exact scenario.
The difference between a cronjob and a daemon is the execution time frame.
A cronjob is a proccess that is executed once in a while. An example of cronjob could be a script that remove the content of a temporary folder once in a while, or a program that sends push notifications every day at 9.00 am to a bunch of devices.
Whereas a daemon is a process running detached from any user, but wont be re-launch if it comes to end.
If you need a service that it permanently available to others, then you need to run a daemon. This is a fairly complicated programming task, since the daemon needs to be able to communicate with the world on a permanent basis (e.g. by listening on a socket or TCP port), and it needs to be written to handle each job cleanly without leaking or even locking up resources for a long time.
By contrast, if you have a specific job whose description can be determined well enough in advance, and which can act automatically without further information, and is self-contained, then it may be entirely sufficient to have a cron job that runs the task periodically. This is much simpler to design for, since you only need a program that runs once for a limited time and then quits.
In a nutshell: A daemon is a single process that runs forever. A cron job is a mechanism to start a new, short-lived process periodically.
A daemon can take advantage of it's longevity by caching state, deferring disk writes, or engaging in prolonged sessions with a client.
A daemon must also be free of memory leaks, as they are likely to accumulate over time and cause a problem.
We have a cherrypy service that integrates with several backend web services. During load testing cherrypy process is regularly crashed after a while (45 minutes). We know the bottleneck is the backend web services we are using. Before crashing we see 500 and 503 errors when accessing the backend services, but I can't figure why cherrypy itself will crash (the whole process was killed). Can you give me ideas how to investigate where the problem is? Is it possible that the thread_poll (50) is queueing up too many requests?
In my early CherryPy days I had it crashing once. I mean a Python process crash caused by a segfault. When I investigated it I found that I messed with MySQLdb connections, caching them in objects which were accessed by CherryPy threads interchangeably. Because a MySQLdb connection is not thread-safe it should be accessed only from the thread in was created in. Also because of concurrency involved the crashes seemed nondeterministic, and only appeared in load-testing. So load-testing can work as a debugging tool here -- try Apache JMeter or Locust (Pythonic).
When a process crashes you can instruct Linux to write a core dump which will have a stack trace (e.g. on MySQLdb C-code side in my example). However alien low-level C environment is to you (it is to me), the stack trace can help find what library is causing the crash or at least narrow a circle of suspects. Here is an article about it.
Also I want to note that unlikely problem is in CherryPy. It is actually very stable.