When I run the Nutch crawl as a background process on Ubuntu in local mode, the Fetcher aborts with hung threads. The message is something like:
WARN fetcher.Fetcher - Aborting with "X" hung threads.
I start off the script using nohup and & as I want to log off from the session and have the crawler still run on the server. Else, when the crawl finishes at a certain depth and when the crawldb is being updated, the SSH session times out. I've tried configuring "keep alive" messages without much help. The command is something like:
nohup ./bin/nutch crawl ....... &
Has anybody experienced this before? It seems to happen only when I use nohup or &.
The hung threads message is logged by Fetcher class when some requests seem to hang, despite all intentions.
In Fetcher.java, lines 926-930 ::
if ((System.currentTimeMillis() - lastRequestStart.get()) > timeout) {
if (LOG.isWarnEnabled()) {
LOG.warn("Aborting with "+activeThreads+" hung threads.");
}
return;
}
The timeout for requests is defined by mapred.task.timeout and default value is 10 mins. You might increase it.. not sure if it will be a 100% clean fix.
When I had observed this phenomenon, I added loggers in the code to find for which url the request hung more than 10 mins and concluded that for large files this issue was seen that too when the server was taking more time for data transfer.
Related
I have a nodejs application running on Linux, as we all know, whenever I restart the nodejs app it will get a new PID, suppose while the nodejs app is running, a client connects to it and running some process and the process status is processing, during that point of time, if the nodejs app restarts(on the server-side), how can we make sure the client connects back to the previous processing state.
What is happening now is, whenever the server restarts, the process stucks in processing forever.
Just direct me to a sample of how this scenario is handled in real life.
Thank You.
If I'm understanding you correctly, then the answer is you can't...
The reason for this is that, when you restart the process the event loop is restarted, meaning any processes that were running or were waiting in the event loop are gone. You are essentially clearing out the event loop when you restart.
I would say though, if you know the process is 'crashing' node then you probably want to look into that process and see why is crashing, place it in a try catch to it wont kill the server.
now with that said ( and without knowing what, processing state really means ) you could set a flag in your DB server for say 'job1' and have a status column of say 'running' when it was kicked off. When the node server restarts it can read Job status for 'running' jobs, if the 'job' is in a 'running' state you can fire off the job again and once complete update the table to 'completed'
This probably not the most efficient way as it's much better to figure out why the process if crashing, but as a fall-back this could work although in a clustered environment this could cause issues because server 1 may fail while server 2 is processing because server 1 does not know what server two is doing. With more details as to the use case, environment etc would probably allow for a better answer
I have a web application running as a service on an Ubuntu EC2 Instance. As of the past 24 hours, the application has been crashing randomly 2-4 hours after running with the message attached in the image below. The error is:
[nodemon] app crashed - waiting for file changes before starting...
I have run into this error before but usually, it is a syntax error and it will not allow me to actually start the application. In this case, the app functions normally for several hours before crashing. I have no idea where to even start as there's nothing above it that looks like it could be causing the crash. The only thing is it looks like the website receives 3 Get / Requests before the server can respond then it crashes. Most of the posts I've found online about this also block the application from running and don't mention the fact that the app runs normally then crashes.
Any help would be greatly appreciated.
Thanks!
Error Log from Journalctl
It looks like a silent error. I would try to log every input (e.g. http request and timeouts) with timestamp and also log the crash with time. When a crash occurs I would compare the time to events happening right before.
Also check your /var/log/ if the programm was terminated by the system or another programm.
I've changed a long running process on an Node/Express route to use Bull/Redis.
I've pretty much copied this tutorial on Heroku docs.
The gist of that tutorial is: The Express route schedules the Job, immediately returns 200 to the client, and browser long polls for the job status (a ping route on Express). When the client gets a completed status, it displays it in the UI. The Worker is a separate file and is run with an addtional yarn run worker.js.
Notice the end where it recommends using Throng for clustering your workers.
I'm using this Bull Dashboard to monitor jobs/queues. This dashboard shows the available Workers, and their status (Idle when not running/ Not in Idle when running).
I've got the MVP working, but the operation is super slow. The average time to complete is 1 minute 30 second. Whereas before (before adding Bull) it was seconds to complete.
Another strange it seems to take at least 30 seconds for a Workers status to change from Idle to not Idle. Seems that a lot of the latency is waiting for the worker.
Being that the new operation is a separate file (worker.js) and throng is enabling clustering, I was expecting this operation to be super fast, but it is just the opposite.
Does anyone have an experience with this? Or pointers to help figure out what is causing this to be so slow?
I have dynamic values in the array (say 100) all values are Neo4j database queries (cypher) each queries brings 30 values from the server.
sometimes if an error occurs in say xyz number query, to handle those error occurred, I have used process.exit() in catch() of the session of Cypher query, because I want to stop complete execution and also discard previous values, but as it also kills the running server.
I want to evaluate everything from starting,
I want to restart the server again automatically.
or
Any other way in spite of using process.exit(), if there is anything which I can call so that my current execution stops and restart current request again from scratch
In short: the server should not stop when an error occurs.
Anyone help, please?
If you want NodeJs to start again when its terminated.
Just try to use the process manager program to run your node.
For example PM2
SOLUTION:
The solution that I found: using low level nohup program that ignores signal sent by putty when closing the connection.
So, instead of ./gearman-manager start I did nohup ./gearman-manager start
NOTE: Still, I would like to know why was it slowing down when closing putty OR why does it continues in the first place if it has received the hangup signal???
I have a problem with execution of a gearman worker after I close a putty session.
This is what I have:
gearman client that is started with a cron job checking something in DB (infinite loop).
gearman manager started with gearman-manager start command receiving client's tasks and managing the calls to a worker
gearman worker reading/writing from DB and echoing the status of a current job
When I start gearman-manager I can see the echos from my worker when it receives task and when it executes them. Tasks (updates in DB) are executed cca. 1/second...
A) When I close putty session the speed of changes in DB decreases enormously (cca. 1/10sec)?! Could you tell me why is this?
B) When I log back with putty I don't get the outputs of gearman-manager back to the screen? I expected I'll log back into and see that it continues to echo the status like it did before closing putty? Maybe this could be because gearman-manager is started with owner root while the echoes are coming from .php ran as user gearman? or maybe when I log back into it the process is in the background?!
You don't see the output when you create a new tty because the process was bound to the previous tty. Unless you use something like screen to keep the tty alive, you aren't going to see that output with a new terminal.