curl hanging, despite connect-timeout and max-time - linux

I have some scripts retrieving resources (image files etc) using system calls to curl. Occasionally, these will fail to finish, and will show as pipe_w in process listings.
F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD
0 S root 4378 4086 0 82 2 - 16002 pipe_w Jan10 ? 00:00:00 curl -JO --max-time 60 --connect-timeout 60 https://address/path/to/resource?identifier=tag
If I understand correctly, I can use connect-timeout to set the # of seconds to try and make the connection, and max-time to limit the amount of time to wait for response from the remote machine.
curl -JO --max-time 60 --connect-timeout 60 https://address/path/to/resource?identifier=tag
Any suggestions as to how I can force curl to continue past this? Or pointers on what might cause this?
This is using curl 7.21.0, on a stock ubuntu 10.10.

Related

Can't seem to kill a process with bash script

I've been trying to kill a process with a batch script and I can't seem to get it working I've read a lot of tutorials online and tried different things and I can't seem to get it to kill the process
how it's run: (crontab)
* * * * * /home/pi/status.sh > /home/pi/logs/status.log 2>&1
log:
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
^M 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0^M100 10 100 10 0 0 73 0 --:--:-- --:--:-- --:--:-- 153^M100 10 100 10 0 0 72 0 --:--:-- --:--:-- --:--:-- $
/home/pi/status.sh: 6: /home/pi/status.sh: 18645: not found
status.sh:
Bridge=$(curl http://www.mywebsite.com/dir/cache/timestamp.txt)
timestamp=$( date +%s )
total=`expr $timestamp - $Bridge`
if (($total > 300));
then
#p=$(pidof cgminerEU)
#sudo killall -9 cgminerEU
#sudo kill -9 $(pidof cgminerEU)
sudo pkill -f cgminerEU
fi
the process in question
pi#raspberrypi ~ $ ps ax | grep cgminerEU
26018 ? Ss 0:13 SCREEN -dm ./cgminerEU
26019 pts/0 Ssl+ 89:32 ./cgminerEU
30989 pts/2 S+ 0:00 grep --color=auto cgminerEU
does the
/home/pi/status.sh: 6: /home/pi/status.sh: 18645: not found
mean that it's trying to kill pid 18645? I'm sorry I'm new to bash scripting and it's all very confusing
I suspect you'll find that this is due to a race condition.
If you kill screen, cgminerEU will immediately die and vice versa
You've made pkill send a signal to both processes
pkill is in a race to kill the second process before it dies.
I suggest you try removing the -f from pkill to make it kill only by process name and not full command line.
This way, it will kill only the cgminerEU process and not the screen process of the same name (which will die as a dependency anyways).
PS: curl has a -s to avoid getting emailing the progress indicator.

how to use procps-3.2.8 in listing all the running processes?

Does anyone know how to use procps-3.2.8 in listing all the running processes of ubuntu/linux?
And how to kill them using procps-3.2.8?
please provide the step-by-step procedure and provide useful links about procps.
procps is the package which contains the many command line utility provided. You can find the complete information about each utility option under the procpcs from the below location:
On the homepage we can get the following information about procps:
procps is the package that has a bunch of small useful utilities that give information about processes using the /proc filesystem. The
package includes the programs ps, top, vmstat, w, kill, free, slabtop,
and skill.*
http://www.linuxfromscratch.org/lfs/view/7.2/chapter06/procps.html
http://procps.sourceforge.net/
How to use procps-3.2.8 in listing all the running processes?
ps is the part of procps package and there are numerous ways to list the all running process(For detailed information do man ps).
mantosh#mantosh4u:~/practice$ ps -V
procps version 3.2.8
mantosh#mantosh4u:~/practice$ ps -AF
UID PID PPID C SZ RSS PSR STIME TTY TIME CMD
root 1 0 0 6143 2544 3 14:38 ? 00:00:00 /sbin/init
root 2 0 0 0 0 1 14:38 ? 00:00:00 [kthreadd]
.............................................................................
root 3320 2 0 0 0 0 15:13 ? 00:00:00 [kworker/u:2]
root 3334 2 0 0 0 1 15:18 ? 00:00:00 [kworker/1:0]
How to kill them using procps-3.2.8?
pkill is part of procps package which contains numerous command line option to kill a process. For detailed information man pkill on your terminal.
mantosh#mantosh4u:~/practice$ pkill -V
pkill (procps version 3.2.8)
mantosh#mantosh4u:~/practice$ pkill -f gedit
In the above example, the gedit was the process name which has been killed.

What could be the reason time info in ps command don't change while the process is active

I'm running a java process (doing some database manipulations) and I ran ps -lp 5631232
F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
202001 A 205 5631232 263213 0 60 20 3f46b46120 70156 * pts/6 1:09 java
the 'TIME' has not been changed for a long while. The status is A (active), so I think it didn't halt.
I just don't know how can I find out what's going wrong out there? Anyone can tell me how to detect the problem and/or what could be the problem?
I'm using AIX system.

Getting CPU utilization information

How could I get the CPU utilization with time info of a process in linux? Basically I want to let my application run overnight. At the same time, I would like to monitor the CPU utilization during the period the application is run.
I tried top | grep appName >& log, it does not seem to return me anything in the log. Could someone help me with this?
Thanks.
vmstat and iostat can both give you periodic information of this nature; I would suggest either setting up the number of times manually, or putting a single poll into a cron job, and then redirecting the output to a file:
vmstat 20 4230 >> cpu_log_file
This would give you a snapshot of usage every 20 seconds for 24 hours.
install sysstat package and run sar
nohup sar -o output.file 12 8 >/dev/null 2>&1 &
use the top or watch command
PID COMMAND %CPU TIME #TH #WQ #PORT #MREG RPRVT RSHRD RSIZE VPRVT VSIZE PGRP PPID STATE UID FAULTS COW MSGSENT MSGRECV SYSBSD SYSMACH CSW PAGEINS USER
10764 top 8.4 00:01.04 1/1 0 24 33 2000K 244K 2576K 17M 2378M 10764 10719 running 0 9908+ 54 564790+ 282365+ 3381+ 283412+ 838+ 27 root
10763 taskgated 0.0 00:00.00 2 0 25 27 432K 244K 1004K 27M 2387M 10763 1 sleeping 0 376 60 140 60 160 109 11 0 root
Write a program that invokes your process and then calls getrusage(2) and reports statistics for its children.
You can monitor the time used by your program with top while it is running.
Alternatively, you can launch your application with the time command, which will print the total amount of CPU time used by your program at the end of its execution. Just type time ./my_app instead of just ./my_app
For more info, man 1 time

How is it possible that kill -9 for a process on Linux has no effect?

I'm writing a plugin to highlight text strings automatically as you visit a web site. It's like the highlight search results but automatic and for many words; it could be used for people with allergies to make words really stand out, for example, when they browse a food site.
But I have problem. When I try to close an empty, fresh FF window, it somehow blocks the whole process. When I kill the process, all the windows vanish, but the Firefox process stays alive (parent PID is 1, doesn't listen to any signals, has lots of resources open, still eats CPU, but won't budge).
So two questions:
How is it even possible for a process not to listen to kill -9 (neither as user nor as root)?
Is there anything I can do but a reboot?
[EDIT] This is the offending process:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
digulla 16688 4.3 4.2 784476 345464 pts/14 D Mar28 75:02 /opt/firefox-3.0/firefox-bin
Same with ps -ef | grep firefox
UID PID PPID C STIME TTY TIME CMD
digulla 16688 1 4 Mar28 pts/14 01:15:02 /opt/firefox-3.0/firefox-bin
It's the only process left. As you can see, it's not a zombie, it's running! It doesn't listen to kill -9, no matter if I kill by PID or name! If I try to connect with strace, then the strace also hangs and can't be killed. There is no output, either. My guess is that FF hangs in some kernel routine but which?
[EDIT2] Based on feedback by sigjuice:
ps axopid,comm,wchan
can show you in which kernel routine a process hangs. In my case, the offending plugin was the Beagle Indexer (openSUSE 11.1). After disabling the plugin, FF was a quick and happy fox again.
As noted in comments to the OP, a process status (STAT) of D indicates that the process is in an "uninterruptible sleep" state. In real-world terms, this generally means that it's waiting on I/O and can't/won't do anything - including dying - until that I/O operation completes.
Processes in a D state will normally only be there for a fraction of a second before the operation completes and they return to R/S. In my experience, if a process gets stuck in D, it's most often trying to communicate with an unreachable NFS or other remote filesystem, trying to access a failing hard drive, or making use of some piece of hardware by way of a flaky device driver. In such cases, the only way to recover and allow the process to die is to either get the fs/drive/hardware back up and running so the I/O can complete or to give up and reboot the system. In the specific case of NFS, the mount may also eventually time out and return from the I/O operation (with a failure code), but this is dependent on the mount options and it's very common for NFS mounts to be set to wait forever.
This is distinct from a zombie process, which will have a status of Z.
Double-check that the parent-id is really 1. If not, and this is firefox, first try sudo killall -9 firefox-bin. After that, try killing the specific process IDs individually with sudo killall -9 [process-id].
How is it even possible for a process not to listen to kill -9 (neiter as user nor as root)?
If a process has gone <defunct> and then becomes a zombie with a parent of 1, you can't kill it manually; only init can. Zombie processes are already dead and gone - they've lost the ability to be killed as they are no longer processes, only a process table entry and its associated exit code, waiting to be collected. You need to kill the parent, and you can't kill init for obvious reasons.
But see here for more general information. A reboot will kill everything, naturally.
Is it possible, that this process is restarted (for example by init) just at the time you kill it?
You can check this easily. If the PID is the same after kill -9 PID then the process wasn't killed, but if it has changed the process has been restarted.
I lately get trapped into a pitfall of Double Fork and had landed to this page before finally finding my answer. The symptoms are identical even if the problem is not the same:
WYKINWYT :What You Kill Is Not What You Thought
The minimal test code is shown below based on an example for an SNMP Daemon
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
int main(int argc, char* argv[])
{
//We omit the -f option (do not Fork) to reproduce the problem
char * options[]={"/usr/local/sbin/snmpd",/*"-f","*/-d","--master=agentx", "-Dagentx","--agentXSocket=tcp:localhost:1706", "udp:10161", (char*) NULL};
pid_t pid = fork();
if ( 0 > pid ) return -1;
switch(pid)
{
case 0:
{ //Child launches SNMP daemon
execv(options[0],options);
exit(-2);
break;
}
default:
{
sleep(10); //Simulate "long" activity
kill(pid,SIGTERM);//kill what should be child,
//i.e the SNMP daemon I assume
printf("Signal sent to %d\n",pid);
sleep(10); //Simulate "long" operation before closing
waitpid(pid);
printf("SNMP should be now down\n");
getchar();//Blocking (for observation only)
break;
}
}
printf("Bye!\n");
}
During the first phase the main process (7699) launches the SNMP daemon (7700) but we can see that this one is now Defunct/Zombie. Beside we can see another process (7702) with the options we specified
[nils#localhost ~]$ ps -ef | tail
root 7439 2 0 23:00 ? 00:00:00 [kworker/1:0]
root 7494 2 0 23:03 ? 00:00:00 [kworker/0:1]
root 7544 2 0 23:08 ? 00:00:00 [kworker/0:2]
root 7605 2 0 23:10 ? 00:00:00 [kworker/1:2]
root 7698 729 0 23:11 ? 00:00:00 sleep 60
nils 7699 2832 0 23:11 pts/0 00:00:00 ./main
nils 7700 7699 0 23:11 pts/0 00:00:00 [snmpd] <defunct>
nils 7702 1 0 23:11 ? 00:00:00 /usr/local/sbin/snmpd -Lo -d --master=agentx -Dagentx --agentXSocket=tcp:localhost:1706 udp:10161
nils 7727 3706 0 23:11 pts/1 00:00:00 ps -ef
nils 7728 3706 0 23:11 pts/1 00:00:00 tail
After the 10 sec simulated we will try to kill the only process we know (7700). What we succeed at last with waitpid(). But Process 7702 is still here
[nils#localhost ~]$ ps -ef | tail
root 7431 2 0 23:00 ? 00:00:00 [kworker/u256:1]
root 7439 2 0 23:00 ? 00:00:00 [kworker/1:0]
root 7494 2 0 23:03 ? 00:00:00 [kworker/0:1]
root 7544 2 0 23:08 ? 00:00:00 [kworker/0:2]
root 7605 2 0 23:10 ? 00:00:00 [kworker/1:2]
root 7698 729 0 23:11 ? 00:00:00 sleep 60
nils 7699 2832 0 23:11 pts/0 00:00:00 ./main
nils 7702 1 0 23:11 ? 00:00:00 /usr/local/sbin/snmpd -Lo -d --master=agentx -Dagentx --agentXSocket=tcp:localhost:1706 udp:10161
nils 7751 3706 0 23:12 pts/1 00:00:00 ps -ef
nils 7752 3706 0 23:12 pts/1 00:00:00 tail
After giving a character to the getchar() function our main process terminates but the SNMP daemon with the pid 7002 is still here
[nils#localhost ~]$ ps -ef | tail
postfix 7399 1511 0 22:58 ? 00:00:00 pickup -l -t unix -u
root 7431 2 0 23:00 ? 00:00:00 [kworker/u256:1]
root 7439 2 0 23:00 ? 00:00:00 [kworker/1:0]
root 7494 2 0 23:03 ? 00:00:00 [kworker/0:1]
root 7544 2 0 23:08 ? 00:00:00 [kworker/0:2]
root 7605 2 0 23:10 ? 00:00:00 [kworker/1:2]
root 7698 729 0 23:11 ? 00:00:00 sleep 60
nils 7702 1 0 23:11 ? 00:00:00 /usr/local/sbin/snmpd -Lo -d --master=agentx -Dagentx --agentXSocket=tcp:localhost:1706 udp:10161
nils 7765 3706 0 23:12 pts/1 00:00:00 ps -ef
nils 7766 3706 0 23:12 pts/1 00:00:00 tail
Conclusion
The fact that we ignored the double fork mechanism made us think that the kill action did not succeed. But in fact we simply killed the wrong process !!
By adding the -f option ( Do Not (Double) Fork ) all go as expected
ps -ef | grep firefox;
and you can see 3 process, kill them all.
sudo killall -9 firefox
Should work
EDIT: [PID] changed to firefox
You can also do a pstree and kill the parent. This makes sure that you get the entire offending process tree and not just the leaf.

Resources