Counter loop piped to grep seems unexpectedly random - linux

Experiment to make grep stop a while loop at 5 iterations, so that /tmp/foo should only be 5 lines long:
n=1
while [ $n -le 2000 ]
do
echo $n
n=$(( $n + 1 ))
done | tee /tmp/foo | grep -q ^5
Check count:
wc -l < /tmp/foo
Outputs:
34
Repeated runs of the above return different numbers most every time, but it's not very random -- running the above 5000 times in bash results in about 1500 9s for example, running it 5000 times in dash results in 157 106s.
These results seem more interesting than the initial experiment. What's happening in this code?

Pipes are asynchronous. While tee will exit the first time it tries to write to its end of the pipe after grep exits and closes its end, there is no way to know how many lines tee will write before grep actually does so. It's entirely up to the OS scheduler.

Related

Why does xtrace show piped commands executing out of order?

Here's a simple reproducer:
cat >sample_pipeline.sh << EOF
set -x
head /dev/urandom | awk '{\$2=\$3=""; print \$0}' | column -t | grep 123 | wc -l >/dev/null
EOF
watch -g -d -n 0.1 'bash sample_pipeline.sh 2>&1 | tee -a /tmp/watchout'
wc -l /tmp/watchout
tail /tmp/watchout
As expected, usually the commands execute in the order they are written in:
+ head /dev/urandom
+ awk '{$2=$3=""; print $0}'
+ column -t
+ grep 123
+ wc -l
...but some of the time the order is different, e.g. awk before head:
+ awk '{$2=$3=""; print $0}'
+ head /dev/urandom
+ column -t
+ grep 123
+ wc -l
I can understand if the shell pre-spawns processes waiting for input, but why wouldn't it spawn them in order?
Replacing bash with dash (the default Ubuntu shell) seems to have the same behavior.
When you write a pipeline like that, the shell will fork off a bunch of child processes to run all of the subcommands. Each child process will then print the command it is executing just before it calls exec. Since each child is an independent process, the OS might schedule them in any order, and various things going on (cache misses/thrashing between CPU cores) might delay some children and not others. So the order the messages come out is unpredictable.
This happens because of how pipelines are implement. The shell will first fork N subshells, and each subshell will (when scheduled) print out its xtrace output and invoke its command. The order of the output is therefore the result of a race.
You can see a similar effect with
for i in {1..5}; do echo "$i" & done
Even though each echo command is spawned in order, the output may still be 3 4 5 2 1.

Shell script + time dependency

I wanted to write a shell script for the following:
I want to check if a service is running, if it is running then exit 1, or else after 5 min of it not running, exit -1.
something like:
while(for 5 minutes) {
if service running, exit 1
}
exit -1 //service is not running even after 5 minutes, so exit -1.
I am able to check the condition that if service is running or not, but not able to add the time constraint part.
This is what i attempted
if (( $(ps -ef | grep -v grep | grep tomcat7 | wc -l) > 0 ));
then
echo "running"
else
echo "NOT running"
fi
You should use the bash sleep command. An excerpt from the man page:-
You could provide sleep 5m in your script to wait for 5 minutes and do an action.
NAME
sleep - delay for a specified amount of time
DESCRIPTION
Pause for NUMBER seconds. SUFFIX may be 's' for seconds (the default), 'm' for minutes, 'h' for hours or 'd' for days. Unlike most implementations that require NUMBER be an
integer, here NUMBER may be an arbitrary floating point number. Given two or more arguments, pause for the amount of time specified by the sum of their values.
A proper way to your solution would be:-
#!/bin/bash
maxAttempts=0
maxCounter=2 # Number of attempts can be controlled by this variable
while [ "$maxAttempts" -lt "$maxCounter" ]; do
if ps -ef | grep -v grep | grep "tomcat7" > /dev/null
then
echo "tomcat7 service running, "
exit 1
else
maxAttempts=$((maxAttempts+1))
sleep 5m # The script waits for 5 minutes before exiting with error code '-1'
fi
done
exit -1
The if condition works when the ps -ef | grep -v grep | grep "tomcat7" returns a command success error code for which the condition passes. > /dev/null to suppress all standard outpur(stdout, stderr) to /dev/null so that we can work only with the exit codes of the command provided.

Run tail -f for a specific time in bash script

I need a script that will run a series of tail -f commands and output them into a file.
What I need is for tail -f to run for a certain amount of time to grep specific words. The reason it's a certain amount of time is because some of these values don't show up right away as this is a live log.
How can I run something like this for let's say 20 seconds, output the grep command and then continue on to the next command?
tail -f /example/logs/auditlog | grep test
Thanks
timeout 20 tail -f /example/logs/auditlog | grep test
tail -f /example/logs/auditlog | grep test &
pid=$!
sleep 20
kill $pid
What about this:
for (( N=0; $N < 20 ; N++)) ; do tail -f /example/logs/auditlog | grep test ; sleep 1 ; done
EDIT: I misread your question, sorry. You want something like this:
tail -f /example/logs/auditlog | grep test
sleep 20

Bash free command stops working

I am trying to do following:
Get output of free -mo , take the 2nd line and log it to a file every 30 seconds.
When I run
$free -mo -s 30
It runs and displays output every 30 seconds.
But when I run
$ free -mo -s 30 | head -2 | tail -1
It runs only once. I am not able to figure out what is wrong.
free Manual says free -s 30 run the command every 30 seconds.
head -2 returns only the first 2 lines of output then quits. tail -1 returns the last line, then quits. When any program quits in a pipeline, it kills the entire pipeline, so free is stopped when head and tail finish.
Use free -mo -s 30 &> test.txt &
This will take all of the output from the free command and output it to test.txt and run it in the background.
Try
free -mos 30 | grep 'Mem:' >yourlog.txt
(but you might be better considering something like sar to capture this kind of data - it can also reports lots of other things - just postpone the filtering/extraction until you generate a resport from the data).
Will Hartung is right. Instead do this:
while true; do free -mo | head -2 | tail -1; sleep 30; done
Thanks to your answers. I was trying to monitor memory utilization of a process. I think I got it.
START_TIME=$(date);
cd /data;
INPUT_DATA=$1;
CODE_FILE=$2;
TIMES=$3;
echo "$START_TIME" > "$CODE_FILE.freeMemory_$TIMES.log";
free -mo -s 30 >> "$CODE_FILE.freeMemory_$TIMES.log" &
freepid=$!;
sleep 1m;
#echo "PID generated for free command -- $freepid";
START_TIME=$(date);
i=0;
while [ $i -le $TIMES ]
do
sh runCode.sh $CODE_FILE "output.csv" $INPUT_DATA;
i=`expr $i + 1`
done
END_TIME=$(date);
echo "process started at $START_TIME and ended at $END_TIME " ;
sleep 1m;
kill -9 $freepid;
END_TIME=$(date);
echo "$END_TIME" >> "$CODE_FILE.freeMemory_$TIMES.log";

Start a command, count lines of output after 10 seconds, then either restart it or let it run

I have an interesting situation I am trying to script. I have a program that outputs 26,000 lines after 10 seconds when it starts successfully. Otherwise I have to kill it and start it again. I tried doing something like this:
test $(./long_program | wc -l) -eq 26000 && echo "Started successfully"
but that only works if the program finishes running. Is there a clever way to watch the output stream of a command and make decisions accordingly? I'm at a loss, not quite sure even how to start searching for this. Thanks!
What about
./long_program > mylogfile &
pid=$!
sleep 10
# then test on mylogfile length and kill $pid if needed
count=0
until [ $count -eq 26000 ]; do
killall ./longrun
#start in background
./longrun >output.$$ &
sleep 10
count=$(wc -l output.$$ |awk '{print $1}')
done
echo "done"
#disown so it continues after current login quits
disown -h

Resources