I have a process that dumps millions of lines to the console while it runs. I'd like to run this in a cronjob but to avoid sending multi-MB mails, I'd like to restrict the output in the case of a success (exit == 0) to 0 lines and in case of an error (exit != 0) to the last 20 lines.
Any ideas to achieve this with little effort? Maybe a few lines of perl or a smart use of standard tools?
Just pipe output to tail, either directly in the crontab or in a wrapper script. e.g.
10 * * * * myprogram 2>&1 | tail -20
That'll always output the last 20 lines, success or not.
If you want no output on success and some on error, you can create a wrapper script that you call from cron e.g.
#!/bin/sh
myprogram 2>&1 | tail -20 >/tmp/myprogram.log
if [ $? != 0 ] ; then
echo "Failed!"
cat /tmp/myprogram.log
fi
rm /tmp/myprogram.log
Is the tail command a good fit for what you're trying to do? Maybe if the console output is also available in a file (using tee, maybe)?
Related
I need to parse 70 identically formatted files (different data), repeatedly, to process some information on demand from each file. I.e. (as a simplified example)...
find /dir -name "MYDATA.bam" | while read filename; do
dir=$(echo ${filename} | awk -F"/" '{ print $(NF-1)}')
ARRAY[$dir]=$(samtools view ${filename} | head -1)
done
Since it's 70 files, I wanted each samtools view command to run as an independent thread...so I didn't have to wait for each command to finish (each command takes around 1 second.) Something like...
# $filename will = "/dir/DATA<id#>/MYDATA.bam"
# $dir then = "DATA<id#>" as the ARRAY key.
find /dir -name "MYDATA.bam" | while read filename; do
dir=$(echo ${filename} | awk -F"/" '{ print $(NF-1)}')
command="$(samtools view ${filename} | head -1)
ARRAY[$dir]=$command &
done
wait # To get the array loaded
(... do stuff with $ARRAY...)
But I can't seem to find the syntax to get all the commands called in the background, but still have "result" receive the (correct) output.
I'd be running this on a slurm cluster, so I WOULD actually have 70 cores available to run each command independently (theoretically making that step take 1-2 seconds concurrently, instead of 70 seconds consecutively).
You can do this simply with GNU Parallel like this:
#!/bin/bash
doit() {
dir=$(echo "$1" | awk -F"/" '{print $(NF-1)}')
result=$(samtools view "$1" | head -1)
echo "$dir:$result"
}
# export doit() function for subshells of "parallel" to use
export -f doit
# find the files and pass, null-terminated, to GNU Parallel
find somewhere -name "MYDATA.bam" -print0 | parallel -0 doit {}
It will run one copy of samtools per CPU core you have available, but you can easily change that, with parallel -j 8 if you just want 8 at a time, for example.
If you want the outputs in order, use parallel -k ...
I am not familiar with slurm clusters, so you may have to read up on how to tell GNU Parallel about your nodes, or let it just run 8 at a time or however many cores your main node has.
Capturing the output of a process even when spawned in the background blocks the shell. Here a small example:
echo "starting to sleep in the background"
sleep 2 &
echo "some printing in the foreground"
wait
echo "done sleeping"
This will produce the following output:
starting to sleep in the background
some printing in the foreground
<2 second wait>
done sleeping
If however you capture like this:
echo "starting to sleep in the background"
output=$(sleep 2 &)
echo "some printing in the foreground"
wait
echo "done sleeping"
The following happens:
starting to sleep in the background
<2 second wait>
some printing in the foreground
done sleeping
The actual waiting happened on the assignment of the output. By the time the wait statement is reached there is no more background process and thus no waiting.
So one way would be to pipe the output into files and stitch them back together
after the wait. This is a bit awkward.
A simpler solution might be to use GNU Parallel, a tool that deals with
collecting the output of parallel processes. It works particularly well when the output is line based.
You should be able to do this with just Bash. This snippet show how you can run each command in the background and write the results to stdout. The inner loop reads in these results and adds them to your array. You'll probably have to tweak this to make it work.
while read -r dir && read -r data; do
ARRAY[$dir]="$data"
done < <(
# sub shell level one
find /dir -name "MYDATA.bam" | while read filename; do
(
# sub shell level two
# run each task in parallel, output will be in the following format
# "directory"
# "result"
# ...
dir=$(awk -F"/" '{ print $(NF-1)}' <<< "$filename")
printf "%s\n%s\n" \
"$dir" "$(samtools view "$filename" | head -1)"
) &
done
)
The key is that ( command; command ) & runs each command in a new sub shell in the background, so the top level shell can continue to the next task.
The < <(command) allows us to redirect the stdout of a subprocess to the stdin of another shell command. This is how we can read the results into our variable and have the variable be available later.
I have a program which accepts 2 prompts (y/n). For example:
stopprogram
do you want to stop the program (Y/N)? y
do you want to send an email to the admin about it (Y/N)? y
Now, I'd like to automate that using the 'at' command. the following works on Solaris but not on Linux RHEL:
at now +5 minutes << EOF
> for i in {1..2}
> do
> echo 'y'
> done | stopprogram
> EOF
commands will be executed using /usr/bin/bash
...
...
Any idea? Thanks!
Your problem may be because of the space between << and EOF.
Note that there is a special program yes for repeatedly outputting a line composed of all of its arguments. By default it outputs 'y'. It was created specially for forcing a scripted flow through those prompts.
Thus the short version of your command will look like this:
at now +5 minutes <<EOF
yes | stopprogram
EOF
I found the solution. This will work:
at now+5 min <<EOF
bash -l -c 'yes | stopprogram'
EOF
That's it!
I use screen to run a minecraft server .jar, and I would like to write a bash script to see if the most recent line has changed every five minutes or so. If it has, then the script would start from the beginning and make the check again in another five minutes. If not, it should kill the java process.
How would I go about getting the last line of text from a screen via a bash script?
If I have understand, you can redirect the output of your program in a file and work on it, with the operator >.
Try to run :
ls -l > myoutput.txt
and open the file created.
You want to use the tail command. tail -n 1 will give you the last line of the file or redirected standard output, while tail -f will keep the tail program going until you cancel it yourself.
For example:
echo -e "Jello\nPudding\nSkittles" | tail -n 1 | if grep -q Skittles ; then echo yes; fi
The first section simply prints three lines of text:
Jello
Pudding
Skittles
The tail -n 1 finds the last line of text ("Skittles") and passes that to the next section.
grep -q simply returns TRUE if your pattern was found or FALSE if not, without actually dumping or outputting anything to screen.
So the if grep -q Skittles section will check the result of that grep -q Skittles pattern and, if it found Skittles, prints 'yes' to the screen. If not, nothing gets printed (try replacing Skittles with Pudding, and even though it was in the original input, it never made it out the other end of the tail -n 1 call).
Maybe you can use that logic and output your .jar to standard output, then search that output every 5 minutes?
I am monitoring the new files created in a folder in linux. Every now and then I issue an "ls -ltr" in it. But I wish there was a program/script that would automatically print it, and only the latest entries. I did a short while loop to list it, but it would repeat the entries that were not new and it would keep my screen rolling up when there were no new files. I've learned about "watch", which does show what I want and refreshes every N seconds, but I don't want a ncurses interface, I'm looking for something like tail:
continuous
shows only the new stuff
prints in my terminal, so I can run it in the background and do other things and see the output every now and then getting mixed with whatever I'm doing :D
Summarizing: get the input, compare to a previous input, output only what is new.
Something that do that doesn't sound like such an odd tool, I can see it being used for other situations also, so I would expect it to already exist, but I couldn't find anything. Suggestions?
You can use the very handy command watch
watch -n 10 "ls -ltr"
And you will get a ls every 10 seconds.
And if you add a tail -10 you will only get the 10 newest.
watch -n 10 "ls -ltr|tail -10"
If you have access to inotifywait (available from the inotify-tools package if you are on Debian/Ubuntu) you could write a script like this:
#!/bin/bash
WATCH=/tmp
inotifywait -q -m -e create --format %f $WATCH | while read event
do
ls -ltr $WATCH/$event
done
This is a one-liner that won't give you the same information that ls does, but it will print out the filename:
inotifywait -q -m -e create --format %w%f /some/directory
This works in cygwin and Linux. Some of the previous solutions which write a file will cause the disk to thrash.
This script does not have that problem:
SIG=1
SIG0=SIG
while [ $SIG != 0 ] ; do
while [ $SIG = $SIG0 ] ; do
SIG=`ls -1 | md5sum | cut -c1-32`
sleep 10
done
SIG0=$SIG
ls -lrt | tail -n 1
done
I have a program which will be running continuously writing output to stdout, and I would like to continuously take the last set of output from it and write this to a file. Each time a new set of output comes along, I'd like to overwrite the same file.
An example would be the following command:
iostat -xkd 5
Every 5 seconds this prints some output, with a blank line after each one. I'd like the last "set" to be written to a file, and I think this should be achievable with something simple.
So far I've tried using xargs, which I could only get to do this with a single line, rather than a group of lines delimited by something.
I think it might be possible with awk, but I can't figure out how to get it to either buffer the data as it goes along so it can write it out using system, or get it to close and reopen the same output file.
EDIT: To clarify, this is a single command which will run continuously. I do NOT want to start a new command and read the output of that.
SOLUTION
I adapted the answer from #Bittrance to get the following:
iostat -xkd 5 | (while read r; do
if [ -z "$r" ]; then
mv -f /tmp/foo.out.tmp /tmp/foo.out
else
echo "$r" >> /tmp/foo.out.tmp
fi
done)
This was basically the same, apart from detecting the end of a "section" and writing to a temp file so that whenever an external process tried to read the output file it would be complete.
Partial answer:
./your_command | (while read r ; do
if ... ; then
rm -f foo.out
fi
echo $r >> foo.out
done)
If there is a condition (the ...) such that you know that you are receiving the first line of the "set" this would work.
Why not:
while :; do
iostat -xkd > FILE
sleep 5
done
If you're set on using awk the following writes each output from iostat to a numbered file:
iostat -xkd 5 | awk -v RS=$'\n'$'\n' '{ print >NR; close(NR) }'
Note: the first record is the header that iostat outputs.
Edit In reply to comment, I tested this perl solution to work:
#!/usr/bin/perl
use strict;
use warnings;
$|=1;
open(STDOUT, '>', '/tmp/iostat.running') or die $!;
while(<>)
{
seek (STDOUT, 0, 0) if (m/^$/gio);
print
}
So now you can
iostat -xkd 5 | myscript.pl&
And watch the snapshots:
watch -n10 cat /tmp/iostat.running
If you wanted to make sure that enough space is overwritten at the end (since the output length might vary slightly each time), you could print some padding at the end, like e.g.:
print "***"x40 and seek (STDOUT, 0, 0) if (m/^$/gio);
(snipped old answer text since apparently it was confusing people)