How to send process progress information from a long running linux command - linux

I am planning to write a linux command which is going to process (with a custom logic) a very big file. Once this command is run, it will take hours to finish the task.
What is the best way to give eta output?
What about the idea of writing a 100 byte progress status file? In specific stages I can write bytes to the file. Say at 40% I can write 40 bytes to the file. Once the file size reaches 100 bytes size it means the process is finished. So to monitor the progress we have to check only the size of this file.
Is there any general way of handling such progress information? I don't want to include too much logic in my program for displaying the progress information. I am looking for a simple solution.
Any suggestions?

The normal way would be to set up a signal handler like dd does, and then you send the process a SIGUSR1 or somesuch using the kill command and it catches the signal and outputs a status message. Try
man dd
So, on my iMac, for example, dd uses SIGINFO, so
# Move a whole load of nothing to nowhere in the background
dd if=/dev/zero of=/dev/null &
[1] 11232
# No output, till I want some a few seconds later
kill -SIGINFO 11232
12875835+0 records in
12875834+0 records out
6592427520 bytes transferred in 9.380158 secs (702805581 bytes/sec)
# No more output, till I want some a few more seconds later
kill -SIGINFO 11232
19163866+0 records in
19163865+0 records out
9811898880 bytes transferred in 14.015095 secs (700095068 bytes/sec)
A signal handler is pretty easy to set up, even in a shell script. For example:
#!/bin/bash
trap "echo Progress report..." SIGINT
echo "My process id is $$"
while : ; do
sleep 10 # Do something - admittedly little :-)
done

I'm afraid you can't do this nicely in Linux but if the result of your program can be an output stream like in unzip then you just can use pv - Pipe Viewer command. You can redirect the output into pv input and it will show you the progress.
In Windows you can use Taskbar Extensions API with methods SetProgressState and SetProgressValue.

Related

resource linux script, disable/flush extra input

Good afternoon everyone,
I have created a resource monitoring tool that works fairly well.
Pulls CPU average usage
Pulls average Memory usage
Even calculates % of NIC throughput (if you have a 1Gb NIC, it'll show percent that is being processed at a time)... and yes I know this is more of a rough estimate/theoretical max limit of a NIC.
I am having one issue with my script though. Portion of the code I am experiencing my issue with is below (I converted some to pseudo code for simplicity).
COUNT=1
read -rsp "When you are ready to begin, please press any key" -n1
echo "processing"
sleep 3
while [ ${COUNT} = "1" ; do
read -t 1 -n 1
if [$? = 0 ] ; then
exit 0
else
`Resource command` > ${cpulog} file for future graphs
`Resource command` > ${memlog} file for future graphs
`Resource command` > ${network} log file for future graphs
`etc`
fi
done
Basically, you hit any key to start the program, and whenever you press any key on the keyboard after the program has started (While loop), the program stops recording information and moves on.
Now this script works and does everything I need it to do. The issue I have come across is when you "press any key".
Note that there are two points in the script waiting for a key press.
If I were to press any key more than once at the first point, the second key input would get processed by my read -t 1 -n 1 command (at the second point), and thus fail to run my resource pulls. Since that happens immediately, the script fails.
Basically, I am trying to figure out if there is a way I can shutdown input after that first key stroke for a limited time while I retrieve a limited amount of data, or flush any input that was given prior to hitting my read -t 1 command. Thank you.
The script can call a small program, written in C, perl or similar, which calls the FIONREAD ioctl on stdin.
Then read the unexpected extra characters to be thrown away with a read call, see Perl Cookbook to determine unread bytes
You can actually enter the perl code on command line with perl -e. to keep it all within the bash script.
At the "processing line," add the following loop:
# Eat any remaining input
while read -t 1 -n 1
do
# Do nothing here
:
done
# Continue processing now that all input has been consumed...
It will add about 1 second delay to startup (more if the user is sitting there pressing keys), but otherwise does what you want.

Bash: How to record highest memory/cpu consumption during execution of a bash script?

I have a function in a bash script that executes a long process called runBatch. Basically runBatch takes a file as an argument and loads the contents into a db. (runBatch is just a wrapper function for a database command that loads the content of the file)
My function has a loop that looks something like the below, where I am currently recording start time and elapsed time for the process to variables.
for batchFile in `ls $batchFilesDir`
do
echo "Batch file is $batchFile"
START_TIME=$(($(date +%s%N)/1000000))
runBatch $batchFile
ELAPSED_TIME=$(($(($(date +%s%N)/1000000))-START_TIME))
IN_SECONDS=$(awk "BEGIN {printf \"%.2f\",${ELAPSED_TIME}/1000}")
done
Then I am writing some information on each batch (such as time, etc.) to a table in a html page I am generating.
How would I go about recording the highest memory/cpu usage while the runBatch is running, along with the time, etc?
Any help appreciated.
Edit: I managed to get this done. I added a wrapper script around this script that runs this script in the background. I pass it's PID with $! to another script in the wrapper script that monitors the processes CPU and Memory usage with top every second. I compile everything into a html page at the end when the PID is no longer alive. Cheers for the pointers.
You should be able to get the PID of the process using $!,
runBatch $batchFile &
myPID=$!
and then you can run a top -b -p $myPID to print out a ticking summary of CPU.
Memory:
cat /proc/meminfo
Next grep whatever you want,
Cpu, it is more complicated - /proc/stat expained
Average load:
cat /proc/loadavg
For timing "runBatch" use
time runBatch
like
time sleep 10
Once you've got the pid of your process (e.g. like answered here) you can use (with watch(1) & cat(1) or grep(1)) the proc(5) file system, e.g.
watch cat /proc/$myPID/stat
(or use /proc/$myPID/status or /proc/$myPID/statm, or /proc/$myPID/maps for the address space, etc...)
BTW, to run batch jobs you should consider batch (and you might look into crontab(5) to run things periodically)

When executing "tar" on a directory with over a billion files, the process stayed in D status

I was doing some experiments to learn more about Linux process states.
So, there's a directory(named big_dir) with over a billion files in it(the directory has many sub-directories recursively), and then I run tar -cv big_dir | ssh anotherServer "tar -xv -C big_dir" and found out via executing top that, the tar process stays in D status. Meanwhile, the tar command keeps outputting the paths of the files.
I know that, the process was in D status because it was doing disk I/O, but why didn't its status keep switching between D and R? Printing the file names under the directory must have used some CPU computation, isn't it? Otherwise how could the find command know that it should print something?
If I run dd if=/dev/zero of=/dev/null, then the dd process status kept in R status from the top output. But why wasn't it in D status? Wasn't it doing I/O all the time?
/dev/zero and /dev/null are pseudo-devices. So there's no physical device behind them.
If I do
dd if=/dev/zero of=/tmp/zeroes
then top does show me dd in the D status. However it does spend a lot of it's time in R (in CPU time). top will simply sample the process table and consequently you may need to watch it for some time in order to see transient states.
I suspect for your tar example above that the amount of time outputting to stdout is negligible compared to the disk time. Note also that outputting to stdout will also involve the windowing system writing and whilst it's doing that the process will be sleeping. e.g. I'm running yes right now, and the majority of the work is being performed by my X server. The yes process is sleeping for most of the time I'm watching it (via top)
I'm sure your tar process SOMETIMES goes to R, but it's probably for a very short period of time, because it doesn't do that much - particularly since you are sending the data through a network. Unless that's a 10Gb/s network card [and everything else to "anotherServer" is really working at 1GB/s], this will be the slowest part of the chain. ssh itself will take a little bit of overhead as it encrypts the data.
It probably takes tar a few microseconds to ask for some data from the disk, and a few milliseconds for the disk to move its head and read the actual data. So you have about 0.1% of the time in "R", the rest is in "D".

Logging VMStat data to file

I am trying to create some capacity planning reports and one of the requrements is to have info on Memory usage for a few Unix Servers.
Now my knowledge of Unix is very low. I usually just log on and run a few scripts.
But for this report I need to gather VMStat data and produce reports based on previous the previous weeks data broken down by hour which is an average of Vmstat data taken every 10 seconds.
So first question: is VMStat logging on by default and if so what location on the server is the data output to?
If not how can I set this up?
Thanks
vmstat is a command that you run.
To generate one week of Virtual Memory stats spaced out at ten second intervals (less the last one) is 60,479 10 second intervals
So the command you want is:
nohup vmstat 10 604879 > myvmstatfile.dat &
This will make a very big file myvmstatfile.dat
EDIT: RobKielty (The & will put this job in the background, the nohup will prevent the task from hanging up when you logout of the command shell. If you ran this command it would be prudent to monitor the disk partition to which this file was being written to. Use df -h /path/to/directory/where/outputfile/resides to monitor the disk space usage.)
I have no idea what you need to do with the data, so I can't help you there.
Create a crontab entry (crontab -e) like this
0 0 * * 0 /path/to/my/vmstat_script.sh
The file vmstat_script.sh will contain the follow bash script commands.
#!/bin/bash
# vmstat_script.sh
vmstat 10 604879 > myvmstatfile.dat
mv myvmstatfile.dat myvmstatfile.dat.`date +%Y-%m-%d`
This will create one file per week with a name like myvmstatfile.dat.2012-07-01
The command I use for monitoring the Linux vm metrics is below:
nohup vmstat 10 720| (while read; do echo "$(date +%d-%m-%Y" "%H:%M:%S) $REPLY"; done) >> nameofLogfile.log
Here nohup is used for running the process in background.
It will run for 2 hours with interval of 10 secs.
This is the best command for generating graphs and reports as timestamp will also be included in logs along with different metrics, so that we can filter the logs accordingly.

Time taken by `less` command to show output

I have a script that produces a lot of output. The script pauses for a few seconds at point T.
Now I am using the less command to analyze the output of the script.
So I execute ./script | less. I leave it running for sufficient time so that the script would have finished executing.
Now I go through the output of the less command by pressing Pg Down key. Surprisingly while scrolling at the point T of the output I notice the pause of few seconds again.
The script does not expect any input and would have definitely completed by the time I start analyzing the output of less.
Can someone explain how the pause of few seconds is noticable in the output of less when the script would have finished executing?
Your script is communicating with less via a pipe. Pipe is an in-memory stream of bytes that connects two endpoints: your script and the less program, the former writing output to it, the latter reading from it.
As pipes are in-memory, it would be not pleasant if they grew arbitrarily large. So, by default, there's a limit of data that can be inside the pipe (written, but not yet read) at any given moment. By default it's 64k on Linux. If the pipe is full, and your script tries to write to it, the write blocks. So your script isn't actually working, it stopped at some point when doing a write() call.
How to overcome this? Adjusting defaults is a bad option; what is used instead is allocating a buffer in the reader, so that it reads into the buffer, freeing the pipe and thus letting the writing program work, but shows to you (or handles) only a part of the output. less has such a buffer, and, by default, expands it automatically, However, it doesn't fill it in the background, it only fills it as you read the input.
So what would solve your problem is reading the file until the end (like you would normally press G), and then going back to the beginning (like you would normally press g). The thing is that you may specify these commands via command line like this:
./script | less +Gg
You should note, however, that you will have to wait until the whole script's output loads into memory, so you won't be able to view it at once. less is insufficiently sophisticated for that. But if that's what you really need (browsing the beginning of the output while the ./script is still computing its end), you might want to use a temporary file:
./script >x & less x ; rm x
The pipe is full at the OS level, so script blocks until less consumes some of it.
Flow control. Your script is effectively being paused while less is paging.
If you want to make sure that your command completes before you use less interactively, invoke less as less +G and it will read to the end of the input, you can then return to the start by typing 1G into less.
For some background information there's also a nice article by Alexander Sandler called "How less processes its input"!
http://www.alexonlinux.com/how-less-processes-its-input
Can I externally enforce line buffering on the script?
Is there an off the shelf pseudo tty utility I could use?
You may try to use the script command to turn on line-buffering output mode.
script -q /dev/null ./script | less # FreeBSD, Mac OS X
script -c "./script" /dev/null | less # Linux
For more alternatives in this respect please see: Turn off buffering in pipe.

Resources