Why does piping into BASH command groups sometimes work? - linux

I have used the following command for a while to keep headers on ps output.
ps aux | { head -1; grep root; }
The output will look something like the following.
USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
root 142 0.0 0.0 1234567 2520 ?? Ss 3:14AM 0:08.03 /usr/sbin/notifyd
root 55 0.0 0.0 7890123 2460 ?? Ss 3:14AM 0:01.94 /usr/sbin/syslogd
...
However, when used with other command line programs the output is not as expected.
Take the following df example.
df -h
Outputs the following.
Filesystem Size Used Avail Use% Mounted on
/dev/disk1s1 466G 103G 362G 22% /
/dev/disk1s4 466G 1.1G 362G 1% /blah/blah/blah
Using df in a similar syntax as the above example with ps.
df -h | { head -1; grep disk1; }
Outputs the following.
Filesystem Size Used Avail Use% Mounted on
The expectation is that the output would look essentially the same as the straight df -h command.
Why does this differ from ps?
I feel that knowing these differences will help me understand BASH processing more completely.
Thank you!

It's because head is buffering its input. It reads into a large buffer from the pipe, then starts extracting lines from that buffer. After it has read and printed the first N lines, it exits. Then grep starts reading from the pipe. But anything that head already read into its buffer is not available.
The reason it seems to work with ps is because it produces lots of output, which doesn't fit into this buffer. grep is then able to process the rest of the output. But I think if you check carefully you'll see that the result is incomplete.
The output of df is much smaller, it all fits into the buffer that head uses, so there's nothing left for grep to process.
The buffer size is probably something like 4K characters.
You can do what you want with awk:
df -h | awk 'NR == 1 || /disk1/'
ps aux | awk 'NR == 1 || /root/'
NR is the line number, so this prints the line if it's the first line or it matches the regexp.

Sed(1) can also be used to filter output in this case:
ps aux | sed -n '1p; /root/p'
-n: Don't echo line of input to standard output after all commands have been applied to it.
1p; "address" of line1, with 'p' to print pattern space
/root/p; "address" of /regexp/ matching "root", with 'p' to print pattern space
Alternative:
ps aux | sed '1p; /root/p; d;'
Some systems may require ps -aux i.t. dash (-) to prefix options. Linux and *BSD systems do not (cannot be sure how macOS behaves, I don't have such a system to check out).

Related

How can I *only* get the number of bytes available on a disk in bash?

df does a great job for an overview. But what if I want to set a variable in a shell script to the number of bytes available on a disk?
Example:
$ df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda 1111111111 2222222 33333333 10% /
tmpfs 44444444 555 66666666 1% /dev/shm
But I just want to return 33333333 (bytes available on /), not the whole df output.
You may get exact number of bytes with df:
df -B1 /
Filesystem 1B-blocks Used Available Use% Mounted on
/dev/mapper/cl-root 32893632512 13080072192 18119061504 42% /
You may use awk,
df | awk '$1=="/dev/sda"{print $4}'
Portably:
df -P /dev/sda1 | awk 'NR==2 {print $4}'
The -P option ensures that df will print output in the expected format, and will in particular not break the line after the device name even if it's long. Passing the device name as an argument to df removes any danger from parsing, such as getting information for /dev/sda10 when you're querying /dev/sda1. df -P just prints two lines, the header line (which you ignore) and the one data line where you print the desired column.
There is a risk that df will display a device name containing spaces, for example if the volume is mounted by name and the name contain spaces, or for an NFS volume whose remote mount point contains spaces. In this case, there's no fully portable way to parse the output of df. If you're confident that df will display the exact device name you pass to it (this isn't always the case), you can strip it:
df -P -- "$device" | awk -vn=${#device} 'NR==2 {$0 = substr($0, n+1); print $3}'
Only in Linux
df --output=avail
You can use an awk
df | grep sda | awk '{print $4}'
You can query disk status with stat as well. To query free blocks on filesystem mounted at /:
stat -f -c '%f' /
To get the result in bytes instead of blocks, you can use shell's arithmetic:
echo $((`stat -f -c '%f*%S' /`))
Very similar answers to the ones shown, but if you don't know the name of filesystem and are just looking for the total available space on the partition.
df -P --total | grep 'total' | awk '{print $4}'

How to execute command when df -h return 98% full

How to execute command when df -h return 98% full
I have a disk which is by the
/dev/sdb1 917G 816G 55G 94% /disk1
If its return 98% full, I would like to do the following
find . -size +80M -delete
How do I do it, I will run the shell script using cron
* * * * * sh /root/checkspace.sh
Execute df -h, pipe the command output to grep matching "/dev/sdb1", and process that line by awk, checking to see if the numeric portion of column 5 ($5 in awk terms) is larger than or equal to 98. Don't forget to check for the possibility that it's over 98.
You need to schedule your script, check the disk utilization, and if the utilization is about 98% then delete files.
For scheduling your script you can reference the Wikipedia Cron entry.
There is an example of using the find command to delete files on the Unix & Linux site:
"How to delete directories based on find output?"
For your test, you'll need test constructs and command substitution. Note that you'll use "backticks" for with sh, but for bash the $(...) form has superseded backticks for command substitution.
To get your disk utilization you could use:
df | grep -F "/dev/sdb1" | awk '{print $5}'
--That's a functional grep to get your specific disk, awk to pull out the 5th column, and tr with the delete flag to get rid of the percent sign.
And your test might look something like this:
if [ `df | grep -F "/dev/vda1" | awk '{print $5}' | tr -d %` -ge 98 ];
then echo "Insert your specific cleanup command here.";
fi
There are many ways to tackle the issue of course, but hope that helps!

How to get first record of top command in linux?

How to get first record of top command in linux by using below line of code
$ top -b|tee aorpprkd004.out| grep 'Cpu(s): | head -1'
Above is not working
This:
grep 'Cpu(s): | head -1'
Should probably be this:
grep 'Cpu(s):' | head -1
Note the quotes.
First up, you need to move the quotes since you don't want to be searching for the head command in the output. The text you're looking for is simply Cpu(s): with the output filtered through head.
Secondly, batch mode by default runs forever. If you're only going to be getting the first one anyway (as per your head -1 filter), you may as well explicitly limit it with the -n option so that it exits immediately it's done that:
$ top -b -n1 | tee aorpprkd004.out | grep 'Cpu(s):'
Cpu(s): 2.0% user, 2.5% system, 0.0% nice, 95.5% idle
Here with little change you can do it,
top -b|tee aorpprkd004.out| grep 'Cpu'

Unix/Linux extract remaining available space on mount point from df output

I am writing a script (perl) and I would like to get remaining KB on some mount point. Command df -k return more information than I need.
~ df -k /var
Filesystem kbytes used avail capacity Mounted on
/dev/vx/dsk/bootdg/var
8267957 5749576 **2435702** 71% /var
Is there some way to cut result with AWK, to get just available space. But it give me same result I run it on Linux or Unix.
In Perl:
my %df = map { $_ = [ split ]; $_->[-1] => $_ } `df -P`;
print "Free space for /var: $df{'/var'}[3]\n";
man df:
-P, --portability
use the POSIX output format
Try :
df -h | awk '{print $4}' //$4 should be free file I think, you can change the variable according to your requirements

linux RSS from ps RES from TOP

Linux : RedHat/Fedora
What is the difference between these memory values:
RES from top command
RSS from ps command
If you are talking about the difference between the RES column in top -p $(pidof process) and the RSS column in the ps aux | grep $(pidof process) command, there is no difference, as both the tools get this value from the /proc/$(pidof process)/stat file.
You can always cat /proc/$(pidof process)/status for a human readable format.

Resources