PCAP management solution that takes into dynamic traffic? - linux

I'm currently trying to find a good solution to a PCAP storage problem I'm encountering. Right now, I have a LVM drive on RHEL that stores PCAPs taken from netsniff. As you can imagine, this drive fills up quickly and somewhat unpredictably depending on how much traffic flows across my network.
Currently, I'm using an inelegant solution to my problem. I'm using a custom shell script that checks the percentage of the disk remaining then removes the 100 oldest captures by invoking logrotate. This is set to run every 30 minutes or so.
#!/bin/bash
declare -i ALERT
ALERt=80
df -H | grep -vE '^Filesystem|tmpfs|udev' | awk '{print $1 " " $5}' | while read output;
do
partition=$(echo $output | awk '{print $2}' | cut -d '%' -f1)
if [ $partition -ge $ALERT ]; then
echo "Running Out of Space" $partition"% remaining"
logrotate -v
else
echo "Plenty of Space" $partition"% remaining"
fi
done
I was wondering if there was a better solution out there? Something that might take into account fluctuations in traffic and adjust the offloading of pcaps accordingly.

What about compressing all the pcaps? This would probably save a lot of space.

Related

How can I query the number of the virtual desktop on which the bash script is running in Linux Mint via bash?

Environment:
Linux Mint, Cinnamon desktop manager, with multiple workspaces=virtual desktops, e.g. 4.
Bash script
What is known:
How to determine the number of workspaces:
wmctrl -d | wc -l
What I need:
Get the number of virtual desktops the bash script is running on with a pure bash as var (like with grep, not awk or similar) and echo the var.
With awk (imho still the most appropriate choice for the task at hand):
nr_of_active_workspace=$(wmctrl -d | awk '/\*/{print $NF}')
echo $nr_of_active_workspace
Or a pure bash solution:
nr_of_active_workspace=$(wmctrl -d | while read -r line; do [[ $line =~ '*' ]] && echo ${line: -1} ; done)
echo $nr_of_active_workspace
You can use POSIX shell features and the xprop(1) command to get both details with no other external utilities.
To get the ID number of the current/active desktop:
curdesk=$(xprop -root -notype _NET_CURRENT_DESKTOP)
curdesk="${curdesk##* }"
To get the count/quantity of desktops defined:
deskcnt=$(xprop -root -notype _NET_NUMBER_OF_DESKTOPS)
deskcnt="${deskcnt##* }"
Both depend on xprop(1) giving the answer in the form "foo = 0" (separated by spaces), and use shell pattern matching parameter expansion to match the longest substring ending in space, and remove it, leaving only the last token (the value after the equals sign).
Note that desktops are numbered from 0 (zero), so the count will be a number one higher than the ID number of the last desktop.
This should work with any window manager that adheres to the Extended Window Manager Hints (EWMH) specification (which is practically all of them, these days):
https://specifications.freedesktop.org/wm-spec/1.3/ar01s03.html
Follow a solution which need awk:
nr_of_active_workspace=$(wmctrl -d | grep "*" | awk '{print $11}')
echo $nr_of_active_workspace
It can be a solution without need awk, are possible on other way.
Based on answer of KamilCuk, its possible to output on follow way the line which is including the number of the active desktop:
nr_of_active_desktop=activedesktop=$(wmctrl -d | grep "*" | rev | cut -d ' ' -f1)
echo $nr_of_active_desktop

How to execute command when df -h return 98% full

How to execute command when df -h return 98% full
I have a disk which is by the
/dev/sdb1 917G 816G 55G 94% /disk1
If its return 98% full, I would like to do the following
find . -size +80M -delete
How do I do it, I will run the shell script using cron
* * * * * sh /root/checkspace.sh
Execute df -h, pipe the command output to grep matching "/dev/sdb1", and process that line by awk, checking to see if the numeric portion of column 5 ($5 in awk terms) is larger than or equal to 98. Don't forget to check for the possibility that it's over 98.
You need to schedule your script, check the disk utilization, and if the utilization is about 98% then delete files.
For scheduling your script you can reference the Wikipedia Cron entry.
There is an example of using the find command to delete files on the Unix & Linux site:
"How to delete directories based on find output?"
For your test, you'll need test constructs and command substitution. Note that you'll use "backticks" for with sh, but for bash the $(...) form has superseded backticks for command substitution.
To get your disk utilization you could use:
df | grep -F "/dev/sdb1" | awk '{print $5}'
--That's a functional grep to get your specific disk, awk to pull out the 5th column, and tr with the delete flag to get rid of the percent sign.
And your test might look something like this:
if [ `df | grep -F "/dev/vda1" | awk '{print $5}' | tr -d %` -ge 98 ];
then echo "Insert your specific cleanup command here.";
fi
There are many ways to tackle the issue of course, but hope that helps!

Simpler way of extracting text from file

I've put together a batch script to generate panoramas using the command line tools used by Hugin. One interesting thing about several of those tools is they allow multi-core usage, but this option has to be flagged within the command.
What I've come up with so far:
#get the last fields of each line in the file, initialize the line counter
results=$(more /proc/cpuinfo | awk '{print ($NF)}')
count=0
#loop through the results till the 12th line for cpu core count
for result in $results; do
if [ $count == 12 ]; then
echo "Core Count: $result"
fi
count=$((count+1))
done
Is there a simpler way to do this?
result=$(awk 'NR==12{print $NF}' /proc/cpuinfo)
To answer your question about getting the first/last so many lines, you could use head and tail,e.g. :
cat /proc/cpuinfo | awk '{print ($NF)}' | head -12 | tail -1
But instead of searching for the 12th line, how about searching semantically for any line containing cores. For example, some machines may have multiple cores, so you may want to sum the results:
cat /proc/cpuinfo | grep "cores" | awk '{s+=$NF} END {print s}'
count=$(getconf _NPROCESSORS_ONLN)
see getconf(1) and sysconf(3) constants.
According to the Linux manpage, _SC_NPROCESSORS_ONLN "may not be standard". My guess is this requires glibc or even a Linux system specifically. If that doesn't work, I'd probably take looking at /sys/class/cpuid (perhaps there's something better?) over parsing /proc/cpuinfo. None of the above are completely portable.
There are many ways:
head -n 12 /proc/cpuinfo | tail -1 | awk -F: '{print $2}'
grep 'cpu cores' /proc/cpuinfo | head -1 | awk -F: '{print $2}'
and so on.
But I must note that you take only the information from the first section of /proc/cpuinfo and I am not sure that that is what you need.
And if the cpuinfo changes its format ;) ? Maybe something like this will be better:
cat /proc/cpuinfo|sed -n 's/cpu cores\s\+:\s\+\(.*\)/\1/p'|tail -n 1
And make sure to sum the cores. Mine has got like 12 or 16 of them ;)
unsure what you are trying to do and why what ormaaj said above wouldn't wouldn't work either. my instinct based on your description would have been much simpler along the lines of.
grep processor /proc/cpuinfo | wc -l

Bash script to get server health

Im looking to monitor some aspects of a farm of servers that are necessary for the application that runs on them.
Basically, Im looking to have a file on each machine, which when accessed via http (on a vlan), with curl, that will spit out information Im looking for, which I can log into the database with dameon that sits in a loop and checks the health of all the servers one by one.
The info Im looking to get is
<load>server load</load>
<free>md0 free space in MB</free>
<total>md0 total space in MB</total>
<processes># of nginx processes</processes>
<time>timestamp</time>
Whats the best way of doing that?
EDIT: We are using cacti and opennms, however what Im looking for here is data that is necessary for the application that runs on these servers. I dont want to complicate it by having it rely on any 3rd party software to fetch this basic data which can be gotten with a few linux commands.
Make a cron entry that:
executes a shell script every few minutes (or whatever frequency you want)
saves the output in a directory that's published by the web server
Assuming your text is literally what you want, this will get you 90% of the way there:
#!/usr/bin/env bash
LOAD=$(uptime | cut -d: -f5 | cut -d, -f1)
FREE=$(df -m / | tail -1 | awk '{ print $4 }')
TOTAL=$(df -m / | tail -1 | awk '{ print $2 }')
PROCESSES=$(ps aux | grep [n]ginx | wc -l)
TIME=$(date)
cat <<-EOF
<load>$LOAD</load>
<free>$FREE</free>
<total>$TOTAL</total>
<processes>$PROCESSES</processes>
<time>$TIME</time>
EOF
Sample output:
<load> 0.05</load>
<free>9988</free>
<total>13845</total>
<processes>6</processes>
<time>Wed Apr 18 22:14:35 CDT 2012</time>

Give the mount point of a path

The following, very non-robust shell code will give the mount point of $path:
(for i in $(df|cut -c 63-99); do case $path in $i*) echo $i;; esac; done) | tail -n 1
Is there a better way to do this in shell?
Postscript
This script is really awful, but has the redeeming quality that it Works On My Systems. Note that several mount points may be prefixes of $path.
Examples
On a Linux system:
cas#txtproof:~$ path=/sys/block/hda1
cas#txtproof:~$ for i in $(df -a|cut -c 57-99); do case $path in $i*) echo $i;; esac; done| tail -1
/sys
On a Mac OSX system
cas local$ path=/dev/fd/0
cas local$ for i in $(df -a|cut -c 63-99); do case $path in $i*) echo $i;; esac; done| tail -1
/dev
Note the need to vary cut's parameters, because of the way df's output differs; using awk solves this, but even awk is non-portable, given the range of result formatting various implementations of df return.
Answer
It looks like munging tabular output is the only way within the shell, but
df -P "$path" | tail -1 | awk '{ print $NF}'
based on ghostdog74's answer, is a big improvement on what I had. Note two new issues: firstly, df $path insists that $path names an existing file, the script I had above doesn't care; secondly, there are no worries about dereferencing symlinks. This doesn't work if you have mount points with spaces in them, which occurs if one has removable media with spaces in their volume names.
It's not difficult to write Python code to do the job properly.
df takes the path as parameter, so something like this should be fairly robust;
df "$path" | tail -1 | awk '{ print $6 }'
In theory stat will tell you the device the file is on, and there should be some way of mapping the device to a mount point.
For example, on linux, this should work:
stat -c '%m' $path
Always been a fan of using formatting options of a program, as it can be more robust than manipulating output (eg if the mount point has spaces). GNU df allows the following:
df --output=target "$path" | tail -1
Unfortunately there is no option I can see to prevent the printing of a header, so the tail is still required.
i don't know what your desired output is, therefore this is a guess
#!/bin/bash
path=/home
df | awk -v path="$path" 'NR>1 && $NF~path{
print $NF
}'
Using cut with -c is not really reliable, since the output of df will be different , say a 5% can change to 10% and you will miss some characters. Since the mount point is always at the back, you can use fields and field delimiters. In the above, $NF is the last column which is the mount point.
I would take the source code to df and find out what it does besides calling stat as Douglas Leeder suggests.
Line-by-line parsing of the df output will cause problems as those lines often look like
/dev/mapper/VOLGROUP00-logical--volume
1234567 1000000 200000 90% /path/to/mountpoint
With the added complexity of parsing those kinds of lines as well, probably calling stat and finding the mountpoint is less complex.
If you want to use only df and awk to find the filesystem device/remote share or a mount point and they include spaces you can cheat by defining the field separator of awk to be a regular expression that matches the format of the numeric sizes used to display total size, used space, available space and capacity percentage. By defining those columns as the field separator you are then left with $1 representing the filesystem device/remote share and $NF representing the mount path.
Take this for example:
[root#testsystem ~] df -P
Filesystem 1024-blocks Used Available Capacity Mounted on
192.168.0.200:/NFS WITH SPACES 11695881728 11186577920 509303808 96% /mnt/MOUNT WITH SPACES
If you attempt to parse this with the quick and dirty awk '{print $1}' or awk '{print $NF}' you'll only get a portion of the filesystem/remote share path and mount path and that's no good. Now make awk use the four numeric data columns as the field separator.
[root#testsystem ~] df -P "/mnt/MOUNT WITH SPACES/path/to/file/filename.txt" | \
awk 'BEGIN {FS="[ ]*[0-9]+%?[ ]+"}; NR==2 {print $1}'
192.168.0.200:/NFS WITH SPACES
[root#testsystem ~] df -P "/mnt/MOUNT WITH SPACES/path/to/file/filename.txt" | \
awk 'BEGIN {FS="[ ]*[0-9]+%?[ ]+"}; NR==2 {print $NF}'
/mnt/MOUNT WITH SPACES
Enjoy :-)
Edit: These commands are based on RHEL/CentOS/Fedora but should work on just about any distribution.
Just had the same problem. If some mount point (or the mounted device) is sufficent as in my case You can do:
DEVNO=$(stat -c '%d' /srv/sftp/testconsumer)
MP=$(findmnt -n -f -o TARGET /dev/block/$((DEVNO/2**8)):$((DEVNO&2**8-1)))
(or split the hex DEVNO %D with /dev/block/$((0x${DEVNO:0:${#DEVNO}-2})):$((0x${DEVNO:2:2})))
Alternatively the following loop come in to my mind, out of ideas why I cannot find proper basic command..
TARGETPATH="/srv/sftp/testconsumer"
TARGETPATHTMP=$(readlink -m "$TARGETPATH")
[[ ! -d "$TARGETPATHTMP" ]] && TARGETPATHTMP=$(dirname "$TARGETPATH")
TARGETMOUNT=$(findmnt -d backward -f -n -o TARGET --target "$TARGETPATHTMP")
while [[ -z "$TARGETMOUNT" ]]
do
TARGETPATHTMP=$(dirname "$TARGETPATHTMP")
echo "$TARGETPATHTMP"
TARGETMOUNT=$(findmnt -d backward -f -n -o TARGET --target "$TARGETPATHTMP")
done
This should work always but is much more then I expect for such simple task?
(Edited to use readlink -f to allow for non existing files, -m or -e for readlink could be used instead if more components might not exists or all components must exists.)
mount | grep "^$path" | awk '{print $3}'
I missed this when I looked over prior questions: Python: Get Mount Point on Windows or Linux, which says that os.path.ismount(path) tells if path is a mount point.
My preference is for a shell solution, but this looks pretty simple.
I use this:
df -h $path | cut -f 1 -d " " | tail -1
Linux has this, which will avoid problem with spaces:
lsblk -no MOUNTPOINT ${device}
Not sure about BSD land.
f () { echo $6; }; f $(df -P "$path" | tail -n 1)

Resources