using wget to download log file every 5 mins & detect changes - linux

i am writing a bash script to accomplish the following.
script runs wget every five minutes to download a small log from a static url.
script uses diff to see if there are any new entries made to the log file (new entries are made at the end of log file).
if new log entries are found - extract the new entries to a new file, format them properly, send me an alert, return to #1.
if no new log entries are found, go back to #1.
wget "https://url-to-logs.org" -O new_log
if diff -q new_log old_log; then
echo "no new log entries to send."
else
echo "new log entries found, sending alert."
diff -u new_log old_log > new_entries
#some logic i have to take the output of "new_entries", properly format the text and send the alert.
rm -rf old_log new_entries
cp new_log old_log
rm -rf new_log
fi
there is one additional thing - every night at midnight the server hosting the logs deletes all entries and displays a blank file until new log entries are made for the new day.
i guess i could always run a cron job at midnight to run "rm -rf" and "touch" the old_log file, but curious if an easier way to do this exists.
thanks in advance for any/all input and help.

If your logs are not rotating - i.e. the old log is guaranteed to be the prefix of the new log, you can just use tail to get the new suffix - something like this:
tail -n+$(( $(wc -l old_log) + 1 )) new_log > new_entries
If there are no new lines in new_log, the new_entries file will be empty, which you can check using stat or some other way.
If your logs are rotating, you should first use grep to check if the last line from the old log exists in the new log, and if not - assume the entire new log is new:
if ! egrep -q "^$(tail -n1 old_log)\$" new_log; then cat new_log > new_entries; fi

Related

Unable to output script results with column/table formatting

Answered - previously titled 'Cron job for shell script not running'
I recently downloaded Speedtest onto my Raspberry Pi, and wrote a script to output the results in csv format to a CSV file.
I'm trying to do this regularly via a cron job, but for some reason, it won't execute the shell script as intended.
Here's the script below. I've commented/cut out a lot to try and find the issue
#!/bin/bash
# Commented out if statement detects presence of data file and creates one if it doesn't exist. Was going to adjust later to include variables/input options if I wanted to used script on alternate systems, but commented out while working on main issue.
file='/home/User/Documents/speedtestdata.csv'
# have tried this with and without quotes, does not seem to make a difference either way
#HEADERS='/usr/bin/speedtest-cli --csv-header'
SPEEDTEST='/usr/bin/speedtest-cli --csv'
# Used absolute path for the executable
#LOG=/home/User/scripts/testreclog.txt
#DATE=$( date )
# Was using the above to log steps of script running successfully with timestamp, commented out
#if [ ! -f $file ]
#then
# echo "Creating results file">>$LOG
# touch $file
# $HEADERS > $file
#fi
#echo "Running speedtest">>$LOG
$SPEEDTEST >> $file
#echo "Formatting results">>$LOG
#column -s, -t < $file
# this step was used to format the log file neatly
#echo "Time completed ",$DATE>>$LOG
And here's how the crontab currently looks
# Edit this file to introduce tasks to be run by cron.
#
# Each task to run has to be defined through a single line
# indicating with different fields when the task will be run
# and what command to run for the task
#
# To define the time you can provide concrete values for
# minute (m), hour (h), day of month (dom), month (mon),
# and day of week (dow) or use '*' in these fields (for 'any').
#
# Notice that tasks will be started based on the cron's system
# daemon's notion of time and timezones.
#
# Output of the crontab jobs (including errors) is sent through
# email to the user the crontab file belongs to (unless redirected).
#
# For example, you can run a backup of all your user accounts
# at 5 a.m every week with:
# 0 5 * * 1 tar -zcf /var/backups/home.tgz /home/
#
# For more information see the manual pages of crontab(5) and cron(8)
#
# m h dom mon dow command
*/5 * * * * /bin/bash /home/User/scripts/testandrec.sh
# 2> /home/User/scripts/testrecerror.txt
# Was attempting to log errors to this file, nothing seen so commented out on a newline.
#* * * * * /home/User/scripts/testscript.sh test to verify cron works (it does)
I've added my scripts folder to the end of my path, but for some reason this only shows up when I'm using the Pi directly, when I ssh in I'm missing the scripts folder on the end.
However, given that I've used absolute path for everything I'm not sure why this would be an issue.
First I tested whether a simple Cron job would work, so I created testscript.sh, which simply returned 'Test' and a timestamp to a specific file and used the same shebang, and used the absolute paths, and functioned as intended.
I have checked systemctl for Cron, restarted Cron with sudo service cron restart and made sure a new line is in place in the crontab.
I have tried with and without /bin/bash in the cron tab entry, it seemingly hasn't made a difference.
I tried cd /home/User/scripts && ./testandrec.sh but no luck.
I changed the run time to every 5 then every 10 minutes, which has not worked.
I have noticed that when I ran the script manually with column -s, -t < $file left in, when cating the results file it is formatted as intended.
However, the next instance of when the cron job should run reverts this to CSV with a , as a delimitter, so clearly something is running.
To confuse matters further, I think the script may be firing once after restarting cron, and then not working when it should be running subsequently. When I leave the column line in, this appears to just revert the formatting, but if I comment it out it appears to run a speed test and append the results, but only once. However, I may be wrong on this and reproducing it
If I instead try 0 * * * * /usr/bin/speedtest-cli --csv >> /home/User/Documents/speedtestdata.csv && column -s, -t < /home/User/Documents/speedtestdata.csv, it appeared to perform/append speedtest but does not action the column command.
I would much rather neatly tie up the process in a shell script, however, rather than have the above which isn't very DRY code.
I've looked extensively, but none of the solutions I've found on this site or others have fixed the issue.
Any troubleshooting suggestions/help would be greatly appreciated.
Here you go - the solution is simple:
#!/bin/bash
# Commented out if statement detects presence of data file and creates one if it doesn't exist. Was going to adjust later to include variables/input options if I wanted to used script on alternate systems, but commented out while working on main issue.
file='/home/User/Documents/speedtestdata.csv'
# have tried this with and without quotes, does not seem to make a difference either way
#HEADERS='/usr/bin/speedtest-cli --csv-header'
SPEEDTEST='/usr/bin/speedtest-cli --csv'
# Used absolute path for the executable
#LOG=/home/User/scripts/testreclog.txt
#DATE=$( date )
# Was using the above to log steps of script running successfully with timestamp, commented out
#if [ ! -f $file ]
#then
# echo "Creating results file">>$LOG
# touch $file
# $HEADERS > $file
#fi
#echo "Running speedtest">>$LOG
$SPEEDTEST | column -s, -t >> $file
Just check the last line ;)

Bash script deletes files older than N days using lftp - but does not remove recursive directories and files

I have finally got this script working and it logs on to my remote FTP and removes files in a folder that are older than N days. I cannot however get it to remove recursive directories also. What can be changed or added to make this script remove files in subfolders as well as subfolders that are also older than N days? I have tried adding the -r function at a few places but it did not work. I think it needs to be added to where the script also builds the list of files to be removed. Any help would be greatly appreciated. Thank you in advance!
#!/bin/bash
# Simple script to delete files older than specific number of days from FTP.
# This script use 'lftp'. And 'date' with '-d' option which is not POSIX compatible.
# FTP credentials and path
FTP_HOST="xxxxxxxxxxxx"
FTP_USER="xxxxxx"
FTP_PASS="xxxxxxxxxxxxxxxxx"
FTP_PATH="/directadmin"
# Full path to lftp executable
LFTP=`which lftp`
# Enquery days to store from 1-st passed argument or strictly hardcode it, uncomment one to use
STORE_DAYS=${1:? "Usage ${0##*/} X, where X - count of daily archives to store"}
# STORE_DAYS=7
function removeOlderThanDays() {
# Make some temp files to store intermediate data
LIST=`mktemp`
DELLIST=`mktemp`
# Connect to ftp get file list and store it into temp file
${LFTP} << EOF
open ${FTP_USER}:${FTP_PASS}#${FTP_HOST}
cd ${FTP_PATH}
cache flush
cls -q -1 --date --time-style="+%Y%m%d" > ${LIST}
quit
EOF
# Print obtained list, uncomment for debug
# echo "File list"
# cat ${LIST}
# Delete list header, uncomment for debug
# echo "Delete list"
# Let's find date to compare
STORE_DATE=$(date -d "now - ${STORE_DAYS} days" '+%Y%m%d')
while read LINE; do
if [[ ${STORE_DATE} -ge ${LINE:0:8} && "${LINE}" != *\/ ]]; then
echo "rm -f \"${LINE:9}\"" >> ${DELLIST}
# Print files which are subject to deletion, uncomment for debug
#echo "${LINE:9}"
fi
done < ${LIST}
# More debug strings
# echo "Delete list complete"
# Print notify if list is empty and exit.
if [ ! -f ${DELLIST} ] || [ -z "$(cat ${DELLIST})" ]; then
echo "Delete list doesn't exist or empty, nothing to delete. Exiting"
exit 0;
fi
# Connect to ftp and delete files by previously formed list
${LFTP} << EOF
open ${FTP_USER}:${FTP_PASS}#${FTP_HOST}
cd ${FTP_PATH}
$(cat ${DELLIST})
quit
I have addressed this sort of thing a few times.
How to connect to a ftp server via bash script?
Provide commands automatically to ftp in bash script
Bash FTP upload - events to log
Better to use scp and/or ssh when you can, especially if you can set up passwordless access with public keys. Otherwise, I recommend a more robust language like Python or Perl that lets you check the return codes of these steps individually and respond accordingly.

How to redirect both OUT and ERR to one file and only ERR to another

Hi expertsI want commands out and err is appended to one file, like this command > logOutErr.txt 2>&1
but I also want that err is also appended to another command 2> logErrOnly.txt
# This is a non working redirection
exec 1>> logOutErr.txt 2>> logOutErr.txt 2>> logErrOnly.txt
# This should be in Out log only
echo ten/two: $((10/2))
# This should be in both Out and Out+Err log files
echo ten/zero: $((10/0))
I understand than the last redirect 2>> overrides the preceding ...so what? tee? but how?
I have to do this once at the beginning of the script, without modifying the rest of the script (because it is dynamically generated and any modification is too complicated)
Please don't answer only with links to the theory, I have already spent two days reading everything with no good results, I would like a working example
Thanks
With the understanding that you lose ordering guarantees when doing this:
#!/usr/bin/env bash
exec >>logOutErr.txt 2> >(tee -a logErrOnly.txt)
# This should be in OutErr
echo "ten/two: $((10/2))"
# This should be in Err and OutErr
echo "ten/zero: $((10/0))"
This works because redirections are processed left-to-right: When tee is started, its stdout is already pointed to logOutErr.txt, so it appends to that location after first writing to logErrOnly.txt.

How to parse last value from CSV every five minutes?

I need to run script to parse the value from csv every five minutes but should not parse the all the values all the time it should be from the last point.
Really just putting some flesh on #hek2mgl's suggestion and implementing it for you.
I store the last known length of the logfile in another file called lastlength.
#!/bin/bash
LOGFILE=somelog.csv
LASTLEN=lastlength
# Pre-set seek to start of file...
seek=0
# ... but overwrite if there was a previously seen value
[ -f lastlength ] && seek=$(cat lastlength)
echo DEBUG: Starting from offset $seek
# Update last seen length into file - parameters to "stat" will differ on Linux
stat -f "%Dz" "$LOGFILE" > "$LASTLEN"
# Get last line of file starting from previous position
dd if="$LOGFILE" bs=$seek skip=1 2> /dev/null | tail -1
I am using OSX, so if you are using Linux, the parameters to the stat command in the second to last line will be different, probably
stat -c%s "$LOGFILE" > "$LASTLEN"
I'll leave you to put it into your crontab so it gets called every 5 minutes.

Rollover shell script

Assuming a shell script(commands.sh) with few commands.
I need to write a script which sends the output of commands executed by commands.sh to a file f1.csv
if file size exceeds 1MB then the output flowing should go to file f2.csv
if the file size exceeds 1 mb again here,the output flowing should go to file f3.csv
if f3.csv exceeds the size 1mb,then the older f1 should be deleted and again new file f1 should be created,
output flowing should be to written to f1. This process should go on .
I can write the crontab file, just the shell script is a bit tricky
I have been experimenting:
#!/usr/bin/env bash
PREFIX="f"
# Maximum size after which you want a new file in bytes
MAX_SIZE=1048576
LAST_FILE=`ls "$prefix"*.csv | tail -1`
# Check if file exists and if it does not, create it.
if [[ -z "$LAST_FILE" ]]
then
LAST_FILE=$PREFIX"1.csv"
touch $LAST_FILE
fi
LAST_FILE_NO=`echo $LAST_FILE | sed s/$PREFIX/''/ | sed s/.csv/''/`
LAST_FILE_SIZE=`stat -c %s $LAST_FILE`
if [ `stat -c %s $LAST_FILE` -lt 200 ]
then
`/bin/sh ./sam.sh >> $LAST_FILE`
else
UPCOMING_FILE_NO=$((LAST_FILE_NO+1))
`/bin/sh ./sam.sh >> $PREFIX$UPCOMING_FILE_NO.csv`
fi
help is appreciated guys.
EDIT: Have got the secondary shell script to work too...
Now if anyone could help me with resetting after 3 files are done and starting from f1.
thanks
It sounds like you'd be better off using logrotate, depending on how your script is running. If you are running 'commands.sh' on a cron, you can have logrotate rotate out the logs. There is a good guide on logrotate here:
http://linuxers.org/howto/howto-use-logrotate-manage-log-files
If your commands.sh isn't going to be on a cron, meaning it's not a regular time interval that triggers it, you could manually set up a log rotation at the beginning of your script. I once had to do something similar. I found this guide really useful:
http://wazem.blogspot.com/2013/11/simple-bash-log-rotate-function.html

Resources