tail -f to several files piped to "while read" gives unexpected results - linux

I wrote a quick bash script to monitor several log files and send their contents to a Mattermost chat. This is the script:
#!/bin/bash
# HOOKS
hook1=...#Mattermost hook
hook2=...#Mattermost hook
hook3=...#Mattermost hook
date=$1
year=$2
unbuffer tail -f -n0 /var/log/$year/hook1_*$date.log \
/var/log/$year/hook2_*$date.log \
/var/log/$year/hook3_*$date.log | grep "ERROR" | while read -r msg
do
if [[ $msg = *"hook1"* ]]
then
hook=$hook1
elif [[ $msg = *"hook2"* ]]
then
hook=$hook2
else
hook=$hook3
fi
msg=$(echo "$msg" | sed -r "s/\"/\\\\\"/g")
curl -i -X POST --data-urlencode "payload={\"text\": \"$msg\"}" $hook
done
The sed part escapes quotes so I can replace the message into curl.
However, several problems arise during execution:
There seems to be 2 processes in htop after I run the script once.
Sometimes messages are not sent to Mattermost as they appear, but instead they come out in batches.
Messages sometimes appear cut and some content is lost.

Related

Multiple process curl command for urls to output to individual files

I am attempting to curl multiple urls in a bash command. Eventually I will be curling a large number of Urls so I am using xargs to use multiple processes to speed up the process.
My file consists of x number of URLs:
https://someurl.com
https://someotherurl.com
My issue comes when attempting to output the results to separate files named after the URLs I curl.
The bash command I have is:
xargs -P 5 -n 1 -I% curl -k -L % -0 % < urls.txt
When I run this I get 'Failed to create file https://someotherurl.com'
You cannot create a file with / in the filename. You could do it this way:
#!/bin/bash
while IFS= read -r line
do
echo "LINE: $line"
if [[ "$line" != "" ]]
then
filename="${line#https://}"
echo "FILENAME: $filename"
# YOUR CURL COMMAND HERE, USING $filename
fi
done < url.txt
it ignores empty lines
variable substitution is used to remove the https:// part of each URL
this will allow you to create the file
Note: if your URLs containt sub-directories, they must be removed as well.
Ex: you want to do https://www.exemple.com/some/sub/dir
The script I suggested here would try to create a file named "www.exemple.com/some/sub/dir". In this case, you could replace the / with _ using tr.
The script would become:
#!/bin/bash
while IFS= read -r line
do
echo "LINE: $line"
if [[ "$line" != "" ]]
then
filename=$(echo "$line" | tr '/' '_')
filename2=${filename#https:__}
echo "FILENAME: $filename2"
# YOUR CURL COMMAND HERE, USING $filename2
fi
done < url.txt
Because your question is ambiguous, I would assume:
You have a file urls.txt that contains URLs separated by LF.
You want to download all URLs by curl and use each URL as its filename.
Unfortunately, that's not possible because URL contains invalid characters like slash /. Alternatively, for this case, I would suggest you use Bsse64 safe mode to decode URL before saving to file based on RFC 3548.
After applying this requirement, your script would become like:
seq 100 | xargs -I# echo 'https://example.com?#' > urls.txt
xargs -P0 -L1 sh -c 'curl -SskL0 -o $(printf %s "$1" | uuencode -m /dev/stdout | sed "1d;\$d" | tr +/ -_) "$1"' sh < urls.txt

Trying to make a live /proc/ reader using bash script for live process monitoring

Im trying to make a little side project script to sit and monitor all of the /proc/ directories, for the most part I have the concept running and it works(to a degree). What im aiming for here is to scan through all the files and cat their status files and pull out the appropriate info, and then I would like to run this process in an infinite loop to give me live updates of when something is running on and dropping off of the scheduler. Right now every time you run the script, it will print 50+ blank lines and every single time it hits the proper regex it will print it correctly, but Im aiming for it to not roll down the screen the way it does. Any help at all would be appreciated.
regex="[0-9]"
temp=""
for f in /proc/*; do
if [[ -d $f && $f =~ /proc/$regex ]]; then
output=$(cat $f/status | grep "^State") #> /dev/null
process_id=$(cut -b 7- <<< $f)
state=$(cut -b 10-19 <<< $output)
tabs 4
if [[ $state =~ "(running)" ]]; then
echo -e "$process_id:$state\n" | sort >> temp
fi
fi
done
cat temp
rm temp````
To get the PID and status of running all processes, try:
awk -F':[[:space:]]*' '/State:/{s=$2} /Pid:/{p=$2} ENDFILE{if (s~/running/) print p,s; p="X"; s="X"}' OFS=: /proc/*/status
To get this output updated every second:
while sleep 1; do awk -F':[[:space:]]*' '/State:/{s=$2} /Pid:/{p=$2} ENDFILE{if (s~/running/) print p,s; p="X"; s="X"}' OFS=: /proc/*/status; done

Multiple scripts making rest calls interfering

So I am running into a problem with unix scripts that use curl to make rest calls. I have one script, that runs two other scripts inside of it.
cat example.sh
FILE="file1.txt"
RECIP="wilfred#blamagam.com"
rm -f $FILE
./script1.sh > $FILE
mail -s "subject" $RECIP < $FILE
RECIP="bob#blamagam.com"
rm -f $FILE
./script2.sh > $FILE
mail -s "subject" $RECIP < $FILE
exit 0
Each script makes rest calls to the same service. It is my understanding that script1.sh should completely finish before script2.sh is ran, however that is not the case. In the logs for the rest service I see a rest call from the second script in the middle of the first one still executing. The second script then fails because of this (it does not get any data returned).
I am modifying this process so I am not the one who originally wrote it. I am not seeing any forked processes, or background processes at all and I have been banging my head against the wall.
I do know that script2.sh works. Whenever script1.sh takes under a minute script2.sh works just fine, but more often than not script1.sh takes over a min, causing the second script to fail.
This is ran by a cron, and the contents of the files are mailed out, so I cant just default to running them manually. Any suggestions for what to look into would be much appreciated!
EDIT: Here is a high pseudo code example
script1.sh
ITEMS=`/usr/bin/curl -m 10 -k -u userName:passWord -L https://server/rest-service/rest?where=clause=value;clause2=value2&sel=field 2>/dev/null | sed s/<\/\?Attribute[^>]*>/\n/g | grep -v '^<' | grep -v '^$' | sed 's/ //g'`
echo "\n Subject for these metrics"
echo "$ITEMS"
Both scripts have lots of entries like this. There are 2 or 3 for loops but they are simple and I do not see any background processes being called. Its a large script so I could only provide a snippet. Could the rest call into pipes be causing an issue?
Edit:
Just tested this on my system and it seems to work.
cat example.sh
FILE="file1.txt"
RECIP="wilfred#blamagam.com"
rm -f "$FILE"
(./script1.sh > "$FILE") &
procscript1=$!
wait "$procscript1"
mail -s "subject" "$RECIP" < "$FILE"
RECIP="bob#blamagam.com"
rm -f "$FILE"
(./script2.sh > "$FILE") &
procscript2=$!
wait "$procscript2"
mail -s "subject" "$RECIP" < "$FILE"
exit 0
Put the script executions in the background with the &.
Get the process id's for each script execution.
Use the wait command to block until the execution is done.

How to check status of URLs from text file using bash shell script

I have to check the status of 200 http URLs and find out which of these are broken links. The links are present in a simple text file (say URL.txt present in my ~ folder). I am using Ubuntu 14.04 and I am a Linux newbie. But I understand the bash shell is very powerful and could help me achieve what I want.
My exact requirement would be to read the text file which has the list of URLs and automatically check if the links are working and write the response to a new file with the URLs and their corresponding status (working/broken).
I created a file "checkurls.sh" and placed it in my home directory where the urls.txt file is also located. I gave execute privileges to the file using
$chmod +x checkurls.sh
The contents of checkurls.sh is given below:
#!/bin/bash
while read url
do
urlstatus=$(curl -o /dev/null --silent --head --write-out '%{http_code}' "$url" )
echo "$url $urlstatus" >> urlstatus.txt
done < $1
Finally, I executed it from command line using the following -
$./checkurls.sh urls.txt
Voila! It works.
#!/bin/bash
while read -ru 4 LINE; do
read -r REP < <(exec curl -IsS "$LINE" 2>&1)
echo "$LINE: $REP"
done 4< "$1"
Usage:
bash script.sh urls-list.txt
Sample:
http://not-exist.com/abc.html
https://kernel.org/nothing.html
http://kernel.org/index.html
https://kernel.org/index.html
Output:
http://not-exist.com/abc.html: curl: (6) Couldn't resolve host 'not-exist.com'
https://kernel.org/nothing.html: HTTP/1.1 404 Not Found
http://kernel.org/index.html: HTTP/1.1 301 Moved Permanently
https://kernel.org/index.html: HTTP/1.1 200 OK
For everything, read the Bash Manual. See man curl, help, man bash as well.
What about to add some parallelism to the accepted solution. Lets modify the script chkurl.sh to be little easier to read and to handle just one request at a time:
#!/bin/bash
URL=${1?Pass URL as parameter!}
curl -o /dev/null --silent --head --write-out "$URL %{http_code} %{redirect_url}\n" "$URL"
And now you check your list using:
cat URL.txt | xargs -P 4 -L1 ./chkurl.sh
This could finish the job up to 4 times faster.
Herewith my full script that checks URLs listed in a file passed as an argument e.g. 'checkurls.sh listofurls.txt'.
What it does:
check url using curl and return HTTP status code
send email notifications when url returns other code than 200
create a temporary lock file for failed urls (file naming could be improved)
send email notification when url becoms available again
remove lock file once url becomes available to avoid further notifications
log events to a file and handle increasing log file size (AKA log
rotation, uncomment echo if code 200 logging required)
Code:
#!/bin/sh
EMAIL=" your#email.com"
DATENOW=`date +%Y%m%d-%H%M%S`
LOG_FILE="checkurls.log"
c=0
while read url
do
((c++))
LOCK_FILE="checkurls$c.lock"
urlstatus=$(/usr/bin/curl -H 'Cache-Control: no-cache' -o /dev/null --silent --head --write-out '%{http_code}' "$url" )
if [ "$urlstatus" = "200" ]
then
#echo "$DATENOW OK $urlstatus connection->$url" >> $LOG_FILE
[ -e $LOCK_FILE ] && /bin/rm -f -- $LOCK_FILE > /dev/null && /bin/mail -s "NOTIFICATION URL OK: $url" $EMAIL <<< 'The URL is back online'
else
echo "$DATENOW FAIL $urlstatus connection->$url" >> $LOG_FILE
if [ -e $LOCK_FILE ]
then
#no action - awaiting URL to be fixed
:
else
/bin/mail -s "NOTIFICATION URL DOWN: $url" $EMAIL <<< 'Failed to reach or URL problem'
/bin/touch $LOCK_FILE
fi
fi
done < $1
# REMOVE LOG FILE IF LARGER THAN 100MB
# alow up to 2000 lines average
maxsize=120000
size=$(/usr/bin/du -k "$LOG_FILE" | /bin/cut -f 1)
if [ $size -ge $maxsize ]; then
/bin/rm -f -- $LOG_FILE > /dev/null
echo "$DATENOW LOG file [$LOG_FILE] has been recreated" > $LOG_FILE
else
#do nothing
:
fi
Please note that changing order of listed urls in text file will affect any existing lock files (remove all .lock files to avoid confusion). It would be improved by using url as file name but certain characters such as : # / ? & would have to be handled for operating system.
I recently released deadlink, a command-line tool for finding broken links in files. Install with
pip install deadlink
and use as
deadlink check /path/to/file/or/directory
or
deadlink replace-redirects /path/to/file/or/directory
The latter will replace permanent redirects (301) in the specified files.
Example output:
if your input file contains one url per line you can use a script to read each line, then try to ping the url, if ping success then the url is valid
#!/bin/bash
INPUT="Urls.txt"
OUTPUT="result.txt"
while read line ;
do
if ping -c 1 $line &> /dev/null
then
echo "$line valid" >> $OUTPUT
else
echo "$line not valid " >> $OUTPUT
fi
done < $INPUT
exit
ping options :
-c count
Stop after sending count ECHO_REQUEST packets. With deadline option, ping waits for count ECHO_REPLY packets, until the timeout expires.
you can use this option as well to limit waiting time
-W timeout
Time to wait for a response, in seconds. The option affects only timeout in absense
of any responses, otherwise ping waits for two RTTs.
curl -s -I --http2 http://$1 >> fullscan_curl.txt | cut -d: -f1 fullscan_curl.txt | cat fullscan_curl.txt | grep HTTP >> fullscan_httpstatus.txt
its work me

Bash | curl | curls 2 URL's then stops

I am trying to write a simple bash script that will use a list from a text document and curl each URL that is on the list in order to see what the contents of each URL is. It allows me to cURL 2 sites and creates the text documents for the rest however it only downloads the first 2. I have already manage to write the script that pulls there IP's and places them in a seperate file using the grep command. At first i tried
#!/bin/bash
for var in `cat host.txt`; do
curl -s $var >> /tmp/ping/html/$var.html
done
I have tried with and without the silent switch. I then tried the following:
#!/bin/bash
for var in `head -2 host.txt`; do
curl $var >> /tmp/ping/html/$var.html
wait
done
for var in `head -4 host.txt | tail -2`; do
curl $var >> /tmp/ping/html/$var.html
done
This would try and do them all at the same time again stopping after 2
#!/bin/bash
for var in `head -2 host.txt`; do
curl $var >> /tmp/ping/html/$var.html
done
wait
for var in `head -4 host.txt | tail -2`; do
curl $var >> /tmp/ping/html/$var.html
done
This would do the same, I am new to bash scripting and only know some of the basics, any help would be appreciated
Start with the simple: verify that you are in fact iterating over the entire list:
# This is the recommended way to iterate over the file. See
# http://mywiki.wooledge.org/BashFAQ/001
while read -r var; do
echo "$var"
done < hosts.txt
Then add in the call to curl, checking its exit status
while read -r var; do
echo "$var"
curl "$var" >> /tmp/ping/html/$var.html || echo "curl failed: $?"
done < hosts.txt
You pipe into $var, which could result in a wrong filename, because of the two slashes in the URL. Additionally i would quote the URL. For Example it works with the basename of the URL.
#!/bin/bash
for var in `cat host.txt`; do
name=$(basename $var)
curl -v -s "$var" -o "/tmp/ping/html/$name.html"
done
You may also want to skip blank lines and Comments (#)
#!/bin/bash
file="host.txt"
curl="curl"
while read -r line
do
[[ $line = \#* ]] || [[ -z "${line}" ]] && continue
filename=$(basename $line)
$curl -s "$line" >> "/tmp/ping/html/$filename.html"
done < "$file"

Resources