Block User Agent when it is a number - htaccess - .htaccess

I've been receiving a lot of visits to my site from bad bots.
The pattern is this:
190.204.58.162 - - [20/Oct/2014:16:46:54 +0200] "GET / HTTP/1.0" 200 318 mysite.com "-" "881087" "-"
201.243.204.1 - - [20/Oct/2014:16:46:54 +0200] "GET / HTTP/1.0" 200 318 mysite.com "-" "442762" "-"
200.109.59.218 - - [20/Oct/2014:16:46:54 +0200] "GET / HTTP/1.0" 200 318 mysite.com "-" "717724" "-"
113.140.25.4 - - [20/Oct/2014:16:46:54 +0200] "GET / HTTP/1.1" 200 318 mysite.com "-" "360319" "-"
183.136.221.6 - - [20/Oct/2014:16:46:54 +0200] "GET / HTTP/1.1" 200 318 mysite.com "-" "989851" "-"
195.154.78.122 - - [20/Oct/2014:16:46:54 +0200] "GET / HTTP/1.0" 200 318 mysite.com "-" "122984" "-"
59.151.103.52 - - [20/Oct/2014:16:46:54 +0200] "GET / HTTP/1.1" 200 318 mysite.com "-" "375843" "-"
Different IP and different user-agent.
However, the user-agent is always a numeric and normally it is 6 characters long.
For example on the first line, the user-agent is "881087" instead of being something like "Chrome", "Opera", "Safari", etc.
Does anyone know how to block it via .htaccess?

Sure can block that depends on what platform php or .net.
Personally I would use isnumeric on the User Agent. If its numeric you might use return from jsp die(); in php or response.end for .net.
As far as htaccess you might try a regex on the user agent.
Please let me know if you want the exact script for any of the above.

Related

Ignoring requests to images etc. when greping server logs

I'm looking to pull out various metrics from some server logs. The first is the total number of requests to just pages, not images, CSS files etc.
So I want to include requests like:
140.77.167.177 - - [01/Apr/2016:22:40:09 +1100] "GET /bad-credit-loans/abc/ HTTP/1.1" 200 7532 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
but ignore requests like:
158.165.213.180 - - [01/Apr/2016:23:00:55 +1100] "GET /assets/img/lenders/png/insurance.png HTTP/1.1" 200 17866 "https://www.example.au/lp/tradie-loans/?utm_source=facebook&utm_medium=cpc&utm_content=mobilead&utm_campaign=abcs/" "Mozilla/5.0 (Linux; Android 5.1.1; SM-G920I Build/LMY47X; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/48.0.2564.106 Mobile Safari/537.36 [FB_IAB/FB4A;FBAV/70.0.0.22.83;]"
grep "GET " | wc -l will get me all requests; how to I disregard those that are in a range (*.png, .css, .jpg and .js), and how do I extend this to ignore any file?
You can do:
grep -Ev '\.(png|jpg|css|js)' file.log

Using echo and grep together

I want to use echo and grep statement together. I have tried most of the thing but couldn't get the exact output
as I want.
aa=$(grep -A100000 "2010-03-24" log.txt|grep "ORA")
echo "Ip-Address|Directory Name|${aa}" > output.txt
I am grepping date because I want all the lines after current date and then grep "ORA" from it. There are other ways but according to my log file this is most suitable way.
I am getting the output like this.
10.46.162.86|ASD----Exception|2010-03-24 07 ORA-00001 - 80 -
173.45.230.59
2010-03-24 07:00:47 ORA-00942 - 80 - 173.45.230.59
2010-03-24 07:01:15 ORA-00001 - 80 - 173.45.230.59
2010-03-24 07:02:17 ORA-12849 - 80 - 173.45.230.59
2010-03-24 07:05:09 ORA-00001 - 80 - 173.45.230.59
The ideal output should be like
10.46.162.86|ASD----Exception|2010-03-24 07 ORA-00001 - 80 -
173.45.230.59
10.46.162.86|ASD----Exception|2010-03-24 07:00:47 ORA-00942 - 80 -
173.45.230.59
10.46.162.86|ASD----Exception|2010-03-24 07:01:15 ORA-00001 - 80 -
173.45.230.59
10.46.162.86|ASD----Exception|2010-03-24 07:02:17 ORA-12849 - 80 -
173.45.230.59
10.46.162.86|ASD----Exception|2010-03-24 07:05:09 ORA-00001 - 80 -
173.45.230.59
I am fetching ORA from log files from different directories.
Input is like
2010-03-22 07:00:47 ZZZZC941948879 RUFFLES 222.222.222.222 ORA-00001 -
80 - 98.88.35.133 HTTP/1.1 Mozilla/5.0+(Windows;+U;+Windows+NT+9.0;+en-
US;+rv:1.9.2.2)
2010-03-22 07:00:47 ZZZZC941948879 RUFFLES 222.222.222.222 GET
/2009/10/yep-twitter-down.ht
2010-03-22 07:00:48 ZZZZC941948879 RUFFLES 222.222.222.222 GET
/img/input-bg.jpg - 80 - 98.88.35.133 HTTP/1.1 Mozilla/5.0+
(Windows;+U;+Windows+NT+9.0;+en-
US;+rv:1.9.2.2)+Gecko/20100319+Firefox/3.9.2
2010-03-23 07:00:48 ZZZZC941948879 RUFFLES 222.222.222.222 ORA-00001 -
80 - 98.88.35.133 HTTP/1.1 Mozilla/5.0+(Windows;+U;+Windows+NT+9.0;+en-
US;+rv:1.9.2.2)+Gecko/20100319+Firefox/3.9.2
2010-03-23 07:00:48 ZZZZC941948879 RUFFLES 222.222.222.222 GET
/img/topnav-about.jpg - 80 - 98.88.35.133 HTTP/1.1 Mozilla/5.0+
(Windows;+U;+Windows+NT+9.0;+en-US;+rv:1.9.2.2)+Gecko/20100319
2010-03-23 07:00:48 ZZZZC941948879 RUFFLES 222.222.222.222 GET
/img/entry-hr.jpg - 80 - 98.88.35.133 HTTP/1.1 Mozilla/5.0+
(Windows;+U;+Windows+NT+9.0;+en-US;+rv:1.9.2.2)+Gecko/20100319+Firefox
2010-03-23 07:00:48 ZZZZC941948879 RUFFLES 222.222.222.222 ORA-00001 -
80 - 98.88.35.133 HTTP/1.1 Mozilla/5.0+(Windows;+U;+Windows+NT+9.0;+en-
US;+rv:1.9.2.2)+Gecko/20100319+Firefox/3.9.2
2010-03-24 07:00:48 ZZZZC941948879 RUFFLES 222.222.222.222 GET
/img/header-bg.jpg - 80 - 98.88.35.133 HTTP/1.1 Mozilla/5.0+
(Windows;+U;+Windows+NT+9.0;+en-US;+rv:1.9.2.2)+Gecko/20100319
2010-03-24 07:00:48 ZZZZC941948879 RUFFLES 222.222.222.222 GET
/img/bullet.gif - 80 - 98.88.35.133 HTTP/1.1 Mozilla/5.0+
(Windows;+U;+Windows+NT+9.0;+en-US;+rv:1.9.2.2)+Gecko/20100319+Firefox
2010-03-24 07:00:49 ZZZZC941948879 RUFFLES 222.222.222.222 ORA-00001 -
80 - 98.88.35.133 HTTP/1.1 Mozilla/5.0+(Windows;+U;+Windows+NT+9.0;+en-
US;+rv:1.9.2.2)+Gecko/20100319+Firefox/3.9.2
2010-03-24 07:00:49 ZZZZC941948879 RUFFLES 222.222.222.222 GET /img/bg-
module.jpg - 80 - 98.88.35.133 HTTP/1.1 Mozilla/5.0+
(Windows;+U;+Windows+NT+9.0;+en-US;+rv:1.9.2.2)+Gecko/20100319
2010-03-24 07:00:50 ZZZZC941948879 RUFFLES 222.222.222.222 ORA-00942 -
80 - 98.88.35.133 HTTP/1.1 Mozilla/5.0+(Windows;+U;+Windows+NT+9.0;+en-
US;+rv:1.9.2.2)+Gecko/20100319+Firefox/3.9.2
2010-03-24 07:00:50 ZZZZC941948879 RUFFLES 222.222.222.222 GET /img/bg-
sidebarul.jpg - 80 - 98.88.35.133 HTTP/1.1 Mozilla/5.0+
(Windows;+U;+Windows+NT+9.0;+en-US;+rv:1.9.2.2)+Gecko/20100319
2010-03-24 07:00:50 ZZZZC941948879 RUFFLES 222.222.222.222 ORA-00001 -
80 - 98.88.35.133 HTTP/1.1 Mozilla/5.0+(Windows;+U;+Windows+NT+9.0;+en-
US;+rv:1.9.2.2)+Gecko/20100319+Firefox/3.9.2
2010-03-24 07:00:51 ZZZZC941948879 RUFFLES 222.222.222.222 ORA-00942 -
80 - 98.88.35.133 HTTP/1.1 Mozilla/5.0+(Windows;+U;+Windows+NT+9.0;+en-
US;+rv:1.9.2.2)+Gecko/20100319+Firefox/3.9.2
The problem here is when I am doing the grep operation it fetches 100 or more lines depending upon the exception and I am able to append the Ip-Address and node name to one line only.
Also, the IP-Address and node name are generated at run time.
Please do suggest a way to get the desired output.
Thanks.
Since I just know that special characters are going to show up in the directory names, I'd prefer awk over sed for this to avoid code injection problems:
grep -A100000 "2010-03-24" log.txt | awk -v prefix="IP-Address|Directory name|" '/ORA/ { print prefix $0 }' > output.txt
The relevant part is
awk -v prefix="IP-Address|Directory name|" '/ORA/ { print prefix $0 }'
With -v prefix=value, a variable named prefix with the given value is made known to awk, and /ORA/ { print prefix $0 } instructs awk to process all lines that match the regex ORA by printing prefix followed by the line (which is $0).
#etanreisner gave you the answer.
One way:
try:
grep -A100000 "2010-03-24" log.txt|grep "ORA" |
while read aa
do
echo "Ip-Address|Directory Name|${aa}"
done > output.txt

Can't increment a 0-padded number past 8 in busybox sh

this is the code I am using to save files from a camera and name them from 0001 onward. The camera is running Busybox, and it has an ash shell inside.
The code is based on a previous answer by Charles Duffy here.
#!/bin/sh
# Snapshot script
cd /mnt/0/foto
sleep 1
set -- *.jpg # put the sorted list of picture namefiles on argv ( the number of files on the list can be requested by echo $# )
while [ $# -gt 1 ]; do # as long as there's more than one...
shift # ...some rows are shifted until only one remains
done
if [ "$1" = "*.jpg" ]; then # If cycle to determine if argv is empty because there is no jpg file present in the dir. #argv is set so that following cmds can start the sequence from 0 on.
set -- snapfull0000.jpg
else
echo "Piu' di un file jpg trovato."
fi
num=${1#*snapfull} # $1 is the first row of $#. The alphabetical part of the filename is removed.
num=${num%.*} # removes the suffix after the name.
num=$(printf "%04d" "$(($num + 1))") # the variable is updated to the next digit and the number is padded (zeroes are added)
# echoes for debug
echo "variabile num="$num # shows the number recognized in the latest filename
echo "\$#="$# # displays num of argv variables
echo "\$1="$1 # displays the first arg variable
wget http://127.0.0.1/snapfull.php -O "snapfull${num}.jpg" # the snapshot is requested to the camera, with the sequential naming of the jpeg file.
This is what I get on the cmd line during the script operation. I manually ran the script nine times, but after the saving of file snapfull0008.jpg, as you can see in the last lines, files are named only snapfull0000.jpg.
# ./snap4.sh
variable num=0001
$#=1
$1=snapfull0000.jpg
Connecting to 127.0.0.1 (127.0.0.1:80)
127.0.0.1 127.0.0.1 - [05/Dec/2014:20:22:22 +0000] "GET /snapfull.php HTTP/1.1" 302 0 "-" "Wget"
snapfull0001.jpg 100% |*******************************| 246k --:--:-- ETA
# ./snap4.sh
More than a jpg file found.
variable num=0002
$#=1
$1=snapfull0001.jpg
Connecting to 127.0.0.1 (127.0.0.1:80)
127.0.0.1 127.0.0.1 - [05/Dec/2014:20:22:32 +0000] "GET /snapfull.php HTTP/1.1" 302 0 "-" "Wget"
snapfull0002.jpg 100% |*******************************| 249k --:--:-- ETA
# ./snap4.sh
More than a jpg file found.
variable num=0003
$#=1
$1=snapfull0002.jpg
Connecting to 127.0.0.1 (127.0.0.1:80)
127.0.0.1 127.0.0.1 - [05/Dec/2014:20:22:38 +0000] "GET /snapfull.php HTTP/1.1" 302 0 "-" "Wget"
snapfull0003.jpg 100% |*******************************| 248k --:--:-- ETA
# ./snap4.sh
More than a jpg file found.
variable num=0004
$#=1
$1=snapfull0003.jpg
Connecting to 127.0.0.1 (127.0.0.1:80)
127.0.0.1 127.0.0.1 - [05/Dec/2014:20:22:43 +0000] "GET /snapfull.php HTTP/1.1" 302 0 "-" "Wget"
snapfull0004.jpg 100% |*******************************| 330k --:--:-- ETA
# ./snap4.sh
More than a jpg file found.
variable num=0005
$#=1
$1=snapfull0004.jpg
Connecting to 127.0.0.1 (127.0.0.1:80)
127.0.0.1 127.0.0.1 - [05/Dec/2014:20:22:51 +0000] "GET /snapfull.php HTTP/1.1" 302 0 "-" "Wget"
snapfull0005.jpg 100% |*******************************| 308k --:--:-- ETA
# ./snap4.sh
More than a jpg file found.
variable num=0006
$#=1
$1=snapfull0005.jpg
Connecting to 127.0.0.1 (127.0.0.1:80)
127.0.0.1 127.0.0.1 - [05/Dec/2014:20:22:55 +0000] "GET /snapfull.php HTTP/1.1" 302 0 "-" "Wget"
snapfull0006.jpg 100% |*******************************| 315k --:--:-- ETA
# ./snap4.sh
More than a jpg file found.
variable num=0007
$#=1
$1=snapfull0006.jpg
Connecting to 127.0.0.1 (127.0.0.1:80)
127.0.0.1 127.0.0.1 - [05/Dec/2014:20:22:59 +0000] "GET /snapfull.php HTTP/1.1" 302 0 "-" "Wget"
snapfull0007.jpg 100% |*******************************| 316k --:--:-- ETA
# ./snap4.sh
More than a jpg file found.
variable num=0008
$#=1
$1=snapfull0007.jpg
Connecting to 127.0.0.1 (127.0.0.1:80)
127.0.0.1 127.0.0.1 - [05/Dec/2014:20:23:04 +0000] "GET /snapfull.php HTTP/1.1" 302 0 "-" "Wget"
snapfull0008.jpg 100% |*******************************| 317k --:--:-- ETA
# ./snap4.sh
More than a jpg file found.
variable num=0000
$#=1
$1=snapfull0008.jpg
Connecting to 127.0.0.1 (127.0.0.1:80)
127.0.0.1 127.0.0.1 - [05/Dec/2014:20:23:10 +0000] "GET /snapfull.php HTTP/1.1" 302 0 "-" "Wget"
snapfull0000.jpg 100% |*******************************| 318k --:--:-- ETA
What could be the cause of the sequence stopping after file number 8?
The problem is that leading 0s cause a number to be read as octal.
In bash, using $((10#$num)) will force decimal. Thus:
num=$(printf "%04d" "$((10#$num + 1))")
To work with busybox ash, you'll need to strip the 0s. One way to do this which will work even in busybox ash:
while [ "${num:0:1}" = 0 ]; do
num=${num:1}
done
num=$(printf '%04d' "$((num + 1))")
See the below transcript showing use (tested with ash from busybox v1.22.1):
$ num=0008
$ while [ "${num:0:1}" = 0 ]; do
> num=${num:1}
> done
$ num=$(printf '%04d' "$((num + 1))")
$ echo "$num"
0009
If your shell doesn't support even the baseline set of parameter expansions required by POSIX, you could instead end up using:
num=$(echo "$num" | sed -e 's/^0*//')
num=$(printf '%04d' "$(($num + 1))")
...though this would imply that your busybox was built with a shell other than ash, a decision I would strongly suggest reconsidering.

How to parse in linux sniffer results on the fly?

I want to sort and calculate how much clients downloaded files (3 types) from my server.
I installed tshark and ran followed command that should capture GET requests:
`./tshark 'tcp port 80 and (((ip[2:2] - ((ip[0]&0xf)<<2)) - ((tcp[12]&0xf0)>>2)) != 0)' -R'http.request.method == "GET"'`
so sniffer starts to work and every second I get new row, here is a result:
0.000000 144.137.136.253 -> 192.168.4.7 HTTP GET /pids/QE13_593706_0.bin HTTP/1.1
8.330354 1.1.1.1 -> 2.2.2.2 HTTP GET /pids/QE13_302506_0.bin HTTP/1.1
17.231572 1.1.1.2 -> 2.2.2.2 HTTP GET /pids/QE13_382506_0.bin HTTP/1.0
18.906712 1.1.1.3 -> 2.2.2.2 HTTP GET /pids/QE13_182406_0.bin HTTP/1.1
19.485199 1.1.1.4 -> 2.2.2.2 HTTP GET /pids/QE13_302006_0.bin HTTP/1.1
21.618113 1.1.1.5 -> 2.2.2.2 HTTP GET /pids/QE13_312106_0.bin HTTP/1.1
30.951197 1.1.1.6 -> 2.2.2.2 HTTP GET /nginx_status HTTP/1.1
31.056364 1.1.1.7 -> 2.2.2.2 HTTP GET /nginx_status HTTP/1.1
37.578005 1.1.1.8 -> 2.2.2.2 HTTP GET /pids/QE13_332006_0.bin HTTP/1.1
40.132006 1.1.1.9 -> 2.2.2.2 HTTP GET /pids/PE_332006.bin HTTP/1.1
40.407742 1.1.2.1 -> 2.2.2.2 HTTP GET /pids/QE13_452906_0.bin HTTP/1.1
what I need to do to store results type and count like /pids/*****.bin in to other file.
Im not strong in linux but sure it can be done with 1-3 rows of script.
Maybe with awk but I don't know what is the technique to read result of sniffer.
Thank you,
Can't you just grep the log file of your webserver?
Anyway, to extract the lines of captured http traffic relative to your server files, just try with
./tshark 'tcp port 80 and \
(((ip[2:2] - ((ip[0]&0xf)<<2)) - ((tcp[12]&0xf0)>>2)) != 0)' \
-R'http.request.method == "GET"' | \
egrep "HTTP GET /pids/.*.bin"

Question about how to make a filter using script

I'm trying to make a filter on script to make this happen:
Before:
123.125.66.126 - - [05/Apr/2010:09:18:12 -0300] "GET / HTTP/1.1" 302 290
66.249.71.167 - - [05/Apr/2010:09:18:13 -0300] "GET /robots.txt HTTP/1.1" 404 290
66.249.71.167 - - [05/Apr/2010:09:18:13 -0300] "GET /~leonardo_campos/IFBA/Web_Design_Aula_17.pdf HTTP/1.1" 404 324
After:
[05/Apr/2010:09:18:12 -0300] / 302 290
[05/Apr/2010:09:18:13 -0300] /robots.txt 404 290
[05/Apr/2010:09:18:13 -0300] /~leonardo_campos/IFBA/Web_Design_Aula_17.pdf 404 324
If someone could help it would be great...
Thanks in advance !
Supporting all HTTP methods:
sed 's#.*\(\[[^]]*\]\).*"[A-Z]* \(.*\) HTTP/[0-9.]*" \(.*\)#\1 \2 \3#'
It seems a perfect work for "sed".
You can easily construct a pair of "s" replacement patterns to remove the unwanted pieces of lines.
sed is your friend here, with regexps.
sed 's/^\(\[.*\]\) "GET \(.*\) .*" \(.*\)$/\1 \2 \3/'
if your file structure is always like that, you can just use fields. no need complex regex
$ awk '{print $4,$5,$7,$9,$10}' file
[05/Apr/2010:09:18:12 -0300] / 302 290
[05/Apr/2010:09:18:13 -0300] /robots.txt 404 290
[05/Apr/2010:09:18:13 -0300] /~leonardo_campos/IFBA/Web_Design_Aula_17.pdf 404 324

Resources