AWK command to print the pattern as below from pattern provided - linux

I have a pattern as below:
Pattern in a unix file:
{1.11.111.111 - 2017-10-06T00:00:00+00:00 111111 1 302 "GET /abcd/z1/bcdfgggg?values" uri="/abcd/v2/nano" 111 111 0 "-" "abcd/2.1.0 (Linux; U; Android 8.1.0; Redmi Note 6 Pro MIUI/V10.2.2.0.bcdwvc)" "1111:1111:111:1111:11:d11e:c11c:111a" cu=0.011 nano=0.011 var="-12345" "1111:1111:111:1111:11:d11e:c11c:111a, 11.111.111.111"}
I am trying to print the below result but the result is not printed as expected.
Code:
Cat test.txt | awk -F'"' '{ print $1,$9}' |awk -F' ' '{ print $3,$6,$24}'
Actual Result: 2017-10-06T00:00:00+00:00 302
Expected Result: 2017-10-06T00:00:00+00:00 302 cu=0.011

With GNU sed and a regex with three backreferences:
sed -r 's/.* ([0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9:+]{14}) [0-9]+ [0-9]+ ([0-9]{3}) .*(cu=[0-9.]+).*/\1 \2 \3/' file
Output:
2017-10-06T00:00:00+00:00 302 cu=0.011
See: The Stack Overflow Regular Expressions FAQ

Assuming the log entry will always look like presented by the OP:
pattern='{1.11.111.111 - 2017-10-06T00:00:00+00:00 111111 1 302 "GET /abcd/z1/bcdfgggg?values" uri="/abcd/v2/nano" 111 111 0 "-" "abcd/2.1.0 (Linux; U; Android 8.1.0; Redmi Note 6 Pro MIUI/V10.2.2.0.bcdwvc)" "1111:1111:111:1111:11:d11e:c11c:111a" cu=0.011 nano=0.011 var="-12345" "1111:1111:111:1111:11:d11e:c11c:111a, 11.111.111.111"}'
awk -F ' ' '{print $3,$6,$25}' <<< "$pattern"
Output: 2017-10-06T00:00:00+00:00 302 cu=0.011

Related

Is there a way to remove a line fully?

I'm using a one-line command to compile and print all of the animal names listed in a log file.
The WILD names are all listed in capital letters under the /wild directory.
The output should appear in the format of one name per line, with no duplicates:
ANT
BAT
CAT
I tried
grep 'wild' animal.txt | awk '{print $7}' | sed 's/[a-z0-9./]//g' | sort -u
It showed what I want but I want to remove the whole string which contains special characters like -, # ? %
Below is a sample of the file animal.txt
191.21.66.100 - - [21/Aug/1995:05:17:57 -0400] "GET /wild/elvpage.htm#ZOO HTTP/1.0"
191.21.66.100 - - [01/Aug/1995:02:22:35 -0400] "GET /wild/S/s_26s.jpg HTTP/1.0"
191.21.66.100 - - [01/Aug/1995:02:22:41 -0400] "GET /wild/struct.gif HTTP/1.0"
191.21.66.100 - - [01/Aug/1995:02:27:34 -0400] "GET /wild/elvpage.htm HTTP/1.0"
191.21.66.100 - - [01/Aug/1995:02:27:36 -0400] "GET /wild/endball.gif HTTP/1.0"
191.21.66.100 - - [01/Aug/1995:02:27:37 -0400] "GET /wild/hot.gif HTTP/1.0"
191.21.66.100 - - [01/Aug/1995:02:27:38 -0400] "GET /wild/elvhead3.gif HTTP/1.0"
191.21.66.100 - - [01/Aug/1995:02:27:38 -0400] "GET /wild/PEGASUS/minpeg1.gif HTTP/1.0"
191.21.66.100 - - [01/Aug/1995:02:27:39 -0400] "GET /wild/DOG/DOG.gif HTTP/1.0"
191.21.66.100 - - [01/Aug/1995:02:27:39 -0400] "GET /wild/SWAN/SWAN.gif HTTP/1.0"
191.21.66.100 - - [01/Aug/1995:02:27:39 -0400] "GET /wild/ATLAS/atlas.gif HTTP/1.0"
191.21.66.100 - - [01/Aug/1995:02:27:40 -0400] "GET /wild/LIZARD/lizard.gif HTTP/1.0"
Below is a sample of my output after running the command:
ATLAS
ATLAS-
CAT_
DOG
%FACT
-KWM
?TIL-
#ZOO
Why not allow only capital A-Z and remove everything else:
grep 'wild' animal.txt | awk '{print $7}' | sed 's/[^A-Z]//g'
from your example input, this will return:
PEGASUS
DOGDOG
SWANSWAN
ATLAS
LIZARD
If you need to: you can further cleanup empty lines by appending |sed "/^$/d" and then sort
You can use a single GNU sed command:
sed -n 's!.*/wild/\([A-Z][A-Z]\+\)/.*!\1!p' animal.txt
Means:
-n: Do not print every line.
s!X!Y! Substitute X with Y.
.*/wild/\([A-Z][A-Z]\+\)/*: find a capital letter followed by at least one capital letter and preceded by wild/. These should be followed by a / and anything. Capture (remember) the capital letters.
!\1!: Replace whatever you found with the capital letter sequence.
p: If it was a match then print the line.
Gives:
PEGASUS
DOG
SWAN
ATLAS
LIZARD
This might work for you (GNU sed):
sed -E '/.*\/wild\/[^A-Z ]*([A-Z]+).*/!d # delete lines with no uppercase letters
s//\1/ # remove everything but uppercases letters
H # append word to the hold space
$!d # delete all lines but the last
x # swap to the hold space
:a # loop name space
s/((\n[^\n]+).*)\2/\1/ # remove duplicates
ta # repeat until failure
s/.//' file # remove introduced newline
GNU awk to get result:
grep 'wild' animal.txt | awk '
($0 = $7)
{gsub(/\//, " ", $0)}; #replace '/' with space so we can separate $0 to ($1, $2, $3);
(NF == 3 && length($2) > 2) #check if there is three word in line ($1, $2, $3) and then check if length($2) is more then 2 character
{print $2}'
| sort -u
Answer:
grep 'wild' animal.txt | awk '
($0 = $7) {gsub(/\//, " ", $0)};
(NF == 3 && length($2) > 2) {print $2}' | sort -u

Retrieving User Agent from Apache/Nginx Access.log

I have the command below which prints out hits, host IP (local server/load balancer) and external IP (the one causing the hit) I would also like to print out the User Agent information alongside the information given. How can this be achieved please?
cat access.log | sed -e 's/^\([[:digit:]\.]*\).*"\(.*\)"$/\1 \2/' | sort -n | uniq -c | sort -nr | head -20
What I get is below...
Hits, Host IP, External IP
What I'd like if possible...
Hits, IP (host example), External IP (causing the hit), User Agent
10000 192.168.1.1 148.285.xx.xx Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.4 (KHTML, like Gecko) Chrome/98 Safari/537.4
Attached below is an excerpt from the log
192.168.xxx.x - - [10/Jun/2019:12:40:15 +0100] "GET /company-publications/152005 HTTP/1.1" 200 55848 "google.com" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.12) Gecko/20080219 Firefox/2.0.0.12 Navigator/9.0.0.6" "xx.xx.xx.xx"
If GNU AWK (gawk) is available, please try the following:
awk -v FPAT='(\"[^"]+\")|(\\[[^]]+])|([^ ]+)' '
{ gsub("\"", "", $9); gsub("\"", "", $10); print $1 " " $10 " " $9 }
' access.log | sort -n | uniq -c | sort -nr | head -20
The value of FPAT represents a regex of each field in access.log. That is: "string surrounded by double quotes", "string surrounded by square
brackets" or "string separated by whitespaces".
Then you can split each line of access.log into fields: $1 for host IP,
$10 for external IP, and $9 for user agent.

parse httpd log in bash

my httpd log has the following format
123.251.0.000 - - [05/Sep/2014:18:19:24 -0700] "GET /myapp/MyService?param1=value1&param2=value2&param3=value3 HTTP/1.1" 200 15138 "-" "-"
I need to extract the following fields and display on a line:
IP value1 httpResponseCode(eg.200), dataLength
what's the most efficient way to do this in bash?
As you're using Linux, chances are that you also have GNU awk installed. If so:
$ awk 'match ($7, /param1=([^& ]*)/, m) { print $1, m[1], $9",", $10 }' http.log
gives:
123.251.0.000 value1 200, 15138
This works as long as value1 hasn't got an ampersand or space in it, which they shouldn't if the request has been escaped correctly.
$ cat tmp.txt
123.251.0.000 - - [05/Sep/2014:18:19:24 -0700] "GET /myapp/MyService?param1=value1&param2=value2&param3=value3 HTTP/1.1" 200 15138 "-" "-"
$ awk '{ print "IP", $1, $9, $10 }' tmp.txt
IP 123.251.0.000 200 15138

awk, from last column to \r\n (CRNL)

I have a file with lines like so:
Internet Protocol Version 4, Src: 192.168.0.29 (192.168.0.29), Dst: www.l.google.com (64.233.187.104)
Time to live: 128
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0\r\n
if I use $NF I end up with:
rv:1.7.5)
but I want:
Firefox/1.0
I want to make my script, below, do that:
awk '
/ User-Agent/{brow=$NF}
END{
print brow;
}
'
any suggestions would be appreciated!
Full script: (fixed)
#!/bin/bash
echo $1;
awk '/ User-Agent/{print}' $1 > useragents_$1;
echo '----------------------------------------------------' >> useragents_$1;
sort useragents_$1 | uniq >> useragents_$1;
awk '
/Internet Protocol Version 4, Src:/{ip=$(NF-4)}
/ Time to live/{ttl++}
/ Time to live/{sttl=$NF}
/ User-Agent/{os=$(NF-6)" "$(NF-5)}
/ User-Agent/{brow=$NF}
/ User-Agent/{agent++}
/ User-Agent/{stringtemp=sttl"\t"ip"\t"os"\t"brow}
/Windows/{windows++}
/Linux/{linux++}
/Solaris/{solaris++}
END{
sub(/\\r.*$/, "", brow);
print "TTL\tIP\t\tOS\t\tBROWSER";
print stringtemp;
print "\nSUMMARY";
print "\tttl\t=\t"ttl; print "\twindows\t=\t"windows;
print "\tlinux\t=\t"linux; print "\tsolaris\t=\t"solaris;
print "\tagent\t=\t"agent
}
' $1 > useragents_$1;
more useragents_$1;
Output:
examplehttppacket.txt
TTL IP OS BROWSER
128 192.168.0.29 Windows NT Firefox/1.0\r\n
SUMMARY
ttl = 1
windows = 3
linux =
solaris =
agent = 1
Thanks for all your help everybody, looks like it was mostly a text file problem!
This awk should work:
awk '/User-Agent/{brow=$NF} END{sub(/\\r.*$/, "", brow); print brow;}' file
If I assume that your sample script has a typo (i.e., that you mean /User-Agent/, with no leading spaces), then given this input file:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0
And this script:
awk '
/User-Agent/{brow=$NF}
END{
print brow;
}
'
Then I get this output:
Firefox/1.0
Which seems to be exactly what you want. If you're seeing different behavior, please update your question with information about your operating system and an example of actual input and actual output that demonstrates the problem.
awk '/User-Agent/{brow=$NF}; END{print brow;}' file_name
Works fine.
I guess the first thing to try is to remove the \r chars
awk '
{gsub(/^M/, "", $0)}
/ User-Agent/{brow=$NF}
END{
print brow;
} file
If using the VI(M) editor, enter the Ctrl-M (^M above) as one char, and using vi(m)s escape char feature, by pressing Ctrl-V (and then) Ctrl-M.
IHTH

Find text in files and get the needed content

I have many access_log files. This is a line from a file of them.
access_log.20111215:111.222.333.13 - - [15/Dec/2011:05:25:00 +0900] "GET /index.php?uid=01O9m5s23O0p&p=nutty&a=check_promotion&guid=ON HTTP/1.1" 302 - "http://xxx.com/index.php?uid=xxx&p=mypage&a=index&sid=&fid=&rand=1681" "Something/2.0 qqq(xxx;yyy;zzz)" "-" "-" 0
How to extract the uid "01O9m5s23O0p" from the lines which have the occurence of "p=nutty&a=check_promotion" and output to a new file.
For example, The "output.txt" file should be:
01O9m5s23O0p
01O9m5s0999p
01O9m5s3249p
fFDSFewrew23
SOMETHINGzzz
...
I tried the:
grep "p=nutty&a=check_promotion" access* > using_grep.out
and
fgrep -o "p=nutty&a=check_promotion" access* > using_fgrep.out
but it prints whole line. I just want to get the uid.
Summary:
1) Find the lines which have "p=nutty&a=check_promotion"
2) Extract uid from those lines.
3) Print them to a file.
Do exactly that, in three stages:
(formatted to avoid the scroll)
grep 'p=nutty&a=check_promotion' access* \
| grep -o '[[:alnum:]]\{4\}m5s[[:alnum:]]\{4\}p' \
> output.txt
If your lines which have p=nutty&a=check_promotion are similar in nature then we can set the delimiters and use awk to extract the uid and place them in a file.
awk -v FS="[?&=]" '
$0~/p=nutty&a=check_promotion/{ print $3 > "output_file"}' input_file
Test:
[jaypal:~/Temp] cat file
access_log.20111215:210.136.161.13 - - [15/Dec/2011:05:25:00 +0900] "GET /index.php?uid=01O9m5s23O0p&p=nutty&a=check_promotion&guid=ON HTTP/1.1" 302 - "http://xxx.com/index.php?uid=xxx&p=mypage&a=index&sid=&fid=&rand=1681" "Something/2.0 qqq(xxx;yyy;zzz)" "-" "-" 0
[jaypal:~/Temp] awk -v FS="[?&=]" '
$0~/p=nutty&a=check_promotion/{ print $3 > "output_file"}' input_file
[jaypal:~/Temp] cat output_file
01O9m5s23O0p

Resources