I have a file with lines like so:
Internet Protocol Version 4, Src: 192.168.0.29 (192.168.0.29), Dst: www.l.google.com (64.233.187.104)
Time to live: 128
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0\r\n
if I use $NF I end up with:
rv:1.7.5)
but I want:
Firefox/1.0
I want to make my script, below, do that:
awk '
/ User-Agent/{brow=$NF}
END{
print brow;
}
'
any suggestions would be appreciated!
Full script: (fixed)
#!/bin/bash
echo $1;
awk '/ User-Agent/{print}' $1 > useragents_$1;
echo '----------------------------------------------------' >> useragents_$1;
sort useragents_$1 | uniq >> useragents_$1;
awk '
/Internet Protocol Version 4, Src:/{ip=$(NF-4)}
/ Time to live/{ttl++}
/ Time to live/{sttl=$NF}
/ User-Agent/{os=$(NF-6)" "$(NF-5)}
/ User-Agent/{brow=$NF}
/ User-Agent/{agent++}
/ User-Agent/{stringtemp=sttl"\t"ip"\t"os"\t"brow}
/Windows/{windows++}
/Linux/{linux++}
/Solaris/{solaris++}
END{
sub(/\\r.*$/, "", brow);
print "TTL\tIP\t\tOS\t\tBROWSER";
print stringtemp;
print "\nSUMMARY";
print "\tttl\t=\t"ttl; print "\twindows\t=\t"windows;
print "\tlinux\t=\t"linux; print "\tsolaris\t=\t"solaris;
print "\tagent\t=\t"agent
}
' $1 > useragents_$1;
more useragents_$1;
Output:
examplehttppacket.txt
TTL IP OS BROWSER
128 192.168.0.29 Windows NT Firefox/1.0\r\n
SUMMARY
ttl = 1
windows = 3
linux =
solaris =
agent = 1
Thanks for all your help everybody, looks like it was mostly a text file problem!
This awk should work:
awk '/User-Agent/{brow=$NF} END{sub(/\\r.*$/, "", brow); print brow;}' file
If I assume that your sample script has a typo (i.e., that you mean /User-Agent/, with no leading spaces), then given this input file:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0
And this script:
awk '
/User-Agent/{brow=$NF}
END{
print brow;
}
'
Then I get this output:
Firefox/1.0
Which seems to be exactly what you want. If you're seeing different behavior, please update your question with information about your operating system and an example of actual input and actual output that demonstrates the problem.
awk '/User-Agent/{brow=$NF}; END{print brow;}' file_name
Works fine.
I guess the first thing to try is to remove the \r chars
awk '
{gsub(/^M/, "", $0)}
/ User-Agent/{brow=$NF}
END{
print brow;
} file
If using the VI(M) editor, enter the Ctrl-M (^M above) as one char, and using vi(m)s escape char feature, by pressing Ctrl-V (and then) Ctrl-M.
IHTH
Related
I have the input file (myfile) as:
/data/152.18224487:2,S/proforma invoice.doc
/data/152.916612:2,/proforma invoice.doc
/data/152.48152834/Bank T.T Copy 12 d3d.doc
/data/155071755/Bank T.T Copy.doc
/data/1521/Quotation Request.doc
/data/15.462/Quotation Request 2ds.doc
/data/15.22649962_test4/Quotation Request 33 zz (.doc
/data/15.226462_test6/Quotation Request.doc
and I need to exclude all data after latest "/" to the end of the row to have this output:
/data/152.18224487:2,S
/data/152.916612:2,
/data/152.48152834
/data/155071755
/data/1521
/data/15.462
/data/15.22649962_test4
/data/15.226462_test6
How can I do this from command line linux ?
This is a follow-up question related to extract last section of data from file using linux command
Could you please try following.
awk 'match($0,/\/.*\//){print substr($0,RSTART,RLENGTH-1)}' Input_file
Above will look from / to till last occurrence of / in case your Input_file can start other than / then try following.
awk 'match($0,/.*\//){print substr($0,RSTART,RLENGTH-1)}' Input_file
This one is combined with your previous question ,
ie. data:
>> Vi 'x' found in file /data/152.916612:2,/proforma invoice.doc
>> Vi 'x' found in file /data/152.48152834/Bank T.T Copy 12 d3d.doc
>> Vi 'x' found in file /data/155071755/Bank T.T Copy.doc
...
wwk:
$ awk '
(s=match($0,/found in file /)+RLENGTH) && (match(substr($0,s),/.*\//)) {
print substr($0,s,RLENGTH-1)
}' file
Output:
/data/152.18224487:2,S
/data/152.916612:2,
/data/152.48152834
...
Try
sed 's:/[^/]*$::' < inputfile > outputfile
You stated in a comment elsewhere that you also need only the rest after the last slash, so here we go:
sed 's:^.*/::' < inputfile > outputfile
awk -F/ '{print "/"$1$2"/"$3}' file
/data/152.18224487:2,S
/data/152.916612:2,
/data/152.48152834
/data/155071755
/data/1521
/data/15.462
/data/15.22649962_test4
/data/15.226462_test6
Hoping someone can help me with a bash linux script to generate report from http logs.
Logs format:
domain.com 101.100.144.34 - r.c.bob [14/Feb/2017:11:31:20 +1100] "POST /webmail/json HTTP/1.1" 200 1883 "https://example.domain.com/webmail/index-rui.jsp?v=1479958955287" "Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko" 1588 2566 "110.100.34.39" 9FC1CC8A6735D43EF75892667C08F9CE 84670 - - - -
Output require:
time in epoch,host,Resp Code,count
1485129842,101.100.144.34,200,4000
1485129842,101.101.144.34,404,1889
what I have so far but nothing near what I am trying to achieve:
tail -100 httpd_access_*.log | awk '{print $5 " " $2 " " $10}' | sort | uniq
awk 'BEGIN{
# print header
print "time in epoch,host,Resp Code,count"
# prepare month conversion array
split( "Jan Feb Mar Apr May Jun Jui Aug Sep Oct Nov Dec", tmp)
for (i in tmp) M[tmp[i]]=i
}
{
#prepare time conversion for mktime() using array and substitution
# from 14/Feb/2017:11:31:20 +1100
# to YYYY MM DD HH MM SS [DST]
split( $5, aT, /[:/[:blank:]]/)
t = $5; sub( /^.*:|:/, " ", t)
t = aT[3] " " M[aT[2]] " " aT[1] t
# count (not clear if it s this to count due to time changing
Count[ sprintf( "%s, %s, %s", mktime( t), $2, $10)]++
}
END{
# disply the result counted
for( e in Count) printf( "%s, %d\n", e, Count[e])
}
' httpd_access_*.log
count is to be more specificaly describe to be sure about the criteria to count
need GNU awk for mktime() function
assume time is always in this format
no secure nor filter (not the purpose of this)
Sure the pure AWK based solution above would be much faster, and more complete.
But can also be done in smaller steps:
First get date and convert it to EPOCH:
$ dt=$(awk '{print $5,$6}' file.log)
$ ep=$(date -d "$(sed -e 's,/,-,g' -e 's,:, ,' <<<"${dt:1:-1}")" +"%s")
$ echo "$ep"
1487032280
Since now you have the epoch date in the bash var $ep, you can continue with your initiall awk like this:
$ awk -v edt=$ep '{print edt","$2","$10}' file.log
1487032280,101.100.144.34,200
If you want a header , you can just print one before last awk with a simple echo.
This is the content of file.txt:
hello bro
my nam§
is Jhon Does
The file could also contain non-printable characters (for example \x00, or \x02), and, as you can see, the lenght of the lines are not the same.
Then I want to read it each each 5 characters without having into a count line breaks. I thought in something like this using awk:
awk -v RS='' '{
s=s $0;
}END{
n=length(s);
for(x=1; x<n; x=x+5){
# Here I will put some calcs and stuff
i++;
print "line " i ": #" substr(s,x,5) "#"
}
}' file.txt
The output is the following:
line 1: #hello#
line 2: # bro
#
line 3: #my na#
line 4: #m§
is#
line 5: # Jhon#
line 6: # Does#
It works perfectly, but the input file will be very large, so the performance is important.
In short, I'm looking for something like this:
awk -v RS='.{5}' '{ # Here I will put some calcs and stuff }'
But it doesn't works.
Another alternative that works ok:
xxd -ps mifile.txt | tr -d '\n' | fold -w 10 | awk '{print "23" $0 "230a"}' | xxd -ps -r
Do you have any idea or alternative? Thank you.
I'm not sure I understand what you want but this outputs the same as the script in your question that you say works perfectly so hopefully this is it:
$ awk -v RS='.{5}' 'RT!=""{ print "line", NR ": #" RT "#" }' file
line 1: #hello#
line 2: # bro
#
line 3: #my na#
line 4: #m§
is#
line 5: # Jhon#
line 6: # Does#
The above uses GNU awk for multi-char RS and RT.
If you are okay with Python, You may try this
f = open('filename', 'r+')
w = f.read(5)
while(w != ''):
print w;
w = f.read(5);
f.close()
You can use perl and binmode assuming you are using normal characters.
use strict;
use warnings;
open my $fh, '<', 'test';
#open the file.
binmode $fh;
# Set to binary mode
$/ = \5;
#Read a record as 5 bytes
while(<$fh>){
#Read records
print "$_#"
#Do whatever calculations you want here
}
For extended character sets you can use UTF8 and read every 5 characters instead of bytes.
use strict;
use warnings;
open my $fh, '<:utf8', 'test';
#open file in utf8.
binmode(STDOUT, ":utf8");
# Set stdout to utf8 as well
while ((read($fh, my $data, 5)) != 0){
#Read 5 characters into variable data
print "$data#";
#Do whatever you want with data here
}
So you asked How to read a file each n characters instead of each line using awk.
Solution:
If you have a modern gawk implementation use FPAT
Normally, when using FS, gawk defines the fields as the parts of the
record that occur in between each field separator. In other words, FS
defines what a field is not, instead of what a field is. However,
there are times when you really want to define the fields by what they
are, and not by what they are not.
Code:
gawk 'BEGIN{FS="\n";RS="";FPAT=".{,5}"}
{for (i=1;i<=NF;i++){
printf("$%d = <%s>\n", i, $i)}
}' file
Check the demo
the following awk syntax cut the lines from the file
from the line that have port XNT1
until END OF COMMAND line
# awk '/\/stats\/port XNT1\/if/,/END OF COMMAND/' /var/tmp/test
>> SW_02_03 - Main# /stats/port XNT1/if
------------------------------------------------------------------
Interface statistics for port XNT1:
IBP/CBP Discards: 0
L3 Discards: 0
>> SW_02_03 - Port Statistics# END OF COMMAND
#
#
#
now I set external variable as XNTF=XNT1 in awk command
but from some reason XNTF in the awk not get the "XNT1" value , and awk not display the lines!!!!!!!!
# awk -v XNTF=XNT1 '/\/stats\/port XNTF\/if/,/END OF COMMAND/' /var/tmp/test
please advice why awk not works when I set external variable ? and how to fix it ?
I normally try to avoid the range command in awk ,, since it not so flexible. This should do:
awk -v XNTF=XNT1 '$0~"/stats/port " XNTF "/if" {f=1} f; /END OF COMMAND/ {f=0}' file
>> SW_02_03 - Main# /stats/port XNT1/if
------------------------------------------------------------------
Interface statistics for port XNT1:
IBP/CBP Discards: 0
L3 Discards: 0
>> SW_02_03 - Port Statistics# END OF COMMAND
Inside //, variables are not expanded. You'll have to use the ~ operator to match against an assembled regex:
awk -v XNTF=XNT1 '$0 ~ "/stats/port " XNTF "/if",/END OF COMMAND/' /var/tmp/test
Generally, $0 ~ some_string matches $0 (the line) against some_string interpreted as a regex.
I am trying to run the script which was in the solaris in the linux machine.
It's showing warnings nawk: cmd. line:7: warning: escape sequence\<' treated as plain <'
I can't change the version of the awk. Is there any other way to remove this warning?
EDIT:
My awk file will simply print the xml tags in function.
function PrintExamHeader()
{
print "<exam"; #I have removed the \
}
Now it's giving warnings at line number where there is no such pattern.
BEGIN { # here it's giving warning
OFS = "";
# Indexes for series structure
idx = 1;
Number = idx++;
ItDate = idx++; # and 3 more at such lines
Time = idx++;
Date = idx++;
Here's a simple example that triggers the warning (GNU awk; on some Linux systems, nawk is a symlink to GNU awk, gawk):
awk 'BEGIN { print "\<exam" }' # -> '<exam'
If your output is OK, and all you need to do is to get rid of the warning, simply remove the \:
awk 'BEGIN { print "<exam" }' # -> '<exam'
If instead you wanted to print \<exam, you'd have to double the backslash:
awk 'BEGIN { print "\\<exam" }' # -> '\<exam'
What the warning is trying to tell you is that the \ prefix is essentially a no-op in this context, and that it is not needed.