I have this command which outputs 2 columns separated by ⎟. First column is the number of occurrence, second is the IP address. And the whole thing is sorted by ascending # of occurrence.
awk '{ips[$1]++} END {for (ip in ips) { printf "%5s %-1s %-3s\n", ips[ip], "⎟", ip}}' "${ACCESSLOG}" | sort -nk1
19 ⎟ 76.20.221.34
19 ⎟ 76.9.214.2
22 ⎟ 105.152.107.118
26 ⎟ 24.185.179.32
26 ⎟ 42.117.198.229
26 ⎟ 83.216.242.69
etc.
Now i would like to add a third column in there. In the bash shell, if you do, for instance:
host 72.80.99.43
you'll get:
43.99.80.72.in-addr.arpa domain name pointer pool-72-80-99-43.nycmny.fios.verizon.net.
So for every IP appearing in the list, i want to show in the third column its associated host. And i want to do that from within awk. So calling host from awk and passing it the parameter ip. And ideally, skipping all the standard stuff and only showing the hostname like so: nycmny.fios.verizon.net.
So my final command would look like this:
awk '{ips[$1]++} END {for (ip in ips) { printf "%5s %-1s %-3s %20s\n", ips[ip], "⎟", ip, system( "host " ip )}}' "${ACCESSLOG}" | sort -nk1
Thanks
You wouldn't use system() since you want to combine the shell command output with your awk output, you'd call the command as a string and read it's result into a variable with getline, e.g.:
awk '{ips[$1]++}
END {
for (ip in ips) {
cmd = "host " ip
if ( (cmd | getline host) <= 0 ) {
host = "N/A"
}
close(cmd)
printf "%5s %-1s %-3s %20s\n", ips[ip], "⎟", ip, host
}
}' "${ACCESSLOG}" | sort -nk1
I assume you can figure out how to use *sub() to get just the part of the host output you care about.
Related
I am trying to get last 2 values from right to left from cut command
I have a large database for about 110 Million domains and subdomains.
Like
yahoo.com
mail.yahoo.com
a.yahoo.com
a.yahoo.co.uk
In simple words I am trying to remove subdomains from domains
echo a.yahoo.aa | cut -d '.' -f 2,3
yahoo.aa
but when I try
echo yahoo.aa | cut -d '.' -f 2,3
aa
it give me only aa
Required output is
yahoo.com
yahoo.com
yahoo.com
yahoo.co.uk
edit thanks anubhava for suggestion.
a TLD property is like
xxxx.xx
xxx.xx
xx.xx
i.e. a ccTLD always has 2 characters in last.
Long solution but a think that makes what you want to do:
Executable file domain.awk:
#! /usr/bin/awk -f
BEGIN {
FS="."
}
{
ret = $NF
if (NF >= 2 && (length($(NF - 1)) == 2 || length($(NF - 1)) == 3)) {
ret = $(NF - 1) "." ret
if (NF >= 3) {
ret = $(NF - 2) "." ret
}
} else if (NF >= 2) {
ret = $(NF - 1) "." ret
}
print ret
}
with domains.lst file:
yahoo.com
mail.yahoo.com
a.yahoo.com
a.yahoo.co.uk
aus.co.au
Used like that:
./domain.awk domains.lst
Output:
yahoo.com
yahoo.com
yahoo.com
yahoo.co.uk
aus.co.au
Using the sample input you provided and accepting your statement that a ccTLD always has 2 characters in last. as being your criteria for printing the last 3 instead of last 2 segments of the input:
Using GNU grep for -o:
$ grep -Eo '[^.]+\.[^.]+(\.[^.]{2})?$' file
yahoo.com
yahoo.com
yahoo.com
yahoo.co.uk
or using any awk:
$ awk 'match($0,/[^.]+\.[^.]+(\.[^.]{2})?$/){print substr($0,RSTART)}' file
yahoo.com
yahoo.com
yahoo.com
yahoo.co.uk
Try
echo a.yahoo.aa | awk -F'.' '{print $NF"."$(NF-1)}'
large database for about 110 Million domains and subdomains.
Due to this I suggest using sed here, let file.txt content be
yahoo.com
mail.yahoo.com
a.yahoo.com
then
sed 's/^.*\.\([^.]*\.[^.]*\)$/\1/' file.txt
output
yahoo.com
yahoo.com
yahoo.com
Explanation: In regular expression spanning whole line (^-start, $-end) I use single capturing group which contain zero-or-more (*) non-dots followed by literal dot (\.) followed by zero-or-more non-dots which is adjacent to end of line, I replace whole line with content of that group. Disclaimer: this solution assumes there is always at least one dot in each line
(tested in GNU sed 4.2.2)
You are selecting only fields 2 and 3. You need to select from field 2 up to the end:
... | cut -d '.' -f 2-
I am trying to output the size of an ARP table from a FW using an Expect script so it can be graphed. After the below code the output displayed to screen is shown:
/usr/bin/expect -f -<< EOD
spawn ssh test#1.2.3.4
sleep 1
expect {
"*word:" {send "password\r"}
}
sleep 1
expect {
"*>" {send "show arp all | match \"total ARP entries in table\"\r"}
}
sleep 1
expect {
"*>" {send "exit\r"}
}
expect eof
EOD
spawn ssh test#1.2.3.4
FW-Active
Password:
Number of failed attempts since last successful login: 0
test#FW-Active(active)> show arp all | match "total ARP entries in table"
total ARP entries in table : 2861
What I am trying to do is be able to output only the numeric value indicated from the total ARP entries in table. I am assuming I need to some how do a "cut" or "awk" or something to extract only the numbers but I am not having any luck. Any help is greatly appreciated.
You store the output of that whole command in a variable, let's say a.
Something like this will probably work. Since you're using expect, you might want to figure out how to store that output as a variable that way you can manipulate it. I stored the output as $a in my example.
$ echo $a
total ARP entries in table : 2861
$ echo ${a% *}
total ARP entries in table :
$ echo ${a% *}-
total ARP entries in table : -
$ echo ${a##* }
2861
Logic explanation (Parameter/Variable Substituion in BASH):
1) To removing/stripping the left hand side part, use # for reaching the first matching character value (reading / parsing from left side), ## for reaching the last matching character/value. It works by giving *<value> within the { } braces.
2) To removing/stripping the right hand side part, use % for reaching the first matching character value (reading / parsing from right side), %% for reaching the last matching character/value. It works by giving <value>* within the { } braces.
Or if you don't want to store the output or anything, then simply do this:
show arp all | match "total ARP entries in table" | grep -o "[0-9][0-9]*"
Or (the following assumes that you don't change
show arp all | match "total ARP entries in table" | sed "s/ *//g"|cut -d':' -f2
Does anyone know of a way to replace blanks with 0's? Here's what im trying to do...
Basically i have a script that pulls an ip address and manipulates the address to make a port number out of it.
192.168.202.3 = Port 23
what i need is a smart enough sed command to add 2 0's in front of the 3 making it a full value.
192.168.202.3 = Port 2003
or:
192.168.202.003 = Port 2003
The catch is, if the number already exists then i dont want it to add 0's..
192.168.202.254 = Port 2254
instead of:
192.168.202.254 = Port 200254
Any ideas on how to do it?
Relevant Portion of the script:
# Retrieve local-ipv4 address from meta-data
GET http://169.254.169.254/latest/meta-data/local-ipv4 > /metadata
# Create a manipulated version of ipv4 to use as a port number
sed "s/192.168.20//" /metadata > /metaport
sed -i "s/\.//g" /metaport
If you have another way without using sed im open for those suggestions as well!!
Thanks!
I would prefer using awk for number manipulation rather than sed
awk -F'.' '{printf "%03s%03s\n", $3, $4}' /metadata | cut -c3-6 > /metaport
Input IP:
192.168.202.3
192.168.202.23
192.168.202.254
Output Port:
2003
3023
2254
EDIT
More concise awk only solution avoiding need of cut (Suggested by Jonathan Leffler)
awk -F'.' '{printf "%d%03d\n", $3 % 10, $4}' /metadata > /metaport
If the input file contains only an IP address, then brute force and ignorance can do the job:
sed -e 's/\([0-9]\)\.\([0-9]\)$/& = Port \100\2/' \
-e 's/\([0-9]\)\.\([0-9][0-9]\)$/& = Port \10\2/' \
-e 's/\([0-9]\)\.\([0-9][0-9][0-9]\)$/& = Port \1\2/'
The first expression deals with 1 digit; the second with 2 digits; the third with 3.
Given input data:
192.168.202.3
192.168.203.13
192.168.202.003
192.168.202.254
the output is:
192.168.202.3 = Port 2003
192.168.203.13 = Port 3013
192.168.202.003 = Port 2003
192.168.202.254 = Port 2254
If you have a different input data format, you have to work harder to isolate the relevant section of the IP address, but you should really, really show what the input data looks like.
Just for fun, bash:
while IFS=. read a b c d; do
printf "%d%03d\n" $((c%10)) $d
done <<END
192.168.202.3
192.168.202.003
192.168.209.123
127.0.0.1
END
2003
2003
9123
0001
Given the description -- only insert two zeros when we only have 2 digits into the port the following should work:
sed -r '/Port [0123456789]{2}$/s/Port (.)/\100/'
So this only matches when Port is followed by 2 digits. If it does match, replace the first digit with that digit and two zeros.
If you need to handle 3 digits, another match section that does just 3 digits could be trivially added.
my target is to match exactly IP address with three octes , while the four IP octet must be valid octet - between <0 to 255>
For example I have the following IP's in file
$ more file
192.9.200.10
192.9.200.100
192.9.200.1555
192.9.200.1
192.9.200.aaa
192.9.200.#
192.9.200.:
192.9.200
192.9.200.
I need to match the first three octets - 192.9.200 while four octet must be valid ( 0-255)
so finally - expects result should be:
192.9.200.10
192.9.200.100
192.9.200.1
the basic syntax should be as the following:
IP_ADDRESS_THREE_OCTETS=192.9.200
cat file | grep -x $IP_ADDRESS_THREE_OCTETS.[ grep Regular Expression syntax ]
Please advice how to write the right "grep regular Expression" in the four octets in order to match the three octets , while the four octets must be valid?
You'd need to use some high-level tools to convert the text to a regex pattern, so you might as well use just that.
perl -ne'
BEGIN { $base = shift(#ARGV); }
print if /^\Q$base\E\.([0-9]+)$/ && 0 <= $1 && $1 <= 255;
' "$IP_ADDRESS_THREE_OCTETS" file
If hardcoding the base is acceptable, that reduces to:
perl -ne'print if /^192\.9\.200\.([0-9]+)$/ && 0 <= $1 && $1 <= 255' file
Both of these snippets also accept input from STDIN.
For a full IP address:
perl -ne'
BEGIN { $ip = shift(#ARGV); }
print if /^\Q$ip\E$/;
' 1.2.3.4 file
or
perl -nle'
BEGIN { $ip = shift(#ARGV); }
print if $_ eq $ip;
' 1.2.3.4 file
Regexp is not good for comparing numbers, I'd do this with awk:
$ awk -F. '$1==192 && $2==9 && $3==200 && $4>=0 && $4<=255 && NF==4' file
192.9.200.10
192.9.200.100
192.9.200.1
If you really want to use grep you need the -E flag for extended regexp or use egrep because you need alternation:
$ grep -Ex '192\.9\.200\.(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])' file
192.9.200.10
192.9.200.100
192.9.200.1
$ IP=192\.9\.200\.
$ grep -Ex "$IP(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])" file
Note: You must escaped . to mean a literal period.
grep -E '^((25[0-5]|2[0-4][0-9]|[1]?[1-9][0-9]?).){3}(25[0-5]|2[0-4][0-9]|[1]?[1-9]?[0-9])$'
This expression will not match IP addresses with leading 0s. e.g., it won't match 192.168.1.01
This expression will not match IP addresses with more than 4 octets. e.g., it won't match 192.168.1.2.3
If you really want to be certain that what you have is a valid IPv4 address, you can always check the return value of inet_aton() (part of the Socket core module).
I am trying to grep a log file for entries within the last 24 hours. I came up with the following command:
grep "$(date +%F\ '%k')"\|"$(date +%F --date='yesterday')\ [$(date +%k)-23]" /path/to/log/file
I know regular expressions can be used in grep, but am not very familiar with regex. You see I am greping for anything from today or anything from yesterday between the current hour or higher. This isnt working and I am guessing due to the way I am trying to pass a command as a variable in the regex of grep. I also wouldnt be opposed to using awk with awk I came up with the following but it is not checking the variables properly:
t=$(date +%F) | y=$(date +%F --date='yesterday') | hr=$(date +%k) | awk '{ if ($1=$t || $1=$y && $2>=$hr) { print $0 }}' /path/to/log/file
I would assume systime could be used with awk rather than settings variables but i am not familiar with systime at all. Any suggestions with either command would be greatly appreciated! Oh and here's the log formatting:
2012-12-26 16:33:16 SMTP connection from [127.0.0.1]:46864 (TCP/IP connection count = 1)
2012-12-26 16:33:16 SMTP connection from (localhost) [127.0.0.1]:46864 closed by QUIT
2012-12-26 16:38:19 SMTP connection from [127.0.0.1]:48451 (TCP/IP connection count = 1)
2012-12-26 16:38:21 SMTP connection from [127.0.0.1]:48451 closed by QUIT
2012-12-26 16:38:21 SMTP connection from [127.0.0.1]:48860 (TCP/IP connection count = 1)
Here's one way using GNU awk. Run like:
awk -f script.awk file
Contents of script.awk:
BEGIN {
time = systime()
}
{
spec = $1 " " $2
gsub(/[-:]/, " ", spec)
}
time - mktime(spec) < 86400
Alternatively, here's the one-liner:
awk 'BEGIN { t = systime() } { s = $1 " " $2; gsub(/[-:]/, " ", s) } t - mktime(s) < 86400' file
Also, the correct way to pass shell vars to awk is to use the -v flag. I've made a few adjustments to your awk command to show you what I mean, but I recommend against doing this:
awk -v t="$(date +%F)" -v y="$(date +%F --date='yesterday')" -v hr="$(date +%k)" '$1==t || $1==y && $2>=hr' file
Explanation:
So before awk starts processing the file, the BEGIN block is processed first. In this block we create a variable called time / t and this is set using the systime() function. systime() simply returns the current time as the number of seconds since the system epoch. Then, for every line in your log file, awk will create another variable called spec / s and this is set to the first and second fields seperated by a single space. Additionally, other characters like - and : need to be globally substituted with spaces for the mktime() function to work correctly and this done using gsub(). Then it's just a little mathematics to test if the datetime in the log file is within the last 24 hours (or exactly 86400 seconds). If the test is true, the line will be printed. Maybe a little extra reading would help, see Time Functions and String Manipulation Functions. HTH.