Extract parent domain name from a list of url through Bash ShellScripting - linux

I have a list of urls like this:
http://noto.zrobimystrone.pl/pucenter/images/NGdocs/
http://visionwebmkt.com/unsubscribe.php?M=879552&C=b744d324e38f5f3b0bcf549f1d57a3ab&L=20&N=497
http://www.meguiatramandai.com.br/unsubscribe.php?M=722&C=8410431be55bf12faac13d18982d71cd&L=1&N=3
http://www.contatoruy.in/link.php?M=86457&N=4&L=1&F=H
http://www.maxxivrimoveis.com.br/
http://www.meguiatramandai.com.br/unsubscribe.php?M=722&C=8410431be55bf12faac13d18982d71cd&L=1&N=2
http://arm.smilecire.com/ch+urch38146263923bpa.stor/imp-roved258021029his+health212149011
http://hurl.zonalrems.com/ge.tyo-ur584372780599hea+lth247408058un/der+control21211901
http://harp.doomyjupe.com/see.this-better/life+58291551346csexdrive663295668+better/how.981692016
http://beefy.toneyvaws.com/no+tice/how/35306640b+see/app=5429204last/attempt=457943182
http://kirk.yournjuju.com/shop/sam.sclub-win=ter/58387369768esame+673844946.bett.er-loo.k981686408
http://idly.theirpoem.com/veri-fy/notice-7853508818b2glob/al=who.43639603inc.lusion-610549278
http://wva188.suleacatan.com/credit-score/review/-551694841511001sfdghsfdgsdfg63887839
http://cop.forterins.com/app.lyto=face962540097dtolo+oko.ung268570307yo.un-ger8752507
http://vni116.gaelsyaray.com/qertqetert//-dghjghjghd5531864856415612229498430
http://ticket.prategama.com/shop/sam.sclub-win=ter/752490935same+226373195.bett.er-loo.k212801
http://cbu125.quetxviii.com/cvbnvbn7551116db537203--swrtytry664896546
http://c5a.dicadodia.com.br/pass4sp09/NetAffProTeste-1.html
http://snub.woadsbevy.com/ama/zing-753773417oppe-tun/ity+217801.is-here/now=236922473
http://mkt.livrariacultura.com.br/pub/cc?_ri_=X0Gzc2X%3DWQpglLjHJlYQGgzfB7tPi0PuyyJ71ES
I wanna extract only the parents domain names, for example:
http://noto.zrobimystrone.pl/pucenter/images/NGdocs/
http://visionwebmkt.com/unsubscribe.php?M=879552&C=b744d324e38f5f3b0bcf549f1d57a3ab&L=20&N=497
http://www.meguiatramandai.com.br/unsubscribe.php?M=722&C=8410431be55bf12faac13d18
Into
zrobimystrone.pl
visionwebmkt.com
meguiatramandai.com.br
I have tried
awk '{gsub("http://|/.*","")}1' list.txt
and got the following results:
noto.zrobimystrone.pl
visionwebmkt.com
www.meguiatramandai.com.br
www.contatoruy.in
www.maxxivrimoveis.com.br
www.meguiatramandai.com.br
arm.smilecire.com
hurl.zonalrems.com
harp.doomyjupe.com
beefy.toneyvaws.com
but dont know how to get only the parent name from noto.zrobimystrone.pl for instance.

Using awk
awk -F \/ '{l=split($3,a,"."); print (a[l-1]=="com"?a[l-2] OFS:X) a[l-1] OFS a[l]}' OFS="." file|sort -u
contatoruy.in
dicadodia.com.br
doomyjupe.com
forterins.com
gaelsyaray.com
livrariacultura.com.br
maxxivrimoveis.com.br
meguiatramandai.com.br
prategama.com
quetxviii.com
smilecire.com
suleacatan.com
theirpoem.com
toneyvaws.com
visionwebmkt.com
woadsbevy.com
yournjuju.com
zonalrems.com
zrobimystrone.pl

You can use this awk:
awk -F'.' '{gsub("http://|/.*","")} NF>2{$1="";$0=substr($0, 2)}1' OFS='.' list.txt
zrobimystrone.pl
visionwebmkt.com
meguiatramandai.com.br
contatoruy.in
maxxivrimoveis.com.br
meguiatramandai.com.br
smilecire.com
zonalrems.com
doomyjupe.com
toneyvaws.com
yournjuju.com
theirpoem.com
suleacatan.com
forterins.com
gaelsyaray.com
prategama.com
quetxviii.com
dicadodia.com.br
woadsbevy.com
livrariacultura.com.br

A "simple" bash solution. Tested in bash shell on Solaris 11.2 x86.
#!/bin/bash
while IFS=/ read HTTP NULL FQDN PAGE
do
PARENT=${FQDN#*.}
if [[ $PARENT != *"."* ]]
then echo $FQDN
else echo $PARENT
fi
done < fileOfURLs.txt
Without the string contains pattern test, too much of the domain could be stripped away. The if paragraph can be reduced,so the whole script now looks like this:
#!/bin/bash
while IFS=/ read HTTP NULL FQDN PAGE
do
PARENT=${FQDN#*.}
[[ $PARENT != *"."* ]] && echo $FQDN || echo $PARENT
done < fileOfURLs.txt
The bash variable substitution is taking the contents of the variable FQDN and stripping from the left any character up to and including the first dot.
The test condition is asking if the contents of the PARENT variable does not contain a dot. If it does not hold a dot somewhere in the value, the test evaluates to true and will display the original FQDN contents. If the test evaluates to false, (there is still a dot in the value) the contents of PARENT are displayed.

I guess it depends on what you mean by parent. If by "parent", you mean the top of the zone apex in DNS (e.g., zrobimystrone.pl ), then the right way to do this is to look that up in DNS. There's a trick with DNS where you get back the parent zone SOA record if you ask for the SOA for any name.. So, try this:
for i in $(awk '{gsub("http://|/.*","")}1' list.txt); do dig soa $i | grep -v ^\; | grep SOA | awk '{print $1}'; done
This will give you a much more accurate list, but it runs way slower and is sub-optimal. The other answers don't take into account all the possible variations of TLD names used within TLDs, e.g., www.somecompany.org.uk, so it all depends on how accurate you need this to be.

An easy solution to get parent domain name
echo http://www.humkinar.pk | awk -F '/' '{print $3}'
www.humkinar.pk

Related

Linux Scripting with Spaces in Filenames

I am currently working with a vendor-provided software that is trying to handle sending attachment files to another script that will text-extract from the listed file. The script fails when we receive files from an outside source that contain spaces, as the vendor-supplied software does not surround the filename in quotes - meaning when the text-extraction script is run, it receives a filename that will split apart on the space and cause an error on the extractor script. The vendor-provided software is not editable by us.
This whole process is designed to be an automated transfer, so having this wrench that could be randomly thrown into the gears is an issue.
What we're trying to do, is handle the spaced name in our text extractor script, since that is the piece we have some control over. After a quick Google, it seems like changing the IFS value for the script would be the quick solution, but unfortunately, that script would take effect after the extensions have already mutilated the incoming data.
The script I'm using takes in a -e value, a -i value, and a -o value. These values are sent from the vendor supplied script, which I have no editing control over.
#!/bin/bash
usage() { echo "Usage: $0 -i input -o output -e encoding" 1>&2; exit 1; }
while getopts ":o:i:e:" o; do
case "${o}" in
i)
inputfile=${OPTARG}
;;
o)
outputfile=${OPTARG}
;;
e)
encoding=${OPTARG}
;;
*)
usage
;;
esac
done
shift $((OPTIND-1))
...
...
<Uses the inputfile, outputfile, and encoding variables>
I admit, there may be pieces to this I don't fully understand, and it could be a simple fix, but my end goal is to be able to extract -o, -i, and -e that all contain 1 value, regardless of the spaces within each section. I can handle quoting the script after I can extract the filename value
The script fragment that you have posted does not have any issues with spaces in the arguments.
The following, for example, does not need quoting (since it's an assignment):
inputfile=${OPTARG}
All other uses of $inputfile in the script should be double quoted.
What matters is how this script is called.
This would fail and would assign only hello to the variable inputfile:
$ ./script.sh -i hello world.txt
The string world.txt would prompt the getopts function to stop processing the command line and the script would continue with the shift (world.txt would be left in $1 afterwards).
The following would correctly assign the string hello world.txt to inputfile:
$ ./script.sh -i "hello world.txt"
as would
$ ./script.sh -i hello\ world.txt
The following script uses awk to split the arguments while including spaces in the file names. The arguments can be in any order. It does not handle multiple consecutive spaces in an argument, it collapses them to one.
#!/bin/bash
IFS=' '
str=$(printf "%s" "$*")
istr=$(echo "${str}" | awk 'BEGIN {FS="-i"} {print $2}' | awk 'BEGIN {FS="-o"} {print $1}' | awk 'BEGIN {FS="-e"} {print $1}')
estr=$(echo "${str}" | awk 'BEGIN {FS="-e"} {print $2}' | awk 'BEGIN {FS="-o"} {print $1}' | awk 'BEGIN {FS="-i"} {print $1}')
ostr=$(echo "${str}" | awk 'BEGIN {FS="-o"} {print $2}' | awk 'BEGIN {FS="-e"} {print $1}' | awk 'BEGIN {FS="-i"} {print $1}')
inputfile=""${istr}""
outputfile=""${ostr}""
encoding=""${estr}""
# call the jar
There was an issue when calling the jar where Java threw a MalformedUrlException on a filename with a space.
So after reading through the commentary, we decided that although it may not be the right answer for every scenario, the right answer for this specific scenario was to extract the pieces manually.
Because we are building this for a pre-built script passing to it, and we aren't updating that script any time soon, we can accept with certainty that this script will always receive a -i, -o, and -e flag, and there will be spaces between them, which causes all the pieces passed in to be stored in different variables in $*.
And we can assume that the text after a flag is the response to the flag, until another flag is referenced. This leaves us 3 scenarios:
The variable contains one of the flags
The variable contains the first piece of a parameter immediately after the flag
The variable contains part 2+ of a parameter, and the space in the name was interpreted as a split, and needs to be reinserted.
One of the other issues I kept running into was trying to get string literals to equate to variables in my IF statements. To resolve that issue, I pre-stored all relevant data in array variables, so I could test $variable == $otherVariable.
Although I don't expect it to change, we also handled what to do if the three flags appear in a different order than we anticipate (Our assumption was that they list as i,o,e... but we can't see excatly what is passed). The parameters are dumped into an array in the order they were read in, and a parallel array tracks whether the items in slots 0,1,2 relate to i,o,e.
The final result still has one flaw: if there is more than one consecutive space in the filename, the whitespace is trimmed before processing, and I can only account for one space. But saying as we processed over 4000 files before encountering one with a space, I find it unlikely with the naming conventions that we would encounter something with more than one space.
At that point, we would have to be stepping in for a rare intervention anyways.
Final code change is as follows:
#!/bin/bash
IFS='|'
position=-1
ioeArray=("" "" "")
previous=""
flagArr=("-i" "-o" "-e" " ")
ioePattern=(0 1 2)
#echo "for loop:"
for i in $*; do
#printf "%s\n" "$i"
if [ "$i" == "${flagArr[0]}" ] || [ "$i" == "${flagArr[1]}" ] || [ "$i" == "${flagArr[2]}" ]; then
((position += 1));
previous=$i;
case "$i" in
"${flagArr[0]}")
ioePattern[$position]=0
;;
"${flagArr[1]}")
ioePattern[$position]=1
;;
"${flagArr[2]}")
ioePattern[$position]=2
;;
esac
continue;
fi
if [[ $previous == "-"* ]]; then
ioeArray[$position]=${ioeArray[$position]}$i;
else
ioeArray[$position]=${ioeArray[$position]}" "$i;
fi
previous=$i;
done
echo "extracting (${ioeArray[${ioePattern[0]}]}) to (${ioeArray[${ioePattern[1]}]}) with (${ioeArray[${ioePattern[2]}]}) encoding."
inputfile=""${ioeArray[${ioePattern[0]}]}"";
outputfile=""${ioeArray[${ioePattern[1]}]}"";
encoding=""${ioeArray[${ioePattern[2]}]}"";

How to grep string and show previous word in a Linux file

i have a file with a lot of IPs and each IP have an ID, like this:
"id":340,"ip":"10.38.6.25"
"id":341,"ip":"10.38.6.26"
"id":345,"ip":"10.38.6.27"
"id":346,"ip":"110.38.6.27"
Below this Ips and after these Ips the file have more information, its a output to an API call..
I need, grep a IP and then the command shows the id, just the number. Like this:
345
EDIT: More information, the ip will be different every time, i need to pass the IP by argument. I cant parse the IP to the syntax X/X/X/X...
any ideas?
Since your current requirement is get the IDs from your broke json file, re-formatting my earlier answer.
Though I do NOT recommend this solution to get the ID, a hacky way to do this would be to use grep in PCRE mode. The way I have done the logic is to get the IP string and get the characters before it. I am not sure how to extract the digit from id alone which returns me
317,"ip":"10.38.6.2"
So using process-substitution to get the value before the first , as below.
IFS="," read -r id _< <(grep -Po ".{0,4}\"ip\":\"10.38.6.2\"" file); printf "%s\n" "$id"
317
IFS="," read -r id _< <(grep -Po ".{0,4}\"ip\":\"10.38.6.3\"" file); printf "%s\n" "$id"
318
Just add the IP you need as part of the grep string.
The below logic applies only to the your initial inputs.
Using multi-character de-limiters ; and , in awk, we can do something like:-
awk -F'[:,]' '/10\.38\.6\.27/{print $2}' file
345
A better way would be to use the match syntax equivalent to the awk // regex feature to use the variables of your choice. Provide the input IP you want in the following format.
input='"10\\.38\\.6\\.25"'
awk -F'[:,]' -v var="$input" '{ if ( match( $0, var )) {print $2};}' file
340
A more robust way to avoid matching incorrect lines would be to use " also as delimiter and do a direct match with the IP as suggested by hek2mgl.
awk -F'[:,"]' -v var="$input" '$9==var{print $4}' file
340
If you want to look up a single IP, use this:
jq ".collection|.[]|select(.ip==\"10.38.6.3\").id" data.json
If you must set IP in an argument, then write a one-liner bash script like this:
jq ".collection|.[]|select(.ip==\"$2\").id" "$1"
And call it like this:
./script data.json 10.38.6.3
grep
grep -Po ':\K\d+(?=,"ip":"xx\.xx\.xx\.xx")' file
awk -F, '/10\.38\.6\.25/ {gsub("\"","");split($1,a,":") ;print a[2]}' ip
340
or
awk -F, -v ipin="10.38.6.25" '$0 ~ ipin {gsub("\"","");split($1,a,":") ;print a[2]}' ip
$ awk -F, -v grep="10.38.6.26" '$2 ~ "\"" grep "\"" && sub(/^.*:/,"",$1) {print $1}' foo
341
Grep, SED, and AWK are inappropriate tools for JSON parsing. You whether need a tool specially designed for working with JSON data (e.g. jq), or write a script in a language that supports JSON parsing in one way, or another (examples: PHP, Perl, JavaScript).
JQ
One of the easiest ways is to use the jq tool (as mentioned in the comments to the question), e.g.:
jq '.collection[] | if .ip == "10.38.6.3" then .id else empty end' < file.json
PHP
Alternatively, you can write a simple tool in PHP, for example. PHP has a built-in JSON support.
ip-ids.php
<?php
$ip = trim($argv[1]);
$json = file_get_contents('file.json');
$json = json_decode($json, true);
foreach ($json['collection'] as $e) {
if ($e['ip'] == $ip)
echo $e['id'], PHP_EOL;
}
(sanity checks are skipped for the sake of simplicity)
Usage
php ip-ids.php '10.38.6.3'
Node.js
If you have Node installed, the following script can be used as a universal solution. You can pass any IP as the first argument, and the script will output a list of corresponding IDs.
ip-ids.js
#!/usr/bin/node
var fs = require('fs');
var ip = process.argv[2];
var json = fs.readFileSync('file.json', 'utf-8');
json = JSON.parse(json);
for (var i = 0; i < json.collection.length; i++) {
if (json.collection[i]['ip'] === ip)
console.log(json.collection[i]['id']);
}
Usage
node ip-ids.js '10.38.6.3'
or, if the executable permissions are set (chmod +x ip-ids.js):
./ip-ids.js '10.38.6.3'
Note, I have skipped sanity checks in the script for the sake of simplicity.
Conclusion
Now you can see that it is pretty easy to use jq. Scripting solutions are slightly more verbose, but not too difficult as well. Both approaches are flexible. You don't have to rely on positions of sub-strings in the JSON string, or to resort to hacks that you will most likely forget after a couple of weeks. The script solutions are reliable and readable (and thus easily maintainable), as opposed to tricky AWK/GREP/SED expressions.
Original answer
This is the original answer for the case of a file in the following format (I didn't know that the input is in JSON format). Still, this solution seems to work even with the partial JSON you currently pasted into the question.
"id":340,"ip":"10.38.6.25"
"id":341,"ip":"10.38.6.26"
"id":345,"ip":"10.38.6.27"
Perl version:
perl -ne '/"id":(\d+).*"ip":"10\.38\.6\.27"/ and print "$1\n"' file
You example is not valid JSON. In order to get valid JSON you have to add curly braces. This is done by the sed in the following example.
$ sed 's/^/{/;s/$/}/' <<EOF | jq -s 'map(select(.ip == "10.38.6.27")) | map(.id) | .[]'
> "id":340,"ip":"10.38.6.25"
> "id":341,"ip":"10.38.6.26"
> "id":345,"ip":"10.38.6.27"
> "id":346,"ip":"110.38.6.27"
> EOF
345
Normally jq reads just one object. With the option -s jq reads all objects, because you have a list input. The first map iterates over the list and selects only those objects with the matching attribute ip. This is the same as a grep. The second map takes just the id attribute from the result and the final .[] the the opposite to the -s option.
If you can make your json pretty and then do cat file, below command might help
cat /tmp/file|grep -B 1 "ipaddress"|grep -w id|tr ' ' '\0'|cut -d: -f2|cut -d, -f1

Country and External IP Bash script

I created a script based on what I could find on the INTERNET and some bash tutorials, that will show me my external IP and the country it's located in.
#
Script looks like this:
#!/bin/bash
wanip=$(dig +short myip.opendns.com #resolver1.opendns.com);
echo "$wanip" > /root/Documents/filewanip;
iplist="/root/Documents/filewanip"
while read IP;do
whois "$IP"
done < "$iplist" | grep "country" >geoloc
cat geoloc filewanip
rm filewanip geoloc
#
Output looks like this:
country: Holland
183.64.132.80
#
Problem is that I don't want to use files to do this as file structure obviously changes from system to system.
How can I make it in an elegant way so the check is made and stored into a variable(s) and then displayed directly into the shell?
John Connor
As I understand it, your goal is to eliminate the use of temporary files. In that case:
#!/bin/bash
wanip=$(dig +short myip.opendns.com #resolver1.opendns.com);
echo "$wanip" | while read ip; do
echo "$ip $(whois "$ip" | awk ' /[Cc]ountry/{print $2}')"
done
The above was written as if dig returns more than one address for your IP. If that is not the case, then the while loop is superfluous.
If you are only expecting one IP, then:
#!/bin/bash
ip=$(dig +short myip.opendns.com #resolver1.opendns.com);
echo "$ip $(whois "$ip" | awk ' /[Cc]ountry/{print $2}')"
Notes:
I converted IP to ip because it is best practices to use lower or mixed case names for your shell variables.
My whois returns a line with Country, not country. So, I made the search for country case insensitive.

Mail output with Bash Script

SSH from Host A to a few hosts (only one listed below right now) using the SSH Key I generated and then go to a specific file, grep for a specific word with a date of yesterday .. then I want to email this output to myself.
It is sending an email but it is giving me the command as opposed to the output from the command.
#!/bin/bash
HOST="XXXXXXXXXXXXXXXXXX, XXXXXXXXXXXXX"
DATE=$(date -d "yesterday")
INVALID=' cat /xxx/xxx/xxxxx | grep 'WORD' | sed 's/$/.\n/g' | grep "$DATE"'
COUNT=$(echo "$INVALID" | wc -c)
for x in $HOSTS
do
ssh BLA#"$x" $COUNT
if [ "$COUNT" -gt 1 ];
then
EMAILTEXT=""
if [ "$COUNT" -gt 1 ];
then
EMAILTEXT="$INVALID"
fi
fi
done | echo -e "$EMAILTEXT" | mail XXXXXXXXXXX.com
This isn't properly an attempt to answer your question, but I think you should be aware of some fundamental problems with your code.
INVALID=' cat /xxx/xxx/xxxxx | grep 'WORD' | sed 's/$/.\n/g' | grep "$DATE"'
This assigns a simple string to the variable INVALID. Because of quoting issues, s/$/.\n/g is not quoted at all, and will probably be mangled by the shell. (You cannot nest single quotes -- the first single-quoted string extends from the first quote to the next one, and then WORD is outside of any quotes, followed by the next single-quoted string, etc.)
If your intent is to execute this as a command at this point, you are looking for a command substitution; with the multiple layers of uselessness peeled off, perhaps something like
INVALID=$(sed -n -e '/WORD/!d' -e "/$DATE/s/$/./p" /xxx/xxx/xxxx)
which looks for a line matching WORD and $DATE and prints the match with a dot appended at the end -- I believe that's what your code boils down to, but without further insights into what this code is supposed to do, it's impossible to tell if this is what you actually need.
COUNT=$(echo "$INVALID" | wc -c)
This assigns a number to $COUNT. With your static definition of INVALID, the number will always be 62; but I guess that's not actually what you want here.
for x in $HOSTS
do
ssh BLA#"$x" $COUNT
This attempts to execute that number as a command on a number of remote hosts (except the loop is over HOSTS and the variable containing the hosts is named just HOST). This cannot possibly be useful, unless you have a battery of commands named as natural numbers which do something useful on these remote hosts; but I think it's safe to assume that that is not what is supposed to be going on here (and if it was, it would absolutely be necessary to explain this in your question).
if [ "$COUNT" -gt 1 ];
then
EMAILTEXT=""
if [ "$COUNT" -gt 1 ];
then
EMAILTEXT="$INVALID"
fi
fi
So EMAILTEXT is either an empty string or the value of INVALID. You assigned it to be a static string above, which is probably the source of your immediate question. But even if it was somehow assigned to a command on the local host, why do you need to visit remote hosts and execute something there? Or is your intent actually to execute the command on each remote host and obtain the output?
done | echo -e "$EMAILTEXT" | mail XXXXXXXXXXX.com
Piping into echo makes no sense at all, because it does not read its standard input. You should probably just have a newline after done; though a possibly more useful arrangement would be to have your loop produce output which we then pipe to mail.
Purely speculatively, perhaps something like the following is what you actually want.
for host in $HOSTS; do
ssh BLA#"$host" sed -n -e '/WORD/!d' -e "/$DATE/s/$/./p" /xxx/xxx/xxxx |
grep . || echo INVALID
done | mail XXXXXXXXXXX.com
If you want to check that there is strictly more than one line of output (which is what the -gt 1 suggests) then this may need to be a little bit more complicated.
Your command substitution is not working. You should read up on how it works but here are the problem lines:
COUNT=$(echo "$INVALID" | wc -c)
[...]
ssh BLA#"$x" $COUNT
should be:
COUNT_CMD="'${INVALID} | wc -c'"
[...]
COUNT=$(ssh BLA#"$x" $COUNT_CMD)
This inserts the value of $INVALID into the string, and puts the whole thing in single quotes. The single quotes are necessary for the ssh call so the pipes aren't evaluated in the script but on the remote host. (COUNT is changed to COUNT_CMD for readability/clarity.)
EDIT:
I misread the question and have corrected my answer.

Is there any better way to get mac address from arp table?

I want to get a mac address from arp table by using ip address. Currently I am using this command
arp -a $ipAddress | awk '{print $4}'
This command prints what I want. But I am not comfortable with it and I wonder if there is any built-in way or more stable way to do this.
You can parse the /proc/net/arp file using awk:
awk "/^${ipAddress//./\.}\>/"' { print $4 }' /proc/net/arp
but I'm not sure it's simpler (it saves one fork and a subshell, though).
If you want a 100% bash solution:
while read ip _ _ mac _; do
[[ "$ip" == "$ipAddress" ]] && break
done < /proc/net/arp
echo "$mac"
Well, you could write a program (such as in C) to actually use the ARP protocol (yes, I know that's redundant, like ATM machine or PIN number) itself to get you the information but that's likely to be a lot harder than a simple pipeline.
Perhaps you should examine your comfort level a little more critically, since it's likely to cause you some unnecessary effort :-)
The manpage for the Linux ARP kernel module lists several methods for manipulating or reading the ARP tabes, ioctl probably being the easiest.
The output of arp -a is locale dependent (i.e. it changes with your system language). So it might be a good idea to at least force it to the default locale:
LC_ALL=C arp -a $ipAddress | awk '{print $4}'
However, I share your fear that the output of arp -a is not meant to be parsed. If your program is restricted to linux system, another option would be to parse the file /proc/net/arp. This file is exported by the kernel and is what arp itself parses to get its information. The format of this file is described in the manpage proc(5), see man 5 proc.
This can be easily done with awk:
awk '$1==IPADDRESS {print $4}' /proc/net/arp
Here's an awk + sed solution which doesn't assume the column number is always 4.
#!/bin/bash
cat /proc/net/arp |\
# remove space from column headers
sed 's/\([^ ]\)[ ]\([^ ]\)/\1_\2/g' |\
# find HW_address column number and/or print that column
awk '{
if ( !column ) {
for (i = 1; i <= NF; i++ ) {
if ( $i ~ /HW_address/ ) { column=i }
};
print $column
}
else {
print $column
}
}'
There are still fragile assumptions here, such as the column name being "HW address".
Update, removed PIPE
sed -nr 's/^'${ipAddress//./\.}'.*(([0-9A-Za-z]{2}:){5}[0-9A-Za-z]{2}).*$/\1/p' /proc/net/arp
Solution for non-fixed column;
arp -a $ipAddress | sed -n 's/^.*\(\([0-9A-Z]\{2\}:\)\{5\}[0-9A-Z]\{2\}\).*$/\1/p'
Explanation
^.* - Match start of string ^ followed by any character .*.
[0-9A-Z]\{2\}: - Match any character of numeric alpha-numeric twice followed by colon.
\([0-9A-Z]\{2\}:\)\{5\} - Match the pattern between the ( ) five times.
[0-9A-Z]\{2\} - Match any character of numeric alpha-numeric twice.
.*$ - Match any characters zero or more times .* until end of string $.
\1/p - Return capture pattern 1 / p print the match.
You can use this one for scripting:
awk ' $1~/[[:digit:]]/ {print $4}' /proc/net/arp
what it do:
read /proc/net/arp (standard arp output)
searchig for strings with [0-9]
get the 4rd "column" with mac adresses
Enjoy!
I prefer to use the arping command to explicitly query the MAC of some IP address (this also updates the local ARP cache):
arping -c 1 192.168.2.24 | grep -Eo "([0-9a-fA-F]{2}:){5}[0-9a-fA-F]"
It's very useful to find if there exist two or more hosts using the same IP address (add -D option), or to check the current IP addresses used in the local VLAN with a simple script like:
for i in $(seq 1 254); do
IP="192.168.5.$i"
MAC=$(arping -c 1 $IP | grep -Eo "([0-9a-fA-F]{2}:){5}[0-9a-fA-F]")
if [ "$MAC" ] ; then
echo "$IP $MAC"
fi
done
Note that arping can't detect the IP address of the local host in this way (but we can add checks in the script to show it if exists in the range).
There exist several versions of arping with slightly different options and output. In Linux Ubuntu there are one in the package iputils-arping and other in the package arping.
Note: To answer the question and not the problem, when filtering /proc/net/arp you must use a regex that ensures the full match, like ending the expression with a space (otherwise, in this example, it will show also 2.240-2.249 addresses if present):
ipaddress="192.168.2.24"
grep "^${ipaddress} " /proc/net/arp | grep -Eo "([0-9a-fA-F]{2}:){5}[0-9a-fA-F]")

Resources