perl + Replace IP address only if octet is VALID IP [duplicate] - linux

This question already has answers here:
grep + match exactly IP address with Regular Expression
(4 answers)
Closed 9 years ago.
The target of the following perl one liner code is to replace the first three octets ( in case the four octet is digit/number - xxx.xxx.xxx.digit )
remark - I use linux and solaris machines
The problem is that the perl one liner will also replace the first three octets while the four octet IP is not valid IP octet ( for example 5.5.5.555 )
The following perl one liner code example show how the perl syntax replaced the first three octets in spite the four octet isn’t VALID IP
# export OLD_IP=1.1.1
# export NEW_IP=5.5.5
# echo 1.1.1.555 | perl -i -pe 'next if /^ *#/; s/(?<![\d.])\Q$ENV{OLD_IP}\E(?=\.\d)/$ENV{NEW_IP}/g'
5.5.5.555
Please advice what need to add in my perl one liner code ,
in order to replace the first three octets
only if four octet is VALID IP ( between 0 – 255 )

If the final octet is always just a single digit then you can do the same as at the start of the pattern and ensure that there is no digit after the first.
s/(?<![\d.])\Q$ENV{OLD_IP}\E(?=\.\d(?!\d))/$ENV{NEW_IP}/g
This way is much easier than checking for a valid final octet from 0 to 255, which would look like
s/(?<![\d.])\Q$ENV{OLD_IP}\E(?=\.(?:1?\d?\d|2[0-4][0-9]|25[0-5])(?!\d))/$ENV{NEW_IP}/g

My first thought is to use a regular expression:
Next expresion should get you started, it checks for different groups, single digits 0 to 9, double digits 0 to 9, three digits starting with 1, three digits starting with 2 but not followed by 5, three digits followed by 5 restricted to 5 for the third digit
^(?:[23456789]|[0123456789][0123456789]|1[0123456789][0123456789]|2[01234][0123456789]|25[012345]?)$
did test it in an online regex tester and it seems to filter it nicely
my knowledge of PERL has been dorment for over twelve years, but shouldn't it be possible to get the output and check that for value:
/\d\.\d\.\d\.(\d)/ if $1 < 255

Here is a simple solution using the eval flag for substitution:
perl -i -pe 's/\b(1\.1\.1\.(\d+))\b/ $2 >= 0 && $2 < 255 ? "5.5.5.$2" : $1/ge'
Test:
echo 1.1.1.12 | perl -i -pe 's/\b(1\.1\.1\.(\d+))\b/ $2 >= 0 && $2 < 255 ? "5.5.5.$2" : $1/ge'
5.5.5.12
echo 1.2.1.256 | perl -i -pe 's/\b(1\.1\.1\.(\d+))\b/ $2 >= 0 && $2 <= 255 ? "5.5.5.$2" : $1/ge'
1.2.1.256

Related

How to return only integers from a variable in Shell Script and discard letters and leading zeros?

In my shell script there is a parameter that comes from certain systems and it gives an answer similar to this one: PAR0000008.
And I need to send only the last number of this parameter to another variable, ie VAR=8.
I used the command VAR=$( echo ${PAR} | cut -c 10 ) and it worked perfectly.
The problem is when the PAR parameter returns with numbers from two decimal places like PAR0000012. I need to discard the leading zeros and send only the number 12 to the variable, but I don't know how to do the logic in the Shell to discard all the characters to the left of the number.
Edit Using grep To Handle 0 As Part Of Final Number
Since you are using POSIX shell, making use of a utility like sed or grep (or cut) makes sense. grep is quite a bit more flexible in parsing the string allowing a REGEX match to handle the job. Say your variable v=PAR0312012 and you want the result r=312012. You can use a command substitution (e.g. $(...)) to parse the value assigning the result to r, e.g.
v=PAR0312012
r=$(echo $v | grep -Eo '[1-9].*$')
echo $r
The grep expression is:
-Eo - use Extended REGEX and only return matching portion of string,
[1-9].*$ - from the first character in [1-9] return the remainder of the string.
This will work for PAR0000012 or PAR0312012 (with result 312012).
Result
For PAR0312012
312012
Another Solution Using expr
If your variable can have zeros as part of the final number portion, then you must find the index where the first [1-9] character occurs, and then assign the substring beginning at that index to your result variable.
POSIX shell provides expr which provides a set of string parsing tools that can to this. The needed commands are:
expr index string charlist
and
expr substr string start end
Where start and end are the beginning and ending indexes to extract from the string. end just has to be long enough to encompass the entire substring, so you can just use the total length of your string, e.g.
v=PAR0312012
ndx=$(expr index "$v" "123456789")
r=$(expr substr "$v" "$ndx" 10)
echo $r
Result
312012
This will handle 0 anywhere after the first [1-9].
(note: the old expr ... isn't the fastest way of handling this, but if you are only concerned with a few tens of thousands of values, it will work fine. A billion numbers and another method will likely be needed)
This can be done easily using Parameter Expension.
var='PAR0000008'
echo "${var##*0}"
//prints 8
echo "${var##*[^1-9]}"
//prints 8
var="${var##*0}"
echo "$var"
//prints 8
var='PAR0000012'
echo "${var##*0}"
//prints 12
echo "${var##*[^1-9]}"
//prints 12
var="${var##*[^1-9]}"
echo "$var"
//prints 12

Convert carriage return (\r) to actual overwrite

Questions
Is there a way to convert the carriage returns to actual overwrite in a string so that 000000000000\r1010 is transformed to 101000000000?
Context
1. Initial objective:
Having a number x (between 0 and 255) in base 10, I want to convert this number in base 2, add trailing zeros to get a 12-digits long binary representation, generate 12 different numbers (each of them made of the last n digits in base 2, with n between 1 and 12) and print the base 10 representation of these 12 numbers.
2. Example:
With x = 10
Base 2 is 1010
With trailing zeros 101000000000
Extract the 12 "leading" numbers: 1, 10, 101, 1010, 10100, 101000, ...
Convert to base 10: 1, 2, 5, 10, 20, 40, ...
3. What I have done (it does not work):
x=10
x_base2="$(echo "obase=2;ibase=10;${x}" | bc)"
x_base2_padded="$(printf '%012d\r%s' 0 "${x_base2}")"
for i in {1..12}
do
t=$(echo ${x_base2_padded:0:${i}})
echo "obase=10;ibase=2;${t}" | bc
done
4. Why it does not work
Because the variable x_base2_padded contains the whole sequence 000000000000\r1010. This can be confirmed using hexdump for instance. In the for loop, when I extract the first 12 characters, I only get zeros.
5. Alternatives
I know I can find alternative by literally adding zeros to the variable as follow:
x_base2=1010
x_base2_padded="$(printf '%s%0.*d' "${x_base2}" $((12-${#x_base2})) 0)"
Or by padding with zeros using printf and rev
x_base2=1010
x_base2_padded="$(printf '%012s' "$(printf "${x_base2}" | rev)" | rev)"
Although these alternatives solve my problem now and let me continue my work, it does not really answer my question.
Related issue
The same problem may be observed in different contexts. For instance if one tries to concatenate multiple strings containing carriage returns. The result may be hard to predict.
str=$'bar\rfoo'
echo "${str}"
echo "${str}${str}"
echo "${str}${str}${str}"
echo "${str}${str}${str}${str}"
echo "${str}${str}${str}${str}${str}"
The first echo will output foo. Although you might expect the other echo to output foofoofoo..., they all output foobar.
The following function overwrite transforms its argument such that after each carriage return \r the beginning of the string is actually overwritten:
overwrite() {
local segment result=
while IFS= read -rd $'\r' segment; do
result="$segment${result:${#segment}}"
done < <(printf '%s\r' "$#")
printf %s "$result"
}
Example
$ overwrite $'abcdef\r0123\rxy'
xy23ef
Note that the printed string is actually xy23ef, unlike echo $'abcdef\r0123\rxy' which only seems to print the same string, but still prints \r which is then interpreted by your terminal such that the result looks the same. You can confirm this with hexdump:
$ echo $'abcdef\r0123\rxy' | hexdump -c
0000000 a b c d e f \r 0 1 2 3 \r x y \n
000000f
$ overwrite $'abcdef\r0123\rxy' | hexdump -c
0000000 x y 2 3 e f
0000006
The function overwrite also supports overwriting by arguments instead of \r-delimited segments:
$ overwrite abcdef 0123 xy
xy23ef
To convert variables in-place, use a subshell: myvar=$(overwrite "$myvar")
With awk, you'd set the field delimiter to \r and iterate through fields printing only the visible portions of them.
awk -F'\r' '{
offset = 1
for (i=NF; i>0; i--) {
if (offset <= length($i)) {
printf "%s", substr($i, offset)
offset = length($i) + 1
}
}
print ""
}'
This is indeed too long to put into a command substitution. So you better wrap this in a function, and pipe the lines to be resolved to that.
To answer the specific question, how to convert 000000000000\r1010 to 101000000000, refer to Socowi's answer.
However, I wouldn't introduce the carriage return in the first place and solve the problem like this:
#!/usr/bin/env bash
x=$1
# Start with 12 zeroes
var='000000000000'
# Convert input to binary
binary=$(bc <<< "obase = 2; $x")
# Rightpad with zeroes: ${#binary} is the number of characters in $binary,
# and ${var:x} removes the first x characters from $var
var=$binary${var:${#binary}}
# Print 12 substrings, convert to decimal: ${var:0:i} extracts the first
# i characters from $var, and $((x#$var)) interprets $var in base x
for ((i = 1; i <= ${#var}; ++i)); do
echo "$((2#${var:0:i}))"
done

Isolate product names from strings by matching string after (including) first letter in a variable

I have a bunch of strings of following pattern in a text file:
201194_2012110634 Appliance 130 AB i Some optional (Notes )
300723_2017050006(2016111550) Device 16 AB i Note
The first part is serial, the second is date. Device/Appliance name and model (about 10 possible different names) is the string after date number and before (including AB i).
I was able to isolate dates and serials using
SERIAL=${line:0:6}
YEAR=${line:7:4}
I'm trying to isolate Device name and note after that:
#!/bin/bash
while IFS= read line || [[ -n $line ]]; do
NAME=${line#*[a-zA-Z]}
STRINGAP='Appliance '"${line/#*Appliance/}"
The first approach is to take everything after the first letter appearing in line, which gives me
NAME = ppliance 130 AB i Some optional (Notes )
The second approach is to write tests for each of the ~10 possible appliance/device names and then append appliance name after the subtracted test. Then test variable which actually matched Appliance / Device (or other name) and use that to input into the database.
Is it possible to write a line that would select everything, including first letter in a line, in text file? Then I would subtract everything after AB i to get notes and everything before AB i would become appliance name.
Remove the ${line#*[az-A-Z]} line (which will, as you see, remove the first character of the name), and instead use
STRINGAP=$(echo "$line" | sed 's/^[0-9_]* \(.*\) AB i.*/\1/')
This drops the leading digits and underscore, and everything from " AB i" to the end.
Edit: The details are unclear - do you want to keep the "AB i", and will it always be "AB i"? If you want it, change the line to
STRINGAP=$(echo "$line" | sed 's/^[0-9_]* \(.* AB i\).*/\1/')
I also forgot the double quotes round the text line.
You can use sed and read to give you more control of parsing.
tmp> line2="300723_2017050006(2016111550) Device 16 AB i Note"
tmp> read serial date type val <<<$(echo $line2 | \
sed 's/\([0-9]*\)_\([0-9]*\)[^A-Z]*\(Device\|Appliance\) \
\([0-9]*\).*/\1 \2 \3 \4/')
tmp> echo "$serial|$date|$type|$val"
300723|2017050006|Device|16
Basically, read allows you to assign multiple variables in one line. The sed statment parses the line, and gives you space delimitted output of its results. You can also read each variable seperately if you don't mind running sed a few extra times:
device="$(echo $line2 | sed -e 's/^.*Device \([0-9]*\).*/\1/;t;d')"
appliance="$(echo $line2 | sed -e 's/^.*Appliance \([0-9]*\).*/\1/;t;d')"
This way $device is populated with device if present, and is blank otherwise (note the -e and ;t;d at the end of the regex to prevent it from dumping the line if it doesn't match.)
Your question isn't clear but it seems like you might be trying to parse strings into substrings. Try this with GNU awk for the 3rd arg to match() and let us know if there's something else you were looking for:
$ awk 'match($0,/^([0-9]+)_([0-9]+)(\([0-9]+\))?\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(.*)/,a) {
for (i=1; i<=8; i++) {
print i, a[i]
}
print "---"
}' file
1 201194
2 2012110634
3
4 Appliance
5 130
6 AB
7 i
8 Some optional (Notes )
---
1 300723
2 2017050006
3 (2016111550)
4 Device
5 16
6 AB
7 i
8 Note
---
If you wanted a CSV output, for example, then it'd just be:
$ awk -v OFS=',' 'match($0,/^([0-9]+)_([0-9]+)(\([0-9]+\))?\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(.*)/,a) {
for (i=1; i<=8; i++) {
printf "%s%s", a[i], (i<8?OFS:ORS)
}
}' file
201194,2012110634,,Appliance,130,AB,i,Some optional (Notes )
300723,2017050006,(2016111550),Device,16,AB,i,Note
Massage to suit...

Generating a 'unique' 3 character tag from 4 Hex digits

I'm working on a project using Raspberry Pi in a stateless net booted environment for a local museum. I want the Raspberry Pi to generate an easy to remember 3 character code from the last 4 digits of the MAC address. This code will be read by a person and input into a database describing what the machine will do, and on future reboots the Raspberry Pi will look up its own code to identify it's tasks.
It could be done with part of the MAC address alone, but I would like to generate a more human compatible code to reduce errors. The conversion only has to work in one direction.
I think I've broken it down into logical steps, but I'm not sure how I could implement them in an efficient way.
Take the last 4 chars from the MAC address
Convert to binary - 16 bits
Delete MSB to leave 15 bits
Reverse the order of the bits - in my mind this would randomize the characters a bit more to make the final string more memorable
Break the 15 bits into 3 chunks of 5 bits
Map each 5 bit chunk to a letter/number skipping I,1,O & 0 to avoid confusion
Output the 3 character string
You can divide the last 13 bits into groups of 5, 3, 5 bits. Choose a consonant letter using the first and the last (there are 21, needing 5 bits, and you can repeat from 22), and choose a vowel letter for the middle 3 (there are 5, same way of choosing). This will give mostly pronunciable "names", and some of them may be hilarious too.
I worked through the responses above and came up with a solution. It's not perfect, but it works...
if [ ! -f /home/pi/MAC_ADDRESS ]; then
BASE28="ABC3DEFG4HJK6LMN7PQRT8UVW9XY";
arg1=$(ifconfig | grep eth0 | awk '{if ($4 == "HWaddr") print $5 }');
echo $arg1 > /home/pi/MAC_ADDRESS;
arg1="${arg1//:}";
arg1=${arg1:8:4};
arg1=$(echo $arg1 | tr '[:lower:]' '[:upper:]');
arg1=$(bc <<< "obase=28; ibase=16; $arg1");
c="";
for i in $arg1; do
if [ $i -eq 08 ]; then
i=8;
fi
if [ $i -eq 09 ]; then
i=9;
fi
b=${BASE28:$i:1};
c=$c$b;
done
c=${c:(-3)};
echo $c > /home/pi/CLIENT_ID;
fi
In the for loop I needed to fix the 08 and 09 for some reason...

grep + match exactly IP address with Regular Expression

my target is to match exactly IP address with three octes , while the four IP octet must be valid octet - between <0 to 255>
For example I have the following IP's in file
$ more file
192.9.200.10
192.9.200.100
192.9.200.1555
192.9.200.1
192.9.200.aaa
192.9.200.#
192.9.200.:
192.9.200
192.9.200.
I need to match the first three octets - 192.9.200 while four octet must be valid ( 0-255)
so finally - expects result should be:
192.9.200.10
192.9.200.100
192.9.200.1
the basic syntax should be as the following:
IP_ADDRESS_THREE_OCTETS=192.9.200
cat file | grep -x $IP_ADDRESS_THREE_OCTETS.[ grep‏ Regular Expression syntax ]
Please advice how to write the right "grep regular Expression" in the four octets in order to match the three octets , while the four octets must be valid?
You'd need to use some high-level tools to convert the text to a regex pattern, so you might as well use just that.
perl -ne'
BEGIN { $base = shift(#ARGV); }
print if /^\Q$base\E\.([0-9]+)$/ && 0 <= $1 && $1 <= 255;
' "$IP_ADDRESS_THREE_OCTETS" file
If hardcoding the base is acceptable, that reduces to:
perl -ne'print if /^192\.9\.200\.([0-9]+)$/ && 0 <= $1 && $1 <= 255' file
Both of these snippets also accept input from STDIN.
For a full IP address:
perl -ne'
BEGIN { $ip = shift(#ARGV); }
print if /^\Q$ip\E$/;
' 1.2.3.4 file
or
perl -nle'
BEGIN { $ip = shift(#ARGV); }
print if $_ eq $ip;
' 1.2.3.4 file
Regexp is not good for comparing numbers, I'd do this with awk:
$ awk -F. '$1==192 && $2==9 && $3==200 && $4>=0 && $4<=255 && NF==4' file
192.9.200.10
192.9.200.100
192.9.200.1
If you really want to use grep you need the -E flag for extended regexp or use egrep because you need alternation:
$ grep -Ex '192\.9\.200\.(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])' file
192.9.200.10
192.9.200.100
192.9.200.1
$ IP=192\.9\.200\.
$ grep -Ex "$IP(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])" file
Note: You must escaped . to mean a literal period.
grep -E '^((25[0-5]|2[0-4][0-9]|[1]?[1-9][0-9]?).){3}(25[0-5]|2[0-4][0-9]|[1]?[1-9]?[0-9])$'
This expression will not match IP addresses with leading 0s. e.g., it won't match 192.168.1.01
This expression will not match IP addresses with more than 4 octets. e.g., it won't match 192.168.1.2.3
If you really want to be certain that what you have is a valid IPv4 address, you can always check the return value of inet_aton() (part of the Socket core module).

Resources