Issue in sed command - linux

I am trying to change nw_src in the following string:
cookie=0xb868a1f26498cddd, duration=5327.613s, table=0, n_packets=199, n_bytes=19502, priority=30,icmp,in_port="qvo2495b490-33",nw_src=10.0.0.133,nw_dst=8.8.8.0/24 actions=group:2
command used:
sed -i -e
's/nw_src=(^((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$,)/nw_src=1.1.1.1/g' flows.txt
It seems to not work however when i generate data from regex: it shows all the permutation of generated data right.
I am just trying to replace nw_src=<any_ipv4_address> to nw_src=1.1.1.1
Also nw_src appears only once in line
What am i missing.
Please help

Let's start with the regular expression that verifies that a text fragment is a one, two or three digit substring, representing a numeric value between 0 and 255:
2(5[0-5]|[0-4][0-9])|1[0-9]{2}|[1-9]?[0-9]
For example, run the following command and inspect its output:
seq 0 999 | grep -E '^(2(5[0-5]|[0-4][0-9])|1[0-9]{2}|[1-9]?[0-9])$'
seq generates all integers from 0 to 999, one per line, and grep will return only the integers from 0 to 255. Note that I had to enclose the regexp in parentheses, so that the anchors ^ and $ apply to the whole thing. In general, you need to have the regexp preceded and followed by non-digits, or by some sort of zero-length assertions to make sure the string is not a substring of a longer sequence of digits.
If you must replace the value of nw_src, but only when it is a valid IP4 address (and leave it as is if it's not a valid IP4 address), you could do this:
sed -iE 's/\bnw_src=(2(5[0-5]|[0-4][0-9])|1[0-9]{2}|[1-9]?[0-9])(\.\1){3}\b/nw_src=1.1.1.1/'
If it is guaranteed that the "old" value is always a valid IP4 address (so you don't need to verify that in this command), or if you need to replace the "old" value regardless of whether it is a valid address or not, the command can be simplified quite a bit; for example
... 's/\bnw_src=[^,]*/nw_src=1.1.1.1/'

If all you are trying to do is replace nw_src=<any_ipv4_address> to nw_src=1.1.1.1 then the following will suffice
sed 's/\(.*nw_src=\)\([^,]*\),\(.*\)/\11.1.1.1,\2/' network_file
Output
$ sed 's/\(.*nw_src=\)\([^,]*\),\(.*\)/\11.1.1.1,\2/' network_file
cookie=0xb868a1f26498cddd, duration=5327.613s, table=0, n_packets=199, n_bytes=19502, priority=30,icmp,in_port="qvo2495b490-33",nw_src=1.1.1.1,10.0.0.133

Related

How to replace a range of numbers from a range of strings with sed

I'm trying to modify a given text file, wherein I want to change/alter the following strings, eg:
lcl|NC_018257.1_cds_XP_003862892.1_5067
lcl|NC_018241.1_cds_XP_003859498.1_1683
lcl|NC_018256.1_cds_XP_003862456.1_4633
lcl|NC_018237.1_cds_XP_003858978.1_1163
lcl|NC_018254.1_cds_XP_003861926.1_4104
so that it only contains the XP_n.1 part of the string.
I have successfully removed the lcl|NC\_*.1_cds\_ part out of the strings for which
I used the following sed command:
sed 's/lcl|NC\_.\*_cds_//g' cds.fa > cds4.fa
The resultant text file contains strings like XP_003862892.1_5067.
There are about 8014 strings like this ranging from XP_*.1_1 to XP_*.1_8014. I want to delete the _1 to _8014 part of the string and replace it with 1.
I tried using
sed 's/1\_./1/g'
and it seemed to have worked, however when I scrolled further down the list of strings, the double digit numbers didn't get replaced - only one of the digits was replaced, which immediately followed the '_', resulting in the first digit turning into 1 and the rest retaining their original identity. Same with triple and quadruple digit numbers.
eg:
XP_003857837.1_23 ---> XP_003857837.13
XP_003857942.1_228 ---> XP_003857942.128
I have absolutely no idea how to remove this, all my attempts have led to failure. Some people have asked me for what my desired output should look like, the ideal output would be: XP_003857837.1, each string should be followed by a .1 instead of .1_SomeNumberRangingFrom1to8014
You can do everything in one go with a slightly more complex regex.
sed 's/lcl|NC_.*_cds_\(XP_[0-9.]*\)_.*/\1/' cds.fa > cds4.fa
The backslashed parentheses create a capturing group, and \1 in the replacement recalls the first captured group (\2 for the second, etc, if you have more than one). The regex inside the group looks for XP_ followed by digits and dots, and the expression after matches the rest of the line from the next uderscore on.
In other words, this basically says "replace the whole line with just the part we care about".
By the by, there is no reason to backslash underscores anywhere, and the /g option to the s command only makes sense when you want to replace multiple occurrences on the same input line.
Using sed
$ sed 's/.*_\?\(XP_[^.]*\.\)[^_]*_[0-9]\(.*\)/\11\2/'
XP_003862892.1067
XP_003859498.1683
XP_003862456.1633
XP_003858978.1163
XP_003861926.1104
XP_003857837.13
XP_003857942.128

Confusion with regex expression for valid IPv4 addresses [duplicate]

This question already has answers here:
Validating IPv4 addresses with regexp
(44 answers)
Closed 2 years ago.
I'm trying to write a Regex expression for selecting the valid IPv4 addresses out of a file which contains many valid, invalid(both) type of addresses.
I have already written the Regex for doing that but two of invalid IPv4 addresses are still printing out - 255.255.256.255 and 8.234.88,55
Can anyone help me understanding why these two are printing out with regex that I have put.
((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){1,3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
I am using this regex to filter valid IPv4 addresses through the file containing, below listed IPv4 addresses.
12.12.12.12
127.0.0.0
255.255.256.255
344.19.0.1.
12.255.12.255
138.168.5.193
256.123.256.123
195.45.13.0
8.234.88.55
1334.0.1.234
196.83.83.191
133.133.133.133
8.234.88,55
203.26.27.38
88.173.71.66
136.186.20.9
241.92.88.103
I want to know why this regex expression is matching with 255.255.256.255 and 8.234.88,55 IPv4 addresses.
why this regex expression is matching with 255.255.256.255 and 8.234.88,55 IPv4 addresses.
It doesn't. It matches parts of that string. Most probably you did:
$ echo '255.255.256.255' | grep -E '((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){1,3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)'
255.255.256.255
Yay, it works. But the pattern doesn't match the whole like, it matches parts 255.255.25 and 6.255 separately. The {1,3} allows the first part to match only once or twice, not necessarily 3 times. Like:
((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.)((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.)(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
25 5 . 25 5 . 2 5 6.255
^^^^^ - left over
Because of the {1,3} the first part may be matched only once. Because grep applies regex to part of the string and because the full regex matched, the line is printed.
Similarly for 8.234.88,55 the part 8.234.88 is matched and ,55 is not matched. Is cool to see:
$ echo '8.234.88,55' | grep --color -E '(((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){1,3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){1}'
8.234.88,55
^^^^^^^^ - is red
To match the whole line do grep -x or add anchors ^....$ or most probably you want to change {1,3} to {3} to match exactly 3 parts.
Your regular expression is not anchored to the beginning and end of the strings. It matches fragments of each line, not the entire line.
Put your regex between ^ and $.
^ matches the beginning of the string; $ matches the end of the string.
If multi-line matching is enabled, ^ matches the beginning of a line, $ matches the end of a line.
Also, the regex slightly incorrect and this makes it match less than it should. An IPv4 address always has 4 components. Because of {1,3}, your regex allows 2 to 4 components. Combined with the lack of anchors, it finds two matches in the lines you mentioned.
Take a look at regex101.com.
The regex should be:
^((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$
((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.)
I've tried your expression in C++.
Adding an extra slash before the dot solved here for the comma issue.
It parsed a comma because you are missing a slash, the way it is being written interpretes the dot as "parse any character but EOL".
Also your expression is allowing values to be prefixed by a 0 when you put [01]?
There goes a suggestion on how to tackle the expression: if the it has only one digit, how can it be written? Then 2 digits then 3...
(([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\\.){3}([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])

Grep filtering of the dictionary

I'm having a hard time getting a grasp of using grep for a class i am in was hoping someone could help guide me in this assignment. The Assignment is as follows.
Using grep print all 5 letter lower case words from the linux dictionary that have a single letter duplicated one time (aabbe or ababe not valid because both a and b are in the word twice). Next to that print the duplicated letter followed buy the non-duplicated letters in alphabetically ascending order.
The Teacher noted that we will need to use several (6) grep statements (piping the results to the next grep) and a sed statement (String Editor) to reformat the final set of words, then pipe them into a read loop where you tear apart the three non-dup letters and sort them.
Sample Output:
aback a bck
abaft a bft
abase a bes
abash a bhs
abask a bks
abate a bet
I haven't figured out how to do more then printing 5 character words,
grep "^.....$" /usr/share/dict/words |
Didn't check it thoroughly, but this might work
tr '[:upper:]' '[:lower:]' | egrep -x '[a-z]{5}' | sed -r 's/^(.*)(.)(.*)\2(.*)$/\2 \1\3\4/' | grep " " | egrep -v "(.).*\1"
But do your way because someone might see it here.
All in one sed
sed -n '
# filter 5 letter word
/[a-zA-Z]\{5\}/ {
# lower letters
y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxya/
# filter non single double letter
/\(.\).*\1/ !b
/\(.\).*\(.\).*\1.*\1/ b
/\(.\).*\(.\).*\1.*\2/ b
/\(.\).*\(.\).*\2.*\1/ b
# extract peer and single
s/\(.\)*\(.\)\(.*\)\2\(.*\)/a & \2:\1\3\4/
# sort singles
:sort
s/:\([^a]*\)a\(.*\)$/:\1\2a/
y/abcdefghijklmnopqrstuvwxyz/zabcdefghijklmnopqrstuvwxy/
/^a/ !b sort
# clean and print
s/..//
s/:/ /p
}' YourFile
posix sed so --posix on GNU sed
The first bit, obviously, is to use grep to get it down to just the words that have a single duplication in. I will give you some clues on how to do that.
The key is to use backreferences, which allow you to specify that something that matched a previous expression should appear again. So if you write
grep -E "^(.)...\1...\1$"
then you'll get all the words that have the starting letter reappearing in fifth and ninth positions. The point of the brackets is to allow you to refer later to whatever matched the thing in brackets; you do that with a \1 (to match the thing in the first lot of brackets).
You want to say that there should be a duplicate anywhere in the word, which is slightly more complicated, but not much. You want a character in brackets, then any number of characters, then the repeated character (with no ^ or $ specified).
That will also include ones where there are two or more duplicates, so the next stage is to filter them out. You can do that by a grep -v invocation. Once you've got your list of 5-character words that have at least one duplicate, pipe them through a grep -v call that strips out anything with two (or more) duplicates in. That'll have a (.), and another (.), and a \1, and a \2, and these might appear in several different orders.
You'll also need to strip out anything that has a (.) and a \1 and another \1, since that will have a letter with three occurrences.
That should be enough to get you started, at any rate.
Your next step should be to find the 5-letter words containing a duplicate letter. To do that, you will need to use back-references. Example:
grep "[a-z]*\([a-z]\)[a-z]*\$1[a-z]*"
The $1 picks up the contents of the first parenthesized group and expects to match that group again. In this case, it matches a single letter. See: http://www.thegeekstuff.com/2011/01/advanced-regular-expressions-in-grep-command-with-10-examples--part-ii/ for more description of this capability.
You will next need to filter out those cases that have either a letter repeated 3 times or a word with 2 letters repeated. You will need to use the same sort of back-reference trick, but you can use grep -v to filter the results.
sed can be used for the final display. Grep will merely allow you to construct the correct lines to consider.
Note that the dictionary contains capital letters and also non-letter characters, plus that strange characters used in Southern Europe. say "è".
If you want to distinguish "A" and "a", it's automatic, on the other hand if "A" and "a" are the same letter, in ALL grep invocations you must use the -i option, to instruct grep to ignore case.
Next, you always want to pass the -E option, to avoid the so called backslashitis gravis in the regexp that you want to pass to grep.
Further, if you want to exclude the lines matching a regexp from the output, the correct option is -v.
Eventually, if you want to specify many different regexes to a single grep invocation, this is the way (just an example btw)
grep -E -i -v -e 'regexp_1' -e 'regexp_2' ... -e 'regexp_n'
The preliminaries are after us, let's look forward, use the answer from chiastic-security as a reference to understand the procedings
There are only these possibilities to find a duplicate in a 5 character string
(.)\1
(.).\1
(.)..\1
(.)...\1
grep -E -i -e 'regexp_1' ...
Now you have all the doubles, but this doesn't exclude triples etc that are identified by the following patterns (Edit added a cople of additional matching triples patterns)
(.)\1\1
(.).\1\1
(.)\1.\1
(.)..\1\1
(.).\1.\1
(.)\1\1\1
(.).\1\1\1
(.)\1\1\1\1\
you want to exclude these patterns, so grep -E -i -v -e 'regexp_1' ...
at his point, you have a list of words with at least a couple of the same character, and no triples, etc and you want to drop double doubles, these are the regexes that match double doubles
(.)(.)\1\2
(.)(.)\2\1
(.).(.)\1\2
(.).(.)\2\1
(.)(.).\1\2
(.)(.).\2\1
(.)(.)\1.\2
(.)(.)\2.\1
and you want to exclude the lines with these patterns, so its grep -E -i -v ...
A final hint, to play with my answer copy a few hundred lines of the dictionary in your working directory, head -n 3000 /usr/share/dict/words | tail -n 300 > ./300words so that you can really understand what you're doing, avoiding to be overwhelmed by the volume of the output.
And yes, this is not a complete answer, but it is maybe too much, isn't it?

Accessing last x characters of a string in Bash

I found out that with ${string:0:3} one can access the first 3 characters of a string. Is there a equivalently easy method to access the last three characters?
Last three characters of string:
${string: -3}
or
${string:(-3)}
(mind the space between : and -3 in the first form).
Please refer to the Shell Parameter Expansion in the reference manual:
${parameter:offset}
${parameter:offset:length}
Expands to up to length characters of parameter starting at the character
specified by offset. If length is omitted, expands to the substring of parameter
starting at the character specified by offset. length and offset are arithmetic
expressions (see Shell Arithmetic). This is referred to as Substring Expansion.
If offset evaluates to a number less than zero, the value is used as an offset
from the end of the value of parameter. If length evaluates to a number less than
zero, and parameter is not ‘#’ and not an indexed or associative array, it is
interpreted as an offset from the end of the value of parameter rather than a
number of characters, and the expansion is the characters between the two
offsets. If parameter is ‘#’, the result is length positional parameters
beginning at offset. If parameter is an indexed array name subscripted by ‘#’ or
‘*’, the result is the length members of the array beginning with
${parameter[offset]}. A negative offset is taken relative to one greater than the
maximum index of the specified array. Substring expansion applied to an
associative array produces undefined results.
Note that a negative offset must be separated from the colon by at least one
space to avoid being confused with the ‘:-’ expansion. Substring indexing is
zero-based unless the positional parameters are used, in which case the indexing
starts at 1 by default. If offset is 0, and the positional parameters are used,
$# is prefixed to the list.
Since this answer gets a few regular views, let me add a possibility to address John Rix's comment; as he mentions, if your string has length less than 3, ${string: -3} expands to the empty string. If, in this case, you want the expansion of string, you may use:
${string:${#string}<3?0:-3}
This uses the ?: ternary if operator, that may be used in Shell Arithmetic; since as documented, the offset is an arithmetic expression, this is valid.
Update for a POSIX-compliant solution
The previous part gives the best option when using Bash. If you want to target POSIX shells, here's an option (that doesn't use pipes or external tools like cut):
# New variable with 3 last characters removed
prefix=${string%???}
# The new string is obtained by removing the prefix a from string
newstring=${string#"$prefix"}
One of the main things to observe here is the use of quoting for prefix inside the parameter expansion. This is mentioned in the POSIX ref (at the end of the section):
The following four varieties of parameter expansion provide for substring processing. In each case, pattern matching notation (see Pattern Matching Notation), rather than regular expression notation, shall be used to evaluate the patterns. If parameter is '#', '*', or '#', the result of the expansion is unspecified. If parameter is unset and set -u is in effect, the expansion shall fail. Enclosing the full parameter expansion string in double-quotes shall not cause the following four varieties of pattern characters to be quoted, whereas quoting characters within the braces shall have this effect. In each variety, if word is omitted, the empty pattern shall be used.
This is important if your string contains special characters. E.g. (in dash),
$ string="hello*ext"
$ prefix=${string%???}
$ # Without quotes (WRONG)
$ echo "${string#$prefix}"
*ext
$ # With quotes (CORRECT)
$ echo "${string#"$prefix"}"
ext
Of course, this is usable only when then number of characters is known in advance, as you have to hardcode the number of ? in the parameter expansion; but when it's the case, it's a good portable solution.
You can use tail:
$ foo="1234567890"
$ echo -n $foo | tail -c 3
890
A somewhat roundabout way to get the last three characters would be to say:
echo $foo | rev | cut -c1-3 | rev
Another workaround is to use grep -o with a little regex magic to get three chars followed by the end of line:
$ foo=1234567890
$ echo $foo | grep -o ...$
890
To make it optionally get the 1 to 3 last chars, in case of strings with less than 3 chars, you can use egrep with this regex:
$ echo a | egrep -o '.{1,3}$'
a
$ echo ab | egrep -o '.{1,3}$'
ab
$ echo abc | egrep -o '.{1,3}$'
abc
$ echo abcd | egrep -o '.{1,3}$'
bcd
You can also use different ranges, such as 5,10 to get the last five to ten chars.
1. Generalized Substring
To generalise the question and the answer of gniourf_gniourf (as this is what I was searching for), if you want to cut a range of characters from, say, 7th from the end to 3rd from the end, you can use this syntax:
${string: -7:4}
Where 4 is the length of course (7-3).
2. Alternative using cut
In addition, while the solution of gniourf_gniourf is obviously the best and neatest, I just wanted to add an alternative solution using cut:
echo $string | cut -c $((${#string}-2))-
Here, ${#string} is the length of the string, and the trailing "-" means cut to the end.
3. Alternative using awk
This solution instead uses the substring function of awk to select a substring which has the syntax substr(string, start, length) going to the end if the length is omitted. length($string)-2) thus picks up the last three characters.
echo $string | awk '{print substr($1,length($1)-2) }'

sed regex not being greedy?

In bash I have a string variable tempvar, which is created thus:
tempvar=`grep -n 'Mesh Tally' ${meshtalfile}`
meshtalfile is a (large) input file which contains some header lines and a number of blocks of data lines, each marked by a beginning line which is searched for in the grep above.
In the case at hand, the variable tempvar contains the following string:
5: Mesh Tally Number 4 977236: Mesh Tally Number 14 1954467: Mesh Tally Number 24 4354479: Mesh Tally Number 34
I now wish to extract the line number relating to a particularly mesh tally number - so I define a variable meshnum1 as equal to 24, and run the following sed command:
echo ${tempvar} | sed -r "s/^.*([0-9][0-9]*):\sMesh\sTally\sNumber\s${meshnum1}.*$/\1/"
This is where things go wrong. I expect the output 1954467, but instead I get 7. Trying with number 34 instead returns 9 instead of 4354479. It seems that sed is returning only the last digit of the number - which surely violates the principle of greedy matching? And oddly, when I move the open parenthesis ( left a couple of characters to include .*, it returns the whole line up to and including the single character it was previously returning. Surely it cannot be greedy in one situation and antigreedy in another? Hopefully I have just done something stupid with the syntax...
The problem is that the .* is being greedy too, which means that it will get all numbers too. Since you force it to get at least one digit in the [0-9][0-9]* part, the .* before it will be greedy enough to leave only one digit for the expression after it.
A solution could be:
echo ${tempvar} | sed -r "s/^.*\s([0-9][0-9]*):\sMesh\sTally\sNumber\s${meshnum1}.*$/\1/"
Where now the \s between the .* and the [0-9][0-9]* explictly forces there to be a space before the digits you want to match.
Hope this helps =)
Are the values in $tempvar supposed to be multiple or a single line? Because if it is a single line, ".*$" should match to the end of line, meaning all the other values too, right?
There's no need for sed, here's one way using GNU grep:
echo "$tempvar" | grep -oP "[0-9]+(?=:\sMesh\sTally\sNumber\s${meshnum1}\b)"

Resources