sed: Find pattern over two lines, not replace after that pattern - linux

Wow, this one has really got me. Gonna need some tricky sed skill here I think. Here is the output value of command text I'm trying to replace:
...
fast
n : abstaining from food
The value I'd like to replace it with, is:
...
Noun
: abstaining from food
This turns out to be tricker that I thought. Because 'fast' is listed a number of times and because it is listed in other places at the beginning of the line. So I came up with this to define the range:
sed '/fast/,/^ n : / s/fast/Noun/'
Which I thought would do, but... Unfortunately, this doesn't end the replacement and the rest of the output following this match are replaced with Noun. How to get sed to stop replacement after the match? Even better, can I find a two line pattern match and replace it?

Try this:
sed "h; :b; \$b ; N; /^${1}\n n/ {h;x;s//Noun\n/; bb}; \$b ; P; D"
Unfortunately, Paul's answer reads the whole file in which makes any additional processing you might want to do difficult. This version reads the lines in pairs.
By enclosing the sed script in double quotes instead of single quotes, you can include shell variables such as positional parameters. I would recommend surrounding them with curly braces so they are set apart from the adjacent characters. When using double quotes, you'll have to be careful of the shell wanting to do its various expansions. In this example, I've escaped the dollar signs that signify the last line of the input file for the branch commands. Otherwise the shell will try to substitute the value of a variable $b which is likely to be null thus making sed unhappy.
Another technique would be to use single quotes and close and open them each time you have a shell variable:
sed 'h; :b; $b ; N; /^'${1}'\n n/ {h;x;s//Noun\n/; bb}; $b ; P; D'
# ↑open close↑ ↑open close↑
I'm assuming that the "[/code]" in your expected result is a typo. Let me know if it's not.

This seems to do what you want:
sed -e ':a;N;$!ba;s/fast\n n/Noun\n/'
I essentially stole the answer from here.

This might work for you:
sed '$!N;s/^fast\n\s*n :/Noun\n :/;P;D' file
...
Noun
: abstaining from food

Related

prevent sed replacements from overwriting each other

I want to replace A with T and T with A
sed -e 's/T/A/g;s/A/T/g
as an example above line changes A:T to T:T
I am hoping to get T:A.
How do I do this?
If you want to change single characters, it is simply:
sed 'y/TA/AT/'
If you want to change longer (non-overlapping) strings, you need a temporary value that you know is never used. Conveniently, newline can never appear. So:
sed '
s/T/\n/g
s/A/T/g
s/\n/A/g
'
I'm not a SED expert - so not sure if that can be done as a single command. Just wondering if you've thought about doing that swap like you would in a programming language that would need a temporary variable to do the switch?
Maybe like change the A to a value you know you don't have in the string like Y for example. Then change the T to A and then change Y to T. Would something like that work?
Edit: I did a quick search just out of curiosity. Found this: https://unix.stackexchange.com/questions/528994/swapping-words-with-sed
In case that helps, but with regex stuff, the result is highly dependent on how structured and unique your inputs are. Not sure how to just swap two arbitrary sub-strings or characters throughout an entire string if there's no particular structure that tells you when you're about to get that sub-string or character like the answer above looking for the parenthesis.
Use this Perl one-liner for case-sensitive replacement:
echo 'TATAtata' | perl -pe 'tr{AT}{TA}'
ATATtata
Or this one-liner for case-insensitive replacement:
echo 'TATAtata' | perl -pe 'tr{ATat}{TAta}'
ATATatat
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-p : Loop over the input one line at a time, assigning it to $_ by default. Add print $_ after each loop iteration.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc tr
perldoc tr/SEARCHLIST/REPLACEMENTLIST/cdsr

Linux Sed command replace after special character

How can I use sed command in Linux to replace key value pair. I want to replace characters that occur after “:”
For example
App.log.level: “xyz”
It sounds like you just want something like sed 's/:.*$/: YOURTEXTHERE/' where the general format is sed 's/REPLACE_THIS/WITH_THIS/g'
The /:.*$/ bit means I want to replace all text from a colon to the end of the line. The : YOURTEXTHERE is what you're replacing with. (I'm putting the colon back in and putting the extra text.) Since I'm only doing one replacement per line, I don't need the g at the end (although it wouldn't hurt anything.)
A real example:
>> echo App.log.level: \"xyz\" | sed 's/:.*$/: YOURTEXTHERE/'
App.log.level: YOURTEXTHERE

expr bash for sed a line in log does not work

my goal is to sed the 100th line and convert it to a string, then separate the data of the sentence to word
#!/bin/bash
fid=log.txt;
sentence=`expr sed -n '100p' ${fid}`;
for word in $sentence
do
echo $word
done
but apparently this has failed.
expr: syntax error
would somebody please let me know what have I done wrong? previously for number it worked.
The expr does not seem to serve a useful purpose here, and if it did, a sed command would certainly not be a valid or useful thing to pass to it, under most circumstances. You should probably just take it out.
However, the following loop is also problematic. Unquoted variables in shell script are very frequently an error. In this case, you can't quote the thing you pass to the for loop (that would cause the loop to only run once, with the loop variable set to the quoted string) but you also cannot prevent the shell from performing wildcard expansion on the unquoted string. So if the string happened to contain *, the shell will expand that to a list of files in the current directory, for example.
Fortunately, this can all be done in an only slightly more complicated sed script.
sed '100!d;s/[ \t]\+/\n/g;q' "$fid"
That is, if the line number is not 100, delete this line and start over with the next line. Otherwise, we are at line 100; replace runs of whitespace with newlines, (print) and quit.
(The backslash escape codes \t and \n are not universally portable; and \+ for repetition is also an optional extension. I believe there are also sed variants which dislike semicolon as a command separator. Consult your sed manual page, experiment, and if everything else fails, maybe switch to Awk or Perl. Just in case, here is a version which works even on Mac OSX:
sed '100!d
s/[ ][ ]*/\
/g;q' log.txt
The stuff inside the square brackets are a space and a literal tab; in Bash, with default keybindings, type ctrl-V, tab to produce a literal tab.)
Incidentally, this also gets rid of the variable capture antipattern. There are good reasons to capture output to a variable, but if it can be avoided, you often end up with a simpler, more robust and efficient, as well as more idiomatic and elegant script. (I see no reason to put the log file name in a variable, either, in this isolated case; but in a larger script, it might make sense.)
I don't think you need expr command in this case.
expr is used to do calculations. Something like:
expr 1 + 1
Just this one is fine:
sentence=`sed -n '100p' ${fid}`;
#!/bin/bash
fid=log.txt;
sentence=$(sed -n '100p' ${fid});
for word in $sentence
do
echo $word
done
put a dollar sign and parenthesis solve the problem

sed regex with variables to replace numbers in a file

Im trying to replace numbers in my textfile by adding one to them. i.e.
sed 's/3/4/g' path.txt
sed 's/2/3/g' path.txt
sed 's/1/2/g' path.txt
Instead of this, Can i automate it, i.e. find a /d and add one to it in the replace.
Something like
sed 's/\([0-8]\)/\1+1/g' path.txt
Also wanted to capture more than one digit i.e. ([0-9])\t([0-9]) and change each one keeping the tab inbetween
Thanks
edited #2
Using the perl example,
I also would like it to work with more digits i.e.
perl -pi~ -e 's/(\d+)\.(\d+)\.(\d+)\.(\d+)/ ($1+1)\.($2+1)\.($3+1)\.($4+1) /ge' output.txt
Any tips on making the above work?
There is no support for arithmetic in sed, but you can easily do this in Perl.
perl -pe 's/(\d+)/ $1+1 /ge'
With the /e option, the replacement expression needs to be valid Perl code. So to handle your final updated example, you need
perl -pi~ -e 's/(\d+)\.(\d+)\.(\d+)\.(\d+)/ $1+1 . "." $2+1 . "." . $3+1 . "." . $4+1 /ge'
where strings are properly quoted and adjacent strings are concatenated together with the . Perl string concatenation operator. (The arithmetic numbers are coerced into strings as well when they are concatenated with a string.)
... Though of course, the first script already does that more elegantly, since with the /g flag it already increments every sequence of digits with one, anywhere in the string.
Triplee's perl solution is the more generic answer, but Michal's sed solution works well for this particular case. However, Michal's sed solution is more easily written:
sed y/12345678/23456789/ path.txt
and is better implemented as
tr 12345678 23456789 < path.txt
This utterly fails to handle 2 digit numbers (as in the edited question).
You can do it with sed but it's not easy, see this thread.
And it's hard with awk too, see this.
I'd rather use perl for this (something like this can be seen in action # ideone):
perl -pe 's/([0-8])/$1+1/e'
(The ideone.com example must have some looping as ideone does not sets -pe by default.)
You can't do addition directly in sed - you could do it in awk by matching numbers using a regex in each line and increasing the value, but it's quite complicated. If do not need to handle arbitrary numbers but a limited set, like only single-digit numbers from 0 to 8, you can just put several replacement commands on a single sed command line by separating them with semicolons:
sed 's/8/9/g ; s/7/8/g; s/6/7/g; s/5/6/g; s/4/5/g; s/3/4/g; s/2/3/g; s/1/2/g; s/0/1/g' path.txt
This might work for you (GNU sed & Bash):
sed 's/[0-9]/$((&+1))/g;s/.*/echo "&"/e' file
This will add one to every individual digit, to increment numbers:
sed 's/[0-9]\+/$((&+1))/g;s/.*/echo "&"/e' file
N.B. This method is fraught with problems and may cause unexpected results.

A Linux Shell Script Problem

I have a string separated by dot in Linux Shell,
$example=This.is.My.String
I want to
1.Add some string before the last dot, for example, I want to add "Good.Long" before the last dot, so I get:
This.is.My.Goood.Long.String
2.Get the part after the last dot, so I will get
String
3.Turn the dot into underscore except the last dot, so I will get
This_is_My.String
If you have time, please explain a little bit, I am still learning Regular Expression.
Thanks a lot!
I don't know what you mean by 'Linux Shell' so I will assume bash. This solution will also work in zsh, etcetera:
example=This.is.My.String
before_last_dot=${example%.*}
after_last_dot=${example##*.}
echo ${before_last_dot}.Goood.Long.${after_last_dot}
This.is.My.Goood.Long.String
echo ${before_last_dot//./_}.${after_last_dot}
This_is_My.String
The interim variables before_last_dot and after_last_dot should explain my usage of the % and ## operators. The //, I also think is self-explanatory but I'd be happy to clarify if you have any questions.
This doesn't use sed (or even regular expressions), but bash's inbuilt parameter substitution. I prefer to stick to just one language per script, with as few forks as possible :-)
Other users have given good answers for #1 and #2. There are some disadvantages to some of the answers for #3. In one case, you have to run the substitution twice. In another, if your string has other underscores they might get clobbered. This command works in one go and only affects dots:
sed 's/\(.*\)\./\1\n./;h;s/[^\n]*\n//;x;s/\n.*//;s/\./_/g;G;s/\n//'
It splits the line before the last dot by inserting a newline and copies the result into hold space:
s/\(.*\)\./\1\n./;h
removes everything up to and including the newline from the copy in pattern space and swaps hold space and pattern space:
s/[^\n]*\n//;x
removes everything after and including the newline from the copy that's now in pattern space
s/\n.*//
changes all dots into underscores in the copy in pattern space and appends hold space onto the end of pattern space
s/\./_/g;G
removes the newline that the append operation adds
s/\n//
Then the sed script is finished and the pattern space is output.
At the end of each numbered step (some consist of two actual steps):
Step Pattern Space Hold Space
This.is.My\n.String This.is.My\n.String
This.is.My\n.String .String
This.is.My .String
This_is_My\n.String .String
This_is_My.String .String
Solution
Two versions of this, too:
Complex: sed 's/\(.*\)\([.][^.]*$\)/\1.Goood.Long\2/'
Simple: sed 's/.*\./&Goood.Long./' - thanks Dennis Williamson
What do you want?
Complex: sed 's/.*[.]\([^.]*\)$/\1/'
Simpler: sed 's/.*\.//' - thanks, glenn jackman.
sed 's/\([^.]*\)[.]\([^.]*[.]\)/\1_\2/g'
With 3, you probably need to run the substitute (in its entirety) at least twice, in general.
Explanation
Remember, in sed, the notation \(...\) is a 'capture' that can be referenced as '\1' or similar in the replacement text.
Capture everything up to a string starting with a dot followed by a sequence of non-dots (which you also capture); replace by what came before the last dot, the new material, and the last dot and what came after it.
Ignore everything up to the last dot followed by a capture of a sequence of non-dots; replace with the capture only.
Find and capture a sequence of non-dots, a dot (not captured), followed by a sequence of non-dots and a dot; replace the first dot with an underscore. This is done globally, but the second and subsequent matches won't touch anything already matched. Therefore, I think you need ceil(log2N) passes, where N is the number of dots to be replaced. One pass deals with 1 dot to replace; two passes deals with 2 or 3; three passes deals with 4-7, and so on.
Here's a version that uses Bash's regex matching (Bash 3.2 or greater).
[[ $example =~ ^(.*)\.(.*)$ ]]
echo ${BASH_REMATCH[1]//./_}.${BASH_REMATCH[2]}
Here's a Bash version that uses IFS (Internal Field Separator).
saveIFS=$IFS
IFS=.
array=($e) # * split the string at each dot
lastword=${array[#]: -1}
unset "array[${#array}-1]" # *
IFS=_
echo "${array[*]}.$lastword" # The asterisk as a subscript when inside quotes causes IFS (an underscore in this case) to be inserted between each element of the array
IFS=$saveIFS
* use declare -p array after these steps to see what the array looks like.
1.
$ echo 'This.is.my.string' | sed 's}[^\.][^\.]*$}Good Long.&}'
This.is.my.Good Long.string
before: a dot, then no dot until the end. after: obvious, & is what matched the first part
2.
$ echo 'This.is.my.string' | sed 's}.*\.}}'
string
sed greedy matches, so it will extend the first closure (.*) as far as possible i.e. to the last dot.
3.
$ echo 'This.is.my.string' | tr . _ | sed 's/_\([^_]*\)$/\.\1/'
This_is_my.string
convert all dots to _, then turn the last _ to a dot.
(caveat: this will turn 'This.is.my.string_foo' to 'This_is_my_string.foo', not 'This_is_my.string_foo')
You don't need regular expressions at all (those complex things hurt my eyes!) if you use Awk and are a little creative.
1. echo $example| awk -v ins="Good.long" -F . '{OFS="."; $NF = ins"."$NF;print}'
What this does:
-v ins="Good.long" tells awk to create a variable called 'ins' with "Good.long" as content,
-F . tells awk to use the dot as a separator for your fields for input,
-OFS tells awk to use the dot as a separator for your fields as output,
NF is the number of fields, so $NF represents the last field,
the $NF=... part replaces the last field, it appends the current last string to what you want to insert (the variable called "ins" declared earlier).
2. echo $example| awk -F . '{print $NF}'
$NF is the last field, so that's all!
3. echo $example| awk -F . '{OFS="_"; $(NF-1) = $(NF-1)"."$NF; NF=NF-1; print}'
Here we have to be creative, as Awk AFAIK doesn't allow deleting fields. Of course, we set the output field separateor to underscore.
$(NF-1) = $(NF-1)"."$NF: First, we replace the second last field with the last glued to the second last, with a dot between.
Then, we fool awk to make it think the Number of fields is equal to the number of fields minus one, hence deleting the last field!
Note you can't say $NF="", because then it would display two underscores.

Resources