How to split Lines using shell SED or something similar - linux

I have a file containing the following
String, SomeotherString Additional, StringNew String
I would like to have the following output:
String, Someother
String Additional, String
New String
The delimiter is always a capital letter following a small letter without space. I tried
sed 's/\([a-z][A-Z]\)/\n\1/g <<< String, SomeotherString Additional, StringNew String However this leads to:
String, Someothe
rString Additional, Strin
gNew String
Thanks for your help

With sed:
sed 's/\([a-z]\)\([A-Z]\)/\1\n\2/g'
Matches a small letter (sub-expression 1) followed by a capital letter (sub-expression 2) and replaces them with the part matching sub-expression 1, a newline character, and the part matching sub-expression 2.
The previous should work with any sed. With GNU sed and others that support it, you can use -E (also -r in GNU sed) to enable extended regexps, so that you don't have to put backslashes before the parentheses.
sed -E 's/([a-z])([A-Z])/\1\n\2/g'
At least GNU sed also supports named character classes, so you can easily match other letters than a-z and A-Z too:
sed -E 's/([[:lower:]])([[:upper:]])/\1\n\2/g'

More than one way to do this, but here's one that uses perl
echo 'StringSomeotherstringAdditionalString' | perl -pe 's/([A-Z])/\n$1/g'
[A-Z] matches a capital letter;
\n$1 replaces it with a newline and the capital letter.

Related

Replace the word with "\" using sed command fails? [duplicate]

I am using sed in a shell script to edit filesystem path names. Suppose I want to replace
/foo/bar
with
/baz/qux
However, sed's s/// command uses the forward slash / as the delimiter. If I do that, I see an error message emitted, like:
▶ sed 's//foo/bar//baz/qux//' FILE
sed: 1: "s//foo/bar//baz/qux//": bad flag in substitute command: 'b'
Similarly, sometimes I want to select line ranges, such as the lines between a pattern foo/bar and baz/qux. Again, I can't do this:
▶ sed '/foo/bar/,/baz/qux/d' FILE
sed: 1: "/foo/bar/,/baz/qux/d": undefined label 'ar/,/baz/qux/d'
What can I do?
You can use an alternative regex delimiter as a search pattern by backslashing it:
sed '\,some/path,d'
And just use it as is for the s command:
sed 's,some/path,other/path,'
You probably want to protect other metacharacters, though; this is a good place to use Perl and quotemeta, or equivalents in other scripting languages.
From man sed:
/regexp/
Match lines matching the regular expression regexp.
\cregexpc
Match lines matching the regular expression regexp. The c may be any character other than backslash or newline.
s/regular expression/replacement/flags
Substitute the replacement string for the first instance of the regular expression in the pattern space. Any character other than backslash or newline can be used instead of a slash to delimit the RE and the replacement. Within the RE and the replacement, the RE delimiter itself can be used as a literal character if it is preceded by a backslash.
Perhaps the closest to a standard, the POSIX/IEEE Open Group Base Specification says:
[2addr] s/BRE/replacement/flags
Substitute the replacement string for instances of the BRE in the
pattern space. Any character other than backslash or newline can
be used instead of a slash to delimit the BRE and the replacement.
Within the BRE and the replacement, the BRE delimiter itself can be
used as a literal character if it is preceded by a backslash."
When there is a slash / in theoriginal-string or the replacement-string, we need to escape it using \. The following command is work in ubuntu 16.04(sed 4.2.2).
sed 's/\/foo\/bar/\/baz\/qux/' file

sed doesn't replace variable

I'm trying to replace some regex line in a apache file.
i define:
OLD1="[0-9]*.[0-9]+"
NEW1="[a-z]*.[0-9]"
when i'm executing:
sed -i 's/$OLD1/$NEW1/g' demo.conf
there's no change.
This is what i tried to do
sed -i "s/${OLD1}/${NEW1}/g" 001-kms.conf
sed -i "s/"$OLD1"/"$NEW1"/g" 001-kms.conf
sed -i "s~${OLD1}~${NEW1}~g" 001-kms.conf
i'm expecting that the new file will replace $OLD1 with $NEW1
OLD1="[0-9]*.[0-9]+"
Because the [ * . are all characters with special meaning in sed, we need to escape them. For such simple case something like this could work:
OLD2=$(<<<"$OLD1" sed 's/[][\*\.]/\\&/g')
It will set OLD2 to \[0-9\]\*\.\[0-9\]+. Note that it doesn't handle all the possible cases, like OLD1='\.\[' will convert to OLD2='\\.\\[ which means something different. Implementing a proper regex to properly escape, well, other regex I leave as an exercise to others.
Now you can:
sed "s/$OLD2/$NEW1/g"
Tested with:
OLD1="[0-9]*.[0-9]+"
NEW1="[a-z]*.[0-9]"
sed "s/$(sed 's/[][\*\.]/\\&/g' <<<"$OLD1")/$NEW1/g" <<<'XYZ="[0-9]*.[0-9]+"'
will output:
XYZ="[a-z]*.[0-9]"
you need matching on exact string
You would need something that can match on exact string [0-9]*.[0-9]+ which sed does not support well.
Therefore instead I am using this pipeline replacing one character at a time (it also is easier to read I think):echo "[0-9]*.[0-9]+" | sed 's/0/a/' | sed 's/9/z/' | sed 's/+//'
You would have to cat your files or use find with execute to then apply this pipe.
I had tried following (from other SO answers):
- sed 's/\<exact_string/>/replacement/'doesn't work as \< and \> are left and right word boundaries respectively.
- sed 's/(CB)?exact_string/replacement/'found in one answer but nowhere in documentation
I used Win 10 bash, git bash, and online Linux tools with the same results.
when I thought matching was on the pattern rather than exact string
Replacement cannot be a regex - at most it can reference parts of the regex expression which matched. From man sed:
s/regexp/replacement/
Attempt to match regexp against the pattern space. If successful, replace that portion matched with replacement. The replacement may contain the special character & to refer to that portion of the pattern space which matched, and the special escapes \1 through \9 to refer to the corresponding matching sub-expressions in the regexp.
Additionally you have to escape some characters in your regex (specifically . and +) unless you add option -E for extended regex as per comment under your question. (N.B. only if you want to match on the full-stop . rather than it meaning any character)
$ echo "01.234--ZZZ" | sed 's/[0-9]*\.[0-9]\+/REPLACEMENT/g'
REPLACEMENT--ZZZ

How to correctly detect and replace apostrophe (') with sed?

I'm having a directory with many files having special characters and spaces. I want to perform an operation with all these files so I'm trying to store all filenames in a list.txt and then run the command with this list.
The special characters in my list are & []'.
So basically I want to use sed to replace each occurence with \ + the character in question.
E.g. : filename .txt => filename\ .txt etc...
The thing is I have trouble handling apostrophes.
Here is my command as of now :
ls | sed 's/\ /\\ /g' | sed 's/\&/\\&/g' | sed "s/\'/\\'/g" | sed 's/\[/\\[/g' | sed 's/\]/\\]/g'
At first I had issues with, I believe, the apostrophes in the string command in conflict with the apostrophes surrounding the string. So I used double quotes instead, but it still doesn't work.
I've tried all these and nothing worked :
sed "s/\'/\\'/g" (escaping the apostrophe)
sed "s/'/\'/g" (escaping nothing)
sed "s/'/\\'/g" (escaping the backslash)
sed 's/"'"/\"'"/g' (double quoting single quote)
As a disclaimer, I must say, I'm completely new to sed. I just run my first sed command today, so maybe I'm doing something wrong I didn't realize.
PS : I've seen those thread, but no answer worked for me :
https://unix.stackexchange.com/questions/157076/how-to-remove-the-apostrophe-and-delete-the-space
How to replace to apostrophe ' inside a file using SED
This may do:
cat file
avbadf
test&rr
more [ yes
this ]
and'df
sed -r 's/(\x27|&|\[|\])/\\\1/g' file
avbadf
test\&rr
more \[ yes
this \]
and\'df
\x27 is equal to singe quote '
\x22 is equal to double quote "
Whoops, I found the answer to my question. Here is the working input :
sed "s/'/\\\'/g"
This will effectively replace any ' with \'.
However I'm having trouble understanding exactly what's happening here.
So if I understand correctly, we are escaping the backslash and the apostrophe in the replacement string. Now, if somebody could answer some those, I would be grateful :
Why don't we need to escape the first quote (the one in the pattern to find) ?
Why do we have to escape the backslash whereas for the other characters, there's no need ?
Why do we need to escape the second quote (the one in the replacement string) ?
I think all of your sed matches actually need that replacement pattern. This one seems to work for all examples:
ls | sed "s/\ /\\\ /g" | sed "s/\&/\\\&/g" | sed "s/\[/\\\[/g" | sed "s/\]/\\\]/g" | sed "s/'/\\\'/g"
So it is s/regex/replacement/command and 'regex' and 'replacement' have different sets of special characters.
The only one that's different is s/'/\\\'/g and there only because I don't believe there is any special ' character on the regex expression. There is some obscure \' special character in the replacement expression, for matching buffer ends in multi-line mode, accord to the docs. That might be why it needs an escape in the replacement side, but not in the regex side.
For example, \5 is a special character in the replacement expression, so to replace:
filename5.txt -> filename\5.txt
You would also need, as with apostrophe:
sed "s/5/\\\5/g"
It probably has to do with the mysterious inner works of sed parsing, it might read from right to left or something.
Please try the following:
sed 's/[][ &'\'']/\\&/g' file
By using the same example by #Jotne, the result will be:
gavbadf
gtest\&rr
gmore\ \[\ yes
gthis\ \]
gand\'df
[How it works]
The regex part in the sed s command above just defines a character
class of & []', which should be escaped with a backslash.
The right square bracket ] does not need escaping when put
immediately after the left square bracket [.
The obfuscating part will be the handling of a single quote.
We cannot put a single quote within single quotes even if we escape it.
The workaround is as follows: Say we have an assignment str='aaabbb'.
To put a single quote between "aaa" and "bbb", we can say as
str='aaa'\''bbb'.
It may look puzzling but it just concatenates the three sequences;
1) to close the single-quoted string as 'aaa'.
2) to put a single quote with an escaping backslash as \'.
3) to restart the single-quoted string as 'bbb'.
Hope this helps.

How to replace from the nth occurence to nth occurrence of a very long string using sed

I have a very long string, with over 2000 occurrences, and it look like that:
Input:
1a2a3a4a5a6a7a8absoad8ryaa90thneas... and more than
I want replace mutiple occurrences at the 3rd to the 450th occurrence in string to output:
1a2a3A4A5A6A7A8AbsoAd8ryAA90thneAs... and more than
I replaced "a" to "A", it replaced from the 3rd occurrence to ending, but I only want to replace from 3rd to 450th, this is my old script:
echo "1a2a3a4a5a6a7a8absoad8ryaa90thneas..." | sed 's/a/A/3g';
Does anyone help me? Or is there any other way? Thanks!
Save the string to a variable and then use brace expansion to target the positional character in the string you want to replace using bash global replace. Consider the following example:
## Our sample string. ##
string="abcde01234
## New, changed string. ##
echo "${string//${string:0:1}/${string:5:1}}"
In this example, when run as a bash script, the first character of $string is replaced with the sixth character of $string. Knowing this little trick, I am sure you will figure out a way to do what you need without having to use sed. Or, you can use the brace expansion similarly in sed.
## Our sample string ##
string="abcde01234"
## New, changed, string. Output is the same as above example. ##
sed -e "s/${string:0:1}/${string:5:1}/g" <(echo "$string")
This should be enough to get you headed in the right direction as far as figuring out the best way for your needs.
choose a character that you know for sure is not present in your input.. for example the ASCII NUL character, then you could do
$ # replacing 3rd to 4th
$ echo "a1a23a5a62a34a235a" | sed 's/a/\x0/3g; s/\x0/a/3g; s/\x0/A/g'
a1a23A5A62a34a235a
$ # replacing 3rd to 5th
$ echo "a1a23a5a62a34a235a" | sed 's/a/\x0/3g; s/\x0/a/4g; s/\x0/A/g'
a1a23A5A62A34a235a
$ # replacing 3rd to 6th
$ echo "a1a23a5a62a34a235a" | sed 's/a/\x0/3g; s/\x0/a/5g; s/\x0/A/g'
a1a23A5A62A34A235a
This might work for you (GNU sed):
sed -r 's/a/\n&/3;s//&\n/450;h;y/a/A/;G;s/.*\n(.*)\n.*\n(.*)\n.*\n/\2\1/' file
Split the string up into pieces using a marker (newline cannot occur in an unadulterated string because it is how sed splits up files), make a copy and then make changes, append the copy and rearrange the pieces.
Alternatively use GNU sed global replace flag:
sed 's/a/\n/451g;s//A/3g;s/\n/a/g' file
sed -E ':again ; s/^(.{4,449})a/\1A/ ; t again' file
you can use t command to branch to label. This is a conditional go to :label when a replacement has happened. So for a character range [x:y], regexp .*{x-1,y-1}a will match greedy for first time, but we loop until there are no more matches.

fedora sed command replace special characters

i am totally new to sed and as part of script writing i am trying to replace specific string from a fiel. I know the special characters need to be escaped using backslash but the problem is if the special character is first in the line then it is not replaced....
For e.g my file contains
sldgfkls $bdxcv sldflksd
Now if i write the below code
sed -i 's/\b\$bdxcv\b/abcd/' filename
Then the above word is not replaced....But if the file contains
sldgfkls a$bdxcv sldflksd
Now if i write the below code
sed -i 's/\ba\$bdxcv\b/abcd/' filename
Then the above word is replaced.....
Please Help me here....
Clearly, \b does not consider a dollar sign to be a word character, so there is no word boundary for it to match between space and $.
Perhaps you want this instead:
sed -i 's/\(^\|[\t ]\)\$bdxcv\b/\1abcd/' filename
Assuming yours is GNU sed, see https://www.gnu.org/software/sed/manual/sed.html which contains this definition:
A “word” character is any letter or digit or the underscore character.
and thus not dollar sign.
sed cannot operator on strings, only regular expressions. Trying to figure out which characters need to be escaped to disable their regexp (or sed delimiter or sed backreference) meaning to make a regexp in sed behave as if it were a string is a fool's errand, just use a tool that can operate on strings, e.g. awk.
$ awk '{for (i=1;i<NF;i++) if ($i == "$bdxcv") $i="abcd"} 1' file
sldgfkls abcd sldflks
The above uses string comparison and string assignment - no need to escape anything unless one of the strings contained the string delimiter, ".

Resources