sed pattern command - linux

So I have this sed command applied on $1 wich is a file
what I'd like to know is how sed evaluates all the slash, backslash
succession and what does any character in the pattern mean
sed '/^\/\*/d/.*\*\//d' $1
as far as I know
'/^ ....../d'
deletes some pattern at the begining of line (considering the second d)
-what does the first d stand for, what about the dot and the / or /\ (unescape chars?)
could please someone explain this to me please ?

This is actually two sed commands back to back:
/^\/\*/d
/.*\*\//d
^ matches start of line, \/ matches a literal forward-slash, \* matches a literal asterisk. (Since forward-slash and asterisk are "meta-characters", they need to be escaped with a backslash to match literally.)
.* matches any sequence of characters, \* matches a literal asterisk again, \/ matches a literal slash again.
Put it all together, and what this does is to delete C-style comments, but only if they appear at the start of a line:
/* This will go away */This will stay

Related

How to correctly detect and replace apostrophe (') with sed?

I'm having a directory with many files having special characters and spaces. I want to perform an operation with all these files so I'm trying to store all filenames in a list.txt and then run the command with this list.
The special characters in my list are & []'.
So basically I want to use sed to replace each occurence with \ + the character in question.
E.g. : filename .txt => filename\ .txt etc...
The thing is I have trouble handling apostrophes.
Here is my command as of now :
ls | sed 's/\ /\\ /g' | sed 's/\&/\\&/g' | sed "s/\'/\\'/g" | sed 's/\[/\\[/g' | sed 's/\]/\\]/g'
At first I had issues with, I believe, the apostrophes in the string command in conflict with the apostrophes surrounding the string. So I used double quotes instead, but it still doesn't work.
I've tried all these and nothing worked :
sed "s/\'/\\'/g" (escaping the apostrophe)
sed "s/'/\'/g" (escaping nothing)
sed "s/'/\\'/g" (escaping the backslash)
sed 's/"'"/\"'"/g' (double quoting single quote)
As a disclaimer, I must say, I'm completely new to sed. I just run my first sed command today, so maybe I'm doing something wrong I didn't realize.
PS : I've seen those thread, but no answer worked for me :
https://unix.stackexchange.com/questions/157076/how-to-remove-the-apostrophe-and-delete-the-space
How to replace to apostrophe ' inside a file using SED
This may do:
cat file
avbadf
test&rr
more [ yes
this ]
and'df
sed -r 's/(\x27|&|\[|\])/\\\1/g' file
avbadf
test\&rr
more \[ yes
this \]
and\'df
\x27 is equal to singe quote '
\x22 is equal to double quote "
Whoops, I found the answer to my question. Here is the working input :
sed "s/'/\\\'/g"
This will effectively replace any ' with \'.
However I'm having trouble understanding exactly what's happening here.
So if I understand correctly, we are escaping the backslash and the apostrophe in the replacement string. Now, if somebody could answer some those, I would be grateful :
Why don't we need to escape the first quote (the one in the pattern to find) ?
Why do we have to escape the backslash whereas for the other characters, there's no need ?
Why do we need to escape the second quote (the one in the replacement string) ?
I think all of your sed matches actually need that replacement pattern. This one seems to work for all examples:
ls | sed "s/\ /\\\ /g" | sed "s/\&/\\\&/g" | sed "s/\[/\\\[/g" | sed "s/\]/\\\]/g" | sed "s/'/\\\'/g"
So it is s/regex/replacement/command and 'regex' and 'replacement' have different sets of special characters.
The only one that's different is s/'/\\\'/g and there only because I don't believe there is any special ' character on the regex expression. There is some obscure \' special character in the replacement expression, for matching buffer ends in multi-line mode, accord to the docs. That might be why it needs an escape in the replacement side, but not in the regex side.
For example, \5 is a special character in the replacement expression, so to replace:
filename5.txt -> filename\5.txt
You would also need, as with apostrophe:
sed "s/5/\\\5/g"
It probably has to do with the mysterious inner works of sed parsing, it might read from right to left or something.
Please try the following:
sed 's/[][ &'\'']/\\&/g' file
By using the same example by #Jotne, the result will be:
gavbadf
gtest\&rr
gmore\ \[\ yes
gthis\ \]
gand\'df
[How it works]
The regex part in the sed s command above just defines a character
class of & []', which should be escaped with a backslash.
The right square bracket ] does not need escaping when put
immediately after the left square bracket [.
The obfuscating part will be the handling of a single quote.
We cannot put a single quote within single quotes even if we escape it.
The workaround is as follows: Say we have an assignment str='aaabbb'.
To put a single quote between "aaa" and "bbb", we can say as
str='aaa'\''bbb'.
It may look puzzling but it just concatenates the three sequences;
1) to close the single-quoted string as 'aaa'.
2) to put a single quote with an escaping backslash as \'.
3) to restart the single-quoted string as 'bbb'.
Hope this helps.

Remove text between one string and 1st occurrence of another string

I have found several solutions to remove text between two strings but I guess my case is a little different.
I am trying to convert this:
/nz/kit.7.2.0.7/bin/adm/tools/hostaekresume
To this:
/nz/kit/bin/adm/tools/hostaekresume
Basically remove the version specific information from the filename.
The solutions I have found remove everything from the word kit to the last occurrence of /. I need something to remove from kit to the first occurrence.
The most common solution I have seen is:
sed -e 's/\(kit\).*\(\/\)/\1\2/'
Which produces:
/nz/kit/hostaekresume
How can I only remove up to the first /? I assume this can done with sed or awk, but open to suggestions.
$ sed 's|\(kit\)[^/]*|\1|' <<< '/nz/kit.7.2.0.7/bin/adm/tools/hostaekresume'
/nz/kit/bin/adm/tools/hostaekresume
This uses a different delimiter (| instead of /) so we don't have to escape the /. Then, for non-greedy matching, it uses [^/]*: any number of characters other than /, which matches everything between kit and the next /.
Alternatively, if you know that what you want to remove consists of dots and digits, and nothing else in the string contains them, you can use parameter expansion:
$ var='/nz/kit.7.2.0.7/bin/adm/tools/hostaekresume'
$ echo "${var//[[:digit:].]}"
/nz/kit/bin/adm/tools/hostaekresume
The syntax is ${parameter/pattern/string}, where pattern in the expanded parameter is replaced by string. If we use // instead of /, all occurrences instead of just the first are replaced.
In our case, parameter is var, the pattern is [[:digit:].] (digits or a dot – this is a glob pattern, not a regular expression, by the way), and we've skipped the /string part, which just removes the pattern (replaces it with nothing).
You need perl for non-greedy regex. sed doesn't do that yet.
Also, use | as a delimiter since / can cause confusion when you have it in your regex.
perl -pe 's|(kit).*?(/.*)|\1\2|'
The ? after the .* makes the pattern non-greedy and will match the first instance of /.
echo "/nz/kit.7.2.0.7/bin/adm/tools/hostaekresume" | perl -pe 's|(kit).*?(/.*)|\1\2|'
returns
/nz/kit/bin/adm/tools/hostaekresume
echo "/nz/kit.7.2.0.7/bin/adm/tools/hostaekresume" | awk '{sub(/.7.2.0.7/,"")}1'
/nz/kit/bin/adm/tools/hostaekresume

How cut characters from string and put it at the end- In shell

I want to be able to do the following:
String1= "HELLO 3002_3322 3.2.1.log"
And get output like:
output = "3002_3322 3.2.1.log HELLO"
I know the command sed is able to do this but I need some guidance.
Thanks!
AWK
awk is one tool to do something like that:
echo "HELLO 3002_3322 3.2.1.log" | awk '{print $2$3" "$1}'
What it does:
awk, without delimiter flag of -F splits by whitespace sequences
that means, HELLO 3002_3322 and 3.2.1.log will be seen
HELLO is referred to by $1; 3002_3322 is $2 and so on
we print $2, then $3 then one space, then $1
SED
I have a unpretty looking sed example for you:
echo "HELLO 3002_3322 3.2.1.log" | sed 's_\(.*\)\s\(.*\)\s\(.*\)_\3 \2 \1_'
What it does:
nomenclature is s_<pattern>_<replacement>_
first s stands for substitute
_ is the delimiter
(.*) is paranthesis dot star parenthesis. That is the first group of characters we are asking sed to match. .* means match any sequence of characters or no characters at all. Ignore the \ before ( and ) for now
Notice the \s after the group. \s matches one space. So, we are asking sed to separate out (.*)\s - i.e. ()
We repeat that to tell sed - (group1)(group2)(group3)
First group's shorthand is \1, group2's shorthand is \2 etc.
For replacement, we tell sed to arrange \3 (group3) first, then \2 (group2) and then \1 (group1)
( is a special character in sed. So we have to escape it by a forward slash. So, (.*)\s(.*)\s(.*) becomes \(.*\)\s\(.*\)\s\(.*\). Oh so pretty!
In sed you can do:
sed 's/\([^[:blank:]]*\)[[:blank:]]*\(.*\)/\2 \1/'
Which outputs 3002_3322 3.2.1.log HELLO.
Explanation
The first word is captured by
\([^[:blank:]]*\)
The \(\) means I want to capture this group to use later. [:blank:] is a POSIX character class for whitespace characters. You can see the other POSIX character classes here:
http://www.regular-expressions.info/posixbrackets.html
The outer [] means match anyone of the characters, and the ^ means any character except those listed in the character class. Finally the * means any number of occurrences (including 0) of the previous character. So in total [^[:blank:]]* this means match a group of characters that are not whitespace, or the first word. We have to do this somewhat complicated regex because POSIX sed only supports BRE (basic regex) which is greedy matching, and to find the first word we want non-greedy matching.
[[:blank:]]*, as explained above, this means match a group of consecutive whitespaces.
\(.*\) This means capture the rest of the line. The . means any single character, so combined with the * it means match the rest of the characters.
For the replacement, the \2 \1 means replace the pattern we matched with the 2nd capture group, a space, then the first capture group.
This might work for you (GNU sed):
sed -r 's/^(\S+)(\s+)(.*)/\3\2\1/' file
Pattern match non-spaces, spaces and what is left and then use the remembered patterns (back references) in the replacement part of the substitution command.
N.B. The -r aurgument just removes the need for copius back slashes, so the same solution may be written as:
sed 's/^\(\S\S*\)\(\s\s*\)\(.*\)/\3\2\1/' file
This also removes the syntatic sugar of the the metacharacter + which means one or more of the preceeding pattern.
Further note, that \S and \s may be replaced by [^[:space:]] and [[:space:]] respectively. Leading to:
sed 's/^\([^[:space:]][^[:space:]]*\)\([[:space:]][[:space:]]*\)\(.*\)/\3\2\1/' file
You can do this too (without awk or sed):
#!/bin/sh
String1="HELLO 3002_3322 3.2.1.log"
start="${String1%% *}"
end="${String1#* }"
output="$end $start"
echo "$output"
Or using cut (in Bash):
#!/bin/bash
String1="HELLO 3002_3322 3.2.1.log"
rstr="$(echo "$String1" |cut -d" " -f1)"
output="${String1/$rstr /} $rstr"
echo "$output"

How can I use sed to get an xml value

How can I use sed to get the SOMETHING in <version.suffix>SOMETHING</version.suffix>?
I tried sed 's#.*>\(.*\)\<version\.suffix\>#\1#' ,but fails.
Try this one:
sed 's/<.*>\(.*\)<.*>/\1/'
It should be general enough to get every xml value.
If you need to eliminate the indentation add \s* at the beginning like this:
sed 's/\s*<.*>\(.*\)<.*>/\1/'
Alternatively if you only want version.suffix's value, you can make the command more specific like this:
sed 's/<version\.suffix>\(.*\)<.*>/\1/'
You could use the below sed command,
$ echo '<version.suffix>SOMETHING</version.suffix>' | sed 's#^<[^>]*>\(.*\)<\/[^>]*>$#\1#'
SOMETHING
^<[^>]*> Matches the first tag string <version.suffix>.
\(.*\)<\/[^>]*>$ Characters upto the next closing tag are captured. And the remaining closing tag was matched by this <\/[^>]*> regex.
Finally all the matched characters are replaced by the characters which are present inside the group index 1.
Your regex is correct but the only thing is, you forget to use / inside the closing tag.
$ echo '<version.suffix>SOMETHING</version.suffix>' | sed 's#.*>\(.*\)</version\.suffix>#\1#'
|<-Here
SOMETHING
Many ways possible, e.g:
with sed
echo '<version.suffix>SOMETHING</version.suffix>' | sed 's#<[^>]*>##g'
or grep
echo '<version.suffix>SOMETHING</version.suffix>' | grep -oP '<version.suffix>\KSOMETHING(?=</version.suffix>)'
Assuming the formatting of the question is accurate, when I run the example in the question as-is:
$ echo '<version.suffix>SOMETHING</version.suffix>' | sed 's#.*>\(.*\)\<version\.suffix\>#\1#'
I see the following output:
SOMETHING</>
In case my formatting skills fail me, this output ends with the trailing left angle bracket, a forward slash, and finally the right angle bracket.
So, why this "failure"? Well, on my system (Linux with GNU grep 2.14), grep(1) includes the following snippet:
The Backslash Character and Special Expressions
The symbols \< and \> respectively match the empty string at the beginning and end of a word.
Other answers suggest good alternatives to extract the value in XML tag syntax; use them.
I just wanted to point out why the RE in the original problem fails on current Linux systems: some symbols match no actual characters, but instead match empty boundaries in these apps that support posix-extended regular expressions. So, in this example, the brackets in the source are matched in unexpected ways:
the (.*)has matched SOMETHING</, to be printed by the \1 back-reference
the left-hand side of version.suffix is matched by \<
version.suffix is matched by version\.suffix
the right-hand side of version.suffix is matched by \>
the trailing > character remains in sed's pattern space and is printed.
TL;DR -"\X" does not mean "just match an X" for all X!

Bash - Changing configuration file with sed

I've been having some problems with a shell script that changes a configuration file named ".backup.conf".
The configuration file looks like this:
inputdirs=(/etc /etc/apm /usr/local)
outputdir="test_outputdir"
backupmethod="test_outputmethod"
loglocation="test_loglocation"`
My script needs to change one of the configuration file variables, and I've had no trouble with the last 3 variables.
If I wanted to change variable "inputdirs" /etc/ to /etc/perl, what expression should I use?
If I use echo with append, it will only append it to the end of the file.
I've tried using sed in the following format:
sed -i 's/${inputdirs[$((izbor-1))]}/$novi/g' .backup.conf where "izbor" is which variable I want to change from inputdirs and "novi" is the new path (e.g. /etc/perl).
So, with the following configuration file, and with variables $izbor=1and $novi=/etc/perl I should change the first variable inputdirs=/etc to /etc/perl
and the variable inputdirs should finally look like inputdirs=(/etc/perl /etc/apm /usr/local)
Thank you for your help!
You could try this:
enovi="$(printf '%s\n' "$novi" | sed -e 's/[\\&/]/\\&/g')"
izbor1="$(expr "$izbor" - 1)"
sed -rie "s/([(]([^ ]* ){$izbor1})[^ )]*/\\1$enovi/" config.txt
A summary of the commands:
The first line generates a variable $enovi that has the escaped contents of $novi. Basically,the following characters are escaped: &, \, and /. So /etc/perl becomes \/etc\/perl.
We create a new variable decrementing $izbor.
This is the actual substitute expression. I'll explain it in parts:
First we match the parenthesis character [(].
We will now search for a sequence of non-spaces followed by a space ([^ ]*).
This search (identified by grouping in the inner parenthesis) is repeated $izbor1 times ({$izbor1})
The previous expressions are grouped into an outer parenthesis group in order to be captured into an auxiliary variable \1.
We now match the word we want to replace. It is formed by a sequence of characters that aren't spaces and isn't a closing parenthesis (this is to handle the case of the last word)
The replacement is formed by the captured value \1, followed by our new string.
Hope this helps =)
If you are trying to use $izbor as an index, it will probably want to be a flag to s///. Assuming your input matches ^inputdirs=( (with no whitespace), you can probably get away with:
sed -i '/^inputdirs=(/{
s/(/( /; s/)/ )/; # Insert spaces inside parentheses
s# [^ ][^ ]* # '"$novi#$izbor"';
s/( /(/; s/ )/)/; } # Remove inserted spaces
' .backup.conf
The first two expressions ensure that you have whitespace inside the parentheses,
so may not be necessary if your input already has whitespace there. It's a bit obfuscated above, but basically the replacement you are doing is something like:
s# [^ ][^ ]* #/etc/perl#2
where the 2 flag tells sed to only replace the second occurrence of the match. This is really fragile, since it requires no whitespace before inputdirs and whitespace inside the parens and does not handle tabs, but it should work for you. Also, some sed allow [^ ][^ ]* to be written more simply as [^ ]+, but that is not universal.

Resources