grep is not working as expected

grep is not working as expected - linux

i have tried to grep in a file ......
in file i have 5 entities
vivek
vivek.a
a.vivek
vivek_a
a_vivek
when i grep as grep -iw vivek filename, then it should give me
vivek only but it give
vivek
vivek.a
a.vivek

Looks fine to me. . is a non-word character. If you meant something else then you should have used a more-specific regex instead of using -w.

It does that because the definition of a word (which is what the w option chooses) permits . to separate words, though _ is considered part of a word. This definition is useful for programming languages, but not so useful for English text.

A set of characters with letters, underscore and digits is considered as a word. So any other character apart from that set denotes the word boundary. Therefore, in the line "vivek.a", the dot denotes end of word, and all the characters before that form a word "vivek", which matches with the word you are trying to match using option -w.
So, one way is to define your own word boundaries like this:
$ grep -i -e "[[:space:]]vivek[[:space:]]" -e "^vivek[[:space:]]" -e "[[:space:]]vivek$" -e "^vivek$" file

Related

Output the names of all files from file.txt, having the .conf extension

I need to output from a file file.txt the names of all files with the .conf extension.
grep .conf file.txt
But in the end, I get a file called dconf and a file with the config extension. How can I output everything else, but without these two?

The '.' has a special meaning, it says "any character". If you really want to match only the dot itself, you have to mask the character with:
grep "\.conf" file.txt
The masking with backslash must also be masked for the shell itself with ".
To see a list of regular expressions, you can take a look at online regex test.
Add on:
From the comments: How to see no file from the list which is named xyz.config
Answer: You have to tell grep that the regular expression ends at the end of the word with:
grep "\.conf\>" file.txt

TL;DR: you should instead do:
grep "\.conf\>" file.txt
grep uses Regular Expressions. The . character in a regex is a command which means "match any one character." So your command means "match any string which contains one character followed by c o n f in that order."
So, your regular expression will match what you are looking for, but it will also match strings that have things after your match (your .config example) as well as anything followed by "conf" (your dconf example)
So instead you want to tell grep that you are looking for a "string literal ." by escaping that character in your regular expression by preceding it with a backslash (\), and you want to describe what the end or your string input is like, which may be a newline or it may simply be a space.

sed doesn't replace variable

I'm trying to replace some regex line in a apache file.
i define:
OLD1="[0-9]*.[0-9]+"
NEW1="[a-z]*.[0-9]"
when i'm executing:
sed -i 's/$OLD1/$NEW1/g' demo.conf
there's no change.
This is what i tried to do
sed -i "s/${OLD1}/${NEW1}/g" 001-kms.conf
sed -i "s/"$OLD1"/"$NEW1"/g" 001-kms.conf
sed -i "s~${OLD1}~${NEW1}~g" 001-kms.conf
i'm expecting that the new file will replace $OLD1 with $NEW1

OLD1="[0-9]*.[0-9]+"
Because the [ * . are all characters with special meaning in sed, we need to escape them. For such simple case something like this could work:
OLD2=$(<<<"$OLD1" sed 's/[][\*\.]/\\&/g')
It will set OLD2 to \[0-9\]\*\.\[0-9\]+. Note that it doesn't handle all the possible cases, like OLD1='\.\[' will convert to OLD2='\\.\\[ which means something different. Implementing a proper regex to properly escape, well, other regex I leave as an exercise to others.
Now you can:
sed "s/$OLD2/$NEW1/g"
Tested with:
OLD1="[0-9]*.[0-9]+"
NEW1="[a-z]*.[0-9]"
sed "s/$(sed 's/[][\*\.]/\\&/g' <<<"$OLD1")/$NEW1/g" <<<'XYZ="[0-9]*.[0-9]+"'
will output:
XYZ="[a-z]*.[0-9]"

you need matching on exact string
You would need something that can match on exact string [0-9]*.[0-9]+ which sed does not support well.
Therefore instead I am using this pipeline replacing one character at a time (it also is easier to read I think):echo "[0-9]*.[0-9]+" | sed 's/0/a/' | sed 's/9/z/' | sed 's/+//'
You would have to cat your files or use find with execute to then apply this pipe.
I had tried following (from other SO answers):
- sed 's/\<exact_string/>/replacement/'doesn't work as \< and \> are left and right word boundaries respectively.
- sed 's/(CB)?exact_string/replacement/'found in one answer but nowhere in documentation
I used Win 10 bash, git bash, and online Linux tools with the same results.
when I thought matching was on the pattern rather than exact string
Replacement cannot be a regex - at most it can reference parts of the regex expression which matched. From man sed:
s/regexp/replacement/
Attempt to match regexp against the pattern space. If successful, replace that portion matched with replacement. The replacement may contain the special character & to refer to that portion of the pattern space which matched, and the special escapes \1 through \9 to refer to the corresponding matching sub-expressions in the regexp.
Additionally you have to escape some characters in your regex (specifically . and +) unless you add option -E for extended regex as per comment under your question. (N.B. only if you want to match on the full-stop . rather than it meaning any character)
$ echo "01.234--ZZZ" | sed 's/[0-9]*\.[0-9]\+/REPLACEMENT/g'
REPLACEMENT--ZZZ

Remove text between one string and 1st occurrence of another string

I have found several solutions to remove text between two strings but I guess my case is a little different.
I am trying to convert this:
/nz/kit.7.2.0.7/bin/adm/tools/hostaekresume
To this:
/nz/kit/bin/adm/tools/hostaekresume
Basically remove the version specific information from the filename.
The solutions I have found remove everything from the word kit to the last occurrence of /. I need something to remove from kit to the first occurrence.
The most common solution I have seen is:
sed -e 's/\(kit\).*\(\/\)/\1\2/'
Which produces:
/nz/kit/hostaekresume
How can I only remove up to the first /? I assume this can done with sed or awk, but open to suggestions.

$ sed 's|\(kit\)[^/]*|\1|' <<< '/nz/kit.7.2.0.7/bin/adm/tools/hostaekresume'
/nz/kit/bin/adm/tools/hostaekresume
This uses a different delimiter (| instead of /) so we don't have to escape the /. Then, for non-greedy matching, it uses [^/]*: any number of characters other than /, which matches everything between kit and the next /.
Alternatively, if you know that what you want to remove consists of dots and digits, and nothing else in the string contains them, you can use parameter expansion:
$ var='/nz/kit.7.2.0.7/bin/adm/tools/hostaekresume'
$ echo "${var//[[:digit:].]}"
/nz/kit/bin/adm/tools/hostaekresume
The syntax is ${parameter/pattern/string}, where pattern in the expanded parameter is replaced by string. If we use // instead of /, all occurrences instead of just the first are replaced.
In our case, parameter is var, the pattern is [[:digit:].] (digits or a dot – this is a glob pattern, not a regular expression, by the way), and we've skipped the /string part, which just removes the pattern (replaces it with nothing).

You need perl for non-greedy regex. sed doesn't do that yet.
Also, use | as a delimiter since / can cause confusion when you have it in your regex.
perl -pe 's|(kit).*?(/.*)|\1\2|'
The ? after the .* makes the pattern non-greedy and will match the first instance of /.
echo "/nz/kit.7.2.0.7/bin/adm/tools/hostaekresume" | perl -pe 's|(kit).*?(/.*)|\1\2|'
returns
/nz/kit/bin/adm/tools/hostaekresume

echo "/nz/kit.7.2.0.7/bin/adm/tools/hostaekresume" | awk '{sub(/.7.2.0.7/,"")}1'
/nz/kit/bin/adm/tools/hostaekresume

How can I use sed to get an xml value

How can I use sed to get the SOMETHING in <version.suffix>SOMETHING</version.suffix>?
I tried sed 's#.*>\(.*\)\<version\.suffix\>#\1#' ,but fails.

Try this one:
sed 's/<.*>\(.*\)<.*>/\1/'
It should be general enough to get every xml value.
If you need to eliminate the indentation add \s* at the beginning like this:
sed 's/\s*<.*>\(.*\)<.*>/\1/'
Alternatively if you only want version.suffix's value, you can make the command more specific like this:
sed 's/<version\.suffix>\(.*\)<.*>/\1/'

You could use the below sed command,
$ echo '<version.suffix>SOMETHING</version.suffix>' | sed 's#^<[^>]*>\(.*\)<\/[^>]*>$#\1#'
SOMETHING
^<[^>]*> Matches the first tag string <version.suffix>.
\(.*\)<\/[^>]*>$ Characters upto the next closing tag are captured. And the remaining closing tag was matched by this <\/[^>]*> regex.
Finally all the matched characters are replaced by the characters which are present inside the group index 1.
Your regex is correct but the only thing is, you forget to use / inside the closing tag.
$ echo '<version.suffix>SOMETHING</version.suffix>' | sed 's#.*>\(.*\)</version\.suffix>#\1#'
|<-Here
SOMETHING

Many ways possible, e.g:
with sed
echo '<version.suffix>SOMETHING</version.suffix>' | sed 's#<[^>]*>##g'
or grep
echo '<version.suffix>SOMETHING</version.suffix>' | grep -oP '<version.suffix>\KSOMETHING(?=</version.suffix>)'

Assuming the formatting of the question is accurate, when I run the example in the question as-is:
$ echo '<version.suffix>SOMETHING</version.suffix>' | sed 's#.*>\(.*\)\<version\.suffix\>#\1#'
I see the following output:
SOMETHING</>
In case my formatting skills fail me, this output ends with the trailing left angle bracket, a forward slash, and finally the right angle bracket.
So, why this "failure"? Well, on my system (Linux with GNU grep 2.14), grep(1) includes the following snippet:
The Backslash Character and Special Expressions
The symbols \< and \> respectively match the empty string at the beginning and end of a word.
Other answers suggest good alternatives to extract the value in XML tag syntax; use them.
I just wanted to point out why the RE in the original problem fails on current Linux systems: some symbols match no actual characters, but instead match empty boundaries in these apps that support posix-extended regular expressions. So, in this example, the brackets in the source are matched in unexpected ways:
the (.*)has matched SOMETHING</, to be printed by the \1 back-reference
the left-hand side of version.suffix is matched by \<
version.suffix is matched by version\.suffix
the right-hand side of version.suffix is matched by \>
the trailing > character remains in sed's pattern space and is printed.
TL;DR -"\X" does not mean "just match an X" for all X!

Using the tr command to change a single word into uppercase

I'm having difficulty interpreting the tr --help.
I know that
tr [:lower:] [:upper:] <inputfile
turns all the characters in the file into uppercase
How do I turn a single word into uppercase?
I am not limited to using tr. I am looking for a way to scan a file (or input) for a set sequence for letters and then once it finds a match to turn them into uppercase letters.

Can sed solve your problem?
sed 's/sequence/SEQUENCE/g' < inputfile

If ghost is the word you are looking to upper-case, the following might do the trick. Here \< and \> represent word-boundaries. \( and \) delineate the capture group and \U upper-cases the captured group \1
sed 's/\(\<ghost\>\)/\U\1/g'

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

grep is not working as expected - linux

i have tried to grep in a file ...... in file i have 5 entities vivek vivek.a a.vivek vivek_a a_vivek when i grep as grep -iw vivek filename, then it should give me vivek only but it give vivek vivek.a a.vivek

Looks fine to me. . is a non-word character. If you meant something else then you should have used a more-specific regex instead of using -w.

It does that because the definition of a word (which is what the w option chooses) permits . to separate words, though _ is considered part of a word. This definition is useful for programming languages, but not so useful for English text.

Related

Output the names of all files from file.txt, having the .conf extension

sed doesn't replace variable

Remove text between one string and 1st occurrence of another string

How can I use sed to get an xml value

Using the tr command to change a single word into uppercase

Categories

Resources