Grep : find patterns in text file

Grep : find patterns in text file - linux

I have to find the following patterns in a text file
\xc2d
d\xa0
\xe7
\xc3\ufffdd
\xc3\ufffdd
\xc2\xa0
\xc3\xa7
\xa0\xa0
I start with finding \x occurrences and do this
grep "\\x" *.log | more
and nothing returns, is this query correct?

I think you'll want to use single quotes instead of double quotes.
grep '\\x' *.log | more
Your shell is likely stripping that first backslash as part of the processing it does for strings in double quotes, which grep needs.

Related

How to correctly detect and replace apostrophe (') with sed?

I'm having a directory with many files having special characters and spaces. I want to perform an operation with all these files so I'm trying to store all filenames in a list.txt and then run the command with this list.
The special characters in my list are & []'.
So basically I want to use sed to replace each occurence with \ + the character in question.
E.g. : filename .txt => filename\ .txt etc...
The thing is I have trouble handling apostrophes.
Here is my command as of now :
ls | sed 's/\ /\\ /g' | sed 's/\&/\\&/g' | sed "s/\'/\\'/g" | sed 's/\[/\\[/g' | sed 's/\]/\\]/g'
At first I had issues with, I believe, the apostrophes in the string command in conflict with the apostrophes surrounding the string. So I used double quotes instead, but it still doesn't work.
I've tried all these and nothing worked :
sed "s/\'/\\'/g" (escaping the apostrophe)
sed "s/'/\'/g" (escaping nothing)
sed "s/'/\\'/g" (escaping the backslash)
sed 's/"'"/\"'"/g' (double quoting single quote)
As a disclaimer, I must say, I'm completely new to sed. I just run my first sed command today, so maybe I'm doing something wrong I didn't realize.
PS : I've seen those thread, but no answer worked for me :
https://unix.stackexchange.com/questions/157076/how-to-remove-the-apostrophe-and-delete-the-space
How to replace to apostrophe ' inside a file using SED

This may do:
cat file
avbadf
test&rr
more [ yes
this ]
and'df
sed -r 's/(\x27|&|\[|\])/\\\1/g' file
avbadf
test\&rr
more \[ yes
this \]
and\'df
\x27 is equal to singe quote '
\x22 is equal to double quote "

Whoops, I found the answer to my question. Here is the working input :
sed "s/'/\\\'/g"
This will effectively replace any ' with \'.
However I'm having trouble understanding exactly what's happening here.
So if I understand correctly, we are escaping the backslash and the apostrophe in the replacement string. Now, if somebody could answer some those, I would be grateful :
Why don't we need to escape the first quote (the one in the pattern to find) ?
Why do we have to escape the backslash whereas for the other characters, there's no need ?
Why do we need to escape the second quote (the one in the replacement string) ?

I think all of your sed matches actually need that replacement pattern. This one seems to work for all examples:
ls | sed "s/\ /\\\ /g" | sed "s/\&/\\\&/g" | sed "s/\[/\\\[/g" | sed "s/\]/\\\]/g" | sed "s/'/\\\'/g"
So it is s/regex/replacement/command and 'regex' and 'replacement' have different sets of special characters.
The only one that's different is s/'/\\\'/g and there only because I don't believe there is any special ' character on the regex expression. There is some obscure \' special character in the replacement expression, for matching buffer ends in multi-line mode, accord to the docs. That might be why it needs an escape in the replacement side, but not in the regex side.
For example, \5 is a special character in the replacement expression, so to replace:
filename5.txt -> filename\5.txt
You would also need, as with apostrophe:
sed "s/5/\\\5/g"
It probably has to do with the mysterious inner works of sed parsing, it might read from right to left or something.

Please try the following:
sed 's/[][ &'\'']/\\&/g' file
By using the same example by #Jotne, the result will be:
gavbadf
gtest\&rr
gmore\ \[\ yes
gthis\ \]
gand\'df
[How it works]
The regex part in the sed s command above just defines a character
class of & []', which should be escaped with a backslash.
The right square bracket ] does not need escaping when put
immediately after the left square bracket [.
The obfuscating part will be the handling of a single quote.
We cannot put a single quote within single quotes even if we escape it.
The workaround is as follows: Say we have an assignment str='aaabbb'.
To put a single quote between "aaa" and "bbb", we can say as
str='aaa'\''bbb'.
It may look puzzling but it just concatenates the three sequences;
1) to close the single-quoted string as 'aaa'.
2) to put a single quote with an escaping backslash as \'.
3) to restart the single-quoted string as 'bbb'.
Hope this helps.

Do bash script and command grep treat single quote differently

In Advanced Bash-Scripting Guide, I find
Within single quotes, every special character except ' gets
interpreted literally.
So I think grep '\<the\>' file.txt would search \<the\>, instead of word the. But it searches the indeed.
#!/bin/bash
grep '\<the\>' file.txt
Added
Maybe I don't describe my question clearly.In man page,
Enclosing characters in single quotes preserves the literal value of each character within the quotes.
So my question is: Now that bash would regard enclosing characters in single quote as the literal value, why '\<the\>' is treated as the in grep? Is it grep own characteristic,differing from bash?

Indeed, bash will pass your string literally.
It is grep that interpretes the string (as a regular expression). If you want to avoid that, use grep -F. With that option, grep will search literally for the given string.

You need to add another backslash \ to match the whole pattern, as the symbols \< and \> are special to grep. Quoting the manpage: man grep
The Backslash Character and Special Expressions
The symbols \< and \> respectively match the empty string at the beginning and end
of a word.

How to grep for this string that contains an equal sign?

Below is the string I am trying to grep for this in the bash shell:
'#Hostname=sometext.company.com, sometext.company.com' filename
I want to only find the string if it matches that exact pattern. I already tried the command below and a few others.
grep -Fx "#Hostname=sometext.company.com, sometext.company.com" filename

Did you specify the -xoption on purpose?
grep -F '#Hostname=sometext.company.com, sometext.company.com' filename
most likely is what you want. Also, it's better to put single quotes instead of double quotes, just in case your search pattern happens to contain special shell characters.

find words in two quotes unix

I would like to display the last word in these lines I tried to look for example the word value but no answer, so I thought to look for the words between quotes but my file contains other words between quotes that I have I need not actually want to display the values of the select tag knowing that my html file is.
grep '*' hosts.html | awk '{print $NF}'
For example:
value='www.visit-tunisia.com'>www.visit-tunisia.com
value='www.watania1.tn'>www.watania1.tn
value='www.watania2.tn'>www.watania2.tn
I would have
www.visit-tunisia.com
www.watania1.tn
www.watania2.tn

You need to set the field separator to > you do this with the -F option:
$ awk -F'>' '{print $NF}' hosts.html
www.visit-tunisia.com
www.watania1.tn
www.watania2.tn
Note: I'm not sure what you are trying to achieve by grep '*' hosts.html?

Interpreting the comment liberally, you have input lines which might contain:
value='www.visit-tunisia.com'>www.visit-tunisia.com
value='www.watania1.tn'>www.watania1.tn
value='www.watania2.tn'>www.watania2.tn
and you would like the names which are repeated on a line as the output:
www.visit-tunisia.com
www.watania1.tn
www.watania2.tn
This can be done using sed and capturing parentheses.
sed -n -e "s/.*'\([^']*\)'.*\1.*/\1/p"
The -n says "don't print unless I say to do so". The s///p command prints if the substitute works. The pattern looks for a stream of 'anything' (.*), a single quote, captures what's inside up to the next single quote ('\([^']*\)') followed by any text, the captured text (the first \1), and anything. The replacement text is what was captured (the second \1).
Example:
$ cat data
www and wotnot
value='www.visit-tunisia.com'>www.visit-tunisia.com
blah
value='www.watania1.tn'>www.watania1.tn
hooplah
value='www.watania2.tn'>www.watania2.tn
if 'nothing' is required, nothing will be done.
$ sed -n -e "s/.*'\([^']*\)'.*\1.*/\1/p" data
www.visit-tunisia.com
www.watania1.tn
www.watania2.tn
nothing
$
Clearly, you can refine the [^']* part of the match if you want to. I used double quotes around the expression since the pattern matches on single quotes. Life is trickier if you need to allow both single and double quotes; at that point, I'd put the script into a file and run sed -f script data to make life easier.

sed 's/.*>\(.*\)/\1/g' your_file

shell scripting for token replacement in all files in a folder

HI
I am not very good with linux shell scripting.I am trying following shell script to replace
revision number token $rev -<rev number> in all html files under specified directory
cd /home/myapp/test
set repUpRev = "`svnversion`"
echo $repUpRev
grep -lr -e '\$rev -'.$repUpRev.'\$' *.html | xargs sed -i 's/'\$rev -'.$repUpRev.'\$'/'\$rev -.*$'/g'
This seems not working, what is wrong with the above code ?

rev=$(svnversion)
sed -i.bak "s/$rev/some other string/g" *.html

What is $rev in the regexp string? Is it another variable? Or you're looking for a string '$rev'. If latter - I would suggest adding '\' before $ otherwise it's treated as a special regexp character...

This is how you show the last line:
grep -lr -e '\$rev -'.$repUpRev.'\$' *.html | xargs sed -i 's/'\$rev -'.$repUpRev.'\$'/'\$rev -.*$'/g'
It would help if you showed some input data.
The -r option makes the grep recursive. That means it will operate on files in the directory and its subdirectories. Is that what you intend?
The dots in your grep and sed stand for any character. If you want literal dots, you'll need to escape them.
The final escaped dollar sign in the grep and sed commands will be seen as a literal dollar sign. If you want to anchor to the end of the line you should remove the escape.
The .* works only as a literal string on the right hand side of a sed s command. If you want to include what was matched on the left side, you need to use capture groups. The g modifier on the s command is only needed if the pattern appears more than once in a line.
Using quote, unquote, quote, unquote is hard to read. Use double quotes to permit variable expansion.
Try your grep command by itself without the xargs and sed to see if it's producing a list of files.
This may be closer to what you want:
grep -lr -e "\$rev -.$repUpRev.$" *.html | xargs sed -i "s/\$rev -.$repUpRev.$/\$rev -REPLACEMENT_TEXT/g"
but you'll still need to determine if the g modifier, the dots, the final dollar signs, etc., are what you intend.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Grep : find patterns in text file - linux

I have to find the following patterns in a text file \xc2d d\xa0 \xe7 \xc3\ufffdd \xc3\ufffdd \xc2\xa0 \xc3\xa7 \xa0\xa0 I start with finding \x occurrences and do this grep "\\x" *.log | more and nothing returns, is this query correct?

I think you'll want to use single quotes instead of double quotes. grep '\\x' *.log | more Your shell is likely stripping that first backslash as part of the processing it does for strings in double quotes, which grep needs.

Related

How to correctly detect and replace apostrophe (') with sed?

Do bash script and command grep treat single quote differently

How to grep for this string that contains an equal sign?

find words in two quotes unix

shell scripting for token replacement in all files in a folder

Categories

Resources