Why does "grep" behave differently in this example? - linux

I am trying to grep a line from a file that starts with 'Residue XXX'. It works when I have only 'Residue' but does not work if I have 'Residue XXX' Any reasons for this behavior?
Here it is working:
grep '^Residue' log.txt
Residue XXX highDenisity
and not working:
grep '^Residue XXX' log.txt

Hold up your hand if you think there's a tab character between Residue and XXX.

Use a whitespace-agnostic regex instead.
grep '^Residue\s*XXX' log.txt
This regex covers multiple tabs or a tab and a space between the words.

Or you can just do a logical and with awk like this:
awk '/^Residue/ && /XXX/' file
This will only output lines that do start with Residue and also contain XXX
You can also make the and like this:
awk '/^Residue.*XXX/' file

Related

Can grep show output only if the line contain another search string? [duplicate]

I am trying to extract text from a file between a < and a >, but only on a line starting with another specific pattern.
So in a file that looks like:
XXX Something here
XXX Something more here
XXX <\Lines like this are a problem>
ZZZ something <\This is the text I need>
XXX Don't need any of this
I would like to print only the <\This is the text I need>.
If I do
sed -n '/^ZZZ/p' FILENAME
it pulls the correct lines I need to look at, but obviously prints the whole line.
sed -n '/<\/,/>/p' FILENAME prints way too much.
I have looked into grouping and tried
sed -n '/^ZZZ/{/<\/,/>/} FILENAME
but this doesn't seem to work at all.
Any suggestions? They will be much appreciated.
(Apologies for formatting, never posted on here before)
sed -n '/^ZZZ/ { s/^.*\(<.*>\).*$/\1/p }'
If it does not have to be sed and you have a fairly recent grep, you may use grep's option -o as in
grep '^ZZZ' | grep -o '<[^>]*>'
An awk version
awk -F"<|>" '/^ZZZ/ {print "<"$2">"}' file
<\This is the text I need>

How to find / \ (slashes) in a text file using shell script

I have a file a.txt and I want to find slashes(/ and \ ) in it, so I used this command in my a.ksh script slashcheck=cat a.txt | grep '/'.
but how do I check for '' backward slash at the same time?
you can try this: cat a.txt | grep "[\/]"
Please try with:
cat a.txt | egrep '(\/|\\)'
Try using a regular expression.
You could use something like grep -E '[/\\]'.
Note that you will need two backslashes because the first one masks the second one.
Using squared brackets allows any character in these brackets to be a match.
The -E stands for extended and allows regular expressions to be used as seach patterns.

Filter out only matched values from a text file in each line

I have a file "test.txt" with the lines below and also lot bunch of extra stuff after the "version"
soainfra_metrics{metric_group="sca_composite",partition="test",is_active="true",state="on",is_default="true",composite="test123"} map:stats version:1.0
soainfra_metrics{metric_group="sca_composite",partition="gello",is_active="true",state="on",is_default="true",composite="test234"} map:stats version:1.8
soainfra_metrics{metric_group="sca_composite",partition="bolo",is_active="true",state="on",is_default="true",composite="3415"} map:stats version:3.1
soainfra_metrics{metric_group="sca_composite",partition="solo",is_active="true",state="on",is_default="true",composite="hji"} map:stats version:1.1
I tried:
egrep -r 'partition|is_active|state|is_default|composite' test.txt
It's displaying every line, but I need only specific mentioned fields like this below,ignoring rest of the data/stuff or lines
in a nut shell, i want to display only these fields from a line not the rest
partition="test",is_active="true",state="on",is_default="true",composite="test123"
partition="gello",is_active="true",state="on",is_default="true",composite="test234"
partition="bolo",is_active="true",state="on",is_default="true",composite="3415"
partition="solo",is_active="true",state="on",is_default="true",composite="hji"
If your version of grep supports Perl-style regular expressions, then I'd use this:
grep -oP '.*?,\K[^}]+' file
It removes everything up to the first comma (\K kills any previous output) and prints everything up to the }.
Alternatively, using awk:
awk -F'}' '{ sub(/[^,]+,/, ""); print $1 }' file
This sets the field separator to } so the part you're interested in is the first field. It then uses sub to remove the part up to the first comma.
For completeness, you could also use sed:
sed 's/[^,]*,\([^}]*\).*/\1/' file
This captures the part after the first , up to the } and replaces the content of the line with it.
After the grep to pick out the lines you want, use sed to edit the lines:
sed 's/.*\(partition[^}]*\)} map.*/\1/'
This means: "whenever you see anything .*, followed by partition and
any number of non-}, then } map and anything else, grab the part
from partition up to but not including the brace \(...\) as group 1.
The replacement text is just group 1 \1.
Use a pipe | to connect the output of egrep to the input of sed:
egrep ... | sed ...
As far as i understood your file might have more lines you don't want to see, so i would use:
sed -n 's/.*\(partition.*\)}.*/\1/p' file
we use -n p to show only lines where we made substitution. The substitution part just gets the part of the line you need substituting the whole line with the pattern.
This might work for you (GNU sed):
sed -r 's/(partition|is_active|state|is_default|composite)="[^"]*"/\n&\n/g;s/[^\n]*\n([^\n]*)\n[^\n]*/\1,/g;s/,$//' file
Treat the problem as if it were a "decomposed club sandwich". Identify the fillings, remove the bread and tidy up.

How to grep exact literal string (no regex)

Is there a way to grep (or use another command) to find exact strings, using NO regex?
For example, if I want to search for (literally):
/some/file"that/has'lots\of"invalid"chars/and.triggers$(#2)[*~.old][3].html
I don't want to go through and escape every single "escapable". Essentially, I want to pass it through, like I would with echo:
$ echo "/some/file\"that/has'lots\of\"invalid\"chars/and.triggers$(#2)[*~.old][3].html"
/some/file"that/has'lots\of"invalid"chars/and.triggers$(#2)[*~.old][3].html
Use fgrep, it's the same as grep -F (matches a fixed string).
Well, you can put the information you want to match, each in a line, and then use grep:
grep -F -f patterns.txt file.txt
Notice the usage of the flag -F, which causes grep to consider each line of the file patterns.txt as a fixed-string to be searched in file.txt.

Replacing a line in a csv file?

I have a set of 10 CSV files, which normally have a an entry of this kind
a,b,c,d
d,e,f,g
Now due to some error entries in this file have become of this kind
a,b,c,d
d,e,f,g
,,,
h,i,j,k
Now I want to remove the line with only commas in all the files. These files are on a Linux filesystem.
Any command that you recommend that can replaces the erroneous lines in all the files.
It depends on what you mean by replace. If you mean 'remove', then a trivial variant on #wnoise's solution is:
grep -v '^,,,$' old-file.csv > new-file.csv
Note that this deletes just those lines with exactly three commas. If you want to delete mal-formed lines with any number of commas (including zero) - and no other characters on the line, then:
grep -v '^,*$' ...
There are endless other variations on the regex that would deal with other scenarios. Dealing with full CSV data with commas inside quotes starts to need something other than a regex machine. It can be done, within broad limits, especially in more complex regex systems such as PCRE or Perl. But it requires more work.
Check out Mastering Regular Expressions.
sed 's/,,,/replacement/' < old-file.csv > new-file.csv
optionally followed by
mv new-file.csv old-file.csv
Replace or remove, your post is not clear... For replacement see wnoise's answer. For removing, you could use
awk '$0 !~ /,,,/ {print}' <old-file.csv > new-file.csv
What about trying to keep only lines which are matching the desired format instead of handling one exception ?
If the provided input is what you really want to match:
grep -E '[a-z],[a-z],[a-z],[a-z]' < oldfile.csv > newfile.csv
If the input is different, provide it, the regular expression should not be too hard to write.
Do you want to replace them with something, or delete them entirely? Either way, it can be done with sed. To delete:
sed -i -e '/^,\+$/ D' yourfile1.csv yourfile2.csv ...
To replace: well, see wnoise's answer, or if you don't want to create new files with the output,
sed -i -e '/^,\+$/ s//replacement/' yourfile1.csv yourfile2.csv ...
or
sed -i -e '/^,\+$/ c\
replacement' yourfile1.csv yourfile2.csv ...
(that should be entered exactly as is, including the line break). Of course, you can also do this with awk or perl or, if you're only deleting lines, even grep:
egrep -v '^,+$' < oldfile.csv > newfile.csv
I tested these to make sure they work, but I'd advise you to do the same before using them (just in case). You can omit the -i option from sed, in which case it'll print out the results (rather than writing them back to the file), or omit the output redirection >newfile.csv from grep.
EDIT: It was pointed out in a comment that some features of these sed commands only work on GNU sed. As far as I can tell, these are the -i option (which can be replaced with shell redirection, sed ... <infile >outfile ) and the \+ modifier (which can be replaced with \{1,\} ).
Most simply:
$ grep -v ,,,, oldfile > newfile
$ mv newfile oldfile
yes, awk or grep are very good option if you are working in linux platform. However you can use perl regex for other platform. using join & split options.

Resources