I created a file via Linux terminal and named it people, the contents of this file are as follows..
Mr. Smith
Mrs. Jenn Bewlite
Ms Carmichael
Dr Ivan James
Mrs Holly Alva Beswol
Mrs James Sheepwool
Mr. Hitchcock
How do I display lines that have the letter H followed later on the line by the letter o.
I have tried to use the following commands, but it didn't work.. maybe I have a typo.
$ egrep -w 'H|o' /home/liveuser/people
$ grep "H|o" people
You do not have a typo. What you are trying to do is called
"regular expressions".
How do I display lines that have the letter H followed later on the line by the letter o.
I assume you want to match H case insensitive, with any characters following until the character "o". Please post a comment to correct me if I've misunderstood.
grep -i "h.*o" /home/liveuser/people
This will do the trick in your case.
You can replace /home/liveuser/people with whatever path you want. I see you try to use just "people" which is fine if your current directory is /home/liveuser
Note that you can use "egrep" too.
Explanation
The -i flag makes grep case insensitive
As for the "h.*o" there are 4 things going on:
The "h" matches the character "h"
The "." matches any character. (Except for newline)
The "*" makes it so that the previous character or expression is matched multiple times. In this case it matches any amount of characters that isn't newline
The "o" makes sure that we have an "o" after the "h" with any characters in between.
A link to a page explaining regex in grep: Regular Expressions in grep
You have the wrong regexp syntax. Use H.*o, not H|o.
Like this?:
$ grep "H.*o" foo
Mrs Holly Alva Beswol
Mr. Hitchcock
Related
I am relatively new to linux I want to search a pattern in a file which starts with "Leonard is" and ends on "champion"
Also this pattern might be placed in multiple lines
the input file(input.txt) may look like:
1 rabbit eats carrot Leonard is a champion
2 loin is the king of
3 jungle Leonard is a
4 Champion
5 Leonard is An exemplary
6 Champion
i would want to have all the occurrences of my pattern ignoring all the other characters other than the pattern in the output file:
1 Leonard is a champion
3 Leonard is a
4 Champion
5 Leonard is An exemplary
6 Champion
i have been very close with the following command:
cat input.txt | grep -ioE "Leonard.*Champion$"
as this command only returns
1 Leonard is a champion
ignoring all the patterns occurring in multiple line
if any other approach of searching other than grep is useful kindly let me know Thanks!!
Perl to the rescue:
perl -l -0777 -e 'print for <> =~ /(.*Leonard(?s:.*?)[Cc]hampion.*)/g' -- input.txt
-l adds newlines to prints
-0777 reads the whole file instead of processing it line by line
the diamond operator <> reads the input
.*? is like .*, i.e. it matches anything, but the ? means the shortest possible match is enough. That prevents the regex from matching everything between the first Leonard and last Champion.
. in a regex doesn't match a newline normally, but it does with the s modifier. (?s:.*?) localizes the changed behaviour, so other dots still don't match newlines.
You're looking for \s which stands for whitespace. + stands for one or more
Pattern: Leonard is a\s+Champion
See: https://regex101.com/r/qiNXhf/1
I use this tool with 0 knowledge of regex in my mind, and it helps me a lot. See the notes on the right bottom, where all these signs are explained.
The "." is referenced as "any character except new line", therefore, what you're trying to achieve with . is not possible, I suggest using \s with an addition of * or + as well (as suggested above), but need to find out how to implement it with the "grep" reg expression. There are also nice tools for regex testing - https://regexr.com/ for example.
One of my elder brother who is studying in Statistics. Now, he is writing his thesis paper in LaTeX. Almost all contents are written for the paper. And he took 5 number after point(e.g. 5.55534) for each value those are used for his calculation. But, at the last time his instructor said to change those to 3 number after point(e.g. 5.555) which falls my brother in trouble. Finding and correcting those manually is not easy. So, he told me to help.
I believe there is also a easy solution which is know to me. The snapshot of a portion of the thesis looks like-
&se($\hat\beta_1$)&0.35581&0.35573&0.35573\\
&mse($\hat\beta_1$)&.12945&.12947&.12947\\
\addlinespace
&$\hat\beta_2$&0.03329&0.03331&0.03331 \\
&se($\hat\beta_2$)&0.01593&0.01592&0.01591\\
&mse($\hat\beta_2$)&.000265&.000264&.000264 \\
\midrule
{n=100} & $\hat\beta_1$&-.52006&-.52001&-.51946\\
&se($\hat\beta_1$)&.22819&.22814&.22795\\
&mse($\hat\beta_1$)&.05247&.05244&.05234\\
\addlinespace
&$\hat\beta_2$&0.03134&0.03134&0.03133 \\
&se($\hat\beta_2$)&0.00979&0.00979&0.00979\\
&mse($\hat\beta_2$)&.000098&.000098&.000098
I want -
&se($\hat\beta_1$)&0.355&0.355&0.355\\
&mse($\hat\beta_1$)&.129&.129&.129\\
......................................................................
........................................................................
........................................................................
Note: Don't feel boring for the syntax(These are LaTeX syntax).
If anybody has solution or suggestion, please provide. Thank you.
In sed:
$ sed 's/\(\.[0-9]\{3\}\)[0-9]*/\1/g' file
&se($\hat\beta_1$)&0.355&0.355&0.355\\
&mse($\hat\beta_1$)&.129&.129&.129\\
ie. replace period starting numeric strings with at least 3 numbers with the leading period and three first numbers.
Here is the command in vim:
:%s/\.\d\{3}\zs\d\+//g
Explanation:
: entering command-mode
% is the range of all lines of the file
s substitution command
\.\d\{3}\zs\d\+ pattern you would like to change
\. literal point (.)
\d\{3} match 3 consecutive digits
\zs start substitution from here
\d\+ one or more digits
g Replace all occurrences in the line
Concerning grep and cat they have nothing to do with replacing text. These commands are only for searching and printing contents of files.
Instead, what you are looking is substitution there are lots of commands in Linux that can do that mainly sed, perl, awk, ex etc.
I want to extract the first instance of a string per line in linux. I am currently trying grep but it yields all the instances per line. Below I want the strings (numbers and letters) after "tn="...but only the first set per line. The actual characters could be any combination of numbers or letters. And there is a space after them. There is also a space before the tn=
Given the following file:
hello my name is dog tn=12g3 fun 23k3 hello tn=1d3i9 cheese 234kd dks2 tn=6k4k ksk
1263 chairs are good tn=k38493kd cars run vroom it95958 tn=k22djd fair gold tn=293838 tounge
Desired output:
12g3
k38493
Here's one way you can do it if you have GNU grep, which (mostly) supports Perl Compatible Regular Expressions with -P. Also, the non-standard switch -o is used to only print the part matching the pattern, rather than the whole line:
grep -Po '^.*?tn=\K\S+' file
The pattern matches the start of the line ^, followed by any characters .*?, where the ? makes the match non-greedy. After the first match of tn=, \K "kills" the previous part so you're only left with the bit you're interested in: one or more non-space characters \S+.
As in Ed's answer, you may wish to add a space before tn to avoid accidentally matching something like footn=.... You might also prefer to use something like \w to match "word" characters (equivalent to [[:alnum:]_]).
Just split the input in tn=-separators and pick the second one. Then, split again to get everything up to the first space:
$ awk -F"tn=" '{split($2,a, " "); print a[1]}' file
12g3
k38493kd
$ awk 'match($0,/ tn=[[:alnum:]]+/) {print substr($0,RSTART+4,RLENGTH-4)}' file
12g3
k38493kd
I want to find the word 'on' as a prefix or suffix of a string, but not where it is in the middle.
As an example,
I have a text which has words like 'on', 'one', 'cron', 'stone'. I want to find lines which contains exact word 'on' and also words like 'one' and 'cron', but it should not match stone.
I'm surprised nobody has proposed the simple, obvious
grep -E '\<on|on\>' files ...
The metacharacter sequences \< and \> match a left and right word boundary, respectively. I believe it should be portable to any modern platform (though I would be unsurprised if Solaris, HP-UX, or AIX required some tweaks in order to get it to work).
If you've got GNU grep or BSD grep, then it is relatively straight-forward:
grep -E '\b(on[[:alpha:]]*|[[:alpha:]]*on)\b'
This looks for a word boundary followed by 'on' and zero or more alphabetic characters, or for zero or more alphabetic characters followed by 'on', followed by a word boundary.
For example, given the data:
on line should be selected
cron line should be selected
stone line should not be selected
station wagon
onwards, ever onwards.
on24 is not selected
24on is not selected
Example run:
$ grep -E '\b(on[[:alpha:]]*|[[:alpha:]]*on)\b' data
on line should be selected
cron line should be selected
station wagon
onwards, ever onwards.
$
With a strict POSIX-compatible grep, you would have to work a lot harder, if it can be done at all.
Note that this solution is assuming that mixed digits and letters are not a 'word' in this context (so neither on24 nor 24on should be selected). If you don't mind digits appearing as part of a word starting or ending 'on', then you can use either of two other answers:
triplee's answer
alfasin's answer
or you can hack this one into shape so it does what one of theirs does.
You can use egrep (regex) in order to catch the exact phrases: by using \b (word boundary) you can make sure to not catch anything else other than the required 3 words:
egrep -e '\b(on|one|cron)\b' <filename>
UPDATE:
Since the question was edited & clarified that the OP is looking to have on "as a prefix or suffix of a string":
egrep -e '\bon|on\b' <filename>
If you're just going 'all out' and searching for anything with the substring 'on' in it (leaving out 'stone')...
grep '[A-Za-z]on[A-Za-z]' <your file name> | grep -v 'stone'
piping into the grep command again will hide any of the results that were 'stone'
I'm trying to write a grep (or egrep) command that will find and print any lines in "words.txt" which contain the same lower-case letter three times in a row. The three occurrences of the letter may appear consecutively (as in "mooo") or separated by one or more spaces (as in "x x x") but not separated by any other characters.
words.txt contains:
The monster said "grrr"!
He lived in an igloo only in the winter.
He looked like an aardvark.
Here's what I think the command should look like:
grep -E '\b[^ ]*[[:alpha:]]{3}[^ ]*\b' 'words.txt'
Although I know this is wrong, but I don't know enough of the syntax to figure it out. Using grep, could someone please help me?
Does this work for you?
grep '\([[:lower:]]\) *\1 *\1'
It takes a lowercase character [[:lower:]] and remembers it \( ... \). It than tries to match any number of spaces _* (0 included), the rememberd character \1, any number of spaces, the remembered character. And that's it.
You can try running it with --color=auto to see what parts of the input it matched.
Try this. Note that this will not match "mooo", as the word boundary (\b) occurs before the "m".
grep -E '\b([[:alpha:]]) *\1 *\1 *\b' words.txt
[:alpha:] is an expression of a character class. To use as a regex charset, it needs the extra brackets. You may have already known this, as it looks like you started to do it, but left the open bracket unclosed.