How do I use basic grep commands in Unix? - linux

I need to display all the lines using the grep command that contain 2-6 'x's
Also need to know how to display all lines with 3 consecutive 'x's
I have tried grep x{2,6} example.txt but I keep getting an error saying that x6 is not found in the directory. My example file contains 7 lines increasing in the amount of 'x's by one in each line.

The Bash shell uses Brace Expansion to expand:
grep x{2,6} example.txt
into:
grep x2 x6 example.txt
Unless you have a file called x6 in your directory, you will get an error from grep telling you it can't open it.
Rule 1: enclose regular expressions to grep inside quotes — single quotes whenever possible.
Hence, use:
grep 'x{2,6}' example.txt
This deals with getting a regex to grep. Now we need to consider what it means. By default, this means look for the characters x, {, 2, ,, 6, } on a single line. Adding the -E option uses extended regular expressions, and the command looks for anything from 2 to 6 consecutive x's on a single line in the file:
grep -E 'x{2,6}' example.txt
However, it might be worth noting that this is pretty much the same as selecting 'xx' unless you have colouration on, or are selecting 'only' the matched text (the GNU grep extension -o option).
These are all for 2-6 adjacent x's, which is roughly what your proposed regex wanted.
You ask about three adjacent x's:
grep 'xxx' example.txt
The single quotes aren't 100% necessary, but they do no harm and remind you to use them for the regex in general.
Now we face the dilemma that you probably meant "between 2 and 6 x's on a single line, not necessarily adjacent, and not 0 or 1, nor 7 or more".
Rule 2: describe your required result precisely
Imprecise requirements lead to incorrect, or unintended, results. Meeting that requirement needs a more complex regex:
grep -E '^([^x]*x){2,6}[^x]*$' example.txt
That looks for 2-6 occurrences of zero or more non-x's followed by an x at the start of the line, followed by zero or more non-x's up to the end of line.

I need to display all the lines using GREP command that contain 2-6 'x's
grep -P '^(?:[^x]*x[^x]*){2,6}$' file
Also need to know how to display all lines with 3 consecutive 'x's
grep -P 'xxx' file

Related

Grep for specific numbers within a text file and output per number text file

I have a text file chunk_names.txt that looks like this:
chr1_12334_64321
chr1_134435_77474
chr10_463252_74754
chr10_54265_423435
chr13_5464565_547644567
This is an example but all chromosomes are represented (1...22, X and Y). All entries follow the same formatchr{1..22, X or Y}_*string of numbers*__*string of numbers*.
I would like to split these into per chromosome files e.g. all of the chunks starting chr10 to be put into a file called chr10.txt:
In Linux I have tried :
for i in {1..22}
do
grep chr$i chunk_names.txt > chr$i.txt
done
However, the chr1.txt output file now contains all the chromosome chunks with 1 in them (1,10,11,12, etc).
How would I modify this script to separate out the chromosomes?
I also haven't tackled how to include chromosome X or Y within the same script and am currently running that separately
Things I have tried :
grep -o gives me just "chr$i" as an output
grep 'chr$i' gives me blank files
grep "chr$i" has the initial problem
Many thanks for your time.
Your 'for' loop will mean parsing your file N times (where N is the number of chromosomes/contigs in your list). Here's an agnostic approach using awk that will parse the file just once:
awk -F '_' '{ print > $1 ".txt" }' chunk_names.txt
If you include the _ following the number you can distinguish between chr1_ and e.g. chr10_. To include X and Y, simply include these in the loop
for i in {1..22} X Y
do
grep "chr${i}_" chunk_names.txt > chr$i.txt
done
To search at the beginning of the line only you can add a leading ^ to the pattern
grep "^chr${i}_" chunk_names.txt > chr$i.txt
Explanation about your attempts:
grep chr$i searches for the pattern anywhere in the line. The shell replaces $i with the value of the variable i, so you get chr1, chr2 etc.
If you enclose the pattern in double quotes as grep "chr$i" the shell will not do any file name globbing or splitting of the string, but still expand variables. In your case it is the same as without quotes.
If you use single quotes, the shell takes the literal string as is, so you always search for a line that contains chr$i (instead of chr1 etc.) which does not occur in your file.
Explanation about quotes:
The quotes in my proposed solution are not necessary in your case, but it is a good habit to quote everything. If your pattern would contain spaces or characters that are special to the shell, the quoting will make a difference.
Example:
If your file would contain a chr1* instead of the chr1_, the pattern chr${i}* would be replaced by the list of matching files.
When you already created your output files chr1.txt etc., try these commands
$ i=1; echo chr$i*
chr10.txt chr11.txt chr12.txt chr13.txt chr14.txt chr15.txt chr16.txt chr17.txt chr18.txt chr19.txt chr1.txt
$ i=1; echo "chr$i*"
chr1*
In the first case, the grepcommand
grep chr${i}* chunk_names.txt
would be expanded as
grep chr10.txt chr11.txt chr12.txt chr13.txt chr14.txt chr15.txt chr16.txt chr17.txt chr18.txt chr19.txt chr1.txt chunk_names.txt
which would search for the pattern chr10.txt in files chr11.txt ... chr1.txt and chunk_names.txt.

Linux tail command includes more lines than intended

so I want to get a little into Linux scripting and started by a simple example in a book. In this book, the author wants me to grab the five lines before "Step #6: Configure output plugins" from snort.conf.
Analogous to the author I determined where the line is that I want, which returns 445 for me. If I then use tail the result returns more text than I expect and the searched line that should be in line 5 is at line 88. I fail to understand how I use the tail command and start at the specific line but then more text is included.
To search for the line I used
nl /etc/snort/snort.conf | grep output.
To get the 5 lines before including the searched line:
tail -n+440 /etc/snort/snort.conf | head -n+6
where as the tail statement seems to be the problem. Any help is appreciated on why my answer is not working!
Your tail command is correct in principle.
The problem lies in the way in which you acquire the line number using nl. The nl command does not count empty lines by default, while the tail command does. You should specify in your nl command that you want to count the empty lines as well, which you can do using the -b, (body-numbering) option and specify a as your style. This would look as follows:
nl -ba /etc/snort/snort.conf | grep output.
From nl --help:
Usage: nl [OPTION]... [FILE]...
Write each FILE to standard output, with line numbers added.
With no FILE, or when FILE is -, read standard input.
Mandatory arguments to long options are mandatory for short options too.
-b, --body-numbering=STYLE use STYLE for numbering body lines
[...]
By default, selects -v1 -i1 -l1 -sTAB -w6 -nrn -hn -bt -fn. CC are
two delimiter characters for separating logical pages, a missing
second character implies :. Type \\ for \. STYLE is one of:
a number all lines
t number only nonempty lines
Number all lines and use that line number in tail.
Hello in trying the same with same book that you are using but I didn’t find any great solution with tail or nl but i come up with simple grep switch -B and -A before and after switches for grep.
I achieved this issue by typing
grep -B 5 “Step #6: Configure output plugins “ /etc/snort/snort.conf
After that you will gonna get 5 lines before that line same for After -A for after lines.
Hope this will help someone staysafe happy learning 🙂

Line numbering in Grep

I have command in Grep:
cat nastava.html | grep '<td>[A-Z a-z]*</td><td>[0-9/]*</td>' | sed 's/[ \t]*<td>\([A-Z a-z]*\)<\/td><td>\([0-9]\{1,3\}\)\/[0-9]\{2\}\([0-9]\{2\}\)<\/td>.*/\1 mi\3\2 /'
|sort|grep -n ".*" | sed -r 's/(.*):(.*)/\1. \2/' >studenti.txt
I don't understand second line, sort is ok, grep -n means to num that sorted list, but why do we use here ".*"? It won't work without it, and i don't understand why.
The grep is used purely for the side effect of the line numbering with the -n option here, so the main thing is really to use a regular expression which matches all the input lines. As such, .* is not very elegant -- ^ would work without scanning every line, and $ trivially matches every line as well. Since you know the input lines are not empty, thus contain at least one character, the simple regular expression . would work perfectly, too.
However, as the end goal is to perform line numbering, a better solution is to use a dedicated tool for this purpose.
... | sort | nl -ba -s '. '
The -ba option specifies to number all lines (the default is to only add a line number to non-empty lines; we know there are no empty lines, so it's not strictly necessary here, but it's good to know) and the -s option specifies the separator string to put after the number.
A possible minor complication is that the line number format is whitespace-padded, so in the end, this solution may not work for you if you specifically want unpadded numbers. (But a sed postprocessor to fix that up is a lot simpler than the postprocessor for grep you have now -- just sed 's/^ *//' will remove leading whitespace).
... As an aside, the ugly cat | grep | sed pipeline can be abbreviated to just
sed -n 's%[ \t]*<td>\([A-Z a-z]*\)</td><td>\([0-9]\{1,3\}\)/[0-9]\{2\}\([0-9]\{2\}\)</td>.*%\1 mi\3\2 %p' nastava.html
The cat was never necessary in the first place, and the sed script can easily be refactored to only print when a substitution was performed (your grep regular expression was not exactly equivalent to the one you have in the sed script but I assume that was the intent). Also, using a different separator avoids having to backslash the slashes.
... And of course, if nastava.html is your own web page, the whole process is umop apisdn. You should have the students results in a machine-readable form, and generate a web page from that, rather than the other way around.
grep needs a regular expression to match. You can't run grep with no expression at all. If you want to number all the lines, just specify an expression that matches anything. I'd probably use ^ instead of .*.

Is there a way to put the following logic into a grep command?

For example suppose I have the following piece of data
ABC,3,4
,,ExtraInfo
,,MoreInfo
XYZ,6,7
,,XyzInfo
,,MoreXyz
ABC,1,2
,,ABCInfo
,,MoreABC
It's trivial to get grep to extract the ABC lines. However if I want to also grab the following lines to produce this output
ABC,3,4
,,ExtraInfo
,,MoreInfo
ABC,1,2
,,ABCInfo
,,MoreABC
Can this be done using grep and standard shell scripting?
Edit: Just to clarify there could be a variable number of lines in between. The logic would be to keep printing while the first column of the CSV is empty.
grep -A 2 {Your regex} will output the two lines following the found strings.
Update:
Since you specified that it could be any number of lines, this would not be possible as grep focuses on matching on a single line see the following questions:
How can I search for a multiline pattern in a file?
Regex (grep) for multi-line search needed
Why can't i match the pattern in this case?
Selecting text spanning multiple lines using grep and regular expressions
You can use this, although a bit hackity due to the grep at the end of the pipeline to mute out anything that does not start with 'A' or ',':
$ sed -n '/^ABC/,/^[^,]/p' yourfile.txt| grep -v '^[^A,]'
Edit: A less hackity way is to use awk:
$ awk '/^ABC/ { want = 1 } !/^ABC/ && !/^,/ { want = 0 } { if (want) print }' f.txt
You can understand what it does if you read out loud the pattern and the thing in the braces.
The manpage has explanations for the options, of which you want to look at -A under Context Line Control.

Highlight text similar to grep, but don't filter out text [duplicate]

This question already has answers here:
Colorized grep -- viewing the entire file with highlighted matches
(24 answers)
Closed 7 years ago.
When using grep, it will highlight any text in a line with a match to your regular expression.
What if I want this behaviour, but have grep print out all lines as well? I came up empty after a quick look through the grep man page.
Use ack. Checkout its --passthru option here: ack. It has the added benefit of allowing full perl regular expressions.
$ ack --passthru 'pattern1' file_name
$ command_here | ack --passthru 'pattern1'
You can also do it using grep like this:
$ grep --color -E '^|pattern1|pattern2' file_name
$ command_here | grep --color -E '^|pattern1|pattern2'
This will match all lines and highlight the patterns. The ^ matches every start of line, but won't get printed/highlighted since it's not a character.
(Note that most of the setups will use --color by default. You may not need that flag).
You can make sure that all lines match but there is nothing to highlight on irrelevant matches
egrep --color 'apple|' test.txt
Notes:
egrep may be spelled also grep -E
--color is usually default in most distributions
some variants of grep will "optimize" the empty match, so you might want to use "apple|$" instead (see: https://stackoverflow.com/a/13979036/939457)
EDIT:
This works with OS X Mountain Lion's grep:
grep --color -E 'pattern1|pattern2|$'
This is better than '^|pattern1|pattern2' because the ^ part of the alternation matches at the beginning of the line whereas the $ matches at the end of the line. Some regular expression engines won't highlight pattern1 or pattern2 because ^ already matched and the engine is eager.
Something similar happens for 'pattern1|pattern2|' because the regex engine notices the empty alternation at the end of the pattern string matches the beginning of the subject string.
[1]: http://www.regular-expressions.info/engine.html
FIRST EDIT:
I ended up using perl:
perl -pe 's:pattern:\033[31;1m$&\033[30;0m:g'
This assumes you have an ANSI-compatible terminal.
ORIGINAL ANSWER:
If you're stuck with a strange grep, this might work:
grep -E --color=always -A500 -B500 'pattern1|pattern2' | grep -v '^--'
Adjust the numbers to get all the lines you want.
The second grep just removes extraneous -- lines inserted by the BSD-style grep on Mac OS X Mountain Lion, even when the context of consecutive matches overlap.
I thought GNU grep omitted the -- lines when context overlaps, but it's been awhile so maybe I remember wrong.
You can use my highlight script from https://github.com/kepkin/dev-shell-essentials
It's better than grep cause you can highlight each match with it's own color.
$ command_here | highlight green "input" | highlight red "output"
Since you want matches highlighted, this is probably for human consumption (as opposed to piping to another program for instance), so a nice solution would be to use:
less -p <your-pattern> <your-file>
And if you don't care about case sensitivity:
less -i -p <your-pattern> <your-file>
This also has the advantage of having pages, which is nice when having to go through a long output
You can do it using only grep by:
reading the file line by line
matching a pattern in each line and highlighting pattern by grep
if there is no match, echo the line as is
which gives you the following:
while read line ; do (echo $line | grep PATTERN) || echo $line ; done < inputfile
If you want to print "all" lines, there is a simple working solution:
grep "test" -A 9999999 -B 9999999
A => After
B => Before
If you are doing this because you want more context in your search, you can do this:
cat BIG_FILE.txt | less
Doing a search in less should highlight your search terms.
Or pipe the output to your favorite editor. One example:
cat BIG_FILE.txt | vim -
Then search/highlight/replace.
If you are looking for a pattern in a directory recursively, you can either first save it to file.
ls -1R ./ | list-of-files.txt
And then grep that, or pipe it to the grep search
ls -1R | grep --color -rE '[A-Z]|'
This will look of listing all files, but colour the ones with uppercase letters. If you remove the last | you will only see the matches.
I use this to find images named badly with upper case for example, but normal grep does not show the path for each file just once per directory so this way I can see context.
Maybe this is an XY problem, and what you are really trying to do is to highlight occurrences of words as they appear in your shell. If so, you may be able to use your terminal emulator for this. For instance, in Konsole, start Find (ctrl+shift+F) and type your word. The word will then be highlighted whenever it occurs in new or existing output until you cancel the function.

Resources