How to get lines that don't contain certain patterns - linux

I have a file that contain many lines :
ABRD0455252003666
JLKS8568875002886
KLJD2557852003625
.
.
.
AION9656532007525
BJRE8242248007866
I want to extract the lines that start with (ABRD or AION) and in column 12 to 14 the numbers (003 or 007).
The output should be
KLJD2557852003625
BJRE8242248007866
I have tried this and it works but it s too long command and I want to optimise it for performance concerns:
egrep -a --text '^.{12}(?:003|007)' file.txt > result.txt |touch results.txt && chmod 777 results.txt |egrep -v -a --text "ABRD|AION" result.txt > result2.text

The -a option is a non-standard extension for dealing with binary files, it is not needed for text files.
grep -E '^.{11}(003|007)' file.txt | grep -Ev '^(ABRD|AION)'
The first stage matches any line with either 003 or 007 in the twelfth through fourteenth column.
The second stage filters out removes any line starting with either ABRD or AION.

You really need to just read a regexp tutorial but meantime try this:
grep -E "^(ABRD|AION).{7}00[37]"

Related

Grep only a part of a text file

How can I apply the following command to only a part of a text file? For example from the beginning to the line 5000.
grep "^ A : 11 B : 10" filename | wc -l
I cannot use head and then apply the above command since the text file is huge.
You could try using the sed command, which I believe does better for large files, from this question and pipe to grep.
sed -n 1,5000p file | grep ...
You can try combination of -n (prefixing each line of output with line number) and -m (limiting number of matching lines). Something like this:
grep -n -m 5000 pattern file.txt | grep -B 5000 "^5000:" | wc -l
First grep search for pattern, add line numbers and limit output to first 5000 matching lines (worst case scenario: all lines from range match pattern). Second grep match line number 5000, and print all lines before this line.
I don't know if it is more efficient solution

Find line number in a text file - without opening the file

In a very large file I need to find the position (line number) of a string, then extract the 2 lines above and below that string.
To do this right now - I launch vi, find the string, note it's line number, exit vi, then use sed to extract the lines surrounding that string.
Is there a way to streamline this process... ideally without having to run vi at all.
Maybe using grep like this:
grep -n -2 your_searched_for_string your_large_text_file
Will give you almost what you expect
-n : tells grep to print the line number
-2 : print 2 additional lines (and the wanted string, of course)
You can do
grep -C 2 yourSearch yourFile
To send it in a file, do
grep -C 2 yourSearch yourFile > result.txt
Use grep -n string file to find the line number without opening the file.
you can use cat -n to display the line numbers and then use awk to get the line number after a grep in order to extract line number:
cat -n FILE | grep WORD | awk '{print $1;}'
although grep already does what you mention if you give -C 2 (above/below 2 lines):
grep -C 2 WORD FILE
You can do it with grep -A and -B options, like this:
grep -B 2 -A 2 "searchstring" | sed 3d
grep will find the line and show two lines of context before and after, later remove the third one with sed.
If you want to automate this, simple you can do a Shell Script. You may try the following:
#!/bin/bash
VAL="your_search_keyword"
NUM1=`grep -n "$VAL" file.txt | cut -f1 -d ':'`
echo $NUM1 #show the line number of the matched keyword
MYNUMUP=$["NUM1"-1] #get above keyword
MYNUMDOWN=$["NUM1"+1] #get below keyword
sed -n "$MYNUMUP"p file.txt #display above keyword
sed -n "$MYNUMDOWN"p file.txt #display below keyword
The plus point of the script is you can change the keyword in VAL variable as you like and execute to get the needed output.

Bash Sorting STDIN

I want to write a bash script that sorts the input by rules in different files. The first rule is to write all chars or strings in file1. The second rule is to write all numbers in file2. The third rule is to write all alphanumerical strings in file3. All specials chars must be ignored. Because I am not familiar with bash I don t know how to realize this.
Could someone help me?
Thanks,
Haniball
Thanks for the answers,
I wrote this script,
#!/bin/bash
inp=0 echo "Which filename for strings?"
read strg
touch $strg
echo "Which filename for nums?"
read nums
touch $nums
echo "Which filename for alphanumerics?"
read alphanums
touch $alphanums
while [ "$inp" != "quit" ]
do
echo "Input: "
read inp
echo $inp | grep -o '\<[a-zA-Z]+>' > $strg
echo $inp | grep -o '\<[0-9]>' > $nums
echo $inp | grep -o -E '\<[0-9]{2,}>' > $nums
done
After I ran it, it only writes string in the stringfile.
Greetings, Haniball
Sure can help. See here:
How To Ask Questions The Smart Way
Help Vampires: A Spotter’s Guide
cool site about the bash is here: http://wiki.bash-hackers.org/doku.php
for sorting try man sort
for pattern matching try man grep
other useful tools: man sed man awk man strings man tee
And it is always correct tag your homework as "homework" ;)
You can try something like:
<input_file strings -1 -a | tee chars_and_strings.txt |\
grep "^[A-Za-z0-9][A-Za-z0-9]*$" | tee alphanum.txt |\
grep "^[0-9][0-9]*$" > numonly.txt
The above is only for USA - no international (read unicode) chars, where things coming a little bit more complicated.
grep is sufficient (your question is a bit vague. If I got something wrong, let me know...)
Using the following input file:
this is a string containing words,
single digits as in 1 and 2 as well
as whole numbers 42 1066
all chars or strings
$ grep -o '\<[a-zA-Z]\+\>' sorting_input
this
is
a
string
containing
words
single
digits
as
in
and
as
well
all single digit numbers
$ grep -o '\<[0-9]\>' sorting_input
1
2
all multiple digit numbers
$ grep -o -E '\<[0-9]{2,}\>' sorting_input
42
1066
Redirect the output to a file, i.e. grep ... > file1
Bash really isn't the best language for this kind of task. While possible, ild highly recommend the use of perl, python, or tcl for this.
That said, you can write all of stdin from input to a temporary file with shell redirection. Then, use a command like grep to output matches to another file. It might look something like this.
#!/bin/bash
cat > temp
grep pattern1 > file1
grep pattern2 > file2
grep pattern3 > file3
rm -f temp
Then run it like this:
cat file_to_process | ./script.sh
I'll leave the specifics of the pattern matching to you.

Pipe output to use as the search specification for grep on Linux

How do I pipe the output of grep as the search pattern for another grep?
As an example:
grep <Search_term> <file1> | xargs grep <file2>
I want the output of the first grep as the search term for the second grep. The above command is treating the output of the first grep as the file name for the second grep. I tried using the -e option for the second grep, but it does not work either.
You need to use xargs's -i switch:
grep ... | xargs -ifoo grep foo file_in_which_to_search
This takes the option after -i (foo in this case) and replaces every occurrence of it in the command with the output of the first grep.
This is the same as:
grep `grep ...` file_in_which_to_search
Try
grep ... | fgrep -f - file1 file2 ...
If using Bash then you can use backticks:
> grep -e "`grep ... ...`" files
the -e flag and the double quotes are there to ensure that any output from the initial grep that starts with a hyphen isn't then interpreted as an option to the second grep.
Note that the double quoting trick (which also ensures that the output from grep is treated as a single parameter) only works with Bash. It doesn't appear to work with (t)csh.
Note also that backticks are the standard way to get the output from one program into the parameter list of another. Not all programs have a convenient way to read parameters from stdin the way that (f)grep does.
I wanted to search for text in files (using grep) that had a certain pattern in their file names (found using find) in the current directory. I used the following command:
grep -i "pattern1" $(find . -name "pattern2")
Here pattern2 is the pattern in the file names and pattern1 is the pattern searched for
within files matching pattern2.
edit: Not strictly piping but still related and quite useful...
This is what I use to search for a file from a listing:
ls -la | grep 'file-in-which-to-search'
Okay breaking the rules as this isn't an answer, just a note that I can't get any of these solutions to work.
% fgrep -f test file
works fine.
% cat test | fgrep -f - file
fgrep: -: No such file or directory
fails.
% cat test | xargs -ifoo grep foo file
xargs: illegal option -- i
usage: xargs [-0opt] [-E eofstr] [-I replstr [-R replacements]] [-J replstr]
[-L number] [-n number [-x]] [-P maxprocs] [-s size]
[utility [argument ...]]
fails. Note that a capital I is necessary. If i use that all is good.
% grep "`cat test`" file
kinda works in that it returns a line for the terms that match but it also returns a line grep: line 3 in test: No such file or directory for each file that doesn't find a match.
Am I missing something or is this just differences in my Darwin distribution or bash shell?
I tried this way , and it works great.
[opuser#vjmachine abc]$ cat a
not problem
all
problem
first
not to get
read problem
read not problem
[opuser#vjmachine abc]$ cat b
not problem xxy
problem abcd
read problem werwer
read not problem 98989
123 not problem 345
345 problem tyu
[opuser#vjmachine abc]$ grep -e "`grep problem a`" b --col
not problem xxy
problem abcd
read problem werwer
read not problem 98989
123 not problem 345
345 problem tyu
[opuser#vjmachine abc]$
You should grep in such a way, to extract filenames only, see the parameter -l (the lowercase L):
grep -l someSearch * | xargs grep otherSearch
Because on the simple grep, the output is much more info than file names only. For instance when you do
grep someSearch *
You will pipe to xargs info like this
filename1: blablabla someSearch blablabla something else
filename2: bla someSearch bla otherSearch
...
Piping any of above line makes nonsense to pass to xargs.
But when you do grep -l someSearch *, your output will look like this:
filename1
filename2
Such an output can be passed now to xargs
I have found the following command to work using $() with my first command inside the parenthesis to have the shell execute it first.
grep $(dig +short) file
I use this to look through files for an IP address when I am given a host name.

Linux using grep to print the file name and first n characters

How do I use grep to perform a search which, when a match is found, will print the file name as well as the first n characters in that file? Note that n is a parameter that can be specified and it is irrelevant whether the first n characters actually contains the matching string.
grep -l pattern *.txt |
while read line; do
echo -n "$line: ";
head -c $n "$line";
echo;
done
Change -c to -n if you want to see the first n lines instead of bytes.
You need to pipe the output of grep to sed to accomplish what you want. Here is an example:
grep mypattern *.txt | sed 's/^\([^:]*:.......\).*/\1/'
The number of dots is the number of characters you want to print. Many versions of sed often provide an option, like -r (GNU/Linux) and -E (FreeBSD), that allows you to use modern-style regular expressions. This makes it possible to specify numerically the number of characters you want to print.
N=7
grep mypattern *.txt /dev/null | sed -r "s/^([^:]*:.{$N}).*/\1/"
Note that this solution is a lot more efficient that others propsoed, which invoke multiple processes.
There are few tools that print 'n characters' rather than 'n lines'. Are you sure you really want characters and not lines? The whole thing can perhaps be best done in Perl. As specified (using grep), we can do:
pattern="$1"
shift
n="$2"
shift
grep -l "$pattern" "$#" |
while read file
do
echo "$file:" $(dd if="$file" count=${n}c)
done
The quotes around $file preserve multiple spaces in file names correctly. We can debate the command line usage, currently (assuming the command name is 'ngrep'):
ngrep pattern n [file ...]
I note that #litb used 'head -c $n'; that's neater than the dd command I used. There might be some systems without head (but they'd pretty archaic). I note that the POSIX version of head only supports -n and the number of lines; the -c option is probably a GNU extension.
Two thoughts here:
1) If efficiency was not a concern (like that would ever happen), you could check $status [csh] after running grep on each file. E.g.: (For N characters = 25.)
foreach FILE ( file1 file2 ... fileN )
grep targetToMatch ${FILE} > /dev/null
if ( $status == 0 ) then
echo -n "${FILE}: "
head -c25 ${FILE}
endif
end
2) GNU [FSF] head contains a --verbose [-v] switch. It also offers --null, to accomodate filenames with spaces. And there's '--', to handle filenames like "-c". So you could do:
grep --null -l targetToMatch -- file1 file2 ... fileN |
xargs --null head -v -c25 --

Resources