similarity between the two files using grep, why this work: grep -i "-ffile1" file2 - linux

I have two files, file1 and file2, and want to find the similarity between the two files using grep.
why -f should flow by the file name without spaces when I surround the -f file1 with quotes?
this will work
grep -i "-ffile1" file2
but this will not work
grep -i "-f file1" file2
but if I remove the quotes these two cases will work
grep -i -ffile1 file2
grep -i -f file1 file2

By convention, one-letter options with arguments can be either presented as two arguments, i.e. either
(1)
Nth parameter to grep : -f
(N+1)st parameter to grep : the filename.
or
(2)
as a single parameter, where the name immedately follows the option letter: -ffilename
In your second attempt, "-f xxxx", you are passing a single parameter (which corresponds to case (2)), but what follows immediately after the option letter, is a space. Hence you specify a file name starting with a space. Such a file does not exist. Therefore, you see a different behaviour in this case.

The -f option can be used in two ways: either by specifying the file name in the next argument, or by specifying the file name in the same argument right after -f.
When you quote the argument like "-f file1", the second case kicks in and grep looks for a file called " file1", with a leading space. Since your file is called "file1", without a leading space, grep fails to find it.
If the file did have a leading space in its name, it would work:
$ echo findthis > " file1"
$ echo findthis > file2
$ grep -nH "-f file1" file2
file2:1:findthis

Related

How to remove lines contained in file 1 from file 2 if in file 2 they are prefixed?

I have the following situation:
source.txt
ID1:email1#domain1.com
ID2:email2#domain2.com
ID3:email3#domain3.com
...
IDs are numeric strings, e.g. 1234, 23412, 897... (one or more digits).
exclude.txt
emailX#domainX.com
emailY#domainY.com
emailZ#domainZ.com
...
i.e. only emails, no IDs.
I want to remove all lines from source.txt which contain emails listed in exclude.txt, preserving the ID:email pairs for the lines which are not removed.
How can I do that with linux command line tools (or simple bash script if needed)?
You can do it easily with awk:
awk -F":" 'NR==FNR{a[$1];next}(!($2 in a))' exclude.txt source.txt
Alternative with grep:
grep -v -F -f exclude.txt source.txt
Use grep with care, since grep does a regex matching. You might need to add also -w option to grep (word matching)

grep: Invalid regular expression

I have a text file which looks like this:
haha1,haha2,haha3,haha4
test1,test2,test3,test4,[offline],test5
letter1,letter2,letter3,letter4
output1,output2,[offline],output3,output4
check1,[core],check2
num1,num2,num3,num4
I need to exclude all those lines that have "[ ]" and output them to another file without all those lines that have "[ ]".
I'm currently using this command:
grep ",[" loaded.txt | wc -l > newloaded.txt
But it's giving me an error:
grep: Invalid regular expression
Use grep -F to treat the search pattern as a fixed string. You could also replace wc -l with grep -c.
grep -cF ",[" loaded.txt > newloaded.txt
If you're curious, [ is a special character. If you don't use -F then you'll need to escape it with a backslash.
grep -c ",\[" loaded.txt > newloaded.txt
By the way, I'm not sure why you're using wc -l anyways...? From your problem description, it sounds like grep -v might be more appropriate. -v inverts grep's normal output, printing lines that don't match.
grep -vF ",[" loaded.txt > newloaded.txt
An alternative method to Grep
It's unclear if you want to remove lines that might contain either bracket [], or only the ones where the brackets specifically surround characters. Regardless of which method you intend to use, sed can easily remove lines that fit a definitive pattern:
To delete only lines that contained both brackets surrounding characters [...]:
sed '/\[.*\]/d' loaded.txt > newloaded.txt
Another approach might be to remove any line that contained either bracket:
sed '/\[/d;/\]/d' loaded.txt > newloaded.txt
(eg. lines containing either [ or ] would be deleted)
Your grep command doesn't seem to be excluding anything. Also, why are you using wc? I thought you want the lines, not their count.
So if you just want the lines, as you say, that don't have [], then this should work:
grep -v "\[" loaded.txt > new.txt
You can also use awk for this:
awk -F\[ 'NF==1' file > newfile
cat newfile
haha1,haha2,haha3,haha4
letter1,letter2,letter3,letter4
num1,num2,num3,num4
Or this:
awk '!/\[/' file

How do I grep in a list of files targeted by a previous grep?

I am using grep to get a list of files that I want to use for another grep search (and not simply piping it).
For example I got as an output:
file1.h:XXX: linecontent
file2.h:XXX: linecontent
file3.h:XXX: linecontent
file4.h:XXX: linecontent
and I want to grep only file1.h, file2.h ...
I'm assuming you want to search for files that contain two different patterns. If so this is what you want:
grep 'your pattern 2' `grep -l 'your pattern 1' *`
The contents of the back quotes will be executed first and the output substituted into the command line. Use of the -l flag will restrict the output of the grep command to just the file names.
If there are a very large number of files that match against your pattern 1 this could fail. The solution for that is to use xargs
grep -l 'your pattern 1' * | xargs grep 'your argument 2'
Assuming what you want is the names of files that contain 'lineofcontent', you could use:
grep -l 'lineofcontent' file*.h

grep -o and display part of filenames using ls

I have a directory which has many directories inside it with the pattern of their name as :
YYYYDDMM_HHMISS
Example: 20140102_120202
I want to extract only the YYYYDDMM part.
I tried ls -l|awk '{print $9}'|grep -o ^[0-9]* and got the answer.
However i have following questions:
Why doesnt this return any results: ls -l|awk '{print $9}'|grep -o [0-9]* . Infact it should have returned all the directories.
Strangely just including '^' before [0-9] works fine :
ls -l|awk '{print $9}'|grep -o ^[0-9]*
Any other(simpler) way to achieve the result?
Why doesnt this return any results: ls -l|awk '{print $9}'|grep -o [0-9]*
If there are files in your current directory that start with [0-9], then the shell will expand them before calling grep. For example, if I have two files a1, a2 and a3 and run this:
ls | grep a*
After the filenames are expanded, the shell will run this:
ls | grep a1 a2 a3
The result of which is that it will print the lines in a2 and a3 that match the text "a1". It will also ignore whatever is coming from stdin, because when you specify filenames for grep (2nd argument and beyond), it will ignore stdin.
Next, consider this:
ls | grep ^a*
Here, ^ has no special meaning to the shell, so it uses it verbatim. Since I don't have filenames starting with ^a, it will use ^a* as the pattern. If I did have filenames like ^asomething or ^another, then again, ^a* would be expanded to those filenames and grep would do something I didn't really intend.
This is why you have to quote search patterns, to prevent the shell from expanding them. The same goes for patterns in find /path -name 'pattern'.
As for a simpler way for what you want, I think this should do it:
ls | sed -ne 's/_.*//p'
To show only the YYDDMM part of the directory names:
for i in ./*; do echo $(basename "${i%%_*}"); done
Not sure what you want to do with it once you've got it though...
You must avoid parsing ls output.
Simple is to use this printf:
printf "%s\n" [0-9]*_[0-9]*|egrep -o '^[0-9]+'

How can I grep in a loop?

I have a file containing text in separate lines.
text1
text2
text3
textN
I have a directory with many files. I want to grep for each line in the of this specific directory. What is an easy way to do this?
There is no need to loop, you can do use grep with the -f option to get patterns from a file:
grep -f pattern_file files*
From man grep:
-f FILE, --file=FILE
Obtain patterns from FILE, one per line. The empty file contains zero
patterns, and therefore matches nothing. (-f is specified by POSIX.)
Test
$ cat a1
hello
how are you?
$ cat a2
bye
hello
$ cat pattern
hello
bye
$ grep -f pattern a*
a1:hello
a2:bye
a2:hello
You can use standard bash loop for this as well :
for i in text*; do grep "pattern" $i; done
or even better option without loop :
grep "pattern" text*
If you press tab after the * then shell will expand it to the files that satisfy the condition.

Resources