List strings not found with grep - linux

I want to search in a file for specific words and then show only the words that were not found. So far, with my limited skills in this area, I can find which words are found:
egrep -w "^bower_components|^npm-debug.log" .gitignore
If my .gitignore file contained bower_components but not npm-debug.log it would return bower_components. I want to find out how to ask it to return only the part of the pattern that was not found. That is, I want it to return only npm-debug.log. I don't want to see all of the text from the file that does not match the search, only the single word from the file that was not found. How do I do that?

I don't have a one liner for you but a short script would work for your use case.
file=$1
shift
for var in "$#"
do
found=`grep "^$var" $file`
if [[ -z $found ]]
then
echo $var
fi
done
Explanation, take the first argument as the file name, any subsequent arguments as strings to test for in the file. Loop through those arguments and test the result of the grep command to see if it returned empty. If it did then the string wasn't found and we print it to the console.
Usage: [script name] .gitignore bower_components npm-debug.log

I don't think this can be done with grep alone; here's a clumsy awk one-liner:
awk -v p1="npm-debug.log" -v p2="bower-component" 'BEGIN{a[p1]=0;a[p2]=0} $0 ~ p1 {a[p1]+=1}; $0 ~ p2 {a[p2]+=1}; END{for(i in a){if( a[i]==0){print i}}}' .gitignore
I'm passing the search pattern into awk as variables, pre-populating an array using the patterns as index, incrementing them if they match and only print ones that didn't have hits (still are 0).

If you have just simple words like in your example, you could do it with two grep commands in one-liner. For example:
file:
first line
_another line
just another line
STRANGE LINE
last line
words:
first
last
row
other
^another
LINE$
we can first get the matched patterns
> grep -owf words file
first
LINE
last
and then we grep -v for them in words. Also add a uniq in the middle to remove any duplicates.
so finally we get patterns from words not matched in file:
> grep -owf words file | uniq | grep -vf - words
row
other
^another

Related

How to sort and print array listing of specific file type in shell

I am trying to write a loop with which I want to extract text file names in all sub-directories and append certain strings to it. Additionally, I want the text file name sorted for numbers after ^.
For example, I have three sub directories mydir1, mydir2, mydir3. I have,
in mydir1,
file223^1.txt
file221^2.txt
file666^3.txt
in mydir2,
file111^1.txt
file4^2.txt
In mydir3,
file1^4.txt
file5^5.txt
The expected result final.csv:
STRINGmydir1file223^1
STRINGmydir1file221^2
STRINGmydir1file666^3
STRINGmydir2file111^1
STRINGmydir2file4^2
STRINGmydir3file1^4
STRINGmydir3file5^5
This is the code I tried:
for dir in my*/; do
array=(${dir}/*.txt)
IFS=$'\n' RGBASE=($(sort <<<"${array[#]}"));
for RG in ${RGBASE[#]}; do
RGTAG=$(basename ${RG/.txt//})
echo "STRING${dir}${RGTAG}" >> final.csv
done
done
Can someone please explain what is wrong with my code? Also, there could be other better ways to do this, but I want to use the for-loop.
The output with this code:
$ cat final.csv
STRINGdir1file666^3.txt
STRINGdir2file4^2.txt
STRINGdir3file5^5.txt
As a starting point which works for your special case, I got a two liner for this.
mapfile -t array < <( find my* -name "*.txt" -printf "STRING^^%H^^%f\n" | cut -d"." -f1 | LANG=C sort -t"^" -k3,3 -k6 )
printf "%s\n" "${array[#]//^^/}"
To restrict the directory depth, you can add -maxdepth with the number of subdirs to search. The find command can also use regex in the search, which is applied to the whole path, which can be used to work on a more complex directory-tree.
The difficulty was the sort on two positions and the delimiter.
My idea was to add a delimiter, which easily can be removed afterwards.
The sort command can only handle one delimiter, therefore I had to use the double hat as delimiter which can be removed without removing the single hat in the filename.
A solution using decorate-sort-undecorate idiom could be:
printf "%s\n" my*/*.txt |
sed -E 's_(.*)/(.*)\^([0-9]+).*_\1\t\3\tSTRING\1\2^\3_' |
sort -t$'\t' -k1,1 -k2,2n |
cut -f3
assuming filenames don't contain tab or newline characters.
A basic explanation: The printf prints each pathname on a separate line. The sed converts the pathname dir/file^number.txt into dir\tnumber\tSTRINGdirfile^number (\t represents a tab character). The aim is to use the tab character as a field separator in the sort command. The sort sorts the lines by the first (lexicographically) and second fields (numerically). The cut discards the first and second fields; the remaining field is what we want.

Read in file line by line and search another file for a line with a partial match

I have a file with partial matches to lines in another file. In order to do this I was looking to generate a while loop with read and substituting a variable for each line of partial matches into a grep command to search a database files with a partial match but for some reason, I am not getting an output (an empty outputfile.txt).
Here is my current script
while read -r line; do
grep $line /path/to/databasefile >> /path/to/folder/outputfile.txt
done < "/partial_matches.txt"
the database has multiple lines with a sequence name then DNA sequence after:
>transcript_ab
AGTCAGTCATGTC
>transcript_ac
AGTCAGTCATGTC
>transctipt_ad
AGTCAGTCATGTC
and the partial matching search file has lines of text:
ab
ac
and I'm looking for a return of:
>transcript_ab
>transcript_ac
any help would be appreciated. Thanks.
If you are using GNU grep, then its -f option is what you are looking for:
grep -f /partial_matches.txt /path/to/databasefile
(if you don't have any pattern in partial_matches.txt but only strings, then use grep -F instead of grep)
you can use a for loop instead:
for i in $(cat partial_matches.txt); do
grep $i /path/to/databasefile >> /path/to/folder/outputfile.txt
done
Also, check if you have a typo:
"/partial_matches.txt" -> "./partial_matches.txt"

Excluding lines from a .csv based on pattern in another .csv

I want to compare values from 2 .csv files at Linux, excluding lines from the first file when its first column (which is always an IP) matches any of the IPs from the second file.
Any way of doing that via command line is good for me (via grep, for example) would be OK by me.
File1.csv is:
10.177.33.157,IP,Element1
10.177.33.158,IP,Element2
10.175.34.129,IP,Element3
10.175.34.130,IP,Element4
10.175.34.131,IP,Element5
File2.csv:
10.177.33.157 < Exists on the first file
10.177.33.158 < Exists on the first file
10.175.34.129 < Exists on the first file
80.10.2.42 < Does not exist on the first file
80.10.3.194 < Does not exist on the first file
Output file desired:
10.175.34.130,IP,Element4
10.175.34.131,IP,Element5
Simply with awk:
awk -F',' 'NR==FNR{ a[$1]; next }!($1 in a)' file2.csv file1.csv
The output:
10.175.34.130,IP,Element4
10.175.34.131,IP,Element5
Use -f option from grep to compare files. -v to invert match. -F for fixed-strings. man grep goes a long way.
-f FILE, --file=FILE
Obtain patterns from FILE, one per line. The empty file contains
zero patterns, and therefore matches nothing. (-f is specified by POSIX.)
-v, --invert-match
Invert the sense of matching, to select non-matching lines. (-v is specified by POSIX.)
-F, --fixed-strings, --fixed-regexp
Interpret PATTERN as a list of fixed strings, separated by newlines, any of which is to be matched. (-F is specified by POSIX,
--fixed-regexp is an obsoleted alias, please do not use it new scripts.)
Result:
$ grep -vFf f2.csv f1.csv
10.175.34.130,IP,Element4
10.175.34.131,IP,Element5

Linux command to grab lines similar between files

I have one file that has one word per line.
I have a second file that has many words per line.
I would like to go through each line in the first file, and all lines for which it is found in the second file, I would like to copy those lines from the second file into a new third file.
Is there a way to do this simply with Linux command?
Edit: Thanks for the input. But, I should specify better:
The first file is just a list of numbers (one number per line).
463463
43454
33634
The second file is very messy, and I am only looking for that number string to be in lines in any way (not necessary an individual word). So, for instance
ewjleji jejeti ciwlt 463463.52%
would return a hit. I think what was suggested to me does not work in this case (please forgive my having to edit for not being detailed enough)
If n is the number of lines in your first file and m is the number of lines in your second file, then you can solve this problem in O(nm) time in the following way:
cat firstfile | while read word; do
grep "$word" secondfile >>thirdfile
done
If you need to solve it more efficiently than that, I don't think there are any builtin utilties for that, however.
As for your edit, this method does work the way you describe.
Here is a short script that will do it. it will take 3 command line arguments 1- file with 1 word per line, 2- file with many lines you want to match for each word in file1 and 3- your output file:
#!/bin/bash
## test input and show usage on error
test -n "$1" && test -n "$2" && test -n "$3" || {
printf "Error: insufficient input, usage: %s file1 file2 file3\n" "${0//*\//}"
exit 1
}
while read line || test -n "$line" ; do
grep "$line" "$2" 1>>"$3" 2>/dev/null
done <"$1"
example:
$ cat words.txt
me
you
them
$ cat lines.txt
This line is for me
another line for me
maybe another for me
one for you
another for you
some for them
another for them
here is one that doesn't match any
$ bash ../lines.sh words.txt lines.txt outfile.txt
$ cat outfile.txt
This line is for me
another line for me
maybe another for me
some for them
one for you
another for you
some for them
another for them
(yes I know that me also matches some in the example file, but that's not really the point.

bash: check if multiple files in a directory contain strings from a list

Folks,
I have a text file which contains multiple lines with one string per line :
str1
str2
str3
etc..
I would like to read every line of this file and then search for those strings inside multiple files located in a different directory.
I am not quite sure how to proceed.
Thanks very much for your help.
awk 'NR==FNR{a[$0];next} { for (word in a) if ($0 ~ word) print FILENAME, $0 }' fileOfWords /wherever/dir/*
for wrd in $(cut -d, -f1 < testfile.txt); do grep $wrd dir/files* ; done
Use the GNU Grep's --file Option
According to grep(1):
-f FILE, --file=FILE
Obtain patterns from FILE, one per line. The empty file
contains zero patterns, and therefore matches nothing. (-f is
specified by POSIX.)
The -H and -n flags will print the filename and line number of each match. So, assuming you store your patterns in /tmp/foo and want to search all files in /tmp/bar, you could use something like:
# Find regular files with GNU find and grep them all using a pattern
# file.
find /etc -type f -exec grep -Hnf /tmp/foo {} +
while read -r str
do
echo "$str"
grep "$str" /path/to/other/files
done < inputfile

Resources