Reading words from an input file and grepping the lines containing the words from another file - linux

I have a file containing list of 4000 words (A.txt). Now I want to grep lines from another file (sentence_per_line.txt) containing those 4000 words mentioned in the file A.txt.
The shell script I wrote for the above problem is
#!/bin/bash
file="A.txt"
while IFS= read -r line
do
# display $line or do somthing with $line
printf '%s\n' "$line"
grep $line sentence_per_line.txt >> output.txt
# tried printing the grep command to check its working or not
result=$(grep "$line" sentence_per_line.txt >> output.txt)
echo "$result"
done <"$file"
And A.txt looks like this
applicable
available
White
Black
..
The code is neither working nor does it shows any error.

Grep has this built in:
grep -f A.txt sentence_per_line.txt > output.txt
Remarks to your code:
Looping over a file to execute grep/sed/awk on each line is typically an antipattern, see this Q&A.
If your $line parameter contains more than one word, you have to quote it (doesn't hurt anyway), or grep tries to look for the first word in a file named after the second word:
grep "$line" sentence_per_line.txt >> output.txt
If you write output in a loop, don't redirect within the loop, do it outside:
while read -r line; do
grep "$line" sentence_per_line.txt
done < "$file" > output.txt
but remember, it's usually not a good idea in the first place.
If you'd like to write to a file and at the same time see what you're writing, you can use tee:
grep "$line" sentence_per_line.txt | tee output.txt
writes to output.txt and stdout.
If A.txt contains words which you want to match only if the complete word matches, i.e., pattern should not match longerpattern, you can use grep -wf – the -w matches only complete words.
If the words in A.txt aren't regular expressions, but fixed strings, you can use grep -fF – the -F option looks for fixed strings and is faster. These two can be combined: grep -WfF

Related

Bash - Piping output of command into while loop

I'm writing a Bash script where I need to look through the output of a command and do certain actions based on that output. For clarity, this command will output a few million lines of text and it may take roughly an hour or so to do so.
Currently, I'm executing the command and piping it into a while loop that reads a line at a time then looks for certain criteria. If that criterion exists, then update a .dat file and reprint the screen. Below is a snippet of the script.
eval "$command"| while read line ; do
if grep -Fq "Specific :: Criterion"; then
#pull the sixth word from the line which will have the data I need
temp=$(echo "$line" | awk '{ printf $6 }')
#sanity check the data
echo "\$line = $line"
echo "\$temp = $temp"
#then push $temp through a case statement that does what I need it to do.
fi
done
So here's the problem, the sanity check on the data is showing weird results. It is printing lines that don't contain the grep criteria.
To make sure that my grep statement is working properly, I grep the log file that contains a record of the text that is output by the command and it outputs only the lines that contain the specified criteria.
I'm still fairly new to Bash so I'm not sure what's going on. Could it be that the command is force feeding the while loop a new $line before it can process the $line that met the grep criteria?
Any ideas would be much appreciated!
How does grep know what line looks like?
if ( printf '%s\n' "$line" | grep -Fq "Specific :: Criterion"); then
But I cant help feel like you are overcomplicating a lot.
function process() {
echo "I can do anything I want"
echo " per element $1"
echo " that I want here"
}
export -f process
$command | grep -F "Specific :: Criterion" | awk '{print $6}' | xargs -I % -n 1 bash -c "process %";
Run the command, filter only matching lines, and pull the sixth element. Then if you need to run an arbitrary code on it, send it to a function (you export to make it visible in subprocesses) via xargs.
What are you applying the grep on ?
Modify
if grep -Fq "Specific :: Criterion"; then
as below
if ( echo $line | grep -Fq "Specific :: Criterion" ); then

bash: grep in loop does not grep

I have (probably a obvious/stupid) problem:
I want to loop over a list of paths, cut them and use the strings to grep in log files.
While every step works fine on its own and 'processed manually' results in hits - grep does not find anything when in the loop?
for FILE in `awk -F "/" '{print $13}' /tmp/files_not_visible.uniq`; do
echo -e "\n\n$FILE\n";
grep "$FILE" /var/log/PATH/FILENAME-2015.12.*;
done
I also tried to do a while loop as reverse exercise, but fails with the same non-result
while read FILE; do
echo $FILE;
echo $FILE | awk -F "/" '{print $13}' | grep -f - /var/log/PATH/FILENAME-2015.12.* ;
done < /tmp/files_not_visible.uniq/tmp/files_not_visible.uniq
So, I guess there is some systematic issue, how I handle the search string with grep?
Found it: the list of files contained invisible characters as the last character of the line! Probably the user, who send me the list of files, created it on some other OS! And I only copied -of course- the visible characters when testing by hand!
Fixed the loop by cutting the last character of a line with
> sed -e 's/.$//'

Concatenation of huge number of selective files from a directory in Shell

I have more than 50000 files in a directory such as file1.txt, file2.txt, ....., file50000.txt. I would like to concatenate of some files whose file numbers are listed in the following text file (need.txt).
need.txt
1
4
35
45
71
.
.
.
I tried with the following. Though it is working, but I look for more simpler and short way.
n1=1
n2=$(wc -l < need.txt)
while [ $n1 -le $n2 ]
do
f1=$(awk 'NR=="$n1" {print $1}' need.txt)
cat file$f1.txt >> out.txt
(( n1++ ))
done
This might also work for you:
sed 's/.*/file&.txt/' < need.txt | xargs cat > out.txt
Something like this should work for you:
sed -e 's/.*/file&.txt/' need.txt | xargs cat > out.txt
It uses sed to translate each line into the appropriate file name and then hands the filenames to xargs to hand them to cat.
Using awk it could be done this way:
awk 'NR==FNR{ARGV[ARGC]="file"$1".txt"; ARGC++; next} {print}' need.txt > out.txt
Which adds each file to the ARGV array of files to process and then prints every line it sees.
It is possible do it without any sed or awk command. Directly using bash built-in functions and cat (of course).
for i in $(cat need.txt); do cat file${i}.txt >> out.txt; done
And as you want, it is quite simple.

Extract strings in a text file using grep

I have file.txt with names one per line as shown below:
ABCB8
ABCC12
ABCC3
ABCC4
AHR
ALDH4A1
ALDH5A1
....
I want to grep each of these from an input.txt file.
Manually i do this one at a time as
grep "ABCB8" input.txt > output.txt
Could someone help to automatically grep all the strings in file.txt from input.txt and write it to output.txt.
You can use the -f flag as described in Bash, Linux, Need to remove lines from one file based on matching content from another file
grep -o -f file.txt input.txt > output.txt
Flag
-f FILE, --file=FILE:
Obtain patterns from FILE, one per line. The empty file
contains zero patterns, and therefore matches nothing. (-f is
specified by POSIX.)
-o, --only-matching:
Print only the matched (non-empty) parts of a matching line, with
each such part on a separate output line.
for line in `cat text.txt`; do grep $line input.txt >> output.txt; done
Contents of text.txt:
ABCB8
ABCC12
ABCC3
ABCC4
AHR
ALDH4A1
ALDH5A1
Edit:
A safer solution with while read:
cat text.txt | while read line; do grep "$line" input.txt >> output.txt; done
Edit 2:
Sample text.txt:
ABCB8
ABCB8XY
ABCC12
Sample input.txt:
You were hired to do a job; we expect you to do it.
You were hired because ABCB8 you kick ass;
we expect you to kick ass.
ABCB8XY You were hired because you can commit to a rational deadline and meet it;
ABCC12 we'll expect you to do that too.
You're not someone who needs a middle manager tracking your mouse clicks
If You don't care about the order of lines, the quick workaround would be to pipe the solution through a sort | uniq:
cat text.txt | while read line; do grep "$line" input.txt >> output.txt; done; cat output.txt | sort | uniq > output2.txt
The result is then in output.txt.
Edit 3:
cat text.txt | while read line; do grep "\<${line}\>" input.txt >> output.txt; done
Is that fine?

How can I prepend a string to the beginning of each line in a file?

I have the following bash code which loops through a text file, line by line .. im trying to prefix the work 'prefix' to each line but instead am getting this error:
rob#laptop:~/Desktop$ ./appendToFile.sh stusers.txt kp
stusers.txt
kp
./appendToFile.sh: line 11: /bin/sed: Argument list too long
115000_210org#house.com,passw0rd
This is the bash script ..
#!/bin/bash
file=$1
string=$2
echo "$file"
echo "$string"
for line in `cat $file`
do
sed -e 's/^/prefix/' $line
echo "$line"
done < $file
What am i doing wrong here?
Update:
Performing head on file dumps all the lines onto a single line of the terminal, probably related?
rob#laptop:~/Desktop$ head stusers.txt
rob#laptop:~/Desktop$ ouse.com,passw0rd
a one-line awk command should do the trick also:
awk '{print "prefix" $0}' file
Concerning your original error:
./appendToFile.sh: line 11: /bin/sed: Argument list too long
The problem is with this line of code:
sed -e 's/^/prefix/' $line
$line in this context is file name that sed is running against. To correct your code you should fix this line as such:
echo $line | sed -e 's/^/prefix/'
(Also note that your original code should not have the < $file at the end.)
William Pursell addresses this issue correctly in both of his suggestions.
However, I believe you have correctly identified that there is an issue with your original text file. dos2unix will not correct this issue, as it only strips the carriage returns Windows sticks on the end of lines. (However, if you are attempting to read a Linux file in Windows, you would get a mammoth line with no returns.)
Assuming that it is not an issue with the end of line characters in your text file, William Pursell's, Andy Lester's, or nullrevolution's answers will work.
A variation on the while read... suggestion:
while read -r line; do echo "PREFIX " $line; done < $file
This could be run directly from the shell (no need for a batch / script file):
while read -r line; do echo "kp" $line; done < stusers.txt
The entire loop can be replaced by a single sed command that operates on the entire file:
sed -e 's/^/prefix/' $file
A Perl way to do it would be:
perl -p -e's/^/prefix' filename
or
perl -p -e'$_ = "prefix $_"' filename
In either case, that reads from filename and prints the prefixed lines to STDOUT.
If you add a -i flag, then Perl will modify the file in place. You can also specify multiple filenames and Perl will magically do all of them.
Instead of the for loop, it is more appropriate to use while read...:
while read -r line; do
do
echo "$line" | sed -e 's/^/prefix/'
done < $file
But you would be much better off with the simpler:
sed -e 's/^/prefix/' $file
Use sed. Just change the word prefix.
sed -e 's/^/prefix/' file.ext
If you want to save the output in another file
sed -e 's/^/prefix/' file.ext > file_new.ext
You don't need sed, just concatenate the strings in the echo command
while IFS= read -r line; do
echo "prefix$line"
done < filename
Your loop iterates over each word in the file:
for line in `cat file`; ...
sed -i '1a\
Your Text' file1 file2 file3
A solution without sed/awk and while loops:
xargs -n1 printf "$prefix%s\n" < "$file"

Resources