grep - limit number of files read - linux

I have a directory with over 100,000 files. I want to know if the string "str1" exists as part of the content of any of these files.
The command:
grep -l 'str1' * takes too long as it reads all of the files.
How can I ask grep to stop reading any further files if it finds a match? Any one-liner?
Note: I have tried grep -l 'str1' * | head but the command takes just as much time as the previous one.

Naming 100,000 filenames in your command args is going to cause a problem. It probably exceeds the size of a shell command-line.
But you don't have to name all the files if you use the recursive option with just the name of the directory the files are in (which is . if you want to search files in the current directory):
grep -l -r 'str1' . | head -1

Use grep -m 1 so that grep stops after finding the first match in a file. It is extremely efficient for large text files.
grep -m 1 str1 * /dev/null | head -1
If there is a single file, then /dev/null above ensures that grep does print out the file name in the output.
If you want to stop after finding the first match in any file:
for file in *; do
if grep -q -m 1 str1 "$file"; then
echo "$file"
break
fi
done
The for loop also saves you from the too many arguments issue when you have a directory with a large number of files.

Related

Search, match and copy directories into another based on names in a txt file

My goal is copy a bulk of specific directories whose names are in a txt file as follows:
$ cat names.txt
raw1
raw2
raw3
raw4
raw5
These directories have subdirectories, hence it is important to copy all the contents. When I list in my terminal it looks like this:
$ ls -l
raw3
raw7
raw1
raw8
raw5
raw6
raw2
raw4
To perform this task, I have tried the following:
cat names.txt | while read line; do grep -l '$line' | xargs -r0 cp -t <desired_destination>; done
But, I get this mistake
cp: cannot stat No such file or directory
I suppose it's because the names in the file list (names.txt) don't match in sorting with the ones in the terminal. Notice that they are unsorted and by using while read line doesn't work. Thank you for taking the time and commitment to help me.
Having problems following the logic of the current code so in the name of K.I.S.S. I propose:
tgtdir=/my/target/directory
while read -r srcdir
do
[[ -d "${srcdir}" ]] && cp -rp "${srcdir}" "${tgtdir}"
done < <(tr -d '\r' < names.dat)
NOTES:
the < <(tr -d '\r' < names.dat) is used to remove windows/dos line endings from names.dat (per comments from OP); if names.dat is updated to remove the \r' then the tr -d with be a no-op (ie, bit of overhead to spawn the subprocess but the script should still read names.dat correctly)
assumes script is run from the directory where the source directories reside otherwise code can be modified to either cd to said directory or preface the ${srcdir} references with said directory
OP can add/modify the cp flags as needed, but I'm assuming at a minimum -r will be needed in order to recursively copy the directories
UUoC.
cat names.txt | while read line; do ...; done
is better written
while read line; do ...; done < names.txt
do grep -l '$LINE' | is eating your input.
printf "%s\n" 1 2 3 |while read line; do echo "Read: [$line]"; grep . | cat; done
Read: [1]
2
3
In your case, it is likely finding no lines that match the literal string $LINE which you have embedded in single-qote marks, which do not allow it to be parsed for content. Use "$line" (avoid capitals), and wouldn't be helpful even if it did match:
$: printf "%s\n" 1 2 3 | grep -l .
(standard input)
You didn't tell it what to read from, so -l is pointless since it's reading the same stdin stream that the read is.
I think what you want is a little simpler -
xargs cp -Rt /your/desired/target/directory/ < names.txt
Assuming you wanted to leave the originals where they were.

fast way to find text files not ending with a specified string

I have many many xml files and want to check their completeness by verifying if they end with </root> tag.
grep -L "</root>" *.xml
does the tricky but rather slow (too many and large files). Is there a quicker solution?
For large files, if you sure that the target string is at the end of them, use tail:
tail -n 10 filename.xml | grep "</root>" # will check the last 10 lines for the pattern
Tested on text file ~ 7GB, single grep ~ 20s, with tail less then 0.01s
For the number of files (and print file names whicn NOT contains the pattern):
for f in *.xml ; do tail -n 10 "$f" | grep -q "</root>" || echo "$f" ; done

Linux commands to get Latest file depending on file name

I am new to linux. I have a folder with many files in it and i need to get the latest file depending on the file name. Example: I have 3 files RAT_20190111.txt RAT_20190212.txt RAT_20190321.txt . I need a linux command to move the latest file here RAT20190321.txt to a specific directory.
If file pattern remains the same then you can try below command :
mv $(ls RAT*|sort -r|head -1) /path/to/directory/
As pointed out by #wwn, there is no need to use sort, Since the files are lexicographically sortable ls should do the job already of sorting them so the command will become :
mv $(ls RAT*|tail -1) /path/to/directory
The following command works.
ls | grep -v '/$' |sort | tail -n 1 | xargs -d '\n' -r mv -- /path/to/directory
The command first splits output of ls with newline. Then sorts it, takes the last file and then it moves this to the required directory.
Hope it helps.
Use the below command
cp ls |tail -n 1 /data...

Printing the number of lines

I have a directory that contains only .txt files. I want to print the number of lines for every file. When I write cat file.txt | wc -l the number of lines appears but when I want to make a script it's more complicated. I have this code:
for fis in `ls -R $1`
do
echo `cat $fis | wc -l`
done
I tried: wc -l $fis , with awk,grep and it doesn't work. It tells that:
cat: fis1: No such file or directory
0
How can I do to print the number of lines?
To find files recursively in subdirectories, use the find command, not ls -R, which is mainly intended for human reading.
find "$1" -type f -exec wc -l {} +
The problems with looping over the output of ls -R are:
Filenames with whitespace won't be parsed correctly.
It prints other output beside just the filenames.
Not the problem here, but the echo command is more than needed:
You can use
wc -l "${fis}"
What goes wrong?
You have a subdir called fis1. Look to the output of ls:
# ls -R fis1
fis1:
file1_in_fis1.txt
When you are parsing this output, your script will try
echo `cat fis1: | wc -l`
The cat will tell you No such file or directory and wc counts 0.
As #Barmar explained, ls prints additional output you do not want.
Do not try to patch your attempt by | grep .txt and if [ -f "${fis}"]; then .., these will fail with filename with spaces.txt. So use find or shopt (and accept the answer of #Barmar or #Cyrus).

How can I loop through a directory and get wc of every txt file?

The shell is sh.
I have been using a for loop:
for F in *.txt
do
echo `wc -w $F`
done
This has been returning the number of words and the name of the file. I don't understand why it keeps returning the name of the file; it looks like it should only return the number of words in the file.
This is the default behavior of wc, it shows the filename after the count.
If you just want the count, pass the filename via STDIN:
wc -w <filename
Also, without iterating over the files using for, you could just use globbing for getting the filenames at once, wc takes multiple arguments so there would not be a problem:
wc -w *.txt
In this case, to get rid of the filenames, use some text-processing:
wc -w *.txt | awk '{print $1}'
This should be faster than the fora approach you have already.

Resources