How can I loop through a directory and get wc of every txt file? - linux

The shell is sh.
I have been using a for loop:
for F in *.txt
do
echo `wc -w $F`
done
This has been returning the number of words and the name of the file. I don't understand why it keeps returning the name of the file; it looks like it should only return the number of words in the file.

This is the default behavior of wc, it shows the filename after the count.
If you just want the count, pass the filename via STDIN:
wc -w <filename
Also, without iterating over the files using for, you could just use globbing for getting the filenames at once, wc takes multiple arguments so there would not be a problem:
wc -w *.txt
In this case, to get rid of the filenames, use some text-processing:
wc -w *.txt | awk '{print $1}'
This should be faster than the fora approach you have already.

Related

Move a file list based upon grep pattern in command line [duplicate]

I want to pass each output from a command as multiple argument to a second command, e.g.:
grep "pattern" input
returns:
file1
file2
file3
and I want to copy these outputs, e.g:
cp file1 file1.bac
cp file2 file2.bac
cp file3 file3.bac
How can I do that in one go? Something like:
grep "pattern" input | cp $1 $1.bac
You can use xargs:
grep 'pattern' input | xargs -I% cp "%" "%.bac"
You can use $() to interpolate the output of a command. So, you could use kill -9 $(grep -hP '^\d+$' $(ls -lad /dir/*/pid | grep -P '/dir/\d+/pid' | awk '{ print $9 }')) if you wanted to.
In addition to Chris Jester-Young good answer, I would say that xargs is also a good solution for these situations:
grep ... `ls -lad ... | awk '{ print $9 }'` | xargs kill -9
will make it. All together:
grep -hP '^\d+$' `ls -lad /dir/*/pid | grep -P '/dir/\d+/pid' | awk '{ print $9 }'` | xargs kill -9
For completeness, I'll also mention command substitution and explain why this is not recommended:
cp $(grep -l "pattern" input) directory/
(The backtick syntax cp `grep -l "pattern" input` directory/ is roughly equivalent, but it is obsolete and unwieldy; don't use that.)
This will fail if the output from grep produces a file name which contains whitespace or a shell metacharacter.
Of course, it's fine to use this if you know exactly which file names the grep can produce, and have verified that none of them are problematic. But for a production script, don't use this.
Anyway, for the OP's scenario, where you need to refer to each match individually and add an extension to it, the xargs or while read alternatives are superior anyway.
In the worst case (meaning problematic or unspecified file names), pass the matches to a subshell via xargs:
grep -l "pattern" input |
xargs -r sh -c 'for f; do cp "$f" "$f.bac"; done' _
... where obviously the script inside the for loop could be arbitrarily complex.
In the ideal case, the command you want to run is simple (or versatile) enough that you can simply pass it an arbitrarily long list of file names. For example, GNU cp has a -t option to facilitate this use of xargs (the -t option allows you to put the destination directory first on the command line, so you can put as many files as you like at the end of the command):
grep -l "pattern" input | xargs cp -t destdir
which will expand into
cp -t destdir file1 file2 file3 file4 ...
for as many matches as xargs can fit onto the command line of cp, repeated as many times as it takes to pass all the files to cp. (Unfortunately, this doesn't match the OP's scenario; if you need to rename every file while copying, you need to pass in just two arguments per cp invocation: the source file name and the destination file name to copy it to.)
So in other words, if you use the command substitution syntax and grep produces a really long list of matches, you risk bumping into ARG_MAX and "Argument list too long" errors; but xargs will specifically avoid this by instead copying only as many arguments as it can safely pass to cp at a time, and running cp multiple times if necessary instead.
The above will still work incorrectly if you have file names which contain newlines. Perhaps see also https://mywiki.wooledge.org/BashFAQ/020
#!/bin/bash
for f in files; do
if grep -q PATTERN "$f"; then
echo cp -v "$f" "${f}.bac"
fi
done
files can be *.txt or *.text which basically means files ending in *.txt or *text or replace with something that you want/need, of course replace PATTERN with yours. Remove echo if you're satisfied with the output. For a recursive solution take a look at the bash shell option globstar

grep - limit number of files read

I have a directory with over 100,000 files. I want to know if the string "str1" exists as part of the content of any of these files.
The command:
grep -l 'str1' * takes too long as it reads all of the files.
How can I ask grep to stop reading any further files if it finds a match? Any one-liner?
Note: I have tried grep -l 'str1' * | head but the command takes just as much time as the previous one.
Naming 100,000 filenames in your command args is going to cause a problem. It probably exceeds the size of a shell command-line.
But you don't have to name all the files if you use the recursive option with just the name of the directory the files are in (which is . if you want to search files in the current directory):
grep -l -r 'str1' . | head -1
Use grep -m 1 so that grep stops after finding the first match in a file. It is extremely efficient for large text files.
grep -m 1 str1 * /dev/null | head -1
If there is a single file, then /dev/null above ensures that grep does print out the file name in the output.
If you want to stop after finding the first match in any file:
for file in *; do
if grep -q -m 1 str1 "$file"; then
echo "$file"
break
fi
done
The for loop also saves you from the too many arguments issue when you have a directory with a large number of files.

How do I use the pipe command to display attributes in a file?

I'm currently making a shell program and I want to display the total amount of bytes in a specific file using the pipe command. I know that the pipe command takes whatever is on the left side and gives it to the right as input. (Assuming you are in the directory the file is in)
I know that the command (wc -c) displays the number of bytes in a file but I'm not sure how to pipe it. What I've tried was:
ls fileName.sh | wc -c
wc takes the filename as argument, not as input. Try this:
wc -c fileName.sh
The wc program takes multiple arguments. You can do this to apply it to all entries in the current working directory:
wc -c $(ls)
Another approach is to use xargs to convert input to arguments:
ls | xargs wc -c
You may need to use a more complex line if you have spaces in your filenames. ls can output a single file per line, and xargs can be told to split only on \n:
ls -1 | xargs -d '\n' wc -c
If you prefer to use find instead of ls (a more powerful tool), the -print0 option for find plays along with the -0 option to xargs.

Can find push the filenames of the found files into the pipe?

I would like to do a find in some dir, and do a awk on the files in this direcory, and then replace the original files by each result.
find dir | xargs cat | awk ... | mv ... > filename
So I need the filename (of each of the files found by find) in the last command. How can I do that?
I would use a loop, like:
for filename in `find . -name "*test_file*" -print0 | xargs -0`
do
# some processing, then
echo "what you like" > "$filename"
done
EDIT: as noted in the comments, the benefits of -print0 | xargs -0 are lost because of the for loop. And filenames containing a white space are still not handled correctly.
The following while loop would not handle unusual filenames neither (good to know it, though it was not in the question), but filenames with a standard white space at least, so it works better, indeed:
find . -name "*test*file*" -print > files_list
while IFS= read -r filename
do
# some process
echo "what you like" > "$filename"
done < files_list
You could do something like this (but I wouldn't recommend it at all).
find dir -print0 |
xargs -0 -n 2 awk -v OFS='\0' '<process the input and write to temporary file>
END {print "temporaryfile", FILENAME}' |
xargs -0 -n 2 mv
This passes the files to awk directly two at a time (which avoids the problem with your original where cat will get hundreds (perhaps more) files as arguments all at once and spit all their content at awk via standard input at once and thus lose their individual contents and filenames entirely).
It then has awk write the processed output to a temporary file and then outputs the temporary filename and the original filename where xargs picks them up (again two at a time) and runs mv on the pairs of temporary file/original file names.
As I said at the beginning however this is a terrible way to do this.
If you have a new enough version of GNU awk (version 4.1.0 or newer) then you could just use the -i (in-place) argument to awk and use (I believe):
find dir | xargs awk -i '......'
Without that I would use a while loop of the form in Bash FAQ 001 to read the find output line-by-line and operate on it in the loop.

Linux using grep to print the file name and first n characters

How do I use grep to perform a search which, when a match is found, will print the file name as well as the first n characters in that file? Note that n is a parameter that can be specified and it is irrelevant whether the first n characters actually contains the matching string.
grep -l pattern *.txt |
while read line; do
echo -n "$line: ";
head -c $n "$line";
echo;
done
Change -c to -n if you want to see the first n lines instead of bytes.
You need to pipe the output of grep to sed to accomplish what you want. Here is an example:
grep mypattern *.txt | sed 's/^\([^:]*:.......\).*/\1/'
The number of dots is the number of characters you want to print. Many versions of sed often provide an option, like -r (GNU/Linux) and -E (FreeBSD), that allows you to use modern-style regular expressions. This makes it possible to specify numerically the number of characters you want to print.
N=7
grep mypattern *.txt /dev/null | sed -r "s/^([^:]*:.{$N}).*/\1/"
Note that this solution is a lot more efficient that others propsoed, which invoke multiple processes.
There are few tools that print 'n characters' rather than 'n lines'. Are you sure you really want characters and not lines? The whole thing can perhaps be best done in Perl. As specified (using grep), we can do:
pattern="$1"
shift
n="$2"
shift
grep -l "$pattern" "$#" |
while read file
do
echo "$file:" $(dd if="$file" count=${n}c)
done
The quotes around $file preserve multiple spaces in file names correctly. We can debate the command line usage, currently (assuming the command name is 'ngrep'):
ngrep pattern n [file ...]
I note that #litb used 'head -c $n'; that's neater than the dd command I used. There might be some systems without head (but they'd pretty archaic). I note that the POSIX version of head only supports -n and the number of lines; the -c option is probably a GNU extension.
Two thoughts here:
1) If efficiency was not a concern (like that would ever happen), you could check $status [csh] after running grep on each file. E.g.: (For N characters = 25.)
foreach FILE ( file1 file2 ... fileN )
grep targetToMatch ${FILE} > /dev/null
if ( $status == 0 ) then
echo -n "${FILE}: "
head -c25 ${FILE}
endif
end
2) GNU [FSF] head contains a --verbose [-v] switch. It also offers --null, to accomodate filenames with spaces. And there's '--', to handle filenames like "-c". So you could do:
grep --null -l targetToMatch -- file1 file2 ... fileN |
xargs --null head -v -c25 --

Resources