How do you search for files containing DOS line endings (CRLF) with grep on Linux?

How do you search for files containing DOS line endings (CRLF) with grep on Linux? - linux

I want to search for files containing DOS line endings with grep on Linux. Something like this:
grep -IUr --color '\r\n' .
The above seems to match for literal rn which is not what is desired.
The output of this will be piped through xargs into todos to convert crlf to lf like this
grep -IUrl --color '^M' . | xargs -ifile fromdos 'file'

grep probably isn't the tool you want for this. It will print a line for every matching line in every file. Unless you want to, say, run todos 10 times on a 10 line file, grep isn't the best way to go about it. Using find to run file on every file in the tree then grepping through that for "CRLF" will get you one line of output for each file which has dos style line endings:
find . -not -type d -exec file "{}" ";" | grep CRLF
will get you something like:
./1/dos1.txt: ASCII text, with CRLF line terminators
./2/dos2.txt: ASCII text, with CRLF line terminators
./dos.txt: ASCII text, with CRLF line terminators

Use Ctrl+V, Ctrl+M to enter a literal Carriage Return character into your grep string. So:
grep -IUr --color "^M"
will work - if the ^M there is a literal CR that you input as I suggested.
If you want the list of files, you want to add the -l option as well.
Explanation
-I ignore binary files
-U prevents grep from stripping CR characters. By default it does this it if it decides it's a text file.
-r read all files under each directory recursively.

Using RipGrep (depending on your shell, you might need to quote the last argument):
rg -l \r
-l, --files-with-matches
Only print the paths with at least one match.
https://github.com/BurntSushi/ripgrep

If your version of grep supports -P (--perl-regexp) option, then
grep -lUP '\r$'
could be used.

# list files containing dos line endings (CRLF)
cr="$(printf "\r")" # alternative to ctrl-V ctrl-M
grep -Ilsr "${cr}$" .
grep -Ilsr $'\r$' . # yet another & even shorter alternative

dos2unix has a file information option which can be used to show the files that would be converted:
dos2unix -ic /path/to/file
To do that recursively you can use bash’s globstar option, which for the current shell is enabled with shopt -s globstar:
dos2unix -ic ** # all files recursively
dos2unix -ic **/file # files called “file” recursively
Alternatively you can use find for that:
find -type f -exec dos2unix -ic {} + # all files recursively (ignoring directories)
find -name file -exec dos2unix -ic {} + # files called “file” recursively

You can use file command in unix. It gives you the character encoding of the file along with line terminators.
$ file myfile
myfile: ISO-8859 text, with CRLF line terminators
$ file myfile | grep -ow CRLF
CRLF

The query was search... I have a similar issue... somebody submitted mixed line
endings into the version control, so now we have a bunch of files with 0x0d
0x0d 0x0a line endings. Note that
grep -P '\x0d\x0a'
finds all lines, whereas
grep -P '\x0d\x0d\x0a'
and
grep -P '\x0d\x0d'
finds no lines so there may be something "else" going on inside grep
when it comes to line ending patterns... unfortunately for me!

If, like me, your minimalist unix doesn't include niceties like the file command, and backslashes in your grep expressions just don't cooperate, try this:
$ for file in `find . -type f` ; do
> dump $file | cut -c9-50 | egrep -m1 -q ' 0d| 0d'
> if [ $? -eq 0 ] ; then echo $file ; fi
> done
Modifications you may want to make to the above include:
tweak the find command to locate only the files you want to scan
change the dump command to od or whatever file dump utility you have
confirm that the cut command includes both a leading and trailing space as well as just the hexadecimal character output from the dump utility
limit the dump output to the first 1000 characters or so for efficiency
For example, something like this may work for you using od instead of dump:
od -t x2 -N 1000 $file | cut -c8- | egrep -m1 -q ' 0d| 0d|0d$'

Related

Wildcard in sed command to replace string not working

I'm trying to use the sed command in terminal to replace a specific line in all my text files with a certain extension by a specific string:
sed -i.bak '35s/^.*$/5\) 1\-4/' fitting_file*.feedme
So I am trying to replace line 35 in each of these files with the string "5) 1-4". When I run an ls fitting_file*.feedme | wc -l command in this directory, I get 221 files. However, when I run the above sed command, it only edits the FIRST file in the order of ls fitting_file*.feedme. I know this because grep '5) 1-4' fitting_file*.feedme continually only returns the first file on the list after I run the replacement command. I also tried replacing fitting_file*.feedme with a space-separated list of a couple of these files in my sed command as a test, but it still only operated on the one I chose to list first. Why is this happening?

sed operates on a single stream. It essentially concats all the files together and treats that as a single stream. So it replaces the 35th line of the big concatenated stream.
To see this, make a 20 line file called A and a 20 line file called B. Apply your sed command as
sed -i.bak '35s/^.*$/5\) 1\-4/' A B
and you will see the 15th line of B replaced.
I think this should answer your direct question. As far how to get done what you like, I assume you've already figured out that wrapping your sed command in a for is one way to do it. :)

Try
Create a file containing your sed instruction like this
#!/bin/bash
sed -i.bak '35s/^.*$/5\) 1\-4/' $1
exit 0
and call it prog.sh. Next make it executable :
chmod u+x prog.sh
now you can solve your problem using
find . -name fitting_file\*.feedme -exec ./prog.sh {} \;
You could do all this on one line but frankly the number of escapes required is a bit much. Good luck.

To do what you're trying to do without using a shell loop is:
awk -i inplace -v inplace::suffix=.bak 'FNR==35{$0="5) 1-4"}1' fitting_file*.feedme
Note that unlike sed which can just count lines across all input files, awk has NR to track the number of records (lines by default) across all files and FNR for the same but just within the current file.
The above uses GNU awk for inplace editing just like GNU sed has -i for that. The default awk on MacOS is BSD awk, not GNU awk, but you should install GNU awk as it doesn't have all the bugs/quirks that BSD awk does and it has a ton of extremely useful extensions.
If you just want to use MacOS's awk then it'd be something like:
find . -name 'fitting_file*.feedme' -exec sh -c "\
awk 'FNR==35{\$0=\"5) 1-4\"}1' \"\$1\" > \"\$1.bak\" &&
mv -- \"\$1.bak\" \"\$1\"
" sh {} \;
which is obviously getting kinda complicated - I'd probably put the awk+mv script in a file to execute from sh -c or just resort to a shell loop myself if faced with that alternative (or a similar quoting nightmare with xargs)!

parse grep output and run vim with result

I'm current using command line to grep a pattern in a source tree. A line of grep output is in the form:
path/to/a/file.java:123: some text here
If I want to open the file at the location specified in the grep output, I would have to manually enter the vim command as:
$ vim +123 path/to/a/file.java
Is there an easier method that would allow me to use the raw grep output and have the relevant components parsed and run vim for the file at the line#.
I am interested in a command line solution. I am aware that I can do greps inside vim.
Thanks

The file-line plugin is exactly what you want. With that installed, you can just run
vim path/to/a/file.java:123

You could simply run grep from Vim itself and benefit from the quickfix list/window:
:grep -Rn foo **/*.h
:cw
(scroll around)
<CR>
Or you could pass your grep output to Vim for the same benefits:
$ vim -q <(grep -Rn foo **/*.h)
:cw
(scroll around)
<CR>
Or, if you are already in Vim, you could insert the output of your grep in a buffer and use gF to jump to the right line of the right file:
:r !grep -Rn foo **/*.h
(scroll around)
gF
Or, from your shell:
$ vim <(grep -Rn foo **/*.h)
(scroll around)
gF
Or, if you just ran your grep, you can reuse it like so:
$ vim <(!!)
(scroll around)
gF
Or, if you know its number in history:
$ vim <(!884)
(scroll around)
gF

> vim $(cat the.file | grep xxx)
will evauluates the $() - find xxx in the.file then will pipe xxx to vim
also possible with backticks ``:
> vim `cat the.file | grep xxx`

Try this:
grep -nr --null pattern | { IFS= read -rd "" f; IFS=: read -d "" n match; vim +$n "$f" </dev/tty; }
grep does a recursive search for pattern. For the first file that it finds, vim is started with the +linenum parameter to put you on the line of interest.
This approach uses NUL-separated i/o. It should be safe for all file names, even ones that contain white space or other difficult characters.
This was tested on GNU tools (Linux). It may work on BSD/OSX as well.
Multiline version
For those who prefer their commands spread over multiple lines:
grep -nr --null pattern | {
IFS= read -rd "" f
IFS=: read -d "" n match
vim +$n "$f" </dev/tty
}
Convenience function
Because the above command is long, one may want to put it in a shell function:
vigrep() { grep -nr --null "$1" | { IFS= read -rd "" f; IFS=: read -d "" n match; vim +$n "$f" </dev/tty; }; }
Once this has been defined, it can be used to search for a file containing any pattern. For example:
vigrep 'some text here'
To make the definition of vigrep permanent, put it in your ~/.bashrc file.
How it works
grep -nr --null pattern
-r tells grep to search recursively.
-n tells grep to return line number of the matches.
-null tells grep to use NUL-separated output
pattern is the regex to search for.
IFS= read -rd "" f
This reads the first NUL-separated section of input (which will be a file name) and assigns it to the shell variable f.
IFS=: read -d "" n match
This reads the next NUL-separated section of input using : as the word separator. The first word (which is the line number) is assigned to shell variable n. The rest of this line will be ignored.
vim +$n "$f" </dev/tty
This starts vim on line number $n of file $f using the terminal, /dev/tty, for input.
Generally, when running vim, one really wants to have vim accept input from the keyboard. That is why, for this case, we hard-coded input from /dev/tty.
Using cut-and-paste to launch vim
Start the following and cut-and-paste a line of grep -n output to it:
IFS=: read f n rest; vim +$n "$f"
The read command will wait for a line on standard input. The type of input it expects looks like:
path/to/a/file.java:123: some text here
Because IFS=:, it divides up the line on colons and assigns the file name to shell variable f and the line number to shell variable n. When this is done, it launches the vim command.
This command could also, if desired, be saved as a shell function:
grvim() { IFS=: read f n rest; vim "+$n" "$f"; }

I have this function in my .bashrc:
grep_edit(){
grep "$#" | sed 's/:/ +/;s/:/ /';
}
So, the output is in the form:
path/to/a/file.java +123 some text here
Then I can directly use
$ vi path/to/a/file.java +123
Note: I have also heard of file-line plugin, but I was not sure how it will work with netrw plugin.
e.g. vi can open remote files with this syntax:
vi scp://root#remote-system//var/log/daemon.log
But if that is not a concern, then you can better use file-line plugin.

How to find a windows end of line (EOL) character

I have several hundred GB of data that I need to paste together using the unix paste utility in Cygwin, but it won't work properly if there are windows EOL characters in the files. The data may or may not have windows EOL characters, and I don't want to spend the time running dos2unix if I don't have to.
So my question is, in Cygwin, how can I figure out whether these files have windows EOL CRLF characters?
I've tried creating some test data and running
sed -r 's/\r\n//' testdata.txt
But that appears to match regardless of whether dos2unix has been run or not.
Thanks.

The file(1) utility knows the difference:
$ file * | grep ASCII
2: ASCII text
3: ASCII English text
a: ASCII C program text
blah: ASCII Java program text
foo.js: ASCII C++ program text
openssh_5.5p1-4ubuntu5.dsc: ASCII text, with very long lines
windows: ASCII text, with CRLF line terminators
file(1) has been optimized to try to read as little of a file as possible, so you may be lucky and drastically reduce the amount of disk IO you need to perform when finding and fixing the CRLF terminators.
Note that some cases of CRLF should stay in place: captures of SMTP will use CRLF. But that's up to you. :)

#!/bin/bash
for i in $(find . -type f); do
if file $i | grep CRLF ; then
echo $i
file $i
#dos2unix "$i"
fi
done
Uncomment "#dos2unix "$i"" when you are ready to convert them.

You can find out using file:
file /mnt/c/BOOT.INI
/mnt/c/BOOT.INI: ASCII text, with CRLF line terminators
CRLF is the significant value here.

If you expect the exit code to be different from sed, it won't be. It will perform a substitution or not depending on the match. The exit code will be true unless there's an error.
You can get a usable exit code from grep, however.
#!/bin/bash
for f in *
do
if head -n 10 "$f" | grep -qs $'\r'
then
dos2unix "$f"
fi
done

grep recursive, with file pattern filter
grep -Pnr --include=*file.sh '\r$' .
output file name, line number and line itself
./test/file.sh:2:here is windows line break

You can use dos2unix's -i option to get information about DOS Unix Mac line breaks (in that order), BOMs, and text/binary without converting the file.
$ dos2unix -i *.txt
6 0 0 no_bom text dos.txt
0 6 0 no_bom text unix.txt
0 0 6 no_bom text mac.txt
6 6 6 no_bom text mixed.txt
50 0 0 UTF-16LE text utf16le.txt
0 50 0 no_bom text utf8unix.txt
50 0 0 UTF-8 text utf8dos.txt
With the "c" flag dos2unix will report files that would be converted, iow files have have DOS line breaks. To report all txt files with DOS line breaks you could do this:
$ dos2unix -ic *.txt
dos.txt
mixed.txt
utf16le.txt
utf8dos.txt
To convert only these files you simply do:
dos2unix -ic *.txt | xargs dos2unix
If you need to go recursive over directories you do:
find -name '*.txt' | xargs dos2unix -ic | xargs dos2unix
See also the man page of dos2unix.

As stated above the 'file' solution works. Maybe the following code snippet may help.
#!/bin/ksh
EOL_UNKNOWN="Unknown" # Unknown EOL
EOL_MAC="Mac" # File EOL Classic Apple Mac (CR)
EOL_UNIX="Unix" # File EOL UNIX (LF)
EOL_WINDOWS="Windows" # File EOL Windows (CRLF)
SVN_PROPFILE="name-of-file" # Filename to check.
...
# Finds the EOL used in the requested File
# $1 Name of the file (requested filename)
# $r EOL_FILE set to enumerated EOL-values.
getEolFile() {
EOL_FILE=$EOL_UNKNOWN
# Check for EOL-windows
EOL_CHECK=`file $1 | grep "ASCII text, with CRLF line terminators"`
if [[ -n $EOL_CHECK ]] ; then
EOL_FILE=$EOL_WINDOWS
return
fi
# Check for Classic Mac EOL
EOL_CHECK=`file $1 | grep "ASCII text, with CR line terminators"`
if [[ -n $EOL_CHECK ]] ; then
EOL_FILE=$EOL_MAC
return
fi
# Check for Classic Mac EOL
EOL_CHECK=`file $1 | grep "ASCII text"`
if [[ -n $EOL_CHECK ]] ; then
EOL_FILE=$EOL_UNIX
return
fi
return
} # getFileEOL
...
# Using this snippet
getEolFile $SVN_PROPFILE
echo "Found EOL: $EOL_FILE"
exit -1

Thanks for the tip to use file(1) command, however it does need a bit more refinement. I had the situation where not only plain text files but also some ".sh" scripts had the wrong eol. And "file" reports them as follows regardless of eol:
xxx/y/z.sh: application/x-shellscript
So the "file -e soft" option was needed (at least for Linux):
bash$ find xxx -exec file -e soft {} \; | grep CRLF
This finds all the files with DOS eol in directory xxx and subdirs.

How to find dos format files in a linux file system

I would like to find out which of my files in a directory are dos text files (as opposed to unix text files).
What I've tried:
find . -name "*.php" | xargs grep ^M -l
It's not giving me reliable results... so I'm looking for a better alternative.
Any suggestions, ideas?
Thanks
Clarification
In addition to what I've said above, the problem is that i have a bunch of dos files with no ^M characters in them (hence my note about reliability).
The way i currently determine whether a file is dos or not is through Vim, where at the bottom it says:
"filename.php" [dos] [noeol]

How about:
find . -name "*.php" | xargs file | grep "CRLF"
I don't think it is reliable to try and use ^M to try and find the files.

Not sure what you mean exactly by "not reliable" but you may want to try:
find . -name '*.php' -print0 | xargs -0 grep -l '^M$'
This uses the more atrocious-filenames-with-spaces-in-them-friendly options and only finds carriage returns immediately before the end of line.
Keep in mind that the ^M is a single CTRLM character, not two characters.
And also that it'll list files where even one line is in DOS mode, which is probably what you want anyway since those would have been UNIX files mangled by a non-UNIX editor.
Based on your update that vim is reporting your files as DOS format:
If vim is reporting it as DOS format, then every line ends with CRLF. That's the way vim works. If even one line doesn't have CR, then it's considered UNIX format and the ^M characters are visible in the buffer. If it's all DOS format, the ^M characters are not displayed:
Vim will look for both dos and unix line endings, but Vim has a built-in preference for the unix format.
- If all lines in the file end with CRLF, the dos file format will be applied, meaning that each CRLF is removed when reading the lines into a buffer, and the buffer 'ff' option will be dos.
- If one or more lines end with LF only, the unix file format will be applied, meaning that each LF is removed (but each CR will be present in the buffer, and will display as ^M), and the buffer 'ff' option will be unix.
If you really want to know what's in the file, don't rely on a too-smart tool like vim :-)
Use:
od -xcb input_file_name | less
and check the line endings yourself.

i had good luck with
find . -name "*.php" -exec grep -Pl "\r" {} \;

This is much like your original solution; therefore, it's possibly more easy for you to remember:
find . -name "*.php" | xargs grep "\r" -l
Thought process:
In VIM, to remove the ^M you type:
%s:/^M//g
Where ^ is your Ctrl key and M is the ENTER key. But I could never remember the keys to type to print that sequence, so I've always removed them using:
%s:/\r//g
So my deduction is that the \r and ^M are equivalent, with the former being easier to remember to type.

If your dos2unix command has the -i option, you can use that feature to find files in a directory that have DOS line breaks.
$ man dos2unix
.
.
.
-i[FLAGS], --info[=FLAGS] FILE ...
Display file information. No conversion is done.
The following information is printed, in this order:
number of DOS line breaks,
number of Unix line breaks,
number of Mac line breaks,
byte order mark,
text or binary, file name.
.
.
.
Optionally extra flags can be set to change the (-i) output.
.
.
.
c Print only the files that would be converted.
The following one-liner script reads:
find all files in this directory tree,
run dos2unix on all files to determine the files to be changed,
run dos2unix on files to be changed
$ find . -type f | xargs -d '\n' dos2unix -ic | xargs -d '\n' dos2unix

I've been using cat -e to see what line endings files have.
Using ^M as a single CTRLM character didn't really work out for me (it works as if I just press return, without actually inserting the non-printable ^M line ending — tested with echo 'CTRLM' | cat -e), so what I ended up doing will probably seem too much, but it did the job nevertheless:
grep '$' *.php | cat -e | grep '\^M\$' | sed 's/:.*//' | uniq
, where
the first grep just prepends filenames to each line of each file (can be replaced with awk '{print FILENAME, $0}', but grep worked faster on my set of files);
cat -e explicitly prints non-printable line endings;
the second grep finds lines ending with ^M$, and ^M are two characters;
the sed part keeps only the file names (can be replaced with cut -d ':' -f 1);
uniq just keeps each file name once.

GNU find
find . -type f -iname "*.php" -exec file "{}" + | grep CRLF
I don't know what you want to do after you find those DOS php files, but if you want to convert them to unix format, then
find . -type f -iname "*.php" -exec dos2unix "{}" +;
will suffice. There's no need to specifically check whether they are DOS files or not.

If you prefer vim to tell you which files are in this format you can use the following script:
"use this script to check which files are in dos format according to vim
"use: in the folder that you want to check
"create a file, say res.txt
"> vim -u NONE --noplugins res.txt
"> in vim: source this_script.vim
python << EOF
import os
import vim
cur_buf = vim.current.buffer
IGNORE_START = ''.split()
IGNORE_END = '.pyc .swp .png ~'.split()
IGNORE_DIRS = '.hg .git dd_ .bzr'.split()
for dirpath, dirnames, fnames in os.walk(os.curdir):
for dirn in dirnames:
for diri in IGNORE_DIRS:
if dirn.endswith(diri):
dirnames.remove(dirn)
break
for fname in fnames:
skip = False
for fstart in IGNORE_START:
if fname.startswith(fstart):
skip = True
for fend in IGNORE_END:
if fname.endswith(fend):
skip = True
if skip is True:
continue
fname = os.path.join(dirpath, fname)
vim.command('view {}'.format(fname))
curr_ff = vim.eval('&ff')
if vim.current.buffer != cur_buf:
vim.command('bw!')
if curr_ff == 'dos':
cur_buf.append('{} {}'.format(curr_ff, fname))
EOF
your vim needs to be compiled with python (python is used to loop over the files in the folder, there is probably an easier way of doing this, but I don't really know it....

How can I use xargs to copy files that have spaces and quotes in their names?

I'm trying to copy a bunch of files below a directory and a number of the files have spaces and single-quotes in their names. When I try to string together find and grep with xargs, I get the following error:
find .|grep "FooBar"|xargs -I{} cp "{}" ~/foo/bar
xargs: unterminated quote
Any suggestions for a more robust usage of xargs?
This is on Mac OS X 10.5.3 (Leopard) with BSD xargs.

You can combine all of that into a single find command:
find . -iname "*foobar*" -exec cp -- "{}" ~/foo/bar \;
This will handle filenames and directories with spaces in them. You can use -name to get case-sensitive results.
Note: The -- flag passed to cp prevents it from processing files starting with - as options.

find . -print0 | grep --null 'FooBar' | xargs -0 ...
I don't know about whether grep supports --null, nor whether xargs supports -0, on Leopard, but on GNU it's all good.

The easiest way to do what the original poster wants is to change the delimiter from any whitespace to just the end-of-line character like this:
find whatever ... | xargs -d "\n" cp -t /var/tmp

This is more efficient as it does not run "cp" multiple times:
find -name '*FooBar*' -print0 | xargs -0 cp -t ~/foo/bar

I ran into the same problem. Here's how I solved it:
find . -name '*FoooBar*' | sed 's/.*/"&"/' | xargs cp ~/foo/bar
I used sed to substitute each line of input with the same line, but surrounded by double quotes. From the sed man page, "...An ampersand (``&'') appearing in the replacement is replaced by the string matching the RE..." -- in this case, .*, the entire line.
This solves the xargs: unterminated quote error.

This method works on Mac OS X v10.7.5 (Lion):
find . | grep FooBar | xargs -I{} cp {} ~/foo/bar
I also tested the exact syntax you posted. That also worked fine on 10.7.5.

Just don't use xargs. It is a neat program but it doesn't go well with find when faced with non trivial cases.
Here is a portable (POSIX) solution, i.e. one that doesn't require find, xargs or cp GNU specific extensions:
find . -name "*FooBar*" -exec sh -c 'cp -- "$#" ~/foo/bar' sh {} +
Note the ending + instead of the more usual ;.
This solution:
correctly handles files and directories with embedded spaces, newlines or whatever exotic characters.
works on any Unix and Linux system, even those not providing the GNU toolkit.
doesn't use xargs which is a nice and useful program, but requires too much tweaking and non standard features to properly handle find output.
is also more efficient (read faster) than the accepted and most if not all of the other answers.
Note also that despite what is stated in some other replies or comments quoting {} is useless (unless you are using the exotic fishshell).

Look into using the --null commandline option for xargs with the -print0 option in find.

For those who relies on commands, other than find, eg ls:
find . | grep "FooBar" | tr \\n \\0 | xargs -0 -I{} cp "{}" ~/foo/bar

find | perl -lne 'print quotemeta' | xargs ls -d
I believe that this will work reliably for any character except line-feed (and I suspect that if you've got line-feeds in your filenames, then you've got worse problems than this). It doesn't require GNU findutils, just Perl, so it should work pretty-much anywhere.

I have found that the following syntax works well for me.
find /usr/pcapps/ -mount -type f -size +1000000c | perl -lpe ' s{ }{\\ }g ' | xargs ls -l | sort +4nr | head -200
In this example, I am looking for the largest 200 files over 1,000,000 bytes in the filesystem mounted at "/usr/pcapps".
The Perl line-liner between "find" and "xargs" escapes/quotes each blank so "xargs" passes any filename with embedded blanks to "ls" as a single argument.

Frame challenge — you're asking how to use xargs. The answer is: you don't use xargs, because you don't need it.
The comment by user80168 describes a way to do this directly with cp, without calling cp for every file:
find . -name '*FooBar*' -exec cp -t /tmp -- {} +
This works because:
the cp -t flag allows to give the target directory near the beginning of cp, rather than near the end. From man cp:
-t, --target-directory=DIRECTORY
copy all SOURCE arguments into DIRECTORY
The -- flag tells cp to interpret everything after as a filename, not a flag, so files starting with - or -- do not confuse cp; you still need this because the -/-- characters are interpreted by cp, whereas any other special characters are interpreted by the shell.
The find -exec command {} + variant essentially does the same as xargs. From man find:
-exec command {} +
This variant of the -exec action runs the specified command on
the selected files, but the command line is built by appending
each selected file name at the end; the total number of invoca‐
matched files. The command line is built in much the same way
that xargs builds its command lines. Only one instance of `{}'
is allowed within the command, and (when find is being invoked
from a shell) it should be quoted (for example, '{}') to protect
it from interpretation by shells. The command is executed in
the starting directory. If any invocation returns a non-zero
value as exit status, then find returns a non-zero exit status.
If find encounters an error, this can sometimes cause an immedi‐
ate exit, so some pending commands may not be run at all. This
variant of -exec always returns true.
By using this in find directly, this avoids the need of a pipe or a shell invocation, such that you don't need to worry about any nasty characters in filenames.

With Bash (not POSIX) you can use process substitution to get the current line inside a variable. This enables you to use quotes to escape special characters:
while read line ; do cp "$line" ~/bar ; done < <(find . | grep foo)

Be aware that most of the options discussed in other answers are not standard on platforms that do not use the GNU utilities (Solaris, AIX, HP-UX, for instance). See the POSIX specification for 'standard' xargs behaviour.
I also find the behaviour of xargs whereby it runs the command at least once, even with no input, to be a nuisance.
I wrote my own private version of xargs (xargl) to deal with the problems of spaces in names (only newlines separate - though the 'find ... -print0' and 'xargs -0' combination is pretty neat given that file names cannot contain ASCII NUL '\0' characters. My xargl isn't as complete as it would need to be to be worth publishing - especially since GNU has facilities that are at least as good.

For me, I was trying to do something a little different. I wanted to copy my .txt files into my tmp folder. The .txt filenames contain spaces and apostrophe characters. This worked on my Mac.
$ find . -type f -name '*.txt' | sed 's/'"'"'/\'"'"'/g' | sed 's/.*/"&"/' | xargs -I{} cp -v {} ./tmp/

If find and xarg versions on your system doesn't support -print0 and -0 switches (for example AIX find and xargs) you can use this terribly looking code:
find . -name "*foo*" | sed -e "s/'/\\\'/g" -e 's/"/\\"/g' -e 's/ /\\ /g' | xargs cp /your/dest
Here sed will take care of escaping the spaces and quotes for xargs.
Tested on AIX 5.3

I created a small portable wrapper script called "xargsL" around "xargs" which addresses most of the problems.
Contrary to xargs, xargsL accepts one pathname per line. The pathnames may contain any character except (obviously) newline or NUL bytes.
No quoting is allowed or supported in the file list - your file names may contain all sorts of whitespace, backslashes, backticks, shell wildcard characters and the like - xargsL will process them as literal characters, no harm done.
As an added bonus feature, xargsL will not run the command once if there is no input!
Note the difference:
$ true | xargs echo no data
no data
$ true | xargsL echo no data # No output
Any arguments given to xargsL will be passed through to xargs.
Here is the "xargsL" POSIX shell script:
#! /bin/sh
# Line-based version of "xargs" (one pathname per line which may contain any
# amount of whitespace except for newlines) with the added bonus feature that
# it will not execute the command if the input file is empty.
#
# Version 2018.76.3
#
# Copyright (c) 2018 Guenther Brunthaler. All rights reserved.
#
# This script is free software.
# Distribution is permitted under the terms of the GPLv3.
set -e
trap 'test $? = 0 || echo "$0 failed!" >& 2' 0
if IFS= read -r first
then
{
printf '%s\n' "$first"
cat
} | sed 's/./\\&/g' | xargs ${1+"$#"}
fi
Put the script into some directory in your $PATH and don't forget to
$ chmod +x xargsL
the script there to make it executable.

bill_starr's Perl version won't work well for embedded newlines (only copes with spaces). For those on e.g. Solaris where you don't have the GNU tools, a more complete version might be (using sed)...
find -type f | sed 's/./\\&/g' | xargs grep string_to_find
adjust the find and grep arguments or other commands as you require, but the sed will fix your embedded newlines/spaces/tabs.

I used Bill Star's answer slightly modified on Solaris:
find . -mtime +2 | perl -pe 's{^}{\"};s{$}{\"}' > ~/output.file
This will put quotes around each line. I didn't use the '-l' option although it probably would help.
The file list I was going though might have '-', but not newlines. I haven't used the output file with any other commands as I want to review what was found before I just start massively deleting them via xargs.

I played with this a little, started contemplating modifying xargs, and realised that for the kind of use case we're talking about here, a simple reimplementation in Python is a better idea.
For one thing, having ~80 lines of code for the whole thing means it is easy to figure out what is going on, and if different behaviour is required, you can just hack it into a new script in less time than it takes to get a reply on somewhere like Stack Overflow.
See https://github.com/johnallsup/jda-misc-scripts/blob/master/yargs and https://github.com/johnallsup/jda-misc-scripts/blob/master/zargs.py.
With yargs as written (and Python 3 installed) you can type:
find .|grep "FooBar"|yargs -l 203 cp --after ~/foo/bar
to do the copying 203 files at a time. (Here 203 is just a placeholder, of course, and using a strange number like 203 makes it clear that this number has no other significance.)
If you really want something faster and without the need for Python, take zargs and yargs as prototypes and rewrite in C++ or C.

You might need to grep Foobar directory like:
find . -name "file.ext"| grep "FooBar" | xargs -i cp -p "{}" .

If you are using Bash, you can convert stdout to an array of lines by mapfile:
find . | grep "FooBar" | (mapfile -t; cp "${MAPFILE[#]}" ~/foobar)
The benefits are:
It's built-in, so it's faster.
Execute the command with all file names in one time, so it's faster.
You can append other arguments to the file names. For cp, you can also:
find . -name '*FooBar*' -exec cp -t ~/foobar -- {} +
however, some commands don't have such feature.
The disadvantages:
Maybe not scale well if there are too many file names. (The limit? I don't know, but I had tested with 10 MB list file which includes 10000+ file names with no problem, under Debian)
Well... who knows if Bash is available on OS X?

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How do you search for files containing DOS line endings (CRLF) with grep on Linux? - linux

Using RipGrep (depending on your shell, you might need to quote the last argument): rg -l \r -l, --files-with-matches Only print the paths with at least one match. https://github.com/BurntSushi/ripgrep

If your version of grep supports -P (--perl-regexp) option, then grep -lUP '\r$' could be used.

# list files containing dos line endings (CRLF) cr="$(printf "\r")" # alternative to ctrl-V ctrl-M grep -Ilsr "${cr}$" . grep -Ilsr $'\r$' . # yet another & even shorter alternative

You can use file command in unix. It gives you the character encoding of the file along with line terminators. $ file myfile myfile: ISO-8859 text, with CRLF line terminators $ file myfile | grep -ow CRLF CRLF

Related

Wildcard in sed command to replace string not working

parse grep output and run vim with result

How to find a windows end of line (EOL) character

How to find dos format files in a linux file system

How can I use xargs to copy files that have spaces and quotes in their names?

Categories

Resources