I'm using xargs, but the argument list is too long - linux

I'm using Linux. I have a directory tree with over 100,000 files that originated on a MS Windows system. Some of the files have spaces in their names. I want to convert those files to unix. I ran this command
find . -type f | xargs -0 dos2unix
And received this error message
xargs: argument line too long
How can I fix this?

If you want to use xargs with -0 to prevent issues with spaces/special characters in file names you must also use -print0 with find so it will delimit its output with null bytes:
find . -type f -print0 | xargs -0 dos2unix

You don't need xargs here, you can do
find . -type f -exec dos2unix '{}' +

Related

cp: invalid option -- 'D'

My goal is to find all .pdf files from multiple subfolder structures and then move them to another folder.
For this I have assembled the following.
find /mnt/user/Data/01_Persönliche_Dokumente/01_Firmen -iname \*.pdf -type f | xargs cp -t /mnt/user/Data/01_Persönliche_Dokumente/Paperless_input/
But as an error you get the following:
root#Tower:/mnt/user/Data/01_Persönliche_Dokumente/01_Firmen# find "/mnt/user/Data/01_Persönliche_Dokumente/01_Firmen" -iname \*.pdf -type f | xargs cp -t "/mnt/user/Data/01_Persönliche_Dokumente/Paperless_input"
cp: invalid option -- 'D'
Try 'cp --help' for more information.
I try diffrent options and get some help in the Unraid Discord.
I got a hint from a Friend of mine.
The correct comand looks like this:
find "/mnt/user/Data/01_Persönliche_Dokumente/01_Firmen" -iname \*.pdf -type f -print0 | xargs -0 cp -t "/mnt/user/Data/01_Persönliche_Dokumente/Paperless_input"
for the find command, I added -print0 which means:
print the full file name on the standard output, followed by a null
character (instead of the newline character that -print uses). This
allows file names that contain newlines or other types of white space
to be correctly interpreted by programs that process the find output.
for the xargs command I added 0 which means:
-0 : input items are terminated by null character instead of white spaces.
basically removing any characters that would screw with the result of find

Linux Shell Command: Find. How to Sort and Exec without using Pipes?

Linux command find with argument exec does a GREAT job executing commands on files/folders regardless whether they contain spaces and special characters. For example:
find . -type f -exec md5sum {} \;
Works great to run md5sum on each file in a directory tree, but executes in a random order. Find does not sort the results, and requires piping to sort to get results in a more human-readable ordering. However, piping to sort eliminates the benefits of exec.
This does not work:
find . -type f | sort | md5sum
Because some filenames contain spaces and special characters.
Also does not work:
find . -type f | sort | sed 's/ /\\ /g' | md5sum
Still does not recognize spaces are part of the filename.
I suppose I can always sort the final result later, but wonder if someone knows an easy way to avoid that extra step by sorting within find?
With BSD find
A -s argument is available to request lexographic sort order.
find . -s -type f -exec md5sum -- '{}' +
With GNU find
Use NUL delimiters to allow filenames to be processed unambiguously. Assuming you have GNU tools:
find . -type f -print0 | sort -z | xargs -0 md5sum
Found a working solution
find . -type f -exec md5sum {} + | sort -k 1.33
Sorts the results by comparing the characters starting after the 32 character md5sum result, producing a readable/sorted list.

search a string in a file with case insensitive file name

I want to grep for a string in all the files which have a particular patter in their name and is case-insensitive.
For eg if I have two files ABC.txt and aBc.txt, then I want something like
grep -i 'test' *ABC*
The above command should look in both the files.
You can use find and then grep on the results of that:
find . -iname "*ABC*" -exec grep -i "test" {} \;
Note that this will run grep once on each file found. If you want to run grep once on all the files (in which case you risk running into the command line length limit), you can use a plus at the end:
find . -iname "*ABC*" -exec grep -i "test" {} \+
You can also use xargs to process a really large number of results more efficiently:
find . -iname "*ABC*" -print0 | xargs -0 grep -i test
The -print0 makes find output 0-terminated results, and the -0 makes xargs able to deal with this format, which means you don't need to worry about any special characters in the filenames. However, it is not totally portable, since it's a GNU extension.
If you don't have a find that supports -print0 (for example SVR4), you can still use -exec as above or just
find . -iname "*ABC*" | xargs grep -i test
But you should be sure your filenames don't have newlines in them, otherwise xargs will treat each line of a filename as a new argument.
You should use find to match file and search string that you want with command grep which support regular expression, for your question, you should input command like below:
find . -name "*ABC*" -exec grep \<test\> {} \;

How to find total size of all files under the ownership of a user?

I'm trying to find out the total size of all files owned by a given user.
I've tried this:
find $myfolder -user $myuser -type f -exec du -ch {} +
But this gives me an error:
missing argument to exec
and I don't know how to fix it. Can somebody can help me with this?
You just need to terminate the -exec. If you want the totals for each directory
possibly -type d is required.
find $myfolder -user $myuser -type d -exec du -ch {} \;
Use:
find $myfolder -user gisi -type f -print0 | xargs -0 du -sh
where user gisi is my cat ;)
Note the option -s for summarize
Further note that I'm using find ... -print0 which on the one hand separates filenames by 0 bytes, which are one of the few characters which are not allowed in filenames, and on the other hand xargs -0 which uses the 0 byte as the delimiter. This makes sure that even exotic filenames won't be a problem.
some version of find command does not like "+" for termination of find command
use "\;" instead of "+"

Linux: Redirecting output of a command to "find"

I have a list of file names as output of certain command.
I need to find each of these files in a given directory.
I tried following command:
ls -R /home/ABC/testDir/ | grep "\.java" | xargs find /home/ABC/someAnotherDir -iname
But it is giving me following error:
find: paths must precede expression: XYZ.java
What would be the right way to do it?
ls -R /home/ABC/testDir/ | grep -F .java |
while read f; do find . -iname "$(basename $f)"; done
You can also use ${f##*/} instead of basename. Or;
find /home/ABC/testDir -iname '*.java*' |
while read f; do find . -iname "${f##*/}"; done
Note that, undoubtedly, many people will object to parsing the output of ls or find without using a null byte as filename separater, claiming that whitespace in filenames will cause problems. Those people usually ignore newlines in filenames, and their objections can be safely ignored. (As long as you don't allow whitespace in your filenames, that is!)
A better option is:
find /home/ABC/testDir -iname '*.java' -exec find . -iname {}
The reason xargs doesn't work is that is that you cannot pass 2 arguments to -iname within find.
find /home/ABC/testDir -name "\.java"

Resources