Extract arguments from stdout and pipe - linux

I was trying to execute a script n times with a different file as argument each time using this:
ls /user/local/*.log | xargs script.pl
(script.pl accepts file name as argument)
But the script is executed only once. How to resolve this ? Am I not doing it correctly ?

ls /user/local/*.log | xargs -rn1 script.pl
I guess your script only expects one parameter, you need to tell xargs about that.
Passing -r helps if the input list would be empty
Note that something like the following is, in general, better:
find /user/local/ -maxdepth 1 -type f -name '*.log' -print0 |
xargs -0rn1 script.pl
It will handle quoting and directories safer.
To see what xargs actually executes use the -t flag.

I hope this helps:
for i in $(ls /usr/local/*.log)
do
script.pl $i
done

I would avoid using ls as this is often an alias on many systems. E.g. if your ls produces colored output, then this will not work:
for i in ls; do ls $i;done
Rather, you can just let the shell expand the wildcard for you:
for i in /usr/local/*.log; do script.pl $i; done

Related

Is it possible to pipe the results of FIND to a COPY command CP?

Is it possible to pipe the results of find to a COPY command cp?
Like this:
find . -iname "*.SomeExt" | cp Destination Directory
Seeking, I always find this kind of formula such as from this post:
find . -name "*.pdf" -type f -exec cp {} ./pdfsfolder \;
This raises some questions:
Why cant you just use | pipe? isn't that what its for?
Why does everyone recommend the -exec
How do I know when to use that (exec) over pipe |?
There's a little-used option for cp: -t destination -- see the man page:
find . -iname "*.SomeExt" | xargs cp -t Directory
Good question!
why cant you just use | pipe? isn't that what its for?
You can pipe, of course, xargs is done for these cases:
find . -iname "*.SomeExt" | xargs cp Destination_Directory/
Why does everyone recommend the -exec
The -exec is good because it provides more control of exactly what you are executing. Whenever you pipe there may be problems with corner cases: file names containing spaces or new lines, etc.
how do I know when to use that (exec) over pipe | ?
It is really up to you and there can be many cases. I would use -exec whenever the action to perform is simple. I am not a very good friend of xargs, I tend to prefer an approach in which the find output is provided to a while loop, such as:
while IFS= read -r result
do
# do things with "$result"
done < <(find ...)
You can use | like below:
find . -iname "*.SomeExt" | while read line
do
cp $line DestDir/
done
Answering your questions:
| can be used to solve this issue. But as seen above, it involves a lot of code. Moreover, | will create two process - one for find and another for cp.
Instead using exec() inside find will solve the problem in a single process.
Try this:
find . -iname "*.SomeExt" -print0 | xargs -0 cp -t Directory
# ........................^^^^^^^..........^^
In case there is whitespace in filenames.
I like the spirit of the response from #fedorqui-so-stop-harming, but it needed a tweak to work in my bash terminal.
In this version...
find . -iname "*.SomeExt" | xargs cp Destination_Directory/
The cp command incorrectly takes Destination_Directory/ as the first argument. I needed to add a replacement string in order to get xargs to insert the argument in the right position for cp. I used a percent symbol for the replacement string, but you can use anything that doesn't conflict with the input from the pipe. This version works for me.
find . -iname "*.SomeExt" | xargs -I % cp % Destination_Directory/
This SOLVED my problem.
find . -type f | grep '\.pdf' | while read line
do
cp $line REPLACE_WITH_TARGET_DIRECTORY
done
If there are spaces in the filenames, try:
find . -iname *.ext > list.txt
cat list.txt | awk 'BEGIN {a="'"'"'"}{print "cp "a$0a" Directory"}' > script.sh
sh script.sh
You can inspect list.txt and script.sh before sh script.sh. Remember to delete the list.txt and script.sh afterwards.
I had some files with parenthesis and wanted a progress bar, so replaced the cat line with:
cat list.txt | awk -v X='"' '{print "rsync -Pa "X$0X" /Volumes/Untitled/"}' > script.sh

$(find -X) equivalent on linux

I'm trying to use the bash find command to create an array of elements and search elements inside it with a for loop.
I would need to do something like this:
for file in $(find -? dirname) ; do
echo element contains $file
done
I know that you can do find -X on mac (which parses spaces and \n as xargs does), but is there any way to do so on linux?
Thank you in advance for your reply
The OSX find manpage says of -X:
However, you may wish to consider the -print0 primary in conjunction
with ``xargs -0'' as an effective alternative.
So you could take that advice:
find dirname -print0 | xargs -0 grep foo # or whatever it is you wanted to do
Alternatively, find can execute a command for each found file itself:
find dirname -exec echo found {} \;
Note the escaped ; to terminate -- it's just something you have to suck up about find -exec.
Or for xargs-like chunking:
find dirname -exec grep foo {} +

Remove files not containing a specific string

I want to find the files not containing a specific string (in a directory and its sub-directories) and remove those files. How I can do this?
The following will work:
find . -type f -print0 | xargs --null grep -Z -L 'my string' | xargs --null rm
This will firstly use find to print the names of all the files in the current directory and any subdirectories. These names are printed with a null terminator rather than the usual newline separator (try piping the output to od -c to see the effect of the -print0 argument.
Then the --null parameter to xargs tells it to accept null-terminated inputs. xargs will then call grep on a list of filenames.
The -Z argument to grep works like the -print0 argument to find, so grep will print out its results null-terminated (which is why the final call to xargs needs a --null option too). The -L argument to grep causes grep to print the filenames of those files on its command line (that xargs has added) which don't match the regular expression:
my string
If you want simple matching without regular expression magic then add the -F option. If you want more powerful regular expressions then give a -E argument. It's a good habit to use single quotes rather than double quotes as this protects you against any shell magic being applied to the string (such as variable substitution)
Finally you call xargs again to get rid of all the files that you've found with the previous calls.
The problem with calling grep directly from the find command with the -exec argument is that grep then gets invoked once per file rather than once for a whole batch of files as xargs does. This is much faster if you have lots of files. Also don't be tempted to do stuff like:
rm $(some command that produces lots of filenames)
It's always better to pass it to xargs as this knows the maximum command-line limits and will call rm multiple times each time with as many arguments as it can.
Note that this solution would have been simpler without the need to cope with files containing white space and new lines.
Alternatively
grep -r -L -Z 'my string' . | xargs --null rm
will work too (and is shorter). The -r argument to grep causes it to read all files in the directory and recursively descend into any subdirectories). Use the find ... approach if you want to do some other tests on the files as well (such as age or permissions).
Note that any of the single letter arguments, with a single dash introducer, can be grouped together (for instance as -rLZ). But note also that find does not use the same conventions and has multi-letter arguments introduced with a single dash. This is for historical reasons and hasn't ever been fixed because it would have broken too many scripts.
GNU grep and bash.
grep -rLZ "$str" . | while IFS= read -rd '' x; do rm "$x"; done
Use a find solution if portability is needed. This is slightly faster.
EDIT: This is how you SHOULD NOT do this! Reason is given here. Thanks to #ormaaj for pointing it out!
find . -type f | grep -v "exclude string" | xargs rm
Note: grep pattern will match against full file path from current directory (see find . -type f output)
One possibility is
find . -type f '!' -exec grep -q "my string" {} \; -exec echo rm {} \;
You can remove the echo if the output of this preview looks correct.
The equivalent with -delete is
find . -type f '!' -exec grep -q "user_id" {} \; -delete
but then you don't get the nice preview option.
To remove files not containing a specific string:
Bash:
To use them, enable the extglob shell option as follows:
shopt -s extglob
And just remove all files that don't have the string "fix":
rm !(*fix*)
If you want to don't delete all the files that don't have the names "fix" and "class":
rm !(*fix*|*class*)
Zsh:
To use them, enable the extended glob zsh shell option as follows:
setopt extended_glob
Remove all files that don't have the string, in this example "fix":
rm -- ^*fix*
If you want to don't delete all the files that don't have the names "fix" and "class":
rm -- ^(*fix*|*class*)
It's possible to use it for extensions, you only need to change the regex: (.zip) , (.doc), etc.
Here are the sources:
https://www.tecmint.com/delete-all-files-in-directory-except-one-few-file-extensions/
https://codeday.me/es/qa/20190819/1296122.html
I can think of a few ways to approach this. Here's one: find and grep to generate a list of files with no match, and then xargs rm them.
find yourdir -type f -exec grep -F -L 'yourstring' '{}' + | xargs -d '\n' rm
This assumes GNU tools (grep -L and xargs -d are non-portable) and of course no filenames with newlines in them. It has the advantage of not running grep and rm once per file, so it'll be reasonably fast. I recommend testing it with "echo" in place of "rm" just to make sure it picks the right files before you unleash the destruction.
This worked for me, you can remove the -f if you're okay with deleting directories.
myString="keepThis"
for x in `find ./`
do if [[ -f $x && ! $x =~ $myString ]]
then rm $x
fi
done
Another solution (although not as fast). The top solution didn't work in my case because the string I needed to use in place of 'my string' has special characters.
find -type f ! -name "*my string*" -exec rm {} \; -print

"find" and "ls" with GNU parallel

I'm trying to use GNU parallel to post a lot of files to a web server. In my directory, I have some files:
file1.xml
file2.xml
and I have a shell script that looks like this:
#! /usr/bin/env bash
CMD="curl -X POST -d#$1 http://server/path"
eval $CMD
There's some other stuff in the script, but this was the simplest example. I tried to execute the following command:
ls | parallel -j2 script.sh {}
Which is what the GNU parallel pages show as the "normal" way to operate on files in a directory. This seems to pass the name of the file into my script, but curl complains that it can't load the data file passed in. However, if I do:
find . -name '*.xml' | parallel -j2 script.sh {}
it works fine. Is there a difference between how ls and find are passing arguments to my script? Or do I need to do something additional in that script?
GNU parallel is a variant of xargs. They both have very similar interfaces, and if you're looking for help on parallel, you may have more luck looking up information about xargs.
That being said, the way they both operate is fairly simple. With their default behavior, both programs read input from STDIN, then break the input up into tokens based on whitespace. Each of these tokens is then passed to a provided program as an argument. The default for xargs is to pass as many tokens as possible to the program, and then start a new process when the limit is hit. I'm not sure how the default for parallel works.
Here is an example:
> echo "foo bar \
baz" | xargs echo
foo bar baz
There are some problems with the default behavior, so it is common to see several variations.
The first issue is that because whitespace is used to tokenize, any files with white space in them will cause parallel and xargs to break. One solution is to tokenize around the NULL character instead. find even provides an option to make this easy to do:
> echo "Success!" > bad\ filename
> find . "bad\ filename" -print0 | xargs -0 cat
Success!
The -print0 option tells find to seperate files with the NULL character instead of whitespace.
The -0 option tells xargs to use the NULL character to tokenize each argument.
Note that parallel is a little better than xargs in that its default behavior is the tokenize around only newlines, so there is less of a need to change the default behavior.
Another common issue is that you may want to control how the arguments are passed to xargs or parallel. If you need to have a specific placement of the arguments passed to the program, you can use {} to specify where the argument is to be placed.
> mkdir new_dir
> find -name *.xml | xargs mv {} new_dir
This will move all files in the current directory and subdirectories into the new_dir directory. It actually breaks down into the following:
> find -name *.xml | xargs echo mv {} new_dir
> mv foo.xml new_dir
> mv bar.xml new_dir
> mv baz.xml new_dir
So taking into consideration how xargs and parallel work, you should hopefully be able to see the issue with your command. find . -name '*.xml' will generate a list of xml files to be passed to the script.sh program.
> find . -name '*.xml' | parallel -j2 echo script.sh {}
> script.sh foo.xml
> script.sh bar.xml
> script.sh baz.xml
However, ls | parallel -j2 script.sh {} will generate a list of ALL files in the current directory to be passed to the script.sh program.
> ls | parallel -j2 echo script.sh {}
> script.sh some_directory
> script.sh some_file
> script.sh foo.xml
> ...
A more correct variant on the ls version would be as follows:
> ls *.xml | parallel -j2 script.sh {}
However, and important difference between this and the find version is that find will search through all subdirectories for files, while ls will only search the current directory. The equivalent find version of the above ls command would be as follows:
> find -maxdepth 1 -name '*.xml'
This will only search the current directory.
Since it works with find you probably want to see what command GNU Parallel is running (using -v or --dryrun) and then try to run the failing commands manually.
ls *.xml | parallel --dryrun -j2 script.sh
find -maxdepth 1 -name '*.xml' | parallel --dryrun -j2 script.sh
I have not used parallel but there is a different between ls & find . -name '*.xml'. ls will list all the files and directories where as find . -name '*.xml' will list only the files (and directories) which end with a .xml.
As suggested by Paul Rubel, just print the value of $1 in your script to check this. Additionally you may want to consider filtering the input to files only in find with the -type f option.
Hope this helps!
Neat.
I had never used parallel before. It appears, though that there are two of them.
One is the Gnu Parrallel, and the one that was installed on my system has Tollef Fog Heen
listed as the author in the man pages.
As Paul mentioned, you should use
set -x
Also, the paradigm that you mentioned above doesn't seem to work on my parallel, rather, I have
to do the following:
$ cat ../script.sh
+ cat ../script.sh
#!/bin/bash
echo $#
$ parallel -ij2 ../script.sh {} -- $(find -name '*.xml')
++ find -name '*.xml'
+ parallel -ij2 ../script.sh '{}' -- ./b.xml ./c.xml ./a.xml ./d.xml ./e.xml
./c.xml
./b.xml
./d.xml
./a.xml
./e.xml
$ parallel -ij2 ../script.sh {} -- $(ls *.xml)
++ ls --color=auto a.xml b.xml c.xml d.xml e.xml
+ parallel -ij2 ../script.sh '{}' -- a.xml b.xml c.xml d.xml e.xml
b.xml
a.xml
d.xml
c.xml
e.xml
find does provide a different input, It prepends the relative path to the name.
Maybe that is what is messing up your script?

How can I use aliased commands with xargs?

I have the following alias in my .aliases:
alias gi grep -i
and I want to look for foo case-insensitively in all the files that have the string bar in their name:
find -name \*bar\* | xargs gi foo
This is what I get:
xargs: gi: No such file or directory
Is there any way to use aliases in xargs, or do I have to use the full version:
find -name \*bar\* | xargs grep -i foo
Note: This is a simple example. Besides gi I have some pretty complicated aliases that I can't expand manually so easily.
Edit: I used tcsh, so please specify if an answer is shell-specific.
Aliases are shell-specific - in this case, most likely bash-specific. To execute an alias, you need to execute bash, but aliases are only loaded for interactive shells (more precisely, .bashrc will only be read for an interactive shell).
bash -i runs an interactive shell (and sources .bashrc).
bash -c cmd runs cmd.
Put them together:
bash -ic cmd runs cmd in an interactive shell, where cmd can be a bash function/alias defined in your .bashrc.
find -name \*bar\* | xargs bash -ic gi foo
should do what you want.
Edit: I see you've tagged the question as "tcsh", so the bash-specific solution is not applicable. With tcsh, you dont need the -i, as it appears to read .tcshrc unless you give -f.
Try this:
find -name \*bar\* | xargs tcsh -c gi foo
It worked for my basic testing.
This solution worked perfect for me in bash:
https://unix.stackexchange.com/a/244516/365245
Problem
[~]: alias grep='grep -i'
[~]: find -maxdepth 1 -name ".bashrc" | xargs grep name # grep alias not expanded
[~]: ### no matches found ###
Solution
[~]: alias xargs='xargs ' # create an xargs alias with trailing space
[~]: find -maxdepth 1 -name ".bashrc" | xargs grep name # grep alias gets expanded
# Name : .bashrc
Why it works
[~]: man alias
alias: alias [-p] [name[=value] ... ]
(snip)
A trailing space in VALUE causes the next word to be checked for
alias substitution when the alias is expanded.
Turn "gi" into a script instead
eg, in /home/$USER/bin/gi:
#!/bin/sh
exec /bin/grep -i "$#"
don't forget to mark the file executable.
The suggestion here is to avoid xargs and use a "while read" loop instead of xargs:
find -name \*bar\* | while read file; do gi foo "$file"; done
See the accepted answer in the link above for refinements to deal with spaces or newlines in filenames.
This is special-character safe:
find . -print0 | xargs -0 bash -ic 'gi foo "$#"' --
The -print0 and -0 use \0 or NUL-terminated strings so you don't get weird things happening when filenames have spaces in them.
bash sets the first argument after the command string as $0, so we pass it a dummy argument (--) so that the first file listed by find doesn't get consumed by $0.
For tcsh (which does not have functions), you could use:
gi foo `find -name "*bar*"`
For bash/ksh/sh, you can create a function in the shell.
function foobar
{
gi $1 `find . -type f -name "*"$2"*"`
}
foobar foo bar
Remember that using backquotes in the shell is more advantageous than using xargs from multiple perspectives. Place the function in your .bashrc.
Using Bash you may also specify the number of args being passed to your alias (or function) like so:
alias myFuncOrAlias='echo' # alias defined in your ~/.bashrc, ~/.profile, ...
echo arg1 arg2 | xargs -n 1 bash -cil 'myFuncOrAlias "$1"' arg0
(should work for tcsh in a similar way)
# alias definition in ~/.tcshrc
echo arg1 arg2 | xargs -n 1 tcsh -cim 'myFuncOrAlias "$1"' arg0 # untested
The simplest solution in you case would be to expand your alias inline. But that is valid for csh/tcsh only.
find -name \*bar\* | xargs `alias gi` foo
for bash it will be more tricky, not so handy but still might be useful:
find -name \*bar\* | xargs `alias gi | cut -d "'" -f2` foo
After trying many solutions with xargs that didn't work for me, went for an alternative with a loop, see examples below:
for file in $(git ls-files *.txt); do win2unix $file; done
for file in $(find . -name *.txt); do win2unix $file; done
Put your expression that generates a list of files inside $() as in the examples above. I've used win2unix which is a function in my .bashrc that takes a file path and converts it to Linux endings. Would expect aliases to also work.
Note that I did not have spaces in my paths or filenames.

Resources