bash aliases in xargs: "bash -c" does not pass arguments to command - linux

To recognize aliases in "xargs", I have set an alias
alias xargs="xargs bash -ic"
If I now execute the below snippet, no arguments are passed to the command to xargs.
find . -name pom.xml | xargs grep projectid
Infact, no arguments are passed to the command even in this case.
bash -ic grep projectid pom.xml
The documentation for bash says
-c If the -c option is present, then commands are read from the first non-option
argument command_string. If there are arguments after the command_string, they
are assigned to the positional parameters, starting with $0.
So what am I doing wrong?
bash --version
GNU bash, version 4.3.39(2)-release (x86_64-unknown-cygwin)
UPDATE:
Thanks to #knittl for his inputs. A work around solution for now to avoid all the extra punctuations in #knittl's answer.
1. Download xargs_bash_alias.sh
2. Set an alias
alias xargs="<path>/xargs_bash_alias.sh"
Now your xargs commands would recognize your other bash aliases.

There are two things you need to be aware of. First, proper quoting:
find . -name pom.xml -print0 | xargs -0 bash -c "grep projectid"
Second, you need to pass your positional arguments somehow:
find . -name pom.xml -print0 | xargs -0 bash -c 'grep projectid "$#"' -
Use - as first argument to bash, so positional arguments start at $1, just like in a normal shellscript.
"$#" expands to quoted positional arguments starting from 1.
Since xargs passes multiple arguments at once, you either need to use "$#" (quoted!) inside your bash script or run xargs with the -n1 option.

Related

using xargs to pass a variable to alias command

I'm trying to write a one liner that creates an alias 'cd="cd dir_name"' which will change directory to to that dir_name
pwd | xargs -i alias cd{}='cd $PWD'
but I get:
xargs: alias: No such file or directory
is it that alias cannot be played with xargs or am I not using xargs correctly?
alias is a shell builtin. xargs needs an external command to run. Normally, you can run a new shell in xargs to interpret the builtins or keywords:
pwd | xargs -i bash -c 'alias cd{}="cd $PWD"'
but it's useless in this case, as the alias would live only in the shell you run from xargs, not in the current one.
Moreover, alias can't be named /home/user. Maybe you meant
... alias cd='cd {}'
Use pushd and popd to remember the current directory and return to it later.

What is Flow Control in Bash?

In this question the answer was
find . -type f -name \*.mp4 -exec sh -c 'ffprobe "$0" 2>&1 |
grep -q 1920x1080 && echo "$0"' {} \;
which will output all mp4 files that are 1920x1080.
I don't understand why sh -c is there. If I remove it, then it doesn't find anything.
The author says
The new shell is required to handle the flow control inside the exec'd
command.
but I guess I am missing some fundamental knowledge to understand the answer.
Question
Can anyone explain why sh -c have to be there, and why it only works then ffprobe is opened in a new shell?
The -exec option takes a sequence of arguments:
find . -exec arg0 arg1 arg2 ... \;
If you put the arguments in quotes
find . -exec "arg0 arg1 arg2" \;
then "arg0 arg1 arg2" are treated as a single argument. It would expect a command called arg0 arg1 arg2, with the spaces, to exist on your system, instead of a command called arg0 with parameters of arg1 and arg2.
If you were to use find without the sh -c, then you would have this:
find . -type f -name \*.mp4 -exec 'ffprobe "{}" 2>&1 |
grep -q 1920x1080 && echo "{}"' \;
This would mean that find would look for a command called ffprobe "$0" ...., passing no arguments -- there is no such command. There is a command called ffprobe, which takes arguments, and that is what you need. One possibility would be to do something like this:
find . -type f -name \*.mp4 -exec ffprobe '$0' 2>&1 |
grep -q 1920x1080 && echo '{}' \;
However, that doesn't work, since the output redirection 2>&1 and the pipe | and the command sequence operator && would all be treated differently than what you want.
To get around this, they use another shell. This is similar to creating a script to do the work:
find . -type f -name \*.mp4 -exec myscript {} \;
But instead of a separate script, everything is all on one line.
find executes the arguments given to exec as a command directly (that is it invokes the first argument as an application with the following arguments as arguments to that command) without doing any processing on it other than replacing {} with the file name. That is it does not implement any shell features like piping or input redirection. In your case the command contains pipes and input redirections. So you need to run the command through sh, so that sh handles those.
The difference lies in the implementation of find. Find uses flavors of fork() and exec() calls of Linux/Unix (Posix) to spawn a new process and pass the command with arguments to it. The exec() call runs an executable (binary or shebang interpreted program), passing the arguments directly to it as argument strings. No processing of those arguments is done.
Therefore shell conventions are not interpreted. E.g. the $0, 2>&1 and '|' etc. are passed to fprobe as arguments. Not what you want. By adding the sh -c, you are telling find to exec a shell, passing the rest of the line to it for interpretation.
Note that you could get the same effect by putting your command in a shebang script and calling this script from find. Here the exec'ed script would load the shell.

"find" and "ls" with GNU parallel

I'm trying to use GNU parallel to post a lot of files to a web server. In my directory, I have some files:
file1.xml
file2.xml
and I have a shell script that looks like this:
#! /usr/bin/env bash
CMD="curl -X POST -d#$1 http://server/path"
eval $CMD
There's some other stuff in the script, but this was the simplest example. I tried to execute the following command:
ls | parallel -j2 script.sh {}
Which is what the GNU parallel pages show as the "normal" way to operate on files in a directory. This seems to pass the name of the file into my script, but curl complains that it can't load the data file passed in. However, if I do:
find . -name '*.xml' | parallel -j2 script.sh {}
it works fine. Is there a difference between how ls and find are passing arguments to my script? Or do I need to do something additional in that script?
GNU parallel is a variant of xargs. They both have very similar interfaces, and if you're looking for help on parallel, you may have more luck looking up information about xargs.
That being said, the way they both operate is fairly simple. With their default behavior, both programs read input from STDIN, then break the input up into tokens based on whitespace. Each of these tokens is then passed to a provided program as an argument. The default for xargs is to pass as many tokens as possible to the program, and then start a new process when the limit is hit. I'm not sure how the default for parallel works.
Here is an example:
> echo "foo bar \
baz" | xargs echo
foo bar baz
There are some problems with the default behavior, so it is common to see several variations.
The first issue is that because whitespace is used to tokenize, any files with white space in them will cause parallel and xargs to break. One solution is to tokenize around the NULL character instead. find even provides an option to make this easy to do:
> echo "Success!" > bad\ filename
> find . "bad\ filename" -print0 | xargs -0 cat
Success!
The -print0 option tells find to seperate files with the NULL character instead of whitespace.
The -0 option tells xargs to use the NULL character to tokenize each argument.
Note that parallel is a little better than xargs in that its default behavior is the tokenize around only newlines, so there is less of a need to change the default behavior.
Another common issue is that you may want to control how the arguments are passed to xargs or parallel. If you need to have a specific placement of the arguments passed to the program, you can use {} to specify where the argument is to be placed.
> mkdir new_dir
> find -name *.xml | xargs mv {} new_dir
This will move all files in the current directory and subdirectories into the new_dir directory. It actually breaks down into the following:
> find -name *.xml | xargs echo mv {} new_dir
> mv foo.xml new_dir
> mv bar.xml new_dir
> mv baz.xml new_dir
So taking into consideration how xargs and parallel work, you should hopefully be able to see the issue with your command. find . -name '*.xml' will generate a list of xml files to be passed to the script.sh program.
> find . -name '*.xml' | parallel -j2 echo script.sh {}
> script.sh foo.xml
> script.sh bar.xml
> script.sh baz.xml
However, ls | parallel -j2 script.sh {} will generate a list of ALL files in the current directory to be passed to the script.sh program.
> ls | parallel -j2 echo script.sh {}
> script.sh some_directory
> script.sh some_file
> script.sh foo.xml
> ...
A more correct variant on the ls version would be as follows:
> ls *.xml | parallel -j2 script.sh {}
However, and important difference between this and the find version is that find will search through all subdirectories for files, while ls will only search the current directory. The equivalent find version of the above ls command would be as follows:
> find -maxdepth 1 -name '*.xml'
This will only search the current directory.
Since it works with find you probably want to see what command GNU Parallel is running (using -v or --dryrun) and then try to run the failing commands manually.
ls *.xml | parallel --dryrun -j2 script.sh
find -maxdepth 1 -name '*.xml' | parallel --dryrun -j2 script.sh
I have not used parallel but there is a different between ls & find . -name '*.xml'. ls will list all the files and directories where as find . -name '*.xml' will list only the files (and directories) which end with a .xml.
As suggested by Paul Rubel, just print the value of $1 in your script to check this. Additionally you may want to consider filtering the input to files only in find with the -type f option.
Hope this helps!
Neat.
I had never used parallel before. It appears, though that there are two of them.
One is the Gnu Parrallel, and the one that was installed on my system has Tollef Fog Heen
listed as the author in the man pages.
As Paul mentioned, you should use
set -x
Also, the paradigm that you mentioned above doesn't seem to work on my parallel, rather, I have
to do the following:
$ cat ../script.sh
+ cat ../script.sh
#!/bin/bash
echo $#
$ parallel -ij2 ../script.sh {} -- $(find -name '*.xml')
++ find -name '*.xml'
+ parallel -ij2 ../script.sh '{}' -- ./b.xml ./c.xml ./a.xml ./d.xml ./e.xml
./c.xml
./b.xml
./d.xml
./a.xml
./e.xml
$ parallel -ij2 ../script.sh {} -- $(ls *.xml)
++ ls --color=auto a.xml b.xml c.xml d.xml e.xml
+ parallel -ij2 ../script.sh '{}' -- a.xml b.xml c.xml d.xml e.xml
b.xml
a.xml
d.xml
c.xml
e.xml
find does provide a different input, It prepends the relative path to the name.
Maybe that is what is messing up your script?

Extract arguments from stdout and pipe

I was trying to execute a script n times with a different file as argument each time using this:
ls /user/local/*.log | xargs script.pl
(script.pl accepts file name as argument)
But the script is executed only once. How to resolve this ? Am I not doing it correctly ?
ls /user/local/*.log | xargs -rn1 script.pl
I guess your script only expects one parameter, you need to tell xargs about that.
Passing -r helps if the input list would be empty
Note that something like the following is, in general, better:
find /user/local/ -maxdepth 1 -type f -name '*.log' -print0 |
xargs -0rn1 script.pl
It will handle quoting and directories safer.
To see what xargs actually executes use the -t flag.
I hope this helps:
for i in $(ls /usr/local/*.log)
do
script.pl $i
done
I would avoid using ls as this is often an alias on many systems. E.g. if your ls produces colored output, then this will not work:
for i in ls; do ls $i;done
Rather, you can just let the shell expand the wildcard for you:
for i in /usr/local/*.log; do script.pl $i; done

How can I use aliased commands with xargs?

I have the following alias in my .aliases:
alias gi grep -i
and I want to look for foo case-insensitively in all the files that have the string bar in their name:
find -name \*bar\* | xargs gi foo
This is what I get:
xargs: gi: No such file or directory
Is there any way to use aliases in xargs, or do I have to use the full version:
find -name \*bar\* | xargs grep -i foo
Note: This is a simple example. Besides gi I have some pretty complicated aliases that I can't expand manually so easily.
Edit: I used tcsh, so please specify if an answer is shell-specific.
Aliases are shell-specific - in this case, most likely bash-specific. To execute an alias, you need to execute bash, but aliases are only loaded for interactive shells (more precisely, .bashrc will only be read for an interactive shell).
bash -i runs an interactive shell (and sources .bashrc).
bash -c cmd runs cmd.
Put them together:
bash -ic cmd runs cmd in an interactive shell, where cmd can be a bash function/alias defined in your .bashrc.
find -name \*bar\* | xargs bash -ic gi foo
should do what you want.
Edit: I see you've tagged the question as "tcsh", so the bash-specific solution is not applicable. With tcsh, you dont need the -i, as it appears to read .tcshrc unless you give -f.
Try this:
find -name \*bar\* | xargs tcsh -c gi foo
It worked for my basic testing.
This solution worked perfect for me in bash:
https://unix.stackexchange.com/a/244516/365245
Problem
[~]: alias grep='grep -i'
[~]: find -maxdepth 1 -name ".bashrc" | xargs grep name # grep alias not expanded
[~]: ### no matches found ###
Solution
[~]: alias xargs='xargs ' # create an xargs alias with trailing space
[~]: find -maxdepth 1 -name ".bashrc" | xargs grep name # grep alias gets expanded
# Name : .bashrc
Why it works
[~]: man alias
alias: alias [-p] [name[=value] ... ]
(snip)
A trailing space in VALUE causes the next word to be checked for
alias substitution when the alias is expanded.
Turn "gi" into a script instead
eg, in /home/$USER/bin/gi:
#!/bin/sh
exec /bin/grep -i "$#"
don't forget to mark the file executable.
The suggestion here is to avoid xargs and use a "while read" loop instead of xargs:
find -name \*bar\* | while read file; do gi foo "$file"; done
See the accepted answer in the link above for refinements to deal with spaces or newlines in filenames.
This is special-character safe:
find . -print0 | xargs -0 bash -ic 'gi foo "$#"' --
The -print0 and -0 use \0 or NUL-terminated strings so you don't get weird things happening when filenames have spaces in them.
bash sets the first argument after the command string as $0, so we pass it a dummy argument (--) so that the first file listed by find doesn't get consumed by $0.
For tcsh (which does not have functions), you could use:
gi foo `find -name "*bar*"`
For bash/ksh/sh, you can create a function in the shell.
function foobar
{
gi $1 `find . -type f -name "*"$2"*"`
}
foobar foo bar
Remember that using backquotes in the shell is more advantageous than using xargs from multiple perspectives. Place the function in your .bashrc.
Using Bash you may also specify the number of args being passed to your alias (or function) like so:
alias myFuncOrAlias='echo' # alias defined in your ~/.bashrc, ~/.profile, ...
echo arg1 arg2 | xargs -n 1 bash -cil 'myFuncOrAlias "$1"' arg0
(should work for tcsh in a similar way)
# alias definition in ~/.tcshrc
echo arg1 arg2 | xargs -n 1 tcsh -cim 'myFuncOrAlias "$1"' arg0 # untested
The simplest solution in you case would be to expand your alias inline. But that is valid for csh/tcsh only.
find -name \*bar\* | xargs `alias gi` foo
for bash it will be more tricky, not so handy but still might be useful:
find -name \*bar\* | xargs `alias gi | cut -d "'" -f2` foo
After trying many solutions with xargs that didn't work for me, went for an alternative with a loop, see examples below:
for file in $(git ls-files *.txt); do win2unix $file; done
for file in $(find . -name *.txt); do win2unix $file; done
Put your expression that generates a list of files inside $() as in the examples above. I've used win2unix which is a function in my .bashrc that takes a file path and converts it to Linux endings. Would expect aliases to also work.
Note that I did not have spaces in my paths or filenames.

Resources