Does bash -c or zsh -c have a limit on a string it executes? - linux

It appears that there is a 240 character limit on the expanded string. This quick test works for short file names, but does not work for longer names.
ls | xargs -I {} zsh -c "echo '---------------------------------------------------------------------------------------------------------------------------------------------------------{}'; echo '==============================================================================={}'"
Is there a way to expand this limit on Mac and/or Linux?

No, bash and zsh have no such limit.
Instead, here's man xargs (emphasis mine):
-I replstr
Execute utility for each input line, replacing one or more occurrences of replstr in up to replacements (or 5 if no -R flag is specified) arguments to utility with the entire line of input. The resulting arguments, after replacement is done, will not be allowed to grow beyond 255 bytes; this is implemented by concatenating as much of the argument containing replstr as possible, to the constructed arguments to utility, up to 255 bytes. The 255 byte limit does not apply to arguments to utility which do not contain replstr, and furthermore, no replacement will be done on utility itself. Implies -x.
The source code is more direct:
Replaces str with a string consisting of str with match replaced with replstr as many times as can be done before the constructed string is maxsize bytes large.
So if the string is already 255+ characters long, the number of times it can replace the string is zero.
This is not a problem in practice since you would never use the replstr in the argument to *sh -c due to the security and robustness issues it causes.
Instead, pass the arguments separately and reference them from the shell command:
find . -print0 | xargs -0 sh -c 'for arg; do echo "Received: $arg"; done' _

This depends on the operating system, not on the shell. You can find this limit on Linux-like systems by
getconf ARG_MAX
On my platform, this is 32000.
Actually, this is not just the limit for a single command argument, but for the whole command line.

Related

Improve performance of Bash loop that removes windows line endings

Editor's note: This question was always about loop performance, but the original title led some answerers - and voters - to believe it was about how to remove Windows line endings.
The below bash loop below just remove the windows line endings and converts them to unix and appears to be running, but it is slow. The input files are small (4 files ranging from 167 bytes - 1 kb), and are all the same structure (list of names) and the only thing that varies is the length (ie. some files are 10 names others are 50). Is it supposed to take over 15 minutes to complete this task using a xeon processor? Thank you :)
for f in /home/cmccabe/Desktop/files/*.txt ; do
bname=`basename $f`
pref=${bname%%.txt}
sed 's/\r//' $f - $f > /home/cmccabe/Desktop/files/${pref}_unix.txt
done
Input .txt files
AP3B1
BRCA2
BRIP1
CBL
CTC1
EDIT
This is not a duplicate as I was more asking for why my bash loop that uses sed to remove windows line endings was running so slow. I did not mean to imply how to remove them, was asking for ideas that might speed up the loop and I got many. Thank you :). I hope this helps.
Use the utilities dos2unix and unix2dos to convert between unix and windows style line endings.
Your 'sed' command looks wrong. I believe the trailing $f - $f should simply be $f. Running your script as written hangs for a very long time on my system, but making this change causes it to complete almost instantly.
Of course, the best answer is to use dos2unix, which was designed to handle this exact thing:
cd /home/cmccabe/Desktop/files
for f in *.txt ; do
pref=$(basename -s '.txt' "$f")
dos2unix -q -n "$f" "${pref}_unix.txt"
done
This always works for me:
perl -pe 's/\r\n/\n/' inputfile.txt > outputfile.txt
you can use dos2unix as stated before or use this small sed:
sed 's/\r//' file
The key to performance in Bash is to avoid loops in general, and in particular those that call one or more external utilities in each iteration.
Here is a solution that uses a single GNU awk command:
awk -v RS='\r\n' '
BEGINFILE { outFile=gensub("\\.txt$", "_unix&", 1, FILENAME) }
{ print > outFile }
' /home/cmccabe/Desktop/files/*.txt
-v RS='\r\n' sets CRLF as the input record separator, and by virtue of leaving ORS, the output record separator at its default, \n, simply printing each input line will terminate it with \n.
the BEGINFILE block is executed every time processing of a new input file starts; in it, gensub() is used to insert _unix before the .txt suffix of the input file at hand to form the output filename.
{print > outFile} simply prints the \n-terminated lines to the output file at hand.
Note that use of a multi-char. RS value, the BEGINFILE block, and the gensub() function are GNU extensions to the POSIX standard.
Switching from the OP's sed solution to a GNU awk-based one was necessary in order to provide a single-command solution that is both simpler and faster.
Alternatively, here's a solution that relies on dos2unix for conversion of Window line-endings (for instance, you can install dos2unix with sudo apt-get install dos2unix on Debian-based systems); except for requiring dos2unix, it should work on most platforms (no GNU utilities required):
It uses a loop only to construct the array of filename arguments to pass to dos2unix - this should be fast, given that no call to basename is involved; Bash-native parameter expansion is used instead.
then uses a single invocation of dos2unix to process all files.
# cd to the target folder, so that the operations below do not need to handle
# path components.
cd '/home/cmccabe/Desktop/files'
# Collect all *.txt filenames in an array.
inFiles=( *.txt )
# Derive output filenames from it, using Bash parameter expansion:
# '%.txt' matches '.txt' at the end of each array element, and replaces it
# with '_unix.txt', effectively inserting '_unix' before the suffix.
outFiles=( "${inFiles[#]/%.txt/_unix.txt}" )
# Create an interleaved array of *input-output filename pairs* to be passed
# to dos2unix later.
# To inspect the resulting array, run `printf '%s\n' "${fileArgs[#]}"`
# You'll see pairs like these:
# file1.txt
# file1_unix.txt
# ...
fileArgs=(); i=0
for inFile in "${inFiles[#]}"; do
fileArgs+=( "$inFile" "${outFiles[i++]}" )
done
# Now, use a *single* invocation of dos2unix, passing all input-output
# filename pairs at once.
dos2unix -q -n "${fileArgs[#]}"

find command works on prompt, not in bash script - pass multiple arguments by variable

I've searched around questions with similar issues but haven't found one that quite fits my situation.
Below is a very brief script that demonstrates the problem I'm facing:
#!/bin/bash
includeString="-wholename './public_html/*' -o -wholename './config/*'"
find . \( $includeString \) -type f -mtime -7 -print
Basically, we need to search inside a folder, but only in certain of its subfolders. In my longer script, includeString gets built from an array. For this demo, I kept things simple.
Basically, when I run the script, it doesn't find anything. No errors, but also no hits. If I manually run the find command, it works. If I remove ( $includeString ) it also works, though obviously it doesn't limit itself to the folders I want.
So why would the same command work from the command line but not from the bash script? What is it about passing in $includeString that way that causes it to fail?
You're running into an issue with how the shell handles variable expansion. In your script:
includeString="-wholename './public_html/*' -o -wholename './config/*'"
find . \( $includeString \) -type f -mtime -7 -print
This results in find looking for files where -wholename matches the literal string './public_html/*'. That is, a filename that contains single quotes. Since you don't have any whitespace in your paths, the easiest solution here would be to just drop the single quotes:
includeString="-wholename ./public_html/* -o -wholename ./config/*"
find . \( $includeString \) -type f -mtime -7 -print
Unfortunately, you'll probably get bitten by wildcard expansion here (the shell will attempt to expand the wildcards before find sees them).
But as Etan pointed out in his comment, this appears to be needlessly complex; you can simply do:
find ./public_html ./config -type f -mtime -7 -print
If you want to store a list of arguments and expand it later, the correct form to do that with is an array, not a string:
includeArgs=( -wholename './public_html/*' -o -wholename './config/*' )
find . '(' "${includeArgs[#]}" ')' -type f -mtime -7 -print
This is covered in detail in BashFAQ #50.
Note: As Etan points out in a comment, the better solution in this case may be to reformulate the find command, but passing multiple arguments via variable(s) is a technique worth exploring in general.
tl;dr:
The problem is not specific to find, but to how the shell parses command lines.
Quote characters embedded in variable values are treated as literals: They are neither recognized as argument-boundary delimiters nor are they removed after parsing, so you cannot use a string variable with embedded quoting to pass multiple arguments simply by directly using it as part of a command.
To robustly pass multiple arguments stored in a variable,
use array variables in shells that support them (bash, ksh, zsh) - see below.
otherwise, for POSIX compliance, use xargs - see below.
Robust solutions:
Note: The solutions assume presence of the following script, let's call it echoArgs, which prints the arguments passed to it in diagnostic form:
#!/usr/bin/env bash
for arg; do # loop over all arguments
echo "[$arg]" # print each argument enclosed in [] so as to see its boundaries
done
Further, assume that the equivalent of the following command is to be executed:
echoArgs one 'two three' '*' last # note the *literal* '*' - no globbing
with all arguments but the last passed by variable.
Thus, the expected outcome is:
[one]
[two three]
[*]
[last]
Using an array variable (bash, ksh, zsh):
# Assign the arguments to *individual elements* of *array* args.
# The resulting array looks like this: [0]="one" [1]="two three" [2]="*"
args=( one 'two three' '*' )
# Safely pass these arguments - note the need to *double-quote* the array reference:
echoArgs "${args[#]}" last
Using xargs - a POSIX-compliant alternative:
POSIX utility xargs, unlike the shell itself, is capable of recognized quoted strings embedded in a string:
# Store the arguments as *single string* with *embedded quoting*.
args="one 'two three' '*'"
# Let *xargs* parse the embedded quoted strings correctly.
# Note the need to double-quote $args.
echo "$args" | xargs -J {} echoArgs {} last
Note that {} is a freely chosen placeholder that allows you to control where in the resulting command line the arguments provided by xargs go.
If all xarg-provided arguments go last, there is no need to use -J at all.
For the sake of completeness: eval can also be used to parse quoted strings embedded in another string, but eval is a security risk: arbitrary commands could end up getting executed; given the safe solutions discussed above, there is no need to use eval.
Finally, Charles Duffy mentions another safe alternative in a comment, which, however, requires more coding: encapsulate the command to invoke in a shell function, pass the variable arguments as separate arguments to the function, then manipulate the all-arguments array $# inside the function to supplement the fixed arguments (using set), and invoke the command with "$#".
Explanation of the shell's string-handling issues involved:
When you assign a string to a variable, embedded quote characters become part of the string:
var='one "two three" *'
$var now literally contains one "two three" *, i.e., the following 4 - instead of the intended 3 - words, separated by a space each:
one
"two-- " is part of the word itself!
three"-- " is part of the word itself!
*
When you use $var unquoted as part of an argument list, the above breakdown into 4 words is exactly what the shell does initially - a process called word splitting. Note that if you were to double-quote the variable reference ("$var"), the entire string would always become a single argument.
Because $var is expanded to its value, one of the so-called parameter expansions, the shell does NOT attempt to recognize embedded quotes inside that value as marking argument boundaries - this only works with quote characters specified literally, as a direct part of the command line (assuming these quote characters aren't themselves quoted).
Similarly, only such directly specified quote characters are removed by the shell before passing the enclosed string to the command being invoked - a process called quote removal.
However, the shell additionally applies pathname expansion (globbing) to the resulting 4 words, so any of the words that happen to match filenames will expand to the matching filenames.
In short: the quote characters in $var's value are neither recognized as argument-boundary delimiters nor are they removed after parsing. Additionally, the words in $var's value are subject to pathname expansion.
This means that the only way to pass multiple arguments is to leave them unquoted inside the variable value (and also leave the reference to that variable unquoted), which:
won't work with values with embedded spaces or shell metacharacters
invariably subjects the values to pathname expansion

Get first character of a string SHELL

I want to first the first character of a string, for example:
$>./first $foreignKey
And I want to get "$"
I googled it and I found some solutions but it concerns only bash and not Sh !
This should work on any Posix compatible shell (including sh). printf is not required to be a builtin but it often is, so this may save a fork or two:
first_letter=$(printf %.1s "$1")
Note: (Possibly I should have explained this six years ago when I wrote this brief answer.) It might be tempting to write %c instead of %.1s; that produces exactly the same result except in the case where the argument "$1" is empty. printf %c "" actually produces a NUL byte, which is not a valid character in a Posix shell; different shells might treat this case differently. Some will allow NULs as an extension; others, like bash, ignore the NUL but generate an error message to tell you it has happened. The precise semantics of %.1s is "at most 1 character at the start of the argument, which means that first_letter is guaranteed to be set to the empty string if the argument is the empty string, without raising any error indication.
Well, you'll probably need to escape that particular value to prevent it being interpreted as a shell variable but, if you don't have access to the nifty bash substring facility, you can still use something like:
name=paxdiablo
firstchar=`echo $name | cut -c1-1`
If you do have bash (it's available on most Linux distros and, even if your login shell is not bash, you should be able to run scripts with it), it's the much easier:
firstchar=${name:0:1}
For escaping the value so that it's not interpreted by the shell, you need to use:
./first \$foreignKey
and the following first script shows how to get it:
letter=`echo $1 | cut -c1-1`
echo ".$letter."
Maybe it is an old question.
recently I got the same problem, according to POSIX shell manual about substring processing, this is my solution without involving any subshell/fork
a="some string here"
printf 'first char is "%s"\n' "${a%"${a#?}"}"
for shell sh
echo "hello" | cut -b 1 # -b 1 extract the 1st byte
h
echo "hello" |grep -o "." | head -n 1
h
echo "hello" | awk -F "" '{print $1}'
h
you can try this for bash:
s='hello'; echo ${s:0:1}
h
printf -v first_character "%c" "${variable}"

how to replace "/" in a POSIX sh string

To replace substring in the bash string str I use:
str=${str/$pattern/$new}
However, I'm presently writing a script which will be executed with ash.
I have a string containing '/' and I want to use the above syntax inorder to replace the '/' in my string but it does not work.
I tried:
str=${str///a}
str=${str/\//a}
str=${str/'/'/a}
But they do not work
How I can fix that?
This parameter expansion is a bash extension to POSIX sh. If you review the relevant section of IEEE standard 1003.1, you'll see that it isn't a required feature, so shells which promise only POSIX compliance, such as ash, have no obligation to implement it, and no obligation for their implementations to hew to any particular standard of correctness should they do so anyhow..
If you want bash extensions, you need to use bash (or other ksh derivatives which are extended similarly).
In the interim, you can use other tools. For instance:
str=$(printf '%s' "$str" | tr '/' 'a')
or
str=$(printf '%s' "$str" | sed -e 's#/#a#g')
POSIX string substitutions can be used to create a 100% POSIX compatible function that does the replacement. For short strings, this is considerably faster than command substitution, especially under Cygwin, whose fork(2) copies the parent process's address space on top of creating processes being generally slow in Windows.
replace_all() {
RIGHT=$1
R=
while [ -n "$RIGHT" ]; do
LEFT=${RIGHT%%$2*}
if [ "$LEFT" = "$RIGHT" ]; then
R=$R$RIGHT
return
fi
R=$R$LEFT$3
RIGHT=${RIGHT#*$2}
done
}
It works like this:
$ replace_all ' foo bar baz ' ' ' .
$ echo $R
.foo.bar.baz.
With regards to performance, replacing 25% of characters in a 512 byte string runs roughly 50 times faster with replace_all() than command substitution under the Cygwin dash(1). However, the execution time evens out around 4 KiB.

Can xargs be used to run several arbitrary commands in parallel?

I'd like to be able to provide a long list of arbitrary/different commands (varying binary/executable and arguments) and have xargs run those commands in parallel (xargs -P).
I can use xargs -P fine when only varying arguments. It's when I want to vary the executable and arguments that I'm having difficulty.
Example: command-list.txt
% cat command-list.txt
binary_1 arg_A arg_B arg_C
binary_2 arg_D arg_E
.... <lines deleted for brevity>
binary_100 arg_AAA arg_BBB
% xargs -a command-list.txt -P 4 -L 1
** I know the above command will only echo my command-list.txt **
I am aware of GNU parallel but can only use xargs for now. I also can't just background all the commands since there could be too many for the host to handle at once.
Solution is probably staring me in the face. Thanks in advance!
If you don't have access to parallel, one solution is just to use sh with your command as the parameter.
For example:
xargs -a command-list.txt -P 4 -I COMMAND sh -c "COMMAND"
The -c for sh basically just executes the string given (instead of looking for a file). The man page explanation is:
-c string If the -c option is present, then commands are read from
string. If there are arguments after the string, they are
assigned to the positional parameters, starting with $0.
And the -I for xargs tells it to run one command at a time (like -L 1) and to search and replace the parameter (COMMAND in this case) with the current line being processed by xargs. Man page info is below:
-I replace-str
Replace occurrences of replace-str in the initial-arguments with
names read from standard input. Also, unquoted blanks do not
terminate input items; instead the separator is the newline
character. Implies -x and -L 1.
sh seems to be very forgiving with commands containing quotations marks (") so you don't appear to need to regexp them into escaped quotations.

Resources