Unix 'alias' fails with 'awk' command - linux

I'm creating an alias in Unix and have found that the following command fails..
alias logspace='find /apps/ /opt/ -type f -size +100M -exec ls -lh {} \; | awk '{print $5, $9 }''
I get the following :
awk: cmd. line:1: {print
awk: cmd. line:1: ^ unexpected newline or end of string
Any ideas on why the piped awk command fails...
Thanks,
Shaun.

To complement's #Dropout's helpful answer:
tl;dr
The problem is the OP's attempt to use ' inside a '-enclosed (single-quoted) string.
The most robust solution in this case is to replace each interior ' with '\'' (sic):
alias logspace='find /apps/ /opt/ -type f -size +100M -exec ls -lh {} \; |
awk '\''{print $5, $9 }'\'''
Bourne-like (POSIX-compatible) shells do not support using ' chars inside single-quoted ('...'-enclosed) strings AT ALL - not even with escaping.
(By contrast, you CAN escape " inside a double-quoted string as \", and, as in #Droput's answer, you can directly, embed ' chars. there, but see below for pitfalls.)
The solution above effectively builds the string from multiple, single-quoted strings into which literal ' chars. - escaped outside the single-quoted strings as \' - are spliced in.
Another way of putting it, as #Etan Reisinger has done in a comment: '\'' means: "close string", "escape single quote", "start new string".
When defining an alias, you usually want single quotes around its definition so as to delay evaluation of the command until the alias is invoked.
Other solutions and their pitfalls:
The following discusses alternative solutions, based on the following alias:
alias foo='echo A '\''*'\'' is born at $(date)'
Note how the * is effectively enclosed in single quotes - using above technique - so as to prevent pathname expansion when the alias is invoked later.
When invoked, this alias prints literal A * star is born, followed by the then-current date and time, e.g.: A * is born at Mon Jun 16 11:33:19 EDT 2014.
Use a feature called ANSI C quoting with shells that support it: bash, ksh, zsh
ANSI C-quoted strings, which are enclosed in $'...', DO allow escaping embedded ' chars. as \':
alias foo=$'echo A \'*\' is born at $(date)'
Pitfalls:
This feature is not part of POSIX.
By design, escape sequences such as \n, \t, ... are interpreted, too (in fact, that's the purpose of the feature).
Use of alternating quoting styles, as in #Dropout's answer:
Pitfall:
'...' and "..." have different semantics, so substituting one for the other can have unintended side-effects:
alias foo="echo A '*' is born at $(date)" # DOES NOT WORK AS INTENDED
While syntactically correct, this will NOT work as intended, because the use of double quotes causes the shell to expand the command substitution $(date) right away, and thus hardwires the date and time at the time of the alias definition into the alias.
As stated: When defining an alias, you usually want single quotes around its definition so as to delay evaluation of the command until the alias is invoked.
Finally, a caveat:
The tricky thing in a Bourne-like shell environment is that embedding ' inside a single-quoted string sometimes - falsely - APPEARS to work (instead of generating a syntax error, as in the question), when it instead does something different:
alias foo='echo a '*' is born at $(date)' # DOES NOT WORK AS EXPECTED.
This definition is accepted (no syntax error), but won't work as expected - the right-hand side of the definition is effectively parsed as 3 strings - 'echo a ', *, and ' is born at $(date)', which, due to how the shell parses string (merging adjacent strings, quote removal), results in the following, single, literal string: a * is born at $(date). Since the * is unquoted in the resulting alias definition, it will expand to a list of all file/directory names in the current directory (pathname expansion) when the alias is invoked.

You chould use different quotes for surrounding the whole text and for inner strings.
Try changing it to
alias logspace="find /apps/ /opt/ -type f -size +100M -exec ls -lh {} \; | awk '{print $5, $9 }'"
In other words, your outer quotes should be different than the inner ones, so they don't mix.

Community wiki update:
The redeeming feature of this answer is recognizing that the OP's problem lies in unescaped use of the string delimiters (') inside a string.
However, this answer contains general string-handling truths, but does NOT apply to (Bourne-like, POSIX-compatible) shell programming specifically, and thus does not address the OP's problem directly - see the comments.
Note: Code snippets are meant to be pseudo code, not shell language.
Basic strings: You canNOT use the same quote within the string as the entire string is delimited with:
foo='hello, 'world!'';
^--start string
^-- end string
^^^^^^^^--- unknown "garbage" causing syntax error.
You have to escape the internal strings:
foo='hello, \'world!\'';
^--escape
This is true of pretty much EVERY programming language on the planet. If they don't provide escaping mechanisms, such as \, then you have to use alternate means, e.g.
quotechar=chr(39); // single quote is ascii #39
foo='hello ,' & quotechar & 'world!' & quotechar;

Escape the $ sign not the awk's single quotes and use double quotes for the alias.
Try this :
alias logspace="find /apps/ /opt/ -type f -size +100M -exec ls -lh {} \; | awk '{print \$5, \$9 }'"

Related

Linux - How can i replace some string with same string enclosed with braces?

I have some files in a dir, when i grep some string, i get result like below.
scripts/FileReplace> grep -r "case" *
dir1/file2:case 'a'
dir2/file3:case "ssss"
file1:case 1
After i use replace cmd i expect strings in files updated as below
CASE ('a')
CASE ("ssss")
CASE (1)
ie,, "case" is replaced with "CASE" and text after space is enclosed in braces as above.
Any suggestion how i can do this with shell cmd.
You can use sed and its substitution:
find . -type f -exec sed -i 's/case \(.\+\)/CASE (\1)/' {} +
.\+ matches anything that has at least one character.
\(...\) creates a capture group, you can reference the first capture group as \1.
running with -i~ instead of -i will create backups of the files; recommended especially if you're just experimenting.

How do I use quotes in sed within quotes?

how do I get this
ls -1 | sed 's/\(.*\)/alias \1 "shot \1"/'
into an alias?
Example:
alias asdf "ls -1 | sed 's/\(.*\)/alias \1 "shot \1"/'"
The problem is when I get to the quotes for the alias.
Don't use aliases, use a function:
my_func() {
ls -1 | sed 's/.*/alias & "shot &"/'
}
You should however avoid parsing the output of ls. Please read the link!
In your case, assuming no there is no newlines in the file names, one can use ^0:
my_func() {
printf '%s\n' * | sed 's/\(.*\)/alias \1 "shot \1"/'
}
^0 Which leaves you with the same problems parsing ls would, but without without invoking the extra process, as printf is buildin.
You missed the equal symbol and you must escape double quotes. Try this:
alias asdf="ls -1 | sed 's/\(.*\)/alias \1 \"shot \1\"/'"
Assuming your filenames only contain alphanumerics or DOT or underscore, you should avoid parsing output of ls. Another pitfall is use of pipeline in your command which will create alias only in subshell not in current shell.
You can use this for loop instead:
for f in *; do
alias $f="shot $f"
done
Like this:
alias asdf="ls -1 | sed 's/.*/alias & \"shot &\"/'"
You forgot the assignment operator (=); to get double quotes within double quotes, you have to escape them with \".
Also, I've changed your capture group and backreference to using &, which stands for the complete match.
Notice that programmatically processing the output of ls is not recommended. A robust solution would, for example, use find or fileglobs (like anubhava's answer), but the main point of the question is about escaping double quotes.

Understanding sed expression 's/^\.\///g'

I'm studying Bash programming and I find this example but I don't understand what it means:
filtered_files=`echo "$files" | sed -e 's/^\.\///g'`
In particular the argument passed to sed after '-e'.
It's a bad example; you shouldn't follow it.
First, understanding the sed expression at hand.
s/pattern/replacement/flags is the a sed command, described in detail in man sed. In this case, pattern is a regular expression; replacement is what that pattern gets replaced with when/where found; and flags describe details about how that replacement should be done.
In this case, the s/^\.\///g breaks down as follows:
s is the sed command being run.
/ is the sigil used to separate the sections of this command. (Any character can be used as a sigil, and the person who chose to use / for this expression was, to be charitable, not thinking about what they were doing very hard).
^\.\/ is the pattern to be replaced. The ^ means that this replaces anything only at the beginning; \. matches only a period, vs . (which is regex for matching any character); and \/ matches only a / (vs /, which would go on to the next section of this sed command, being the selected sigil).
The next section is an empty string, which is why there's no content between the two following sigils.
g in the flags section indicates that more than one replacement can happen each line. In conjunction with ^, this has no meaning, since there can only be one beginning-of-the-line per line; further evidence that the person who wrote your example wasn't thinking much.
Using the same data structures, doing it better:
All of the below are buggy when handling arbitrary filenames, because storing arbitrary filenames in scalar variables is buggy in general.
Still using sed:
# Use printf instead of echo to avoid bugginess if your "files" string is "-n" or "-e"
# Use "#" as your sigil to avoid needing to backslash-escape all the "\"s
filtered_files=$(printf '%s\n' "$files" | sed -e 's#^[.]/##g'`)
Replacing sed with a bash builtin:
# This is much faster than shelling out to any external tool
filtered_files=${files//.\//}
Using better data structures
Instead of running
files=$(find .)
...instead:
files=( )
while IFS= read -r -d '' filename; do
files+=( "$filename" )
done < <(find . -print0)
That stores files in an array; it looks complex, but it's far safer -- works correctly even with filenames containing spaces, quote characters, newline literals, etc.
Also, this means you can do the following:
# Remove the leading ./ from each name; don't remove ./ at any other position in a name
filtered_files=( "${files[#]#./}" )
This means that a file named
./foo/this directory name (which has spaces) ends with a period./bar
will correctly be transformed to
foo/this directory name (which has spaces) ends with a period./bar
rather than
foo/this directory name (which has spaces) ends with a periodbar
...which would have happened with the original approach.
man sed. In particular:
-e script, --expression=script
add the script to the commands to be executed
And:
s/regexp/replacement/
Attempt to match regexp against the pattern space. If success-
ful, replace that portion matched with replacement. The
replacement may contain the special character & to refer to that
portion of the pattern space which matched, and the special
escapes \1 through \9 to refer to the corresponding matching
sub-expressions in the regexp.
In this case, it replaces any occurence of ./ at the beginning of a line with the empty string, in other words removing it.

find command works on prompt, not in bash script - pass multiple arguments by variable

I've searched around questions with similar issues but haven't found one that quite fits my situation.
Below is a very brief script that demonstrates the problem I'm facing:
#!/bin/bash
includeString="-wholename './public_html/*' -o -wholename './config/*'"
find . \( $includeString \) -type f -mtime -7 -print
Basically, we need to search inside a folder, but only in certain of its subfolders. In my longer script, includeString gets built from an array. For this demo, I kept things simple.
Basically, when I run the script, it doesn't find anything. No errors, but also no hits. If I manually run the find command, it works. If I remove ( $includeString ) it also works, though obviously it doesn't limit itself to the folders I want.
So why would the same command work from the command line but not from the bash script? What is it about passing in $includeString that way that causes it to fail?
You're running into an issue with how the shell handles variable expansion. In your script:
includeString="-wholename './public_html/*' -o -wholename './config/*'"
find . \( $includeString \) -type f -mtime -7 -print
This results in find looking for files where -wholename matches the literal string './public_html/*'. That is, a filename that contains single quotes. Since you don't have any whitespace in your paths, the easiest solution here would be to just drop the single quotes:
includeString="-wholename ./public_html/* -o -wholename ./config/*"
find . \( $includeString \) -type f -mtime -7 -print
Unfortunately, you'll probably get bitten by wildcard expansion here (the shell will attempt to expand the wildcards before find sees them).
But as Etan pointed out in his comment, this appears to be needlessly complex; you can simply do:
find ./public_html ./config -type f -mtime -7 -print
If you want to store a list of arguments and expand it later, the correct form to do that with is an array, not a string:
includeArgs=( -wholename './public_html/*' -o -wholename './config/*' )
find . '(' "${includeArgs[#]}" ')' -type f -mtime -7 -print
This is covered in detail in BashFAQ #50.
Note: As Etan points out in a comment, the better solution in this case may be to reformulate the find command, but passing multiple arguments via variable(s) is a technique worth exploring in general.
tl;dr:
The problem is not specific to find, but to how the shell parses command lines.
Quote characters embedded in variable values are treated as literals: They are neither recognized as argument-boundary delimiters nor are they removed after parsing, so you cannot use a string variable with embedded quoting to pass multiple arguments simply by directly using it as part of a command.
To robustly pass multiple arguments stored in a variable,
use array variables in shells that support them (bash, ksh, zsh) - see below.
otherwise, for POSIX compliance, use xargs - see below.
Robust solutions:
Note: The solutions assume presence of the following script, let's call it echoArgs, which prints the arguments passed to it in diagnostic form:
#!/usr/bin/env bash
for arg; do # loop over all arguments
echo "[$arg]" # print each argument enclosed in [] so as to see its boundaries
done
Further, assume that the equivalent of the following command is to be executed:
echoArgs one 'two three' '*' last # note the *literal* '*' - no globbing
with all arguments but the last passed by variable.
Thus, the expected outcome is:
[one]
[two three]
[*]
[last]
Using an array variable (bash, ksh, zsh):
# Assign the arguments to *individual elements* of *array* args.
# The resulting array looks like this: [0]="one" [1]="two three" [2]="*"
args=( one 'two three' '*' )
# Safely pass these arguments - note the need to *double-quote* the array reference:
echoArgs "${args[#]}" last
Using xargs - a POSIX-compliant alternative:
POSIX utility xargs, unlike the shell itself, is capable of recognized quoted strings embedded in a string:
# Store the arguments as *single string* with *embedded quoting*.
args="one 'two three' '*'"
# Let *xargs* parse the embedded quoted strings correctly.
# Note the need to double-quote $args.
echo "$args" | xargs -J {} echoArgs {} last
Note that {} is a freely chosen placeholder that allows you to control where in the resulting command line the arguments provided by xargs go.
If all xarg-provided arguments go last, there is no need to use -J at all.
For the sake of completeness: eval can also be used to parse quoted strings embedded in another string, but eval is a security risk: arbitrary commands could end up getting executed; given the safe solutions discussed above, there is no need to use eval.
Finally, Charles Duffy mentions another safe alternative in a comment, which, however, requires more coding: encapsulate the command to invoke in a shell function, pass the variable arguments as separate arguments to the function, then manipulate the all-arguments array $# inside the function to supplement the fixed arguments (using set), and invoke the command with "$#".
Explanation of the shell's string-handling issues involved:
When you assign a string to a variable, embedded quote characters become part of the string:
var='one "two three" *'
$var now literally contains one "two three" *, i.e., the following 4 - instead of the intended 3 - words, separated by a space each:
one
"two-- " is part of the word itself!
three"-- " is part of the word itself!
*
When you use $var unquoted as part of an argument list, the above breakdown into 4 words is exactly what the shell does initially - a process called word splitting. Note that if you were to double-quote the variable reference ("$var"), the entire string would always become a single argument.
Because $var is expanded to its value, one of the so-called parameter expansions, the shell does NOT attempt to recognize embedded quotes inside that value as marking argument boundaries - this only works with quote characters specified literally, as a direct part of the command line (assuming these quote characters aren't themselves quoted).
Similarly, only such directly specified quote characters are removed by the shell before passing the enclosed string to the command being invoked - a process called quote removal.
However, the shell additionally applies pathname expansion (globbing) to the resulting 4 words, so any of the words that happen to match filenames will expand to the matching filenames.
In short: the quote characters in $var's value are neither recognized as argument-boundary delimiters nor are they removed after parsing. Additionally, the words in $var's value are subject to pathname expansion.
This means that the only way to pass multiple arguments is to leave them unquoted inside the variable value (and also leave the reference to that variable unquoted), which:
won't work with values with embedded spaces or shell metacharacters
invariably subjects the values to pathname expansion

Escape file name for use in sed substitution

How can I fix this:
abc="a/b/c"; echo porc | sed -r "s/^/$abc/"
sed: -e expression #1, char 7: unknown option to `s'
The substitution of variable $abc is done correctly, but the problem is that $abc contains slashes, which confuse sed. Can I somehow escape these slashes?
Note that sed(1) allows you to use different characters for your s/// delimiters:
$ abc="a/b/c"
$ echo porc | sed -r "s|^|$abc|"
a/b/cporc
$
Of course, if you go this route, you need to make sure that the delimiters you choose aren't used elsewhere in your input.
The GNU manual for sed states that "The / characters may be uniformly replaced by any other single character within any given s command."
Therefore, just use another character instead of /, for example ::
abc="a/b/c"; echo porc | sed -r "s:^:$abc:"
Do not use a character that can be found in your input. We can use : above, since we know that the input (a/b/c/) doesn't contain :.
Be careful of character-escaping.
If using "", Bash will interpret some characters specially, e.g. ` (used for inline execution), ! (used for accessing Bash history), $ (used for accessing variables).
If using '', Bash will take all characters literally, even $.
The two approaches can be combined, depending on whether you need escaping or not, e.g.:
abc="a/b/c"; echo porc | sed 's!^!'"$abc"'!'
You don't have to use / as pattern and replace separator, as others already told you. I'd go with : as it is rather rarely used in paths (it's a separator in PATH environment variable). Stick to one and use shell built-in string replace features to make it bullet-proof, e.g. ${abc//:/\\:} (which means replace all : occurrences with \: in ${abc}) in case of : being the separator.
$ abc="a/b/c"; echo porc | sed -r "s:^:${abc//:/\\:}:"
a/b/cporc
backslash:
abc='a\/b\/c'
space filling....
As for the escaping part of the question I had the same issue and resolved with a double sed that can possibly be optimized.
escaped_abc=$(echo $abc | sed "s/\//\\\AAA\//g" | sed "s/AAA//g")
The triple A is used because otherwise the forward slash following its escaping backslash is never placed in the output, no matter how many backslashes you put in front of it.

Resources