Function in pipe chain - linux

I have this function where input parameters are searched string and input file. Function works with files:
f_highlite() {
sed -e 's/\($1\)/\o033[91m\1\o033[39m/g' $2
}
Now I would like to use this function in pipe. How does it should be modified?
ps aux | grep java | f_highlite "Xms" -
PS: I'm not sure how to exactly name this question. If you have better suggestion say it. ;]

First, you need to use double quotes, otherwise $1 wouldn't get expanded:
f_highlite() {
sed -e "s/\($1\)/\o033[91m\1\o033[39m/g" "$2"
}
Btw, you need to make sure that $1 won't contain characters that are understood by sed as syntax elements. For Xms that's fine.
To the topic, you can pass - as the second argument to the function because sed understands - as stdin:
ps aux | grep Java | f_highlite "Xms" -
(thanks #chepner!)

There are two other approaches that you might want to know about, as not all commands would support the - trick.
The first one is having a function that works on streams and does not take a file as input. You can do that by removing the $2 at the end, and changing how you call the function
f_highlite() {
sed -e 's/\($1\)/\o033[91m\1\o033[39m/g'
}
f_highlite <"Xms"
This will redirect the content of your file and connect it to the standard input of the function (and hence to that of sed).
The other approach is to keep your function as is (I am reusing a correction to the quoting suggested in another answer), but feed it a file by using process substitution.
f_highlite() {
sed -e "s/\($1\)/\o033[91m\1\o033[39m/g" "$2"
}
f_highlite < <(<"Xms")
This (conceptually at least) creates a FIFO that has its input fed with the content of your file, and its output connected to the input of the function. The key here is that <(<"Xms") becomes a filename (you can try printing its name to validate that).

Related

How can I use xargs to run a function in a command substitution for each match?

While writing Bash functions for string replacements I have encountered a strange behaviour when using xargs. This is actually driving me mad currently as I cannot get it to work.
Fortunately I have been able to nail it down to the following simple example:
Define a simple function which doubles every character of the given parameter:
function subs { echo $1 | sed -E "s/(.)/\1\1/g"; }
Call the function:
echo $(subs "ABC")
As expected the output is:
AABBCC
Now call the function using xargs:
echo "ABC" | xargs -I % echo $(subs "%")
Surprisingly the result now is:
ABCABC
It seems as if the sed command inside the function treats the whole string now as a single character.
Why does this happen and how can it be prevented?
You might ask, why I use xargs at all. Of course, this is a simplified example and the actual use case is much more complex.
In the original use case, I have a program which produces lots of output. I pipe the output through several greps to get the lines of interest. Afterwards, I pipe the lines to sed to extract the data I need from the lines. Because some transformations I need to do on the data are too complex to do with regular expressions alone, I'd like to use a function for these. So, my original idea was to simply pipe into the function but I couldn't get that to work and end up with the xargs solution. My original idea was something like this:
command | grep ... | grep ... | grep ... | sed ... | subs
BTW: I do not do this from the command line but from within a script. The function is defined in the very same script in which it is used.
I'm using Bash 3.2 (Mac OS X default), so fancy Bash 4.x stuff won't help me, sorry.
I'll be happy about everything which might shed some light on this topic.
Best regards
Frank
If you really need to do this (and you probably don't, but we can't help without a more representative sample), a better-practice approach might look like:
subs() { sed -E "s/(.)/\1\1/g" <<<"$1"; }
export -f subs
echo "ABC" | xargs bash -c 'for arg; do subs "$arg"; done' _
The use of echo "$(subs "$arg")" instead of just subs "$arg" adds nothing but bugs (consider what happens if one of your arguments is -n -- and that's assuming a relatively tame echo; they're allowed to consume backslashes even without a -e argument and to do all manner of other surprising things). You could do it above, but it slows your program down and makes it more prone to surprising behaviors; there's no point.
Running export -f subs export your function to the environment, so it can be run by other instances of bash invoked as child processes (all programs invoked by xargs are outside your shell, so they can't see shell-local variables or functions).
Without -I -- which is to say, in its default mode of operation -- xargs appends arguments to the end of the command it's given. This permits a much more efficient usage mode, where instead of invoking one command per line of input, it passes as many arguments as possible to the shortest possible number of subprocesses.
This also avoids major security bugs that can happen when using xargs -I in conjunction with bash -c '...' or sh -c '...'. (If you ever use -I% sh -c '...%...', then your filenames become part of your code, and are able to be used in injection attacks on your system).
That's because the construct $(subs "%") gets expanded by the shell when parsing the pipeline, so xargs runs with echo %%.

sed not working on a variable within a bash script; requesting a file. Simple example

If I declare a variable within a bash script, and then try to operate on it with sed, I keep getting errors. I've tried with double quotes, back ticks and avoiding single quotes on my variable. Here is what I'm essentially doing.
Call my script with multiple parameters
./myScript.sh apples oranges ilike,apples,oranges,bananas
My objective is to use sed to replace $3 "," with " ", then use wc -w to count how many words are in $3.
MyScript.sh
fruits="$3"
checkFruits= sed -i 's/,/ /g' <<< "$fruits"
echo $checkFruits
And the result after running the script in the terminal:
ilike,apples,oranges,bananas
sed: no input files
P.s. After countless google searches, reading suggestions and playing with my code, I simply cannot get this easy sample of code to work and I'm not really sure why. And I can't try to implement the wc -w until I move past this block.
You can do
fruits="$3"
checkFruits="${3//,/ }"
# or
echo "${3//,/ }"
The -i flag to sed requires a file argument, without it the sed command does what you expect.
However, I'd consider using tr instead of sed for this simple replacement:
fruits="$3"
checkFruits="$(tr , ' ' <<< $fruits)"
echo $checkFruits
Looking at the larger picture, do you want to count comma-separated strings, or the number of words once you have changed commas into spaces? For instance, do you want the string "i like,apples,oranges,and bananas" to return a count of 4, or 6? (This question is moot if you are 100% sure you will never have spaces in your input data.)
If 6, then the other answers (including mine) will already work.
However, if you want the answer to be 4, then you might want to do something else, like:
fruits="$3"
checkFruits="$(tr , \\n <<< $fruits)"
itemCount="$(wc -l <<< $checkFruits)"
Of course this can be condensed a little, but just throwing out the question as to what you're really doing. When asking a question here, it's good to post your expected results along with the input data and the code you've already used to try to solve the problem.
The -i option is for inplace editing of input file, you don't need it here.
To assign a command's output to a variable, use command expansion like var=$(command).
fruits="$3"
checkFruits=$(sed 's/,/ /g' <<< "$fruits")
echo $checkFruits
You don't need sed at all.
IFS=, read -a things <<< "$3"
echo "${#things[#]}"

Making a bash script to accept input from file OR piping output

I have the following bash script which takes the tabular data as input,
get the first line and spit them vertically:
#!/bin/bash
# my_script.sh
export LC_ALL=C
file=$1
head -n1 $file |
tr "\t" "\n" |
awk '{print $1 " " NR-1}'
The problem is that I can only execute it this way:
$ myscript.sh some_tab_file.txt
What I want to do is on top of the above capability also allows you to do this:
$ cat some_tab_file.txt myscript.sh | myscript.sh
Namely take it from pipe output. How can I achieve that?
I'd normally write:
export LC_ALL=C
head -n1 "$#" |
tr "\t" "\n" |
awk '{print $1 " " NR-1}'
This works with any number of arguments, or none if there are none. Using "$#" is important in this and many other contexts. See the Bash manual on special parameters and shell parameter expansion for more information on the many and varied notations available for controlling how shell parameters are handled. Generally, double quotes are a good idea, especially if the file names may contain spaces.
A common idiom is to fall back to the input file - if there are no parameters. There is a convenient shorthand for that;
file=${1--}
The substitution ${variable-fallback} evaluates to the variable's value, or fallback if it's unset.
I believe your script should work as-is, though; head will read standard input if the (unquoted!) file name you pass in evaluates to the empty string.
Take care to properly double-quote all interpolations of "$file", by the way; otherwise, your script won't work on filenames containing spaces or shell metacharacters. (Then you break the fortunate side effect of not passing a filename to head if your script did not receive one, though.)

Is there a way to put the following logic into a grep command?

For example suppose I have the following piece of data
ABC,3,4
,,ExtraInfo
,,MoreInfo
XYZ,6,7
,,XyzInfo
,,MoreXyz
ABC,1,2
,,ABCInfo
,,MoreABC
It's trivial to get grep to extract the ABC lines. However if I want to also grab the following lines to produce this output
ABC,3,4
,,ExtraInfo
,,MoreInfo
ABC,1,2
,,ABCInfo
,,MoreABC
Can this be done using grep and standard shell scripting?
Edit: Just to clarify there could be a variable number of lines in between. The logic would be to keep printing while the first column of the CSV is empty.
grep -A 2 {Your regex} will output the two lines following the found strings.
Update:
Since you specified that it could be any number of lines, this would not be possible as grep focuses on matching on a single line see the following questions:
How can I search for a multiline pattern in a file?
Regex (grep) for multi-line search needed
Why can't i match the pattern in this case?
Selecting text spanning multiple lines using grep and regular expressions
You can use this, although a bit hackity due to the grep at the end of the pipeline to mute out anything that does not start with 'A' or ',':
$ sed -n '/^ABC/,/^[^,]/p' yourfile.txt| grep -v '^[^A,]'
Edit: A less hackity way is to use awk:
$ awk '/^ABC/ { want = 1 } !/^ABC/ && !/^,/ { want = 0 } { if (want) print }' f.txt
You can understand what it does if you read out loud the pattern and the thing in the braces.
The manpage has explanations for the options, of which you want to look at -A under Context Line Control.

How to pass the value of a variable to the standard input of a command?

I'm writing a shell script that should be somewhat secure, i.e., does not pass secure data through parameters of commands and preferably does not use temporary files. How can I pass a variable to the standard input of a command?
Or, if it's not possible, how can I correctly use temporary files for such a task?
Passing a value to standard input in Bash is as simple as:
your-command <<< "$your_variable"
Always make sure you put quotes around variable expressions!
Be cautious, that this will probably work only in bash and will not work in sh.
Simple, but error-prone: using echo
Something as simple as this will do the trick:
echo "$blah" | my_cmd
Do note that this may not work correctly if $blah contains -n, -e, -E etc; or if it contains backslashes (bash's copy of echo preserves literal backslashes in absence of -e by default, but will treat them as escape sequences and replace them with corresponding characters even without -e if optional XSI extensions are enabled).
More sophisticated approach: using printf
printf '%s\n' "$blah" | my_cmd
This does not have the disadvantages listed above: all possible C strings (strings not containing NULs) are printed unchanged.
(cat <<END
$passwd
END
) | command
The cat is not really needed, but it helps to structure the code better and allows you to use more commands in parentheses as input to your command.
Note that the 'echo "$var" | command operations mean that standard input is limited to the line(s) echoed. If you also want the terminal to be connected, then you'll need to be fancier:
{ echo "$var"; cat - ; } | command
( echo "$var"; cat - ) | command
This means that the first line(s) will be the contents of $var but the rest will come from cat reading its standard input. If the command does not do anything too fancy (try to turn on command line editing, or run like vim does) then it will be fine. Otherwise, you need to get really fancy - I think expect or one of its derivatives is likely to be appropriate.
The command line notations are practically identical - but the second semi-colon is necessary with the braces whereas it is not with parentheses.
This robust and portable way has already appeared in comments. It should be a standalone answer.
printf '%s' "$var" | my_cmd
or
printf '%s\n' "$var" | my_cmd
Notes:
It's better than echo, reasons are here: Why is printf better than echo?
printf "$var" is wrong. The first argument is format where various sequences like %s or \n are interpreted. To pass the variable right, it must not be interpreted as format.
Usually variables don't contain trailing newlines. The former command (with %s) passes the variable as it is. However tools that work with text may ignore or complain about an incomplete line (see Why should text files end with a newline?). So you may want the latter command (with %s\n) which appends a newline character to the content of the variable. Non-obvious facts:
Here string in Bash (<<<"$var" my_cmd) does append a newline.
Any method that appends a newline results in non-empty stdin of my_cmd, even if the variable is empty or undefined.
I liked Martin's answer, but it has some problems depending on what is in the variable. This
your-command <<< """$your_variable"""
is better if you variable contains " or !.
As per Martin's answer, there is a Bash feature called Here Strings (which itself is a variant of the more widely supported Here Documents feature):
3.6.7 Here Strings
A variant of here documents, the format is:
<<< word
The word is expanded and supplied to the command on its standard
input.
Note that Here Strings would appear to be Bash-only, so, for improved portability, you'd probably be better off with the original Here Documents feature, as per PoltoS's answer:
( cat <<EOF
$variable
EOF
) | cmd
Or, a simpler variant of the above:
(cmd <<EOF
$variable
EOF
)
You can omit ( and ), unless you want to have this redirected further into other commands.
Try this:
echo "$variable" | command
If you came here from a duplicate, you are probably a beginner who tried to do something like
"$variable" >file
or
"$variable" | wc -l
where you obviously meant something like
echo "$variable" >file
echo "$variable" | wc -l
(Real beginners also forget the quotes; usually use quotes unless you have a specific reason to omit them, at least until you understand quoting.)

Resources