xargs pipe non empty stdin lines to command while preserving double quotes

xargs pipe non empty stdin lines to command while preserving double quotes - linux

I'm trying to have a script listen to stdin (so I run it and it doesn't immediately exit) and only execute when stdin is not empty and then pipe the stdin line to another command.
Right now I'm using the command from the answer here:
xargs -I {} sh -c 'echo {} | foo'
I want to preserve double quotes from stdin, for that people suggest using -d '\n' but this causes foo to run on empty lines.
I looked into possible GNU Parallel solutions but couldn't find anything.
Here is my stdout:
>xargs -I {} sh -c 'echo {} | foo'
bar
I have executed for 'bar'
"bar"
I have executed for 'bar' //notice the double quotes missing
^C
>xargs -I {} sh -c "echo '{}' | foo"
bar
I have executed for 'bar'
"bar"
I have executed for 'bar' //Same thing, double quotes missing
^C
>xargs -d '\n' -I {} sh -c "echo {} | foo"
i have executed for '' //doesn't ignore empty lines anymore
i have executed for ''
bar
i have executed for 'bar'
"bar"
i have executed for 'bar'
Desired output:
bar
I have executed for 'bar'
"bar"
I have executed for '"bar"'
Running
echo '"bar"' | foo
gets me
I have executed for '"bar"'

If, as your tags suggest, you are running on linux, you have GNU xargs, which supports the -0 option. Then, you can pass in completely arbitrary text, including even newlines:
printf '%s\0' "foo" "'bar"' '"baz"' 'quux
with a newline' | xargs -0 foo
Removing empty lines could be accomplished with a simple grep in front. There is also xargs -r which says to not run the command if xargs receives empty input (this too is a GNU extension).
Your attempts are slightly problematic, though; you should pass the arguments as command-line arguments rather than have xargs interpolate them into the sh -c '... {} ...' string literally.
Slightly depending on your requirements, this could even work portably on other platforms:
xargs sh -c 'if [ $# -gt 0 ]; then echo "$#" | foo; fi' _
The _ is just a placeholder; the arguments to sh -c '...' are used to populate $0, $1, $2, etc and so we put in something, anything, to occupy the slot for $0.

GNU Parallel uses this internally:
perl -e 'if(sysread(STDIN,$buf,1)){open($fh,"|-",#ARGV)||die;syswrite($fh,$buf);if($read=sysread(STDIN,$buf,131071)){syswrite($fh,$buf);}while($read=sysread(STDIN,$buf,131072)){syswrite($fh,$buf);}close$fh;exit($?&127?128+($?&127):1+$?>>8)}' /usr/bin/bash -c 'wc -l'
If you only want a single line try:
seq 3 | parallel --pipe -N1 wc -c
echo "'foo'" | parallel --pipe -N1 --rrs "echo -n i have executed for \"'\";cat;echo \"'\""
echo '"foo"' | parallel --pipe -N1 --rrs "echo -n i have executed for \"'\";cat;echo \"'\""

I want to preserve double quotes from stdin, for that people suggest using -d '\n' but this causes foo to run on empty lines.
xargs performs quote processing by default unless you specify a delimiter via either -d/--delimiter or -0/--null. You must use one of these to avoid xargs removing the quotes you are trying to preserve.
What's more, supposing that you manage to pass the quoted input through xargs unchanged, the shell that xargs launches to run the command will perform its own quote removal, as well as parameter expansion, variable assingments, redirection processing, etc. You can observe the effects of that directly with this variation on your command:
$ xargs -d '\n' -I{} sh -c 'echo {} >>tmp.txt'
bar
'bar'
$ cat tmp.txt
bar
bar
$
Note that the quotes are removed despite specifying a delimiter to xargs.
It's a bit hard to parse your exact requirements, but it sounds like you just want to filter empty lines out of the standard input to some command. sed can do that pretty easily:
foo() {
while IFS= read -r line; do
echo "I have executed for '$line'"
done
}
$ sed '/\S/!d' | foo
bar
"bar"
A whole line with "quotes" and 'quotes' and metacharacters > < !
I have executed for 'bar'
I have executed for '"bar"'
I have executed for 'A whole line with "quotes" and 'quotes' and metacharacters > < !'
$
Explanation of the sed command: the regex /\S/ matches any non-whitespace character, anywhere on the line. The ! negates the match, and the d deletes lines matching the (negated) pattern -- that is any line that does not contain at least one non-whitespace character.
As you can see in the example run transcript, there is a difference in buffering between your example command and the effect of filtering with sed. It's unclear whether that's important to you.

Related

pipe then hyphen (stdin) as an alternative to for loop

I wrote a few sed an awk commands to extract a set of IDs that are associated with file names. I would like to run a set of commands using these filenames from id.txt
cat id.txt
14235.gz
41231.gz
41234.gz
I usually write for loops as follows:
for i in $(cat id.txt);
do
command <options> $i
done
I thought I could also do cat id.txt | command <options> -
Is there a way to pipe the output of cat, awk, sed, etc, line by line into a command?

Use a while read loop see Don't read lines wit for
while IFS= read -r line_in_text_file; do
echo "$line_in_text_file"
done < id.txt

Commands don't usually get their filename arguments on standard input. Using - as an argument means to read the file contents from standard input instead of a named file, it doesn't mean to get the filename from stdin.
You can use command substitution to use the contents of the file as all the filename arguments to the command:
command <options> $(cat id.txt)
or you can use xargs
xargs command <options> < id.txt

Is there a way to pipe the output of cat, awk, sed, etc, line by line into a command?
Compound commands can be placed in a pipe, the syntax is not very strict. The usual:
awk 'some awk script' |
while IFS= read -r line; do
echo "$line"
done |
sed 'some sed script'
I avoid reading input line by line using a while read - it's very slow. It's way faster to use awk scripts and other commands.
Command groups can be used to:
awk 'some awk script' |
{ # or '(', but there is no need for a subshell
echo "header1,header2"
# remove first line
IFS= read -r first_line
# ignore last line
sed '$d'
} |
sed 'some sed script'
Remember that pipe command are run in a subshell, so variable changes will not affect parent shell.
Bash has process substitution extension that let's you run a while loop inside parent shell:
var=1
while IFS= read -r line; do
if [[ "$line" == 2 ]]; then
var=2
fi
done <(
seq 10 |
sed '$d'
)
echo "$var" # will output 2

xargs can do this
cat id.txt | xargs command
From xargs help
$ xargs --help
Usage: xargs [OPTION]... COMMAND [INITIAL-ARGS]...
Run COMMAND with arguments INITIAL-ARGS and more arguments read from input.
Mandatory and optional arguments to long options are also
mandatory or optional for the corresponding short option.
-0, --null items are separated by a null, not whitespace;
disables quote and backslash processing and
logical EOF processing
-a, --arg-file=FILE read arguments from FILE, not standard input
-d, --delimiter=CHARACTER items in input stream are separated by CHARACTER,
not by whitespace; disables quote and backslash
...

Preserve '\n' newline in returned text over ssh

If I execute a find command, with grep and sort etc. in the local command line, I get returned lines like so:
# find ~/logs/ -iname 'status' | xargs grep 'last seen' | sort --field-separator=: -k 4 -g
0:0:line:1
0:0:line:2
0:0:line:3
If I execute the same command over ssh, the returned text prints without newlines, like so:
# VARcmdChk="$(ssh ${VARuser}#${VARserver} "find ~/logs/ -iname 'status' | xargs grep 'last seen' | sort --field-separator=: -k 4 -g")"
# echo ${VARcmdChk}
0:0:line:1 0:0:line:2 0:0:line:3
I'm trying to understand why ssh is sanitising the returned text, so that newlines are converted to spaces. I have not yet tried output'ing to file, and then using scp to pull that back. Seems a waste, since I just want to view the remote results locally.

When you echo the variable VARcmdChk, you should enclose it with ".
$ VARcmdChk=$(ssh ${VARuser}#${VARserver} "find tmp/ -iname status -exec grep 'last seen' {} \; | sort --field-separator=: -k 4 -g")
$ echo "${VARcmdChk}"
last seen:11:22:33:44:55:66:77:88:99:00
last seen:00:99:88:77:66:55:44:33:22:11
Note that I've replaced your xargs for -exec.

Ok, the question is a duplicate of this one, Why does shell Command Substitution gobble up a trailing newline char?, so partly answered.
However, I say partly, as the answers tell you the reasons for this happening as such, but the only clue to a solution is a small answer right at the end.
The solution is to quote the echo argument, as the solution suggests:
# VARcmdChk="$(ssh ${VARuser}#${VARserver} "find ~/logs/ -iname 'status' | xargs grep 'last seen' | sort --field-separator=: -k 4 -g")"
# echo "${VARcmdChk}"
0:0:line:1
0:0:line:2
0:0:line:3
but there is no explanation as to why this works as such, since assumption is that the variable is a string, so should print as expected. However, reading Expansion of variable inside single quotes in a command in Bash provides the clue regarding preserving newlines etc. in a string. Placing the variable to be printed by echo into quotes preserves the contents absolutely, and you get the expected output.

The echo of the variable is why its putting it all into one line. Running the following command will output the results as expected:
ssh ${VARuser}#${VARserver} "find ~/logs/ -iname 'status' | xargs grep 'last seen' | sort --field-separator=: -k 4 -g"
To get the command output to have each result on a new line, like it does when you run the command locally you can use awk to split the results onto a new line.
awk '{print $1"\n"$2}'
This method can be appended to your command like this:
echo ${VARcmdChk} | awk '{print $1"\n"$2"\n"$3"\n"$4}'
Alternatively, you can put quotes around the variable as per your answer:
echo "${VARcmdChk}"

xargs bash -c unexpected token

I'm experiencing an issue calling xargs inside a bash script to parallelize the launch of a function.
I have this line:
grep -Ev '^#|^$' "$listOfTables" | xargs -d '\n' -l1 -I args -P"$parallels" bash -c "doSqoop 'args'"
that launches the function doSqoop that I previously exported.
I am passing to xargs and then to bash -c a single, very long line, containing fields that I split and handle inside the function.
It is something like schema|tab|dest|desttab|query|splits|.... that I read from a file, via the grep command above. I am fine with this solution, I know xargs can split the line on | but I'm ok this way.
It used to work well since I had to add another field at the end, which contains this kind of value:
field1='varchar(12)',field2='varchar(4)',field3='timestamp',....
Now I have this error:
bash: -c: line 0: syntax error near unexpected token '('
I tried to escape the pharhentesis and and single quotes, without success.
It appears to me that bash -c is interpreting the arguments

Use GNU parallel that can call exported functions, and also has an easier syntax and much more capabilities.
Your sample command should could be replaced with
grep -Ev '^#|^$' file | parallel doSqoop
Test with below script:
#!/bin/bash
doSqoop() {
printf "%s\n" "$#"
}
export -f doSqoop
grep -Ev '^#|^$' file | parallel doSqoop
You can also set the number of processes with the -P option, otherwise it matches the number of cores in your system:
grep -Ev '^#|^$' file | parallel -P "$num" doSqoop

optimize xargs argument enumeration

Can this usage of xargs argument enumaration be optimized better?
The aim is to inject single argument in the middle of the actual command.
I do:
echo {1..3} | xargs -I{} sh -c 'for i in {};do echo line $i here;done'
or
echo {1..3} | for i in $(xargs -n1);do echo line $i here; done
I get:
line 1 here
line 2 here
line 3 here
which is what I need but I wondered if loop and temporary variable could be avoided?

You need to separate the input to xargs by newlines:
echo {1..3}$'\n' | xargs -I% echo line % here
For array expansions, you can use printf:
ar=({1..3})
printf '%s\n' "${ar[#]}" | xargs -I% echo line % here
(and if it's just for output, you can use it without xargs:
printf 'line %s here\n' "${ar[#]}"
)

Try without xargs. For most situations xargs is overkill.
Depending on what you really want you can choose a solution like
# Normally you want to avoid for and use while, but here you want the things splitted.
for i in $(echo {1 2 3} );do
echo line $i here;
done
# When you want 1 line turned into three, `tr` can help
echo {1..3} | tr " " "\n" | sed 's/.*/line & here/'
# printf will repeat itself when there are parameters left
printf "line %s here\n" $(echo {1..3})
# Using the printf feature you can avoid the echo
printf "line %s here\n" {1..3}

Maybe this?
echo {1..3} | tr " " "\n" | xargs -n1 sh -c ' echo "line $0 here"'
The tr replaces the spaces with newlines, so xargs sees three lines. I would not be surprised if there were a better (more efficient) solution, but this one is quite simple.
Please note I have modified my previous answer to remove the use of {}, which was suggested in the comments to eliminate a potential code injection vulnerability.

There is a not well known feature of GNU sed. You can add the e flag to the s command and then sed executes whatever is in the pattern space and replaces the pattern space with the output if that command.
If you are really only interested in the output of the echo commands, you might try this GNU sed example, which eliminates the temporary variable, the loop (and the xargs as well):
echo {1..3} | sed -r 's/([^ ])+/echo "line \1 here"\n/ge
it fetches one token (i.e. whatever is separated by the spaces)
replaces it with echo "line \1 here"\n command, with \1 replaced by the token
then executes echo
puts the output of the echo command back into pattern space
that means it outputs the result of the three echos
But an even better way to get the desired output is to skip the execution and do the transformation directly in sed, like this:
echo {1..3} | sed -r 's/([^ ])+ ?/line \1 here\n/g'

UNIX shell script to run a list of grep commands from a file and getting result in a single delimited file

I am beginner in unix programming and a way to automate my work
I want to run a list a grep commands and get the output of all the grep command in a in a single delimited file .
i am using the following bash script. But it's not working .
Mockup sh file:
!/bin/sh
grep -l abcd123
grep -l abcd124
grep -l abcd125
and while running i used the following command
$ ./Mockup.sh > output.txt
Is it the right command?
How can I get both the grep command and output in the output file?
how can i delimit the output after each command and result?

How can I get both the grep command and output in the output file
You can use bash -v (verbose) to print each command before execution on stderr and it's output will be as usual be available on stdout:
bash -v ./Mockup.sh > output.txt 2>&1
cat output.txt
Working Demo

A suitable shell script could be
#!/bin/sh
grep -l 'abcd123\|abcd124\|abcd125' "$#"
provided that the filenames you pass on the invocation of the script are "well behaved", that is no whitespace in them. (Edit Using the "$#" expansion takes care of generic whitespace in the filenames, tx to triplee for his/her comment)
This kind of invocation (with alternative matching strings, as per the \| syntax) has the added advantage that you have exactly one occurrence of a filename in your final list, because grep -l prints once the filename as soon as it finds the first occurrence of one of the three strings in a file.
Addendum about "$#"
% ff () { for i in "$#" ; do printf "[%s]\n" "$i" ; done ; }
% # NB "a s d" below is indeed "a SPACE s TAB d"
% ff "a s d" " ert " '345
345'
[a s d]
[ ert ]
[345
345]
%

cat myscript.sh
########################
#!/bin/bash
echo "Trying to find the file contenting the below string, relace your string with below string"
grep "string" /path/to/folder/* -R -l
########################
save above file and run it as below
sh myscript.sh > output.txt
once the command prmpt get return you can check the output.txt for require output.

Another approach, less efficient, that tries to address the OP question
How can I get both the grep command and output in the output file?
% cat Mockup
#!/bin/sh
grep -o -e string1 -e string2 -e string3 "$#" 2> /dev/null | sort -t: -k2 | uniq
Output: (mocked up as well)
% sh Mockup file{01..99}
file01:string1
file17:string1
file44:string1
file33:string2
file44:string2
file48:string2
%
looking at the output from POV of a consumer, one foresees problems with search strings and/or file names containing colons... oh well, that's another Q maybe

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string