Why doesn't xargs in parallel mode work with grep? - linux

cat gg
gives
192
and
cat tmpfilelist
gives
android/app/src/main/res/drawable-mdpi/src_assets_images_alerts_iconnotificationsmartpluggreen.png
android/app/src/main/res/drawable-mdpi/src_assets_images_ic_biggames.png
android/app/src/main/res/drawable-xhdpi/src_assets_images_alerts_iconnotificationsmartpluggreen.png
android/app/src/main/res/drawable-xhdpi/src_assets_images_ic_biggames.png
android/app/src/main/res/drawable-xxhdpi /src_assets_images_alerts_iconnotificationsmartpluggreen.png
android/app/src/main/res/drawable-xxhdpi/src_assets_images_ic_biggames.png
gg
ios/WebRTC.framework/Headers/RTCCallbackLogger.h
ios/WebRTC.framework/Headers/RTCFileLogger.h
ios/WebRTC.framework/Headers/RTCLogging.h
I run xargs with parralel mode on - this does not find the required text "192":
cat tmpfilelist | \xargs -P0 -t -I {} \bash -c "\grep -C 2 -H -I -r 192 {}" |& \grep -C 3 "192 gg"
bash -c '\grep -C 2 -H -I -r 192 android/app/src/main/res/drawable-xhdpi/src_assets_images_ic_biggames.png'
bash -c '\grep -C 2 -H -I -r 192 android/app/src/main/res/drawable-xxhdpi/src_assets_images_alerts_iconnotificationsmartpluggreen.png'
bash -c '\grep -C 2 -H -I -r 192 android/app/src/main/res/drawable-xxhdp/src_assets_images_ic_biggames.png'
bash -c '\grep -C 2 -H -I -r 192 gg'
bash -c '\grep -C 2 -H -I -r 192 ios/WebRTC.framework/Headers/RTCCallbackLogger.h'
bash -c '\grep -C 2 -H -I -r 192 ios/WebRTC.framework/Headers/RTCFileLogger.h'
bash -c '\grep -C 2 -H -I -r 192 ios/WebRTC.framework/Headers/RTCLogging.h'
When I disable parralel mode it succesfully finds the text "192" in file "gg":
cat tmpfilelist | \xargs -t -I {} \bash -c "\grep -C 2 -H -I -r 192 {}" |& \grep -C 3 "192 gg"
bash -c '\grep -C 2 -H -I -r 192 android/app/src/main/res/drawable-xhdpi/src_assets_images_ic_biggames.png'
bash -c '\grep -C 2 -H -I -r 192 android/app/src/main/res/drawable-xxhdpi/src_assets_images_alerts_iconnotificationsmartpluggreen.png'
bash -c '\grep -C 2 -H -I -r 192 android/app/src/main/res/drawable-xxhdp/src_assets_images_ic_biggames.png'
bash -c '\grep -C 2 -H -I -r 192 gg'
gg:192
bash -c '\grep -C 2 -H -I -r 192 ios/WebRTC.framework/Headers/RTCCallbackLogger.h'
bash -c '\grep -C 2 -H -I -r 192 ios/WebRTC.framework/Headers/RTCFileLogger.h'
Any reason why parallel mode would break grep? Or have I made a mistake somwhere?
Many thanks

The issue is indeed due to the parallel execution. The issue is that the output is not deterministic when the you run it in parallel. I create a similar test
bash-5.0# ls -alh
total 32K
drwxr-xr-x 9 root root 288 May 20 17:00 .
drwxr-xr-x 1 root root 4.0K May 20 17:11 ..
-rw-r--r-- 1 root root 3 May 20 17:00 a
-rw-r--r-- 1 root root 3 May 20 17:00 b
-rw-r--r-- 1 root root 3 May 20 17:00 c
-rw-r--r-- 1 root root 4 May 20 17:00 d
-rw-r--r-- 1 root root 3 May 20 17:00 e
-rw-r--r-- 1 root root 3 May 20 17:00 f
-rw-r--r-- 1 root root 11 May 20 17:00 files
bash-5.0# tail -n +1 *
==> a <==
192
==> b <==
193
==> c <==
192
==> d <==
195
==> e <==
200
==> f <==
198
==> files <==
a
b
c
d
e
f
Now if we run your two commands without the last grep the outputs are like below
Without parallel
bash-5.0# cat files | \xargs -t -I {} \bash -c "\grep -C 2 -H -I -r 192 {}"
bash -c \grep -C 2 -H -I -r 192 a
a:192
bash -c \grep -C 2 -H -I -r 192 b
bash -c \grep -C 2 -H -I -r 192 c
c:192
bash -c \grep -C 2 -H -I -r 192 d
bash -c \grep -C 2 -H -I -r 192 e
bash -c \grep -C 2 -H -I -r 192 f
With Parallel
bash-5.0# cat files | \xargs -P4 -t -I {} \bash -c "\grep -C 2 -H -I -r 192 {}"
bash -c \grep -C 2 -H -I -r 192 a
bash -c \grep -C 2 -H -I -r 192 b
bash -c \grep -C 2 -H -I -r 192 c
bash -c \grep -C 2 -H -I -r 192 d
bash -c \grep -C 2 -H -I -r 192 e
a:192
bash -c \grep -C 2 -H -I -r 192 f
c:192
Hope you can see how it is affecting the output and orders of your line. So your issue is that when you do grep -C 3 192 gg, you are supposed to get 3 lines before printing of 192 gg and 3 after, which you actually do get.
But gg:192 is printed sometime later because of each command sending their output in parallel on the same output terminal

Not an answer - but pointing to possible problem.
The code runs two filters in pipeline
grep -C 2 -H -I -r 192 FILENAMe
grep -C 3 "192 gg"
The output from the first line will follow the format 'FILENAME:DATA'. In the example above
gg:192
The second filter is not going to find the pattern '192 gg'.
So the question may be how does the non-parallel produces any output ?

Short answer
Your final grep -C 3 requests 3 lines of context around the 192 gg. In the parallel case, this might or might not be enough to find the line containing gg:192.
Details
Before the final grep, your output will consist of a number of lines like:
bash -c \grep -C 2 -H -I -r 192 <filename>
which are echoed to stderr just before xargs launches each bash -c ... command, and the line
gg:192
is echoed to stdout when the relevant bash -c ... command (i.e. the one involving the file gg) finds the match.
In the parallel case, the whole output that is piped into your the final grep (once stderr and stdout are combined using |&) might look something like this, where I have replaced the other filenames for brevity:
bash -c \grep -C 2 -H -I -r 192 some_file
bash -c \grep -C 2 -H -I -r 192 some_other_file
bash -c \grep -C 2 -H -I -r 192 another_file
bash -c \grep -C 2 -H -I -r 192 gg
bash -c \grep -C 2 -H -I -r 192 and_another_file
bash -c \grep -C 2 -H -I -r 192 yet_another_file
gg:192
bash -c \grep -C 2 -H -I -r 192 and_yet_another_file
In this example, the bash -c ... commands involving and_another_file and yet_another_file were launched after the one involving gg was launched, but before the one involving gg wrote its output (or at least, before any stdio buffers associated with that output were flushed), so they appear between the lines containing 192 gg and gg:192.
The number of such intervening lines between the lines containing 192 gg and the gg:192 (in this example, 2) will depend on timings and the number of other parallel tasks being launched after the one involving gg. This will vary, and for example if you inserted a sleep statement (e.g. ... \bash -c "sleep 1; \grep -C ...) then there would tend to be more such lines. In any case, you are then piping it to a grep -C 3 to extract 3 lines of context. If it so happens that there are fewer than 3 intervening lines, then this grep -C will find the line containing gg:192, but if there are 3 or more, then it will be outside the requested amount of context and will not be in the final output.
In the serial case, however, the gg:192 line is guaranteed to always be immediately after the 192 gg line, as follows:
bash -c \grep -C 2 -H -I -r 192 some_file
bash -c \grep -C 2 -H -I -r 192 some_other_file
bash -c \grep -C 2 -H -I -r 192 another_file
bash -c \grep -C 2 -H -I -r 192 gg
gg:192
bash -c \grep -C 2 -H -I -r 192 and_another_file
bash -c \grep -C 2 -H -I -r 192 yet_another_file
bash -c \grep -C 2 -H -I -r 192 and_yet_another_file
and so it is always within the 3 lines of context.

Related

Pass arrays to SSH connections

I want to pass an array to a script that is on a remote computer. I'm using SSH for this. I tried the below code and I'm getting an error saying that the parameter is not available.
ssh -i location/to/keyfile -o StrictHostKeyChecking=no -T ubuntu#18.220.20.50 ./script.sh -m 1G -s 0 -d 120 -w 60 -j 512M -k 512M -l 515M -b "${array_1[*]}" -u "${array_2[*]}"
Here ${array_1} and ${array_2} are indexed arrays.
If I understand the situation correctly, you have two arrays containing numbers, something like:
array_1=(1 2 3)
array_2=(21 22 23)
...and want to pass those lists of numbers to the script as space-separated lists, something like running this on the remote computer:
./script.sh -m 1G -s 0 -d 120 -w 60 -j 512M -k 512M -l 515M -b "1 2 3" -u "21 22 23"
If this is correct, try the following command:
ssh -i location/to/keyfile -o StrictHostKeyChecking=no -T ubuntu#18.220.20.50 ./script.sh -m 1G -s 0 -d 120 -w 60 -j 512M -k 512M -l 515M -b "'${array_1[*]}'" -u "'${array_2[*]}'"
Explanation: commands passed via ssh get parsed twice; first by the local shell, and then the result of that gets parsed again by the remote shell. In each of these parsing phases, quotes (and escapes) get applied and removed. Your original command had only one level of quotes, so the local shell parses, applies, and removes it, so the remote shell doesn't see any quotes, so it treats each of the numbers as a separate thing.
In more detail: the original command:
ssh -i location/to/keyfile -o StrictHostKeyChecking=no -T ubuntu#18.220.20.50 ./script.sh -m 1G -s 0 -d 120 -w 60 -j 512M -k 512M -l 515M -b "${array_1[*]}" -u "${array_2[*]}"
has the array references expanded, giving the equivalent of (assuming the array contents I listed above):
ssh -i location/to/keyfile -o StrictHostKeyChecking=no -T ubuntu#18.220.20.50 ./script.sh -m 1G -s 0 -d 120 -w 60 -j 512M -k 512M -l 515M -b "1 2 3" -u "21 22 23"
The local shell parses and removes the quotes, but they have the effect of passing 1 2 3 and 21 22 23 to the ssh programs as single arguments. But then ssh just pastes the list of command arguments it got back together with spaces in between, so this is what it sends to the remote shell:
./script.sh -m 1G -s 0 -d 120 -w 60 -j 512M -k 512M -l 515M -b 1 2 3 -u 21 22 23
...which confuses the script.
My solution, adding single-quotes around the array references, doesn't change the local parsing (the single-quotes are inside the double-quotes, so they have no special effect); they just get passed through, resulting in this command being sent to the remote shell:
./script.sh -m 1G -s 0 -d 120 -w 60 -j 512M -k 512M -l 515M -b '1 2 3' -u '21 22 23'
The single-quotes here have the same effect that double-quotes would (since there are no other quotes, escapes, dollar signs, or other special characters inside them), so this should give the result you want.
another solution slightly different from the answers in the question nominated by #jww
idea is pass array definition as text;
and than eval them through stdin device
sample code piece below
you need to replace echo part with your own array definition script,
and put the source /dev/stdin inside script.sh
echo 'array_1[id]=3.14'|ssh ubuntu#18.220.20.50 'source /dev/stdin; echo ${array_1[id]}'

Parsing a non-regular command output

I have the following output of iperf version 2 running on LEDE OS. I am trying to parse the output to get the number before the Mbits/sec which is the average throughput of the IPERF session. However, it seems the separation between each column does not match certain number of spaces nor tabs. In addition, the CSV format generated by iperf generates strange results, as a result I have to rely on the regular output of iperf. Any suggestion how to parse the output using either regular expression or awk command?
The iperf command:
iperf -c 10.0.0.7 -t 10 -i 0.1 -f m
The output:
[ 3] 0.00-10.00 sec 1889 MBytes 1584 Mbits/sec 15114/0 0
2483K/3302 us
You can use grep for those.
iperf -c 10.0.0.7 -t 10 -i 0.1 -f m | grep -o -E '\w+ Mbits/sec'
OR to be more accurate:
iperf -c 10.0.0.7 -t 10 -i 0.1 -f m | grep -o -E '[0-9]+ Mbits/sec'
To get only the digits, you can use yet another regex,
iperf -c 10.0.0.7 -t 10 -i 0.1 -f m | grep -Po '[[:digit:]]+ *(?=Mbits/sec)'
Above, [[:digit:]]+ and [0-9]+ are same and matches the digits in the line.
For FreeBSD grep in MacOS X, -P will not work. Instead use perl directly,
iperf -c 10.0.0.7 -t 10 -i 0.1 -f m | perl -nle 'print $& if m{\d+ *(?=Mbits/sec)}'
You cat try to use awk tool :
iperf -c 10.0.0.7 -t 10 -i 0.1 -f m | awk -F 'MBytes' {'print $2'}
If this is iperf 2 then try -fb for bit/byte formatting. This format is easier to parse w/regular expressions as it's just a number. Man page is here.
`GENERAL OPTIONS
-f, --format [abkmgBKMG]
format to report: adaptive, bits, Bytes, Kbits, Mbits, Gbits, KBytes, MBytes, GBytes (see NOTES for more)`

Bash: store redis-benchmark result to var generate strange string

I try to parse redis-benchmark result in shell script, I write the script but failed to execute.
Environment
$ bash --version
GNU bash, version 4.2.24(1)-release (x86_64-pc-linux-gnu)
$ cat /etc/issue
Ubuntu 12.04 LTS \n \l
$ dpkg -l |grep redis
2:2.8.19-rwky1~precise
$ cat demo.sh
OUTPUT=`redis-benchmark -n 1000 -r 100000 -d 32 -c 30 -t GET -p 6379 -q |grep 'per second'`
R=$(echo "$OUTPUT" | cut -f 1 -d'.')
S=$(echo $R | awk '{print $2}')
echo $S
Shell debug show some confuse information.
$ bash -x demo.sh
++ redis-benchmark -n 1000 -r 100000 -d 32 -c 30 -t GET -p 6379 -q
++ grep 'per second'
GET: 166666.67 requests per second'
GET: 166666.67 requests per second'
++ cut -f 1 -d.
GET: 166666'an
++ echo GET: $'-nan\rGET:' 166666
++ awk '{print $2}'
+ S=$'-nan\rGET:'
+ echo $'-nan\rGET:'
GET:
Do I miss something?
Comments
Looks due to redis-benchmark result is something strange, don't know why
$ redis-benchmark -n 1000 -r 100000 -d 32 -c 30 -t GET -p 6379 -q |grep per > todo
$ vim todo
GET: -nan^MGET: 166666.67 requests per second
If you will not be able to fix the redis-benchmark output, this will parse both the correct and strange formats:
redis-benchmark -n 1000 -r 100000 -d 32 -c 30 -t GET -p 6379 -q | grep 'per second' | sed 's/.*GET: \(.*\) requests .*/\1/'
But you should probably fix the input :D

wc -c and wc -m give the same output all the time?

I am having the following doubt. wc -m and wc -c are always giving same output. I tried with floating point numbers also but the output is same for both the commands.
cat test | wc -m
541
cat test | wc -c
541
ASCII character takes byte. But UTF-8 local charaters takes 2 bytes.
echo -n "ŻÓŹŁŃĘ"|wc -m
6
echo -n "ŻÓŹŁŃĘ"|wc -c
12
P.S. You can wc -m test to save cat.

command hangs head / netcat

I am using two linux machines to simulate some firewall tests... I execute the tests by running nc through ssh on a remote machine... if I spawn the ssh like this, it works...
ssh -i id_dsa -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no \
-p 2224 root#a2-idf-lab nc -s 10.26.216.82 10.195.18.132 \
21 < /var/log/messages
However, if I try to control how much of /var/log/messages with head -c 20 /var/log/messages, the command hangs but I don't understand why...
ssh -i id_dsa -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no \
-p 2224 root#a2-idf-lab nc -s 10.26.216.82 10.195.18.132 \
21 < head -c 20 /var/log/messages
I also tried this with no better success...
ssh -i id_dsa -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no \
-p 2224 root#a2-idf-lab nc -s 10.26.216.82 10.195.18.132 \
21 < (head -c 20 /var/log/messages)
Question: Why does the second command hang, and how can I accomplish what I need?
FYI, these experiments were really in preparation for sending cat /dev/urandom | base64 | head -c 20 - into nc... bonus points if you can give me cli that would work with nc through an ssh session...
< is shell redirection, it redirects the input stream to read from a file, not to execute a command. try:
head -c 20 /var/log/messages | ssh -i id_dsa -o UserKnownHostsFile=/dev/null \
-o StrictHostKeyChecking=no \
-p 2224 root#a2-idf-lab nc -s 10.26.216.82 10.195.18.132 21
this pipes /var/log/messages from the local machine into nc on the remote machine.
if you want to use the /var/log/messages file on the remote machine, use quotes around the command:
ssh -i id_dsa -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no \
-p 2224 root#a2-idf-lab "head -c 20 /var/log/messages |\
nc -s 10.26.216.82 10.195.18.132 21"
Try to use
head -n 20
My guess is the problem is the lack of carriage return at the end.

Resources