Need to parse big log file in one run and print id, address and service_name of found requests. The problem is that service_name is in request body that is quite big.
If I list all patterns with -e option -
grep -e 'ID: [0-9]\+' -e 'Address: .*' -e ':Body><[^ ]*'
the full request body will be printed.
What is needed is
grep -e 'ID: [0-9]\+' -e 'Address: .*' -o ':Body><[^ ]*'
or
grep -o 'ID: [0-9]\+' -o 'Address: .*' -o ':Body><[^ ]*'
to print only first word from request body that is name of the service;
but in this case grep: :Body><[^ ]*: No such file or directory error received
UPD: solution with -oe and regex works, but as it turned out -o significantly slows the operation
If you wish to print only bits of a file that match these 3 regular expressions and those three are never on the same line, you may use \|, which is grep's logical or :
grep -o 'ID: [0-9]\+\|Address: .*\|:Body><[^ ]*' my.log
Without seeing your log its difficult to fully understand what your example will return. You might try -Eo and see if that helps get what you want. You may need to adjust the regex accordingly. -E should at least resolve your "grep: :Body><[^ ]*: No such file or directory" error you're receiving.
grep -Eo 'ID: [0-9]\+' -Eo 'Address: .*' -Eo ':Body><[^ ]*' myLog.log
Related
I want get the run times of some processes. Here is what I am doing
ps -ef | grep "python3 myTask.py" | awk '{print $2}' | xargs -n1 ps -p {} -o etime
I want to get the pids by
ps -ef | grep "python3 myTask.py" | awk '{print $2}'
then pass these along to the
ps -p {} -o etime
by using xargs, but its not working. I get
error: process ID list syntax error
Usage:
ps [options]
Try 'ps --help <simple|list|output|threads|misc|all>'
or 'ps --help <s|l|o|t|m|a>'
for additional help text.
For more details see ps(1).
error: process ID list syntax error
Usage:
ps [options]
Try 'ps --help <simple|list|output|threads|misc|all>'
or 'ps --help <s|l|o|t|m|a>'
for additional help text.
For more details see ps(1).
what am i doing wrong?
You can use the following command:
pgrep -f "python3 myTask.py" | xargs -i{} ps -p {} -o etime
pgrep - Look up or signal processes based on name and other attributes.
-f, --full -
The pattern is normally only matched against the process name. When -f is set, the full command line is
used.
For further reading, see man pgrep.
The missing part from the xargs segment was -i{}, which invokes the command for each argument, whilst {} will be replaced by it.
-i[replace-str], --replace[=replace-str] -
This option is a synonym for -Ireplace-str if replace-str is specified.
For further reading, see man xargs.
You must provide -I{} to xargs to set the placeholder; otherwise it cannot be used.
Nevertheless, your command is too complicated and involves too many intermediate steps (and a race-condition). Simply get your processes including elapsed time and filter the lines you need:
ps -eo etime,cmd | awk '/python3 myTask.py/{print $1}'
(no xargs anymore)
Doing the following:
First console
touch /tmp/test
Second console
tail -f /tmp/test |grep propo |grep -v miles
Third console
echo propo >> /tmp/test
Second console must show "propo" but it doesn't shows anything, if you run in second console instead:
tail -f /tmp/test |grep propo
And do echo propo >> /tmp/test it will show propo, but the grep -v is for miles not for propo
Why?
Test into your own environment if you want, it's pretty obvious but not working.
Why?
Most probably because the output of a command when piped to another command is fully buffered, not line buffered. The output could be buffered in the first pipe or by grep.
Use stdbuf -oL to force line buffering and grep --line-buffered for line buffered grep.
the problem is that grep does not use line buffering by default; so the output will be buffered. You could use grep --line-buffered:
tail -f /tmp/test | grep --line-buffered propo | grep -v miles
I want to remove duplicate lines in wget output.
I use this code
wget -q "http://www.sawfirst.com/selena-gomez" -O -|tr ">" "\n"|grep 'selena-gomez-'|cut -d\" -f2|cut -d\# -f1|while read url;do wget -q "$url" -O -|tr ">" "\n"|grep 'name=.*content=.*jpg'|cut -d\' -f4|sort |uniq;done
And output like this
http://www.sawfirst.com/wp-content/uploads/2018/03/Selena-Gomez-12.jpg
http://www.sawfirst.com/wp-content/uploads/2018/03/Selena-Gomez-12.jpg
http://www.sawfirst.com/wp-content/uploads/2018/03/Selena-Gomez-12.jpg
http://www.sawfirst.com/wp-content/uploads/2018/03/Selena-Gomez-12.jpg
http://www.sawfirst.com/wp-content/uploads/2018/02/Selena-Gomez-760.jpg
http://www.sawfirst.com/wp-content/uploads/2018/02/Selena-Gomez-760.jpg
I want to remove duplicate lines of output.
Better try :
mech-dump --images "http://www.sawfirst.com/selena-gomez" |
grep -i '\.jpg$' |
sort -u
Package libwww-mechanize-perl for Debian and derivatives.
Output:
http://www.sawfirst.com/wp-content/uploads/2018/03/Selena-Gomez-12.jpg
http://www.sawfirst.com/wp-content/uploads/2018/02/Selena-Gomez-760.jpg
http://www.sawfirst.com/wp-content/uploads/2018/02/Selena-Gomez-404.jpg
...
In some cases, tools like Beautiful Soup become more appropriate.
Trying to do this with only wget & grep becomes an interesting exercise, this is my naive try but I am very sure are better ways of doing it
$ wget -q "http://www.sawfirst.com/selena-gomez" -O -|
grep -Eo "(http|https)://[a-zA-Z0-9./?=_-]*" |
grep -i "selena-gomez" |
while read url; do
if [[ $url == *jpg ]]
then
echo $url
else
wget -q $url -O - |
grep -Eo "(http|https)://[a-zA-Z0-9./?=_-]*" |
grep -i "selena-gomez" |
grep "\.jpg$" &
fi
done | sort -u > selena-gomez
In the first round:
wget -q "http://www.sawfirst.com/selena-gomez" -O -|
grep -Eo "(http|https)://[a-zA-Z0-9./?=_-]*" |
grep -i "selena-gomez"
URLs matching the desired name will be extracted, in the while loop could be the case that the $url is already ending with .jpg therefore it will be only printed instead of fetching the content again.
This approach just goes deep 1 level, and to try to speed up things it uses & ad the end with the intention to do multiple requests in parallel:
grep "\.jpg$" &
Need to check if the & lock or wait for all background jobs to finish
It ends with sort -u to return a unique list of items found.
I've got two test files, namely, ttt.txt and ttt2.txt, the Content of which is shown as below:
#ttt.txt
(132) 123-2131
543-732-3123
238-3102-312
#ttt2.txt
1
2
3
I've already tried the following commands in bash and it works fine:
if grep -oE "(\(\d{3}\)[ ]?\d{3}-\d{4})|(\d{3}-\d{3}-\d{4})" ttt1.txt ; then echo "found"; fi
# with output 'found'
if grep -oE "(\(\d{3}\)[ ]?\d{3}-\d{4})|(\d{3}-\d{3}-\d{4})" ttt2.txt ; then echo "found"; fi
But when I combine the above command with xargs, it complains error '-bash: syntax error near unexpected token `then''. Could anyone give me some explanation? Thanks in advance!
ll | awk '{print $9}' | grep ttt | xargs -I $ if grep --quiet -oE "(\(\d{3}\)[ ]?\d{3}-\d{4})|(\d{3}-\d{3}-\d{4})" $; then echo "found"; fi
$ is a special character in bash (it marks variables) so don't use it as your xargs marker, you'll only get confused.
The real problem here though is that you are passing if grep --quiet -oE "(\(\d{3}\)[ ]?\d{3}-\d{4})|(\d{3}-\d{3}-\d{4})" $ as the argument to xargs, and then the remainder of the line is being treated as a new command, because it breaks at the ;.
You can wrap the whole thing in a sub-invocation of bash, so that xargs sees the whole command:
$ ll | awk '{print $9}' | grep ttt | xargs -I xx bash -c 'if grep --quiet -oE "(\(\d{3}\)[ ]?\d{3}-\d{4})|(\d{3}-\d{3}-\d{4})" xx; then echo "found"; fi'
found
Finally, ll | awk '{print $9}' | grep ttt is a needlessly complicated way of listing the files that you're looking for. You actually you don't need any of the code above, just do this:
$ if grep --quiet -oE "(\(\d{3}\)[ ]?\d{3}-\d{4})|(\d{3}-\d{3}-\d{4})" ttt*; then echo "found"; fi
found
Alternatively, if you want to process each file in turn (which you don't need here, but you might want when this gets more complicated):
for file in ttt*
do
if grep --quiet -oE "(\(\d{3}\)[ ]?\d{3}-\d{4})|(\d{3}-\d{3}-\d{4})" "$file"
then
echo "found"
fi
done
I believe this is a simple syntax issue on my part but I have been unable to find another example similar to what i'm trying to do. I have a variable taking in a specific disk location and I need to use that location in an hdparm /grep command to pull out the max LBA
targetDrive=$1 #/dev/sdb
maxLBA=$(hdparm -I /dev/sdb |grep LBA48 |grep -P -o '(?<=:\s)[^\s]*') #this works perfect
maxLBA=$(hdparm -I $1 |grep LBA48 |grep -P -o '(?<=:\s)[^\s]*') #this fails
I have also tried
maxLBA=$(hdparm -I 1 |grep LBA48 |grep -P -o '(?<=:\s)[^\s]*')
maxLBA=$(hdparm -I "$1" |grep LBA48 |grep -P -o '(?<=:\s)[^\s]*')
Thanks for the help
So I think here is the solution to your problem. I did basically the same as you but changed the way I pipe the results into one another.
grep with regular expression to find the line containing LBA48
cut to retrieve the second field when the resulting string is divided by the column ":"
then trim all the leasding spaces from the result
Here is my resulting bash script.
#!/bin/bash
target_drive=$1
max_lba=$(sudo hdparm -I "$target_drive" | grep -P -o ".+LBA48.+:.+(\d+)" | cut -d: -f2 | tr -d ' ')
echo "Drive: $target_drive MAX LBA48: $max_lba"