Parsing token value from gitlab-runner output containing ANSI escape attributes - linux

Currently i'm trying to catch a token from gitlab-runner list, which outputs something like that:
Listing configured runners ConfigFile=/etc/gitlab-runner/config.toml
Ursain Bolt Executor=docker Token=abcdef678901234567890123456789 URL=https://my.gitlab.com
I am searching for the existence of a Token: Token=abcdef678901234567890123456789.
I tried several patterns (i'm familiar with regular expressions), the one i'd prefer to use looks as follows:
gitlab-runner list 2>&1 >/dev/null | grep -E 'Token=[a-f0-9]{30}'
I redirect stderr to stdout since it seems gitlab-runner prints to stdout. Not piping stdout to /dev/null makes no difference here.
This pattern does not match.
I tried sucessfully (grep returned 0) the following matches:
[a-f0-9]+
([a-f0-9])+'
([a-f0-9]){30}
Token
Token.
However, the following matches did not work (grep returned 1):
(Token.[a-f0-9]){30}
Token=[a-f0-9]{30}
Token=
Token\=
What am i missing, why does the regular combination of both patterns (Token and [a-f0-9]{30}) with an equal sign not work?
Raw Gitlab-Runner Output
bash-4.3# gitlab-runner list 2>&1 >/dev/null
Listing configured runners ConfigFile=/etc/gitlab-runner/config.toml
Ursain Bolt Executor=docker Token=abcdef678901234567890123456789 URL=https://my.gitlab.com
Partial Hexdump Update
...
000000a0: 2045 7865 6375 746f 721b 5b30 3b6d 3d64 Executor.[0;m=d
000000b0: 6f63 6b65 7220 546f 6b65 6e1b 5b30 3b6d ocker Token.[0;m
000000c0: 3d65 6466 3834 3062 3436 6166 6434 3333 =edf840b46afd433
...
Version Numbers
bash-4.3# cat /etc/alpine-release
3.6.2
bash-4.3# bash --version
GNU bash, version 4.3.48(1)-release (x86_64-alpine-linux-musl)
bash-4.3# grep --version
grep (GNU grep) 3.0
Copyright (C) 2017 Free Software Foundation, Inc.
Trying to reproduce this by echoing the output doesn't seem to work, grep matches the pattern as intended.

If you are using GNU grep use the PCRE library that comes with it. The -P flag enables it and -o flag returns only the matched group than the entire line. The \K is a reserved character to indicate ignore the string in the matched string upto that part.
grep -Po 'Token=\K[a-f0-9]{30}'
The problem with grep -E 'Token=[a-f0-9]{30}' is it will return the entire line matching the regex and not the matching group alone, since the string you are looking for is part of a line with other words. You can of-course use the -o flag in your original expression, but it would still return Token=abcdef678901234567890123456789.
As a side note, you might want to ensure in which output stream your Token= string is available, because by your current re-direction 2>&1 > /dev/null you are suppressing the entire stdout to null stream and re-directing only stderr to console and grep is acting only on that.
Update
So if your problem is with ANSI escape sequences, you need to clean your output with sed or any tools of your choice and apply the grep regex filter, something like
gitlab-runner list 2>&1 >/dev/null | s/.\[0;m//g | grep -Po 'Token=\K[a-f0-9]{30}'

I fetched a hexdump of the output (using gitlab-runner list 2>&1 >/dev/null | xxd [xxd should be shipped with vim]. It turned out, that gitlab-runner adds Ansi Escape Sequences .[0;m before every equal sign:
...
000000a0: 2045 7865 6375 746f 721b 5b30 3b6d 3d64 Executor.[0;m=d
000000b0: 6f63 6b65 7220 546f 6b65 6e1b 5b30 3b6d ocker Token.[0;m
000000c0: 3d65 6466 3834 3062 3436 6166 6434 3333 =edf840b46afd433
...
The finally working pattern is Token.{5}=[a-f0-9]{30}.

Related

grep and cut a specific pattern [duplicate]

Is there a way to make grep output "words" from files that match the search expression?
If I want to find all the instances of, say, "th" in a number of files, I can do:
grep "th" *
but the output will be something like (bold is by me);
some-text-file : the cat sat on the mat
some-other-text-file : the quick brown fox
yet-another-text-file : i hope this explains it thoroughly
What I want it to output, using the same search, is:
the
the
the
this
thoroughly
Is this possible using grep? Or using another combination of tools?
Try grep -o:
grep -oh "\w*th\w*" *
Edit: matching from Phil's comment.
From the docs:
-h, --no-filename
Suppress the prefixing of file names on output. This is the default
when there is only one file (or only standard input) to search.
-o, --only-matching
Print only the matched (non-empty) parts of a matching line,
with each such part on a separate output line.
Cross distribution safe answer (including windows minGW?)
grep -h "[[:alpha:]]*th[[:alpha:]]*" 'filename' | tr ' ' '\n' | grep -h "[[:alpha:]]*th[[:alpha:]]*"
If you're using older versions of grep (like 2.4.2) which do not include the -o option, then use the above. Else use the simpler to maintain version below.
Linux cross distribution safe answer
grep -oh "[[:alpha:]]*th[[:alpha:]]*" 'filename'
To summarize: -oh outputs the regular expression matches to the file content (and not its filename), just like how you would expect a regular expression to work in vim/etc... What word or regular expression you would be searching for then, is up to you! As long as you remain with POSIX and not perl syntax (refer below)
More from the manual for grep
-o Print each match, but only the match, not the entire line.
-h Never print filename headers (i.e. filenames) with output lines.
-w The expression is searched for as a word (as if surrounded by
`[[:<:]]' and `[[:>:]]';
The reason why the original answer does not work for everyone
The usage of \w varies from platform to platform, as it's an extended "perl" syntax. As such, those grep installations that are limited to work with POSIX character classes use [[:alpha:]] and not its perl equivalent of \w. See the Wikipedia page on regular expression for more
Ultimately, the POSIX answer above will be a lot more reliable regardless of platform (being the original) for grep
As for support of grep without -o option, the first grep outputs the relevant lines, the tr splits the spaces to new lines, the final grep filters only for the respective lines.
(PS: I know most platforms by now would have been patched for \w.... but there are always those that lag behind)
Credit for the "-o" workaround from #AdamRosenfield answer
It's more simple than you think. Try this:
egrep -wo 'th.[a-z]*' filename.txt #### (Case Sensitive)
egrep -iwo 'th.[a-z]*' filename.txt ### (Case Insensitive)
Where,
egrep: Grep will work with extended regular expression.
w : Matches only word/words instead of substring.
o : Display only matched pattern instead of whole line.
i : If u want to ignore case sensitivity.
You could translate spaces to newlines and then grep, e.g.:
cat * | tr ' ' '\n' | grep th
Just awk, no need combination of tools.
# awk '{for(i=1;i<=NF;i++){if($i~/^th/){print $i}}}' file
the
the
the
this
thoroughly
grep command for only matching and perl
grep -o -P 'th.*? ' filename
I was unsatisfied with awk's hard to remember syntax but I liked the idea of using one utility to do this.
It seems like ack (or ack-grep if you use Ubuntu) can do this easily:
# ack-grep -ho "\bth.*?\b" *
the
the
the
this
thoroughly
If you omit the -h flag you get:
# ack-grep -o "\bth.*?\b" *
some-other-text-file
1:the
some-text-file
1:the
the
yet-another-text-file
1:this
thoroughly
As a bonus, you can use the --output flag to do this for more complex searches with just about the easiest syntax I've found:
# echo "bug: 1, id: 5, time: 12/27/2010" > test-file
# ack-grep -ho "bug: (\d*), id: (\d*), time: (.*)" --output '$1, $2, $3' test-file
1, 5, 12/27/2010
cat *-text-file | grep -Eio "th[a-z]+"
You can also try pcregrep. There is also a -w option in grep, but in some cases it doesn't work as expected.
From Wikipedia:
cat fruitlist.txt
apple
apples
pineapple
apple-
apple-fruit
fruit-apple
grep -w apple fruitlist.txt
apple
apple-
apple-fruit
fruit-apple
I had a similar problem, looking for grep/pattern regex and the "matched pattern found" as output.
At the end I used egrep (same regex on grep -e or -G didn't give me the same result of egrep) with the option -o
so, I think that could be something similar to (I'm NOT a regex Master) :
egrep -o "the*|this{1}|thoroughly{1}" filename
To search all the words with start with "icon-" the following command works perfect. I am using Ack here which is similar to grep but with better options and nice formatting.
ack -oh --type=html "\w*icon-\w*" | sort | uniq
You could pipe your grep output into Perl like this:
grep "th" * | perl -n -e'while(/(\w*th\w*)/g) {print "$1\n"}'
grep --color -o -E "Begin.{0,}?End" file.txt
? - Match as few as possible until the End
Tested on macos terminal
$ grep -w
Excerpt from grep man page:
-w: Select only those lines containing matches that form whole words. The test is that the matching substring must either be at the beginning of the line, or preceded by a non-word constituent character.
ripgrep
Here are the example using ripgrep:
rg -o "(\w+)?th(\w+)?"
It'll match all words matching th.

Fail to exclude words from grep command

For filtering errors in log files I have a command something like that
sudo grep -R --color=always -ri "err" *.log | grep -v "terry"
but the output isn't what I want. I still see lines like
mail.log:Mar 27 10:31:44 (removed) postfix/smtp[5449]: 4EB0822348: to=, relay=(removed), delay=6.6, delays=0.55/0.02/3.4/2.6, dsn=2.0.0, status=sent (250 OK id=1csFlH-00010k-6T)
Why is that line here when I have excluded "terry" from it?
Your "--color=always" is why you are still getting the the result. Remember that the pipe sends the stdout of one program to stdin of another. The output of your first grep command is outputting colors. In order to output color to the screen you have to send the color codes to the shell. In order for the shell to interpret these characters as colors it must use escape codes like this:
echo -e "This is \e[31mRed"
The word "Red" will be red when it is echo'ed. So grep is sending the escape characters to the second grep command. Go ahead and try it for yourself by redirecting your first grep command to a file and then examining the file.
grep -R --color=always -ri 'err' /tmp/log/syslog > /tmp/log/syslog2
Now open the file in a text editor (Don't cat the file out as you will just see the colors).
ar 26 10:30:59 zipmaster07 cinnamon-screensaver-dialog: pam_ecryptfs: seteuid ^[[01;31m^[[Kerr^[[m^[[Kor
Mar 26 14:27:19 zipmaster07 cinnamon-screensaver-dialog: pam_ecryptfs: seteuid ^[[01;31m^[[Kerr^[[m^[[Kor
t^[[01;31m^[[Kerr^[[m^[[Ky was here with an ^[[01;31m^[[Kerr^[[m^[[Kor.
mail.log:Mar 27 10:31:44 (removed) postfix/smtp[5449]: 4EB0822348: to=<t^[[01;31m^[[Kerr^[[m^[[Ky#(removed
The line "terry#...." is not terry anymore it is "t^[[01;31m^[[Kerr^...." and an inverted match of "terry" is not equal to "t^[[01;31m^[[Kerr^....", therefore grep includes it.
You need to remove the color option.
jschaeffer#zipmaster07 ~ $ grep -R -ri 'err' /tmp/log/sys2.log
pam_ecryptfs: seteuid err
pam_ecryptfs: seteuid err
terry was here with an error.
mail.log:Mar 27 10:31:44 (removed) postfix/smtp[5449]: 4EB0822348: to=<terry#(removed)>, relay=(removed), delay=6.6, delays=0.55/0.02/3.4/2.6, dsn=2.0.0, status=sent (250 OK id=1csFlH-00010k-6T)
Now with the second grep.
schaeffer#zipmaster07 ~ $ grep -R -ri 'err' /tmp/log/sys2.log | grep -v terry
pam_ecryptfs: seteuid err
pam_ecryptfs: seteuid err
Hopefully this all make sense.

How to check latest kernel version by bash

How can I check latest kernel version by bash?
Is there any command to check latest kernel from https://www.kernel.org/ ?
If you are looking for the latest kernel version on the website and not the one on your system, you can use this command. It will work fine unless they change their page layout later. If they do, in that case, you will have to tweak your command:
[root#slave2 gc]# curl -s https://www.kernel.org/ | grep -A1 'mainline:' | grep -oP '(?<=strong>).*(?=</strong.*)'
3.16-rc7
It will return you the 'mainline' release. You can search for 'stable' release using the same logic.
Explanation:
-o Option to print only what matches the pattern.
-P Interpret the pattern as a Perl regular expression.
(?=pattern) A zero-width positive look-ahead assertion. To put it in simple words using an example, q(?=u) matches a q that is followed by a u.
(?<=pattern) A zero-width positive look-behind assertion. To put it in simple words using an example, (?<=a)b matches the b (and only the b) in cab, but does not match bed or debt
So, whatever pattern is matched is actually removed from the output and that's how we get the result. :)
You can refer these links for more detail:
http://perldoc.perl.org/perlre.html#Extended-Patterns
http://www.regular-expressions.info/lookaround.html
worked since 2017
curl -s https://www.kernel.org | grep -A1 latest_link | tail -n1 | egrep -o '>[^<]+' | egrep -o '[^>]+'

Is there a way to look for a flag in a man page?

I'm trying to come up with a way to find a specific flag in a man-page. Usually, I type '/'
to search for something, followed by something like '-Werror' to find a specific flag.
The thing is though that there are man-pages (gcc is the one motivating me right now) that
have a LOT of references to flags in their text, so there are a lot of occurrences.
It's not that big of a deal, but maybe it can be done a bit better. I thought of looking for
something like '-O\n' but it didn't work (probably because the man program doesn't use C escapes?)
Then I've tried something like man gcc | grep $'-O\n', since I kind of recall that a
single-quoted string preceded by a dollar sign haves bash interpret common C escapes...
It' didn't work, grep echoed the whole man-page.
That's what has brought me here: why? or rather, can this be done?
rici's helpful answer explains the problem with the original approach well.
However, there's another thing worth mentioning:
man's output contains formatting control characters, which interfere with text searches.
If you pipe to col -b before searching, these control characters are removed - note the side effect that the search results will be plain-text too.
However, grep is not the right tool for this job; I suggest using awk as follows to obtain the description of -O:
man gcc | col -b | awk -v RS= '/^\s+-O\n/'
RS= (an empty input-record separator) is an awk idiom that breaks the input into blocks of non-empty lines, so matching the option at the start of such a block ensures that all lines comprising the description of the option are returned.
If you have a POSIX-features-only awk such as BSD/OSX awk, use this version:
man gcc | col -b | awk -v RS= '/^[[:blank:]]+-O\n/'
Obviously, such a command is somewhat cumbersome to type, so find generic bash function manopt below, which returns the description of the specified option for the specified command from its man page. (There can be false positives and negatives, but overall it works pretty well.)
Examples:
manopt gcc O # search `man gcc` for description of `-O`
manopt grep regexp # search `man grep` for description of `--regexp`
manopt find '-exec.*' # search `man find` for all actions _starting with_ '-exec'
bash function manopt() - place in ~/.bashrc, for instance:
# SYNOPSIS
# manopt command opt
#
# DESCRIPTION
# Returns the portion of COMMAND's man page describing option OPT.
# Note: Result is plain text - formatting is lost.
#
# OPT may be a short option (e.g., -F) or long option (e.g., --fixed-strings);
# specifying the preceding '-' or '--' is OPTIONAL - UNLESS with long option
# names preceded only by *1* '-', such as the actions for the `find` command.
#
# Matching is exact by default; to turn on prefix matching for long options,
# quote the prefix and append '.*', e.g.: `manopt find '-exec.*'` finds
# both '-exec' and 'execdir'.
#
# EXAMPLES
# manopt ls l # same as: manopt ls -l
# manopt sort reverse # same as: manopt sort --reverse
# manopt find -print # MUST prefix with '-' here.
# manopt find '-exec.*' # find options *starting* with '-exec'
manopt() {
local cmd=$1 opt=$2
[[ $opt == -* ]] || { (( ${#opt} == 1 )) && opt="-$opt" || opt="--$opt"; }
man "$cmd" | col -b | awk -v opt="$opt" -v RS= '$0 ~ "(^|,)[[:blank:]]+" opt "([[:punct:][:space:]]|$)"'
}
fish implementation of manopt():
Contributed by Ivan Aracki.
function manopt
set -l cmd $argv[1]
set -l opt $argv[2]
if not echo $opt | grep '^-' >/dev/null
if [ (string length $opt) = 1 ]
set opt "-$opt"
else
set opt "--$opt"
end
end
man "$cmd" | col -b | awk -v opt="$opt" -v RS= '$0 ~ "(^|,)[[:blank:]]+" opt "([[:punct:][:space:]]|$)"'
end
I suspect you didn't actually use grep $'-O\n', but rather some flag recognized by grep.
From grep's point of view, you are simply passing an argument, and that argument starts with a - so it's going to be interpreted as an option. You need to do something like grep -- -O$ to explicitly flag the end of the list of options, or grep -e -O$ to explicitly flag the pattern as a pattern. In any event, you cannot include a newline in a pattern because grep patterns are actually lists of patterns separated by newline characters, so the argument $'foo\n' is actually two patterns, foo and the empty string, and the empty string will match every line.
Perhaps you searched for the flag -e since that takes a pattern as an argument, and giving it a newline as an argument will cause grep to find every line in the whole file.
For most GNU programs, such as gcc, you might find the info interface easier to navigate in, since it includes reference links, tables of contents, and even indices. The info gcc document includes an index of options, which is very useful. In some linux distributions, and somewhat surprisingly since they call themselves GNU/linux distributions, it's necessary to separately install info packages although man files are distributed with the base software. The debian/ubuntu package containing the gcc info files is called gcc-doc, for example. (The use of the -doc suffix to the package name is quite common.)
In the case of gcc you can rapidly find an option using a command like:
info gcc "option index" O
or
info gcc --index-search=funroll-loops
For programs with fewer options, it's usually good enough to use info's -O option:
info -O gawk
The thing is that 'man' uses a pager, commonly 'less', whose man-page states:
/pattern
Search forward in the file for the N-th line containing the pattern.
N defaults to 1. The pattern is a regular expression, as recognized by the
regular expression library supplied by your system. The search starts at the
first line displayed (but see the -a and -j options, which change this).
So one could try and look for '-O$' in a man-page to find a flag that lives alone in it's
own line. Although, it is common for a flag to be followed by text in the very same line,
so this is not guaranteed to work.
The issue with grep and $'-O\n' is still a mystery though.
man gcc | grep "\-"
This works pretty well, as it displays all flags and usually not much more.
Edit: I notice I didn't completely answer your question, but I hope my suggestion can be considered as a nice alternative.
I use folowing:
man some_command | col -b | grep -A5 -- 'your_request'
Examples:
man man | col -b | grep -A5 -- '-K'
man grep | col -b | grep -A5 -- '-e patt'
You can make alias for it.
The manly Python utility is very convenient for getting a quick explanation of all options used in a given command.
Note that it only outputs the first paragraph of the option descriptions.
pip install manly
$ manly blkid /dev/sda -o value -p
blkid - locate/print block device attributes
============================================
-o, --output format
Use the specified output format. Note that the order of vari‐
ables and devices is not fixed. See also option -s. The format
parameter may be:
-p, --probe
Switch to low-level superblock probing mode (bypassing the
cache).
a double dash (--) is used in most bash built-in commands and many other commands to signify the end of command options
https://unix.stackexchange.com/a/11382/204245
Without the double-dash, grep is trying to use whatever flag you are looking for:
$ man curl | grep -c # Looks for this c flag, but can't find one so throws the error below.
usage: grep [-abcDEFGHhIiJLlmnOoqRSsUVvwxZ] [-A num] [-B num] [-C[num]]
[-e pattern] [-f file] [--binary-files=value] [--color=when]
[--context[=num]] [--directories=action] [--label] [--line-buffered]
[--null] [pattern] [file ...]
If you use double-dash to signify the end of input to grep, it works a bit better, but you still end up with every occurrence of the match:
$ man curl | grep -- -c
--cacert <file>
certs file named 'curl-ca-bundle.crt', either in the same direc-
--capath <dir>
curl to make SSL-connections much more efficiently than using
--cacert if the --cacert file contains many CA certificates.
--cert-status
--cert-type <type>
-E, --cert <certificate[:password]>
--ciphers <list of ciphers>
# ...many more matches......
So simply wrap the flag in quotes and throw a space before it to only match the -c flag:
$ man curl | grep -- " -c"
-c, --cookie-jar <filename>
This has driven me insane for years. Hope this helps.
man is based on an environment variable (EDITOR if I'm not mistaking). You can change this from more (the default value) to, e.g., emacs, and then while using man an emacs session gets opened on your system, where you can search and browse as you like.

Question about shell commands and grep

Does anyone know why
grep "p\{2\}" textfile
will find "apple" if it's in the file, but
grep p\{2\} textfile
won't?
I'm new to using a command line and regular expressions, and this is puzzling me.
Although this has already been answered, but since you are new to all this stuff, here is how to debug it:
-- get the pid of current shell (using ps).
PID TTY TIME CMD
1611 pts/0 00:00:00 su
1619 pts/0 00:00:00 bash
1763 pts/0 00:00:00 ps
-- from some other shell, attach strace (system call tracer) to the required pid (here 1619):
strace -f -o <output_file> -p 1619
-- Run both the commands that you tried
-- open the output file and look for exec family calls for the required process, here: grep
The output on my machine is some thing like:
1723 execve("/bin/grep", ["grep", "--color=auto", "p{2}", "foo"], [/* 19 vars */]) = 0
1725 execve("/bin/grep", ["grep", "--color=auto", "p\\{2\\}", "foo"], [/* 19 vars */]) = 0
Now you can see the difference how grep was executed in both the cases and can figure out the problem yourself. :)
still the -e flag mystery is yet to be solved....
Without the quotes, the shell will try to expanding the options. In your case the curly brackets '{}' have a special meaning in the shell much like the asterisk '*' which expands to a wildcard.
With quotes, your complete regex gets passed directly to grep. Without the quotes, grep sees your regex as p{2}.
Edit:
To clarify, without the quotes your slashes are being removed by shell before your regex is passed to grep.
Try:
echo grep p\{2\} test.txt
And you'll see your output as...
grep p{2} test.txt
The quotes prevent shell from escaping characters before they get to grep. You could also escape your slashes and it will work without quotes - grep p\\{2\\} test.txt
The first one greps the pattern using regex, then pp:
echo "apple" | grep 'p\{2\}'
The second one greps the pattern literally, then p{2}:
echo "ap{2}le" | grep p\{2\}
From the grep man page
In basic regular expressions the meta-characters ?, +, {, |, (, and ) lose their special meaning; instead use the backslashed versions \?, \+, \{, \|, \(, and \).
so these two become functional equivalent
egrep p{2}
and
grep "p\{2\}"
the first uses EREs(Extended Regular Expressions) the second uses BREs(Basic Regular Expressions) in your example because your using grep(which supports BREs when you don't use the -e switch) and you're enclosed in quotes so "\{" gets expanded as a special BRE character.
You second instance doesn't work because your just looking for the literal string 2{p} which doesn't exist in your file
you can demonstrate that grep is expanding your string as a BRE by trying:
grep "p\{2"
grep will complain
grep: Unmatched \{

Resources