grep date from string - linux

I'm trying to print any date value from a string. For example:
echo "08/08/2018 text here" | grep '/(0\d{1}|1[0-2])\/([0-2]\d{1}|3[0-1])\/(19|20)\d{2}/'
This returns no result. I want to print out only the date value, excluding the text here.

grep doesn't use / delimiters around the regexp, and doesn't need you to escape embedded /.
You need to use the -P option to use a PCRE regexp with GNU grep, so it will recognize \d for digits.
You should put \b around the regexp, to match word boundaries. Otherwise, if the input contains 108/08/2018 it will match the date that starts after 1.
You need the -o option to print only the part of the line that matches, rather than the whole matching line.
echo "08/08/2018 text here" | grep -Po '\b(0\d{1}|1[0-2])/([0-2]\d{1}|3[0-1])\/(19|20)\d{2}\b'

Related

To grep string from file and read first matching string with full variable name

I have file named file.txt, i am trying to read string first search which matches my pattern search from the file. The problem here is with my command entire line is printed. where i am looking for that variable which matches with search pattern with its full variable name, in this example it is warning_duration=""; where my search pattern is duration *=.i have posted the command i tried to read result also with expected result.
Please help !!!
file.txt
warning_type="";warning_threshold="";warning_duration="";oemhp_power_micro_ver="";previous_warning_threshold="";
duration=19;
duration =1;
commands i tried :
cat file.txt | grep -m1 "duration *="
warning_type="";warning_threshold="";warning_duration="";oemhp_power_micro_ver="";previous_warning_threshold="";
cat file.txt | grep -oP -m1 "duration *="
duration=
expected result:-
warning_duration="";
You may use this grep command:
grep -m1 -woE "[_[:alnum:]]*duration *=[^;]*" file
warning_duration=""
Details:
-o: Only show matches
-E: Enable extended regex
-w: Word search
[_[:alnum:]]*: Match 0 or more of a _ or alphanumeric characters
duration *=: Match duration followed by 0 or more spaces and =
[^;]*: Match 0 or more of any character that are not ;
Could you please try following, written and tested with shown samples. Simple explanation would be; using match function to match regex ;[_[:alnum:]]+duration="" to get required value by OP eg--> warning_duration=""
awk 'match($0,/;[_[:alnum:]]+duration=""/){print substr($0,RSTART+1,RLENGTH-1)}' Input_file
With GNU grep and with your shown samples you could try following.
grep -oP '.*?;.*?;\K[_[:alnum:]]+duration=""' Input_file
Explanation: using GNU grep's -oP option to match exact match and to enable PCRE regex here. In regex mentioning non-greedy matches to match till 2nd semi-colon and forgetting(removing) matched values by \K option and matching alphanumeric9with _) ` or more occurrences along with duration="" to get the matched value in current line.

How to trim a string to either specific character in a bash script

I want to trim a string from one character, the last /, to either : or #, which ever appears first. An example would be:
https://www.example.com/?client=safari/this-text:not-this:or_this
would be trimmed to:
this-text
and
https://www.example.com/?client=safari/this-text#not-this:or_this
would be trimmed to:
this-text
I know I can trim text in bash from a specific character to another character, but is there a way to trim from one character to either of 2 characters?
Use grep like so: grep -Po '^.*/\K[^:#]*'
Examples:
echo 'https://www.example.com/?client=safari/this-text:not-this:or_this' | grep -Po '^.*/\K[^:#]*'
or:
echo 'https://www.example.com/?client=safari/this-text#not-this:or_this' | grep -Po '^.*/\K[^:#]*'
Output:
this-text
Here, grep uses the following options:
-P : Use Perl regexes.
-o : Print the matches only, 1 match/line, not the entire lines.
The regex ^.*/\K[^:#]* does the following:
^.*/ : Match from the beginning of the string (^) all the way up to the last slash ('/').
\K : Pretend that the match started at this position.
[^:#]* : zero or more occurrences (greedy) of any characters except : or #. This matches either until the end of the line, or until the next : or #, whichever comes first.
SEE ALSO:
grep manual
NOTE:
This works with GNU grep, which may need to be installed, depending on your system. For example, to install GNU grep on macOS, see this answer: https://apple.stackexchange.com/a/357426/329079
With a little Bash function:
trim() {
local str=${1##*/}
printf '%s\n' "${str%%[:#]*}"
}
This first trims everything up to and including the last /, then everything starting from the first occurrence of : or #.
In use:
$ trim 'https://www.example.com/?client=safari/this-text:not-this:or_this'
this-text
$ trim 'https://www.example.com/?client=safari/this-text#not-this:or_this'
this-text
Another way is to use sed: sed -e 's,^.*/,,' -e 's,[:#].*$,,'.
First -e command (s/regex/replacement/) removes text from the start to the last /, then the second -e removes from : or # to the end of the text.
echo 'https://www.example.com/?client=safari/this-text:not-this:or_this' | sed -e 's,^.*/,,' -e 's,[:#].*$,,'
this-text

Do not print unmatched text with sed

I want to print only matched lines and strip unmatched ones, but with following:
$ echo test12 test | sed -n 's/^.*12/**/p'
I always get:
** test
instead of:
**
What am I doing wrong?
[edit1]
I provide more information of what I need - and actually I should start with it. So, I have a command which produced lots of lines of output, I want to grab only parts of the lines - the ones that matches, and strip the result. So in the above example 12 was meant to find end of matched part of the line, and instead of ** I should have put & which represents matched string. So the full example is:
echo test12 test | sed -n 's/^.*12/&/p'
which produces exactly the same output as input:
test12 test
the expected output is:
test12
As suggested I started to find a grep alternative and the following looks promising:
$ echo test12 test | grep -Eo "^.*12"
but I dont see how to format the matched part, this only strips unmatched text.
EDIT: In some cases, the -E flag might be needed for sed. But then the brackets don't need to be escaped anymore. check your sed's man page.
I think what you are looking for is this:
echo test12 test | sed -n 's/^\(.*12\).*$/\1/p'
if you want to discard the rest of the line, you have to match it as well, but not include it in the output. the \( and \) denote a group that is then referenced by the \1.
Good luck :)
Additional information on sed:
sed works on lines, and the ampersand characters represents the entire line that was matched by the given regular expression. if a regex is "open" at the end (i.e. doesn't end with the endline character ($), it acts as if .*$ is appended to the match string. (not sure if that is how it is implemented, but could very well be.)
Try:
echo test12 test | sed -n 's/^.*/**/p'
You don't need to match the number 12, since that is already being done in your regex.
Your regular expression is matching anything from the beginning of the line until the expression '12'. All the matched expression is replaced with '**', that is why you get '** test'. If you want only match I recommend you using grep.

find words in two quotes unix

I would like to display the last word in these lines I tried to look for example the word value but no answer, so I thought to look for the words between quotes but my file contains other words between quotes that I have I need not actually want to display the values ​​of the select tag knowing that my html file is.
grep '*' hosts.html | awk '{print $NF}'
For example:
value='www.visit-tunisia.com'>www.visit-tunisia.com
value='www.watania1.tn'>www.watania1.tn
value='www.watania2.tn'>www.watania2.tn
I would have
www.visit-tunisia.com
www.watania1.tn
www.watania2.tn
You need to set the field separator to > you do this with the -F option:
$ awk -F'>' '{print $NF}' hosts.html
www.visit-tunisia.com
www.watania1.tn
www.watania2.tn
Note: I'm not sure what you are trying to achieve by grep '*' hosts.html?
Interpreting the comment liberally, you have input lines which might contain:
value='www.visit-tunisia.com'>www.visit-tunisia.com
value='www.watania1.tn'>www.watania1.tn
value='www.watania2.tn'>www.watania2.tn
and you would like the names which are repeated on a line as the output:
www.visit-tunisia.com
www.watania1.tn
www.watania2.tn
This can be done using sed and capturing parentheses.
sed -n -e "s/.*'\([^']*\)'.*\1.*/\1/p"
The -n says "don't print unless I say to do so". The s///p command prints if the substitute works. The pattern looks for a stream of 'anything' (.*), a single quote, captures what's inside up to the next single quote ('\([^']*\)') followed by any text, the captured text (the first \1), and anything. The replacement text is what was captured (the second \1).
Example:
$ cat data
www and wotnot
value='www.visit-tunisia.com'>www.visit-tunisia.com
blah
value='www.watania1.tn'>www.watania1.tn
hooplah
value='www.watania2.tn'>www.watania2.tn
if 'nothing' is required, nothing will be done.
$ sed -n -e "s/.*'\([^']*\)'.*\1.*/\1/p" data
www.visit-tunisia.com
www.watania1.tn
www.watania2.tn
nothing
$
Clearly, you can refine the [^']* part of the match if you want to. I used double quotes around the expression since the pattern matches on single quotes. Life is trickier if you need to allow both single and double quotes; at that point, I'd put the script into a file and run sed -f script data to make life easier.
sed 's/.*>\(.*\)/\1/g' your_file

Prepend to regex match

I got a variable in a bash script that I need to replace. The only constant in the line is that it will be ending in "_(*x)xxxp.mov". Where x's are numbers and can be of either 3 or 4 of length. For example, I know how to replace the value but only if it is a constant:
echo 'whiteout-tlr1_1080p.mov' | sed 's/_[0-9]*[0-9][0-9][0-9]p.mov/_h1080p.mov/g'
How can I carry over the regex match to replacement line?
Edit:
Ok I just learned that grep can print only the match would it better to to do something like this?
urltrail=$(echo $# | grep -o [0-9]*[0-9][0-9][0-9]p.mov)
newurl=$(sed 's/$urltrail/h$urltrail/g')
Hmm, tried the above but am getting a hang.
Back Reference
sed 's/_\([0-9]*[0-9][0-9][0-9]\)p.mov/_h\1p.mov/g'
The back-reference \n, where n is a single digit, matches the substring previously matched by the nth parenthesized subexpression of the regular expression.
You're not piping the old path into sed, so sed is hanging waiting for input.
newurl=$(echo $# |sed 's/$urltrail/h$urltrail/g')

Resources