How to grep the string with specific pattern - linux

I am trying to grep a file.txt to search 2 strings cp and (target file name) where the line in file is as below,
cp (source file name) (target file name)
the problem for me here is string '(target file name)' has specific pattern as /path/to/file/TC_12_IT_(6 digits)_(6 digits)_TC_12_TEST _(2 digits).tc12.tc12
I am using below grep command to search a line with these 2 strings,
grep -E cp.*/path/to/file/TC_12_IT_ file.txt
how can I be more specific about (target file name) in grep command to search (target file name) with all its patterns, something like below,
grep -E 'cp.*/path/to/file/TC_12_IT_*_*_TC_12_TEST_*.tc12.tc12' file.txt
can we use wildcards in grep to search string in file just like we can use wilecard like * in listing out files e.g.
ls -lrt TC_12_*_12345678.txt
please suggest if there are any other ways to achieve this.

More specifically:
grep -P '^cp\s+.+\s+\S+/TC_12_IT_\d{6}_\d{6}_TC_12_TEST _\d2[.]tc12[.]tc12$' in_file > out_file
^ : beginning of the line.
\s+ : 1 or more whitespace characters.
.+ : 1 or more any characters.
\S+ : 1 or more non-whitespace characters.
\d{6} : exactly 6 digits.
[.] : literal dot (.). Note that just plain . inside a regular expression means any character, unless it is inside a character class ([.]) or escaped (\.).
$ : end of the line.
SEE ALSO:
GNU grep manual
perlre - Perl regular expressions

Like this, using GNU grep:
grep -P 'cp.*TC_12_IT_\d{6}_\d{6}TC_12_TEST\d{2}.tc12.tc12' file
The regular expression matches as follows:
Node
Explanation
cp
'cp'
.*
any character except \n (0 or more times (matching the most amount possible))
TC_12_IT_
'TC_12_IT_'
\d{6}
digits (0-9) (6 times)
_
_
\d{6}
digits (0-9) (6 times)
TC_12_TEST
'TC_12_TEST'
\d{2}
digits (0-9) (2 times)
.
any character except \n
tc12
'tc12'
.
any character except \n
tc12
'tc12'

Related

Linux sed regular expression

I have a string:
2021-05-27 10:40:50.678117 PID529270:TID 47545543550720:SID 1673488:TXID 786092740:QID 140: INFO:MEMCONTEXT:MemContext state: mem[cur/hi/max] = 9135 / 96586 / 96576 MB, VM[cur/hi/max] = 9161 / 21841178 / 100663296 MB
I want to get the number 9135 that first occurrence between '=' and '/', right now, my command as below, it works, but I don't think it's perfect:
sed -r 's/.* = ([0-9]+) .* = .*/\1 /'
Need a more neat one, please help advise.
You can use
sed -En 's~.*= ([0-9]+) /.*=.*~\1~p'
See the online demo.
An awk solution:
awk -F= '{gsub(/\/.*|[^0-9]/,"",$2);print $2}'
See this demo.
Details:
-En - E (or r as in your example) enables the POSIX ERE syntax and n suppresses the default line output
.*= ([0-9]+) /.*=.* - matches any text, = + space, captures one or more digits into Grou 1, then matches a space, /, then any text, = and again any text
\1 - replaces with Group 1 value
p - prints the result of the substitution.
Here, ~ are used as regex delimiters in order not to escape / in the pattern.
awk:
-F= - sets the input field separator to =
gsub(/\/.*|[^0-9]/,"",$2) - removes any non-digit or / and the rest of the string
print $2 - prints the modified Field 2 value.
You could also get the first match with grep using -P for Perl-compatible regular expressions.
grep -oP "^.*? = \K\d+(?= /)"
^ Start of string
.*? Match as least as possible chars
= Match space = and space
\K\d+ Forget what is matched so far
(?= /) Assert a space and / to the right
Output
9135
See a bash demo
Since you want the material between the first = and the first /, ignoring the spaces, you could use:
sed -E -e 's%^[^=]*= ([^/]*) /.*$%\1%'
This uses Extended Regular Expressions (ERE) (-E; -r also works with GNU sed), and searches from the start of the line for a sequence of 'not =' characters, the = character, a space, anything that's not a slash (which is remembered), another space, a slash, and anything that follows, replacing it all with what was remembered. The ^ and $ anchors aren't crucial; it will work the same without them. The % symbols are used instead of / because the searched-for pattern includes a /. If your sure there'll never be any spaces other than the first and last ones between the = and /, you can use [^ /]* in place of [^/]* and there should be some small (probably immeasurable) performance benefit.

Partial replace with sed command

We have a filewith some utf-16 decimal characters and we would like to replace them in the following manner
Test Line in a file \u343- ? some random words \u1233? 300 \u241? \u208?\cell
The required out put is
Test Line in a file \u343- ? some random words UTF16-1233| 300 UTF16-241| UTF16-208|\cell
The requirement is to change \u[0-9]+? to UTF16-[0-9]+|
Replace the initial \u to UTF16- and the ending ? with a pipe |.
Please note if there is any non digit character between \u and ? it should not be considered
Using sed to modify the file in place, you can:
Match \\u([0-9]+)\?:
Match a literal \u, match and capture one or more digits, match a literal ?.
Replace UTF16-\1:
Replace with the string UTF16- followed by the captured group.
$ sed -i -E 's/\\u([0-9]+)\?/UTF16-\1|/g' file
$ cat file
Test Line in a file \u343- ? some random words UTF16-1233| 300 UTF16-241| UTF16-208|\cell

grep string with special characters in file

my file contains:
/*uid:68160*/\n SELECT
/*uid:68160*/SELECT
I tried with below:
grep -vF "/[*]uid::[[:digit:]][*]/SELECT"
which is helping to removed 2nd line.
How to remove 1st line by grep
also tried:
grep -vF "/[*]uid::[[:digit:]][*]/\n SELECT"
Assuming you have a literal text like that,
s='/*uid:68160*/\n SELECT
/*uid:68160*/SELECT
Text'
and you want to remove lines 1 and 2, you may use
grep -Ev '/[*]uid:[[:digit:]]+[*]/(\\n *)?SELECT'
See the online grep demo
Details
-Ev - E enables POSIX ERE and v will negate the result
/[*]uid:[[:digit:]]+[*]/(\\n *)?SELECT - matches
/[*]uid: - a /*uid: string
[[:digit:]]+ - 1+ digits
[*]/ - a */ string
(\\n *)? - an optional group matching 1 or 0 occurrences of \n two-char combination and then any 0 or more spaces
SELECT - a string

Get Text after word at specific position

I have file like this
TT;12-11-18;text;abc;def;word
AA;12-11-18;tee;abc;def;gih;word
TA;12-11-18;teet abc;def;word
TT;12-11-18;tdd;abc;def;gih;jkl;word
I want output like this
TT;12-11-18;text;abc;def;word
TA;12-11-18;teet abc;def;word
I want to get word if it occur at position 5 after date 12-11-18. I do not want this occurrence if its found after this position that is at 6th or 7th position. Count of position start from date 12-11-18
I want tried this command
cat file.txt|grep "word" -n1
This print all occurrence in which this pattern word is matched. How should I solve my problem?
Try this(GNU awk):
awk -F"[; ]" '/12-11-18/ && $6=="word"' file
Or sed one:
sed -n '/12-11-18;\([^; ]*[; ]\)\{3\}word/p' file
Or grep with basically the same regex(different escape):
grep -E "12-11-18;([^; ]*[; ]){3}word" file
[^; ] means any character that's not ; or (space).
* means match any repetition of former character/group.
-- [^; ]* means any length string that don't contain ; or space, the ^ in [^; ] is to negate.
[; ] means ; or space, either one occurance.
() is to group those above together.
{3} is to match three repetitives of former chracter/group.
As a whole ([^; ]*[; ]){3} means ;/space separated three fields included the delimiters.
As #kvantour points out, if there could be multiple spaces at one place they could be faulty.
To consider multiple spaces as one separator, then:
awk -F"(;| +)" '/12-11-18/ && $6=="word"'
and
grep -E "12-11-18;([^; ]*(;| +)){3}word"
or GNU sed (posix/bsd/osx sed does not support |):
sed -rn '/12-11-18;([^; ]*(;| +)){3}word/p'

Two pattern match on same sed command

I have the following sed command:
sed -n '/^out(/{n;p}' ${filename} | sed -n '/format/ s/.*format=//g; s/),$//gp; s/))$//gp'
I tried to do it as one line as in:
sed -n '/^out(/{n;}; /format/ s/.*format=//g; s/),$//gp; s/))$//gp' ${filename}
But that also display the lines I don't want (those that do not match).
What I have is a file with some strings as in:
entry(variable=value)),
format(variable=value)),
entry(variable=value)))
out(variable=value)),
format(variable=value)),
...
I just want the format lines that came right after the out entry. and remove those trailing )) or ),
You can use this sed command:
sed -nr '/^out[(]/ {n ; s/.*[(]([^)]+)[)].*/\1/p}' your_file
Once a out is found, it advanced to the next line (n) and uses the s command with p flag to extract only what is inside parenthesises.
Explanation:
I used [(] instead of \(. Outside brackets a ( usually means grouping, if you want a literal (, you need to escape it as \( or you can put it inside brackets. Most RE special characters dont need escaping when put inside brackets.
([^)]+) means a group (the "(" here are RE metacharacters not literal parenthesis) that consists of one or more (+) characters that are not (^) ) (literal closing parenthesis), the ^ inverts the character class [ ... ]

Resources