Why is my vim zero width positive lookbehind storing a backreference?

Why is my vim zero width positive lookbehind storing a backreference? - vim

I am using Vim, and I have the following code:
print "Number 1 = $no1\n";
print "Number 2 = $no2\n";
When I apply the following substitute command
$s/.*\(\d\\n\)\#<=\(";\)/\1
the result is
1\n
2\n
and when I substitute with backreference \2 instead
$s/.*\(\d\\n\)\#<=\(";\)/\2
the result is
";
";
I thought that I only have one backreference in the regex (the ";) What was stored in \1 appears to be the regex I used within my zero-width positive lookbehind, which I thought would NOT be stored in a backreference.
Am I mistaken?

I think \( is always a capturing back reference. From what I can see from a few attempts, what you want is a \%(, which is a non-capturing back reference.
So basically, rewriting your substitute as:
$s/.*\%(\d\\n\)\#<=\(";\)/\1
will put
";
to backreference \1, rather than \2

When you apply the following substitute command:
:%s/.*\(\d\\n\)\#<=\(";\)/\1
... the result is:
1\n
2\n
As should be expected, because you've captured the below expression in the \1 capturing group:
\(\d\\n\)
... and when you substitute with backreference \2 instead
:%s/.*\(\d\\n\)\#<=\(";\)/\2
... the result is:
";
";
As should be expected, because you've captured the below expression in the second capturing group:
\(";\)
I'm unclear what you're trying to do. What output were you expecting from the above substitutions?

Related

Linux sed regular expression

I have a string:
2021-05-27 10:40:50.678117 PID529270:TID 47545543550720:SID 1673488:TXID 786092740:QID 140: INFO:MEMCONTEXT:MemContext state: mem[cur/hi/max] = 9135 / 96586 / 96576 MB, VM[cur/hi/max] = 9161 / 21841178 / 100663296 MB
I want to get the number 9135 that first occurrence between '=' and '/', right now, my command as below, it works, but I don't think it's perfect:
sed -r 's/.* = ([0-9]+) .* = .*/\1 /'
Need a more neat one, please help advise.

You can use
sed -En 's~.*= ([0-9]+) /.*=.*~\1~p'
See the online demo.
An awk solution:
awk -F= '{gsub(/\/.*|[^0-9]/,"",$2);print $2}'
See this demo.
Details:
-En - E (or r as in your example) enables the POSIX ERE syntax and n suppresses the default line output
.*= ([0-9]+) /.*=.* - matches any text, = + space, captures one or more digits into Grou 1, then matches a space, /, then any text, = and again any text
\1 - replaces with Group 1 value
p - prints the result of the substitution.
Here, ~ are used as regex delimiters in order not to escape / in the pattern.
awk:
-F= - sets the input field separator to =
gsub(/\/.*|[^0-9]/,"",$2) - removes any non-digit or / and the rest of the string
print $2 - prints the modified Field 2 value.

You could also get the first match with grep using -P for Perl-compatible regular expressions.
grep -oP "^.*? = \K\d+(?= /)"
^ Start of string
.*? Match as least as possible chars
= Match space = and space
\K\d+ Forget what is matched so far
(?= /) Assert a space and / to the right
Output
9135
See a bash demo

Since you want the material between the first = and the first /, ignoring the spaces, you could use:
sed -E -e 's%^[^=]*= ([^/]*) /.*$%\1%'
This uses Extended Regular Expressions (ERE) (-E; -r also works with GNU sed), and searches from the start of the line for a sequence of 'not =' characters, the = character, a space, anything that's not a slash (which is remembered), another space, a slash, and anything that follows, replacing it all with what was remembered. The ^ and $ anchors aren't crucial; it will work the same without them. The % symbols are used instead of / because the searched-for pattern includes a /. If your sure there'll never be any spaces other than the first and last ones between the = and /, you can use [^ /]* in place of [^/]* and there should be some small (probably immeasurable) performance benefit.

Bash: extract a part of a string, after a number

I have a few strings like this:
var1="string one=3423423 and something which i don't care"
var2="another bigger string=413145 and something which i don't care"
var3="the longest string ever=23442 and something which i don't care"
These strings are the output of a python script (which i am not allowed to touch), and I need a way to extract the 1st part of the string, right after the number. Basically, my outputs should be:
"string one=3423423"
"another bigger string=413145"
"the longest string ever=23442"
As you can see, i can't use positions, or stuff like that, because the number and the string length are not always the same. I assume i would need to use a regex or something, but i don't really understand regexes. Can you please help with a command or something which can do this?

grep -oP '^.*?=\d+' inputfile
string one=3423423
another bigger string=413145
the longest string ever=23442
Here -o flag will enable grep to print only matching part and -p will enable perl regex in grep. Here \d+ means one or more digit. So, ^.*?=\d+ means print from start of the line till you find last digit (first match).

You could use parameter expansion, for example:
var1="string one=3423423 and something which i don't care"
name=${var1%%=*}
value=${var1#*=}
value=${value%%[^0-9]*}
echo "$name=$value"
# prints: string one=3423423
Explanation of ${var1%%=*}:
%% - remove the longest matching suffix
= - match =
* - match everything
Explanation of ${var1#*=}:
# - remove the shortest matching prefix
* - match everything
= - match =
Explanation of ${value%%[^0-9]*}:
%% - remove the longest matching suffix
[^0-9] - match any non-digit
* - match everything
To perform the same thing on more than one values easily,
you could wrap this logic into a function:
extract_and_print() {
local input=$1
local name=${input%%=*}
local value=${input#*=}
value=${value%%[^0-9]*}
echo "$name=$value"
}
extract_and_print "$var1"
extract_and_print "$var2"
extract_and_print "$var3"

$ shopt -s extglob
$ echo "${var1%%+([^0-9])}"
string one=3423423
$ echo "${var2%%+([^0-9])}"
another bigger string=413145
$ echo "${var3%%+([^0-9])}"
the longest string ever=23442
+([^0-9]) is an extended pattern that matches one or more non-digits.
${var%%+([^0-9])} with %%pattern will remove the longest match of that pattern from the end of the variable value.
Refs: patterns, parameter substitution

perl: print remaining string only if there is no character before the matched value.

The following prints the entire content of the line after "B. "
perl -ne'print if /B[.] (.*)/s' $string > file
How can I match/print the line only if there is no other character before the "B. "? In other words, if there is a character before the "B. " ie. "TAB." skip the line / do not print.
The correct "B." is always on a new line, the only correct line to match appears as follows:
B. some text here

A regex with a leading carat indicates that the expression should match only if it is the first item on the line. The pattern /^B[.] (.*)/s should get you the result you're looking for.

Put ^ in front of the B. It means match the word starts with B. So your regex should be /^B\. (.*)/. Then no need you s flag in your pattern match.

Repeat a number in string twice using sed command

Let's say I have hell0 w0rld, I want it to become hell00 w0rld.
I tried sed s/0/00/, but that only replaces 0, it wouldn't work for he1lo wor1d(he11lo wor1d), what can I do so that it replaces any first digit, instead of just 0?

Since you don't want to match just 0, but any digit, you want to use [0-9]. This stands for "any one of the digits 0-9". You put this in parentheses to "capture" it, and in the replacement string, you can add backrefences:
$ sed 's/\([0-9]\)/\1\1/' <<< "he1lo wor1d"
he11lo wor1d
If you want to repeat the first number (as per the title) and not just digit, you append \+ to your character class. This stands for "one or more of these":
$ sed 's/\([0-9]\+\)/\1\1/' <<< "he12o wor1d"
he1212o wor1d
An alternative to the backreferences \1, which match the capture group /(.../), would be to use &, which stands for the complete match, i.e.,
sed 's/[0-9]/&&/' <<< "he1lo wor1d"
and
sed 's/[0-9]\+/&&/' <<< "he12lo wor1d"
where the /(.../) are not needed any longer.

How to grep/split a word in middle of %% or $$

I have a variable from which I have to grep the which in middle of %% adn the word which starts with $$. I used split it works... but for only some scenarios.
Example:
#!/usr/bin/perl
my $lastline ="%Filters_LN_RESS_DIR%\ARC\Options\Pega\CHF_Vega\$$(1212_GV_DATE_LDN)";
my #lastline_temp = split(/%/,$lastline);
print #lastline_temp;
my #var=split("\\$\\$",$lastline_temp[2]);
print #var;
I get the o/p as expected. But can i get the same using Grep command. I mean I dont want to use the array[2] or array[1]. So that I can replace the values easily.

I don't really see how you can get the output you expect. Because you put your data in "busy" quotes (interpolating, double, ...), it comes out being stored as:
'%Filters_LN_RESS_DIR%ARCOptionsPegaCHF_Vega$01212_GV_DATE_LDN)'
See Quote and Quote-like Operators and perhaps read Interpolation in Perl
Notice that the backslashes are gone. A backslash in interpolating quotes simply means "treat the next character as literal", so you get literal 'A', literal 'O', literal 'P', ....
That '0' is the value of $( (aka $REAL_GROUP_ID) which you unwittingly asked it to interpolate. So there is no sequence '$$' to split on.
Can you get the same using a grep command? It depends on what "the same" is. You save the results in arrays, the purpose of grep is to exclude things from the arrays. You will neither have the arrays, nor the output of the arrays if you use a non-trivial grep: grep {; 1 } #data.
Actually you can get the exact same result with this regular expression, assuming that the single string in #vars is the "result".
m/%([^%]*)$/
Of course, that's no more than
substr( $lastline, rindex( $lastline, '%' ) + 1 );
which can run 8-10 times faster.

First, be very careful in your use of quotes, I'm not sure if you don't mean
'%Filters_LN_RESS_DIR%\ARC\Options\Pega\CHF_Vega\$$(1212_GV_DATE_LDN)'
instead of
"%Filters_LN_RESS_DIR%\ARC\Options\Pega\CHF_Vega\$$(1212_GV_DATE_LDN)"
which might be a different string. For example, if evaluated, "$$" means the variable $PROCESS_ID.
After trying to solve riddles (not sure about that), and quoting your string
my $lastline =
'%Filters_LN_RESS_DIR%\ARC\Options\Pega\CHF_Vega\$$(1212_GV_DATE_LDN)'
differently, I'd use:
my ($w1, $w2) = $lastline =~ m{ % # the % char at the start
([^%]+) # CAPTURE everything until next %
[^(]+ # scan to the first brace
\( # hit the brace
([^)]+) # CAPTURE everything up to closing brace
}x;
print "$w1\n$w2";
to extract your words. Result:
Filters_LN_RESS_DIR
1212_GV_DATE_LDN
But what do you mean by replace the values easily. Which values?
Addendum
Now lets extract the "words" delimited by '\'. Using a simple split:
my #words = split /\\/, # use substr to start split after the first '\\'
substr $lastline, index($lastline,'\\');
you'll get the words between the backslashes if you drop the last entry (which is the $$(..) string):
pop #words; # remove the last element '$$(..)'
print join "\n", #words; # print the other elements
Result:
ARC
Options
Pega
CHF_Vega
Does this work better with grep? Seems to:
my #words = grep /^[^\$%]+$/, split /\\/, $lastline;
and
print join "\n", #words;
also results in:
ARC
Options
Pega
CHF_Vega
Maybe that is what you are after? What do you want to do with these?
Regards
rbo

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Why is my vim zero width positive lookbehind storing a backreference? - vim

I think \( is always a capturing back reference. From what I can see from a few attempts, what you want is a \%(, which is a non-capturing back reference. So basically, rewriting your substitute as: $s/.*\%(\d\\n\)\#<=\(";\)/\1 will put "; to backreference \1, rather than \2

Related

Linux sed regular expression

Bash: extract a part of a string, after a number

perl: print remaining string only if there is no character before the matched value.

Repeat a number in string twice using sed command

How to grep/split a word in middle of %% or $$

Categories

Resources