Can somebody tell me what is the difference between "^string" and "string" in varnish when we evaluate with ~ in varnish
both does not evaluate similarly
and one more when we add a \ it is again different
eg
~ "^index.php/sting" is different to ~ "^/string"
am totally confused
Can somebody show me a link to a cheat sheet or something
In terms of regex, ^ means the beginning of the string, meaning
^/string will match /string123 but not /application/string
/string will match both /string123 and /application/string
^string will not match /string but will match string/123 or string123
string will match all /string, string, 123string and any thing that contains the word string
Related
Please suggest a wildcard for below Firstjson list
Firstjson = { p10_7_8 , p10_7_2 , p10_7_3 p10_7_4}
I have tried p10.7.* wildcard for below Secondjson list, it worked. But when I tried p10_7_* for above Firstjson list it did not work
Secondjson = { p10.7.8 , p10.7.2 , p10.7.3 , p10.7.4 }
You are attempting to use wildcard syntax, but Groovy expects regular expression syntax for its pattern matching.
What went wrong with your attempt:
Attempt #1: p10.7.*
A regular expression of . matches any single character and .* matches 0 or more characters. This means:
p10{exactly one character of any kind here}7{zero or more characters of any
kind here}
You didn't realize it, but the . character in your first attempt was acting like a single-character wildcard too. This might match with p10x7abcdefg for example. It also does match p10.7.8 though. But be careful, it also matches p10.78, because the .* expression at the end of your pattern will happily match any sequence of characters, thus any and all characters following p10.7 are accepted.
Attempt #2: p10_7_*
_ matches only a literal underscore. But _* means to match zero or more underscores. It does not mean to match characters of any kind. So p10_7_* matches things like p10_7_______. Literally:
p10_7{zero or more underscores here}
What you can do instead:
You probably want a regular expression like p10_7_\d+
This will match things like p10_7_3 or p10_7_422. It works by matching the literal text p10_7_ followed by one or more digits where a digit is 0 through 9. \d matches any digit, and + means to match one or more of the preceding thing. Literally:
p10_7_{one or more digits here}
I want to use input from a user as a regex pattern for a search over some text. It works, but how I can handle cases where user puts characters that have meaning in regex?
For example, the user wants to search for Word (s): regex engine will take the (s) as a group. I want it to treat it like a string "(s)" . I can run replace on user input and replace the ( with \( and the ) with \) but the problem is I will need to do replace for every possible regex symbol.
Do you know some better way ?
Use the re.escape() function for this:
4.2.3 re Module Contents
escape(string)
Return string with all non-alphanumerics backslashed; this is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it.
A simplistic example, search any occurence of the provided string optionally followed by 's', and return the match object.
def simplistic_plural(word, text):
word_or_plural = re.escape(word) + 's?'
return re.match(word_or_plural, text)
You can use re.escape():
re.escape(string)
Return string with all non-alphanumerics backslashed; this is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it.
>>> import re
>>> re.escape('^a.*$')
'\\^a\\.\\*\\$'
If you are using a Python version < 3.7, this will escape non-alphanumerics that are not part of regular expression syntax as well.
If you are using a Python version < 3.7 but >= 3.3, this will escape non-alphanumerics that are not part of regular expression syntax, except for specifically underscore (_).
Unfortunately, re.escape() is not suited for the replacement string:
>>> re.sub('a', re.escape('_'), 'aa')
'\\_\\_'
A solution is to put the replacement in a lambda:
>>> re.sub('a', lambda _: '_', 'aa')
'__'
because the return value of the lambda is treated by re.sub() as a literal string.
Usually escaping the string that you feed into a regex is such that the regex considers those characters literally. Remember usually you type strings into your compuer and the computer insert the specific characters. When you see in your editor \n it's not really a new line until the parser decides it is. It's two characters. Once you pass it through python's print will display it and thus parse it as a new a line but in the text you see in the editor it's likely just the char for backslash followed by n. If you do \r"\n" then python will always interpret it as the raw thing you typed in (as far as I understand). To complicate things further there is another syntax/grammar going on with regexes. The regex parser will interpret the strings it's receives differently than python's print would. I believe this is why we are recommended to pass raw strings like r"(\n+) -- so that the regex receives what you actually typed. However, the regex will receive a parenthesis and won't match it as a literal parenthesis unless you tell it to explicitly using the regex's own syntax rules. For that you need r"(\fun \( x : nat \) :)" here the first parens won't be matched since it's a capture group due to lack of backslashes but the second one will be matched as literal parens.
Thus we usually do re.escape(regex) to escape things we want to be interpreted literally i.e. things that would be usually ignored by the regex paraser e.g. parens, spaces etc. will be escaped. e.g. code I have in my app:
# escapes non-alphanumeric to help match arbitrary literal string, I think the reason this is here is to help differentiate the things escaped from the regex we are inserting in the next line and the literal things we wanted escaped.
__ppt = re.escape(_ppt) # used for e.g. parenthesis ( are not interpreted as was to group this but literally
e.g. see these strings:
_ppt
Out[4]: '(let H : forall x : bool, negb (negb x) = x := fun x : bool =>HEREinHERE)'
__ppt
Out[5]: '\\(let\\ H\\ :\\ forall\\ x\\ :\\ bool,\\ negb\\ \\(negb\\ x\\)\\ =\\ x\\ :=\\ fun\\ x\\ :\\ bool\\ =>HEREinHERE\\)'
print(rf'{_ppt=}')
_ppt='(let H : forall x : bool, negb (negb x) = x := fun x : bool =>HEREinHERE)'
print(rf'{__ppt=}')
__ppt='\\(let\\ H\\ :\\ forall\\ x\\ :\\ bool,\\ negb\\ \\(negb\\ x\\)\\ =\\ x\\ :=\\ fun\\ x\\ :\\ bool\\ =>HEREinHERE\\)'
the double backslashes I believe are there so that the regex receives a literal backslash.
btw, I am surprised it printed double backslashes instead of a single one. If anyone can comment on that it would be appreciated. I'm also curious how to match literal backslashes now in the regex. I assume it's 4 backslashes but I honestly expected only 2 would have been needed due to the raw string r construct.
I'm looking for a little help on some Lua. I need some code to match this exact line:
efs.test efs.test.gpg
Here's what I have so far, which matches "efs.test":
if string.match(a.message, "%a+%a+%a+.%%a+%a+%a+%a+") then
print(a.message)
else
print ("Does not match")
end
I've also tried this, which matches:
if string.match(a.message, "efs.test") then
print(a.message)
else
print ("Does not match")
end
But when I try to add the extra text my compiler errors with "Number expected, got string" when running this code:
if string.match(a.message, "efs.test", "efs") then
print(a.message)
else
print ("Does not match")
end
Any pointers would be great!
Thanks.
if string.match(a.message, "%a+%a+%a+.%%a+%a+%a+%a+") then
Firstly, this is a wrong use of quantifiers. From PiL 20.2:
+ 1 or more repetitions
* 0 or more repetitions
- also 0 or more repetitions
? optional (0 or 1 occurrence)
In words, you try to match for unlimited %a+ after you already matched the full word with unlimited %a+
To match efs.test efs.test.gpg - we have 2 filenames I suppose, in a strict sense file names may contain only %w - alphanumeric characters (A-Za-z0-9). This would correctly match efs.test:
string.match(message, "%w+%.%w+")
Going one step further, match efs.test as filename and the following filename:
string.match(message, "%w+%.%w+ %w+%.%w+%.gpg")
While this would match both filenames, you would need to check if matched filenames are the same. We can go one step further yet:
local file, gpgfile = string.match(message, "(%w+%.%w+) (%1%.gpg)")
This pattern will return any <filename> <filename>.gpg where the filenames are equal.
With the use of capture-groups, we capture the filename: it will be returned as the first variable and further represented as %1. Then after the space char, we try to match for %1 (captured filename) followed by .gpg. Since it's also enclosed in brackets, it will become the second captured group and returned as the second variable. Done!
PS: You may want to grab ".gpg" by case-insensitive [Gg][Pp][Gg] pattern.
PPS: File names may contain spaces, dashes, UTF-8 characters etc. E.g. ext4 only forbids \0 and / characters.
string.match optional third argument is the index of the given string to start searching at. If you are looking for exactly efs.test efs.test.gpg in that order with that given spacing, why not just use:
string.match(a.message, "efs%.test efs%.test%.gpg")
If you want to match the entire line containing that substring:
string.match(a.message, ".*efs%.test efs%.test%.gpg.*")
For reference
If you are trying to match that exact line its way easier to just use:
if "efs.test efs.test.gpg" = a.message then
print(a.message)
else
print("string does not match!")
end
Of course this wouldn't find any other strings than this.
Another interpretation I see for your question is that you want to know if it has efs.test in the string, which you should be able to accomplish by doing:
if string.match(a.message, "%w+%.%w+") == "efs.test" then
...
end
Also, look into regex, it's basically the language Lua used to match strings with some exceptions.
I have a string column [VEHICLE] that contains row variations of "car", "CAR", "car" and "car1". I'm trying to use a limit data by expression to exclude all of those variations. I've tried Lower([VEHICLE]) ~= "*car*" but it isn't working. Any ideas?
You were very close. In Limit Data Using Expression use this instead.
IF(Lower([Vehicle]) ~= "car*",true,false)
or even better... in case you have car$ or something that isn't a-z
IF(Lower([Vehicle]) ~= "car.*",true,false)
or if you expect something to become before car... like thisCar1 use this:
IF(Lower([Vehicle]) ~= ".*car.*",true,false)
In the second example, . is any character and * is stating match 0 or more of this instance. Without the *, which is what you had, it's stating match 0 or more of instances of... nothing. You just have to give it something to reference.
Remeber ~= uses Regular Expressions
I'm trying to find exact matches of strings in Lua including, special characters. I want the example below to return that it is an exact match, but because of the - character it returns nil
index = string.find("test-string", "test-string")
returns nil
index = string.find("test-string", "test-")
returns 1
index = string.find("test-string", "test")
also returns 1
How can I get it to do full matching?
- is a pattern operator in a Lua string pattern, so when you say test-string, you're telling find() to match the string test as few times as possible. So what happens is it looks at test-string, sees test in there, and since - isn't an actual minus sign in this case, it's really looking for teststring.
Do as Mike has said and escape it with the % character.
I found this helpful for better understanding patterns.
You can also ask for a plain substring match that ignores magic characters:
string.find("test-string", "test-string",1,true)
you need to escape special characters in the pattern with the % character.
so in this case you are looking for
local index = string.find('test-string', 'test%-string')