Up to a few days ago my Sublime text 3 was working just fine. I could search/replace regular strings and use regular expressions patterns as well and when a capture group got a match, all of them were highlighted perfectly.
However, since yesterday, everything I search is matching... reversely. Here:
image:\s*"?(.*?)"?
This should match a fixed string image, followed by a colon, any number of spaces, if any, and anything between optional quotes.
Not a big deal, right? However Sublime is capturing the string image instead of what I've defined to be captured. Even if there are no spaces or quotes, it should at least match what's after the colon, not before it:
I did a fresh install, reinstalling and reconfiguring the very few plugins I use, trying to, maybe, get rid of any sort of caching, without luck.
And this is a major setback for me 'cause I can't do batch replacements all over a project.
There are only two things I did differently than my regular development routine:
Installed String 2 Lower Hyphen Plugin to speed-up the creation of some dashed separated URI slugs BUT when fresh installing I didn't add it back and the problem persisted.
For the first time, I used the expression <open files> to do a batch replacement in a specific set of files I had manually opened since they're in different directories.
Nothing more than that.
I can workaround the issue by changing the .*? to a .* but this is a palliative measure since I always used the non-greedy version without problems
Does anyone know what could be happening?
I'm not sure how your regex used to match any differently, let's think about what the regex is saying:
image: - the literal image:
\s* - any amount of whitespace, including none
"? - an optional quote
(.*?) - lazily capture anything except a newline character into capture group 1
"? an optional quote
So for your example text to match, it matches image: and the space after it, then, there is no quote, the next instruction is lazy so it captures nothing into capture group 1, then there is no quote, so that is the full extent of the match.
If you always want to capture the value in capture group 1, regardless of whether it was a quoted or unquoted string, you could instead consider using an expression like:
\bimage:\s*"?((?(?<=")[^"]*|.*$))"?
\b word boundary, to ignore image: not being the start of a word
image: literal image:
\s* any amount of whitespace, including none (depending on your source document and requirements, it may be better/more defensive to specify a literal space so that newlines won't be matched here)
"? optional quote
( begin capture group 1
(?(?<=") conditional - lookbehind to see if a quote matched
[^"]* if a quote matched, then match all non-quote characters (of course, we could also check for escaped quotes if your file is YAML format or similar, but URLs shouldn't contain quotes, so we're leaving it out as per the original regex.)
| otherwise, if the conditional didn't match, i.e. there was no quote after image:
.*$ match everything until the newline - again, if this is YAML, you may want to consider excluding comments etc.
) end conditional
) end capture group
"? optional quote (this will never match at $ if the conditional fails)
Related
I am new here. I wanted to ask a question on using REGEX for an entity in DialogFlow
I wanted the entity to accept all text and spaces except for the symbol *
I have tried to use [A-Za-z0-9 ][^*], but it is not working. Any advice. thanks!
In your Regex expression, [^*] means "capture any character at the start of the line." To refer to a literal asterisk rather than matching any character, you need to use \*
If you want to match a line of letters or numbers as in the [A-Za-z0-9] example you give, but only if that string does not include an asterisk, then this expression should work for you:
^[a-zA-Z0-9]+$
This means "match a whole line of text if it only contains one or more of the characters a-z, A-Z, or 0-9".
If you want to match any character or group of characters in a line except for the asterisk, then you could use something like this:
(?!\*)([a-zA-Z0-9]+)(?<!\*)
The first part is called a "negative lookahead," and it looks forward to ensure we're not matching the asterisk. The last part is called a "negative lookbehind," and it looks backwards to make sure we're not matching the asterisk. The middle part is your "capture group," and confirms that you're matching any letters or numbers in a given string, but excluding the * character.
If this Regex gets input like *abc, it will capture abc. If it encounters abc*, it will still capture abc. If it encounters abc*def, it will capture abc and def separately in two capture groups, because it will break around the asterisk.
This link explains the concept of lookarounds in Regex. You can also use this Regex tester to get started practicing your Regular Expressions with explanations of what each block of characters does.
EDITED TO ADD If you're just interested in matching single characters rather than groups of characters, you can use [A-Za-z0-9] and match any upper or lowercase letter and any single digit. You don't need to exclude the * character, because the character group is already exclusive.
This is a slight duplicate of the question below, so responses here may also help you. Hope this helps!
How can I exclude asterisk in a regex expression
[A-Za-z0-9 ][^*]
What you regex will do is match 2 consecutive characters. First, it will look for anything A-Za-z0-9 . Then, it will look at the negated set that includes *, and will match ANY character except *.
You can type your regex into https://regexr.com/ to see a breakdown of how it matches and test some strings.
For example, your regex would match these:
Aa
AA
a&
A1
0_
But would not match these:
A*
a*
1*
And WOULD NOT match anything longer than 2 characters. If you really want to match any string with any characters except *, this should work:
[^\*]+
What that will do is match any number of consecutive characters that are not *. (The + means match 1 or more characters in the set). It is also a good idea to escape * because it is also a reserved character in regex. Even though most regex parsers are smart enough to know that inside a group you probably mean the literal char *, it is still a best practice to escape it. (And by that same token, you would want to use \s instead of the blank space in your original regex.)
https://regex101.com/r/sB9wW6/1
(?:(?<=\s)|^)#(\S+) <-- the problem in positive lookbehind
Working like this on prod: (?:\s|^)#(\S+), but I need a correct start index (without space).
Here is in JS:
var regex = new RegExp(/(?:(?<=\s)|^)#(\S+)/g);
Error parsing regular expression: Invalid regular expression:
/(?:(?<=\s)|^)#(\S+)/
What am I doing wrong?
UPDATE
Ok, no lookbehind in JS :(
But anyways, I need a regex to get the proper start and end index of my match. Without leading space.
Make sure you always select the right regex engine at regex101.com. See an issue that occurred due to using a JS-only compatible regex with [^] construct in Python.
JS regex - at the time of answering this question - did not support lookbehinds. Now, it becomes more and more adopted after its introduction in ECMAScript 2018. You do not really need it here since you can use capturing groups:
var re = /(?:\s|^)#(\S+)/g;
var str = 's #vln1\n#vln2\n';
var res = [];
while ((m = re.exec(str)) !== null) {
res.push(m[1]);
}
console.log(res);
The (?:\s|^)#(\S+) matches a whitespace or the start of string with (?:\s|^), then matches #, and then matches and captures into Group 1 one or more non-whitespace chars with (\S+).
To get the start/end indices, use
var re = /(\s|^)#\S+/g;
var str = 's #vln1\n#vln2\n';
var pos = [];
while ((m = re.exec(str)) !== null) {
pos.push([m.index+m[1].length, m.index+m[0].length]);
}
console.log(pos);
BONUS
My regex works at regex101.com, but not in...
First of all, have you checked the Code Generator link in the Tools pane on the left?
All languages - "Literal string" vs. "String literal" alert - Make sure you test against the same text used in code, literal string, at the regex tester. A common scenario is copy/pasting a string literal value directly into the test string field, with all string escape sequences like \n (line feed char), \r (carriage return), \t (tab char). See Regex_search c++, for example. Mind that they must be replaced with their literal counterparts. So, if you have in Python text = "Text\n\n abc", you must use Text, two line breaks, abc in the regex tester text field. Text.*?abc will never match it although you might think it "works". Yes, . does not always match line break chars, see How do I match any character across multiple lines in a regular expression?
All languages - Backslash alert - Make sure you correctly use a backslash in your string literal, in most languages, in regular string literals, use double backslash, i.e. \d used at regex101.com must written as \\d. In raw string literals, use a single backslash, same as at regex101. Escaping word boundary is very important, since, in many languages (C#, Python, Java, JavaScript, Ruby, etc.), "\b" is used to define a BACKSPACE char, i.e. it is a valid string escape sequence. PHP does not support \b string escape sequence, so "/\b/" = '/\b/' there.
All languages - Default flags - Global and Multiline - Note that by default m and g flags are enabled at regex101.com. So, if you use ^ and $, they will match at the start and end of lines correspondingly. If you need the same behavior in your code check how multiline mode is implemented and either use a specific flag, or - if supported - use an inline (?m) embedded (inline) modifier. The g flag enables multiple occurrence matching, it is often implemented using specific functions/methods. Check your language reference to find the appropriate one.
line-breaks - Line endings at regex101.com are LF only, you can't test strings with CRLF endings, see regex101.com VS myserver - different results. Solutions can be different for each regex library: either use \R (PCRE, Java, Ruby) or some kind of \v (Boost, PCRE), \r?\n, (?:\r\n?|\n)/(?>\r\n?|\n) (good for .NET) or [\r\n]+ in other libraries (see answers for C#, PHP). Another issue related to the fact that you test your regex against a multiline string (not a list of standalone strings/lines) is that your patterns may consume the end of line, \n, char with negated character classes, see an issue like that. \D matched the end of line char, and in order to avoid it, [^\d\n] could be used, or other alternatives.
php - You are dealing with Unicode strings, or want shorthand character classes to match Unicode characters, too (e.g. \w+ to match Стрибижев or Stribiżew, or \s+ to match hard spaces), then you need to use u modifier, see preg_match() returns 0 although regex testers work - To match all occurrences, use preg_match_all, not preg_match with /...pattern.../g, see PHP preg_match to find multiple occurrences and "Unknown modifier 'g' in..." when using preg_match in PHP?- Your regex with inline backreference like \1 refuses to work? Are you using a double quoted string literal? Use a single-quoted one, see Backreference does not work in PHP
phplaravel - Mind you need the regex delimiters around the pattern, see https://stackoverflow.com/questions/22430529
python - Note that re.search, re.match, re.fullmatch, re.findall and re.finditer accept the regex as the first argument, and the input string as the second argument. Not re.findall("test 200 300", r"\d+"), but re.findall(r"\d+", "test 200 300"). If you test at regex101.com, please check the "Code Generator" page. - You used re.match that only searches for a match at the start of the string, use re.search: Regex works fine on Pythex, but not in Python - If the regex contains capturing group(s), re.findall returns a list of captures/capture tuples. Either use non-capturing groups, or re.finditer, or remove redundant capturing groups, see re.findall behaves weird - If you used ^ in the pattern to denote start of a line, not start of the whole string, or used $ to denote the end of a line and not a string, pass re.M or re.MULTILINE flag to re method, see Using ^ to match beginning of line in Python regex
- If you try to match some text across multiple lines, and use re.DOTALL or re.S, or [\s\S]* / [\s\S]*?, and still nothing works, check if you read the file line by line, say, with for line in file:. You must pass the whole file contents as the input to the regex method, see Getting Everything Between Two Characters Across New Lines. - Having trouble adding flags to regex and trying something like pattern = r"/abc/gi"? See How to add modifers to regex in python?
c#, .net - .NET regex does not support possessive quantifiers like ++, *+, ??, {1,10}?, see .NET regex matching digits between optional text with possessive quantifer is not working - When you match against a multiline string and use RegexOptions.Multiline option (or inline (?m) modifier) with an $ anchor in the pattern to match entire lines, and get no match in code, you need to add \r? before $, see .Net regex matching $ with the end of the string and not of line, even with multiline enabled - To get multiple matches, use Regex.Matches, not Regex.Match, see RegEx Match multiple times in string - Similar case as above: splitting a string into paragraphs, by a double line break sequence - C# / Regex Pattern works in online testing, but not at runtime - You should remove regex delimiters, i.e. #"/\d+/" must actually look like #"\d+", see Simple and tested online regex containing regex delimiters does not work in C# code - If you unnecessarily used Regex.Escape to escape all characters in a regular expression (like Regex.Escape(#"\d+\.\d+")) you need to remove Regex.Escape, see Regular Expression working in regex tester, but not in c#
dartflutter - Use raw string literal, RegExp(r"\d"), or double backslashes (RegExp("\\d")) - https://stackoverflow.com/questions/59085824
javascript - Double escape backslashes in a RegExp("\\d"): Why do regex constructors need to be double escaped?
- (Negative) lookbehinds unsupported by most browsers: Regex works on browser but not in Node.js - Strings are immutable, assign the .replace result to a var - The .replace() method does change the string in place - Retrieve all matches with str.match(/pat/g) - Regex101 and Js regex search showing different results or, with RegExp#exec, RegEx to extract all matches from string using RegExp.exec- Replace all pattern matches in string: Why does javascript replace only first instance when using replace?
javascriptangular - Double the backslashes if you define a regex with a string literal, or just use a regex literal notation, see https://stackoverflow.com/questions/56097782
java - Word boundary not working? Make sure you use double backslashes, "\\b", see Regex \b word boundary not works - Getting invalid escape sequence exception? Same thing, double backslashes - Java doesn't work with regex \s, says: invalid escape sequence - No match found is bugging you? Run Matcher.find() / Matcher.matches() - Why does my regex work on RegexPlanet and regex101 but not in my code? - .matches() requires a full string match, use .find(): Java Regex pattern that matches in any online tester but doesn't in Eclipse - Access groups using matcher.group(x): Regex not working in Java while working otherwise - Inside a character class, both [ and ] must be escaped - Using square brackets inside character class in Java regex - You should not run matcher.matches() and matcher.find() consecutively, use only if (matcher.matches()) {...} to check if the pattern matches the whole string and then act accordingly, or use if (matcher.find()) to check if there is a single match or while (matcher.find()) to find multiple matches (or Matcher#results()). See Why does my regex work on RegexPlanet and regex101 but not in my code?
scala - Your regex attempts to match several lines, but you read the file line by line (e.g. use for (line <- fSource.getLines))? Read it into a single variable (see matching new line in Scala regex, when reading from file)
kotlin - You have Regex("/^\\d+$/")? Remove the outer slashes, they are regex delimiter chars that are not part of a pattern. See Find one or more word in string using Regex in Kotlin - You expect a partial string match, but .matchEntire requires a full string match? Use .find, see Regex doesn't match in Kotlin
mongodb - Do not enclose /.../ with single/double quotation marks, see mongodb regex doesn't work
c++ - regex_match requires a full string match, use regex_search to find a partial match - Regex not working as expected with C++ regex_match - regex_search finds the first match only. Use sregex_token_iterator or sregex_iterator to get all matches: see What does std::match_results::size return? - When you read a user-defined string using std::string input; std::cin >> input;, note that cin will only get to the first whitespace, to read the whole line properly, use std::getline(std::cin, input); - C++ Regex to match '+' quantifier - "\d" does not work, you need to use "\\d" or R"(\d)" (a raw string literal) - This regex doesn't work in c++ - Make sure the regex is tested against a literal text, not a string literal, see Regex_search c++
go - Double backslashes or use a raw string literal: Regular expression doesn't work in Go - Go regex does not support lookarounds, select the right option (Go) at regex101.com before testing! Regex expression negated set not working golang
groovy - Return all matches: Regex that works on regex101 does not work in Groovy
r - Double escape backslashes in the string literal: "'\w' is an unrecognized escape" in grep - Use perl=TRUE to PCRE engine ((g)sub/(g)regexpr): Why is this regex using lookbehinds invalid in R?
oracle - Greediness of all quantifiers is set by the first quantifier in the regex, see Regex101 vs Oracle Regex (then, you need to make all the quantifiers as greedy as the first one)] - \b does not work? Oracle regex does not support word boundaries at all, use workarounds as shown in Regex matching works on regex tester but not in oracle
firebase - Double escape backslashes, make sure ^ only appears at the start of the pattern and $ is located only at the end (if any), and note you cannot use more than 9 inline backreferences: Firebase Rules Regex Birthday
firebasegoogle-cloud-firestore - In Firestore security rules, the regular expression needs to be passed as a string, which also means it shouldn't be wrapped in / symbols, i.e. use allow create: if docId.matches("^\\d+$").... See https://stackoverflow.com/questions/63243300
google-data-studio - /pattern/g in REGEXP_REPLACE must contain no / regex delimiters and flags (like g) - see How to use Regex to replace square brackets from date field in Google Data Studio?
google-sheets - If you think REGEXEXTRACT does not return full matches, truncates the results, you should check if you have redundant capturing groups in your regex and remove them, or convert the capturing groups to non-capturing by add ?: after the opening (, see Extract url domain root in Google Sheet
sed - Why does my regular expression work in X but not in Y?
word-boundarypcrephp - [[:<:]] and [[:>:]] do not work in the regex tester, although they are valid constructs in PCRE, see https://stackoverflow.com/questions/48670105
snowflake-cloud-data-platform snowflake-sql - If you are writing a stored procedure, and \\d does not work, you need to double them again and use \\\\d, see REGEX conversion of VARCHAR value to DATE in Snowflake stored procedure using RLIKE not consistent.
I have the following string in the code at multiple places,
m_cells->a[ Id ]
and I want to replace it with
c(Id)
where the string Id could be anything including numbers also.
A regular expression replace like below should do:
%s/m_cells->a\[\s\(\w\+\)\s\]/c(\1)/g
If you wish to apply the replacement operation on a number of files you could use the :bufdo command.
Full explanation of #BasBossink's answer (as a separate answer because this won't fit in a comment), because regexes are awesome but non-trivial and definitely worth learning:
In Command mode (ie. type : from Normal mode), s/search_term/replacement/ will replace the first occurrence of 'search_term' with 'replacement' on the current line.
The % before the s tells vim to perform the operation on all lines in the document. Any range specification is valid here, eg. 5,10 for lines 5-10.
The g after the last / performs the operation "globally" - all occurrences of 'search_term' on the line or lines, not just the first occurrence.
The "m_cells->a" part of the search term is a literal match. Then it gets interesting.
Many characters have special meaning in a regex, and if you want to use the character literally, without the special meaning, then you have to "escape" it, by putting a \ in front.
Thus \[ and \] match the literal '[' and ']' characters.
Then we have the opposite case: literal characters that we want to treat as special regex entities.
\s matches white*s*pace (space, tab, etc.).
\w matches "*w*ord" characters (letters, digits, and underscore _).
(. matches any character (except a newline). \d matches digits. There are more...)
If a character is not followed by a quantifier, then exactly one such character matches. Thus, \s will match one space or tab, but not fewer or more.
\+ is a quantifier, and means "one or more". (\? matches 0 or 1; * (with no backslash) matches any number: zero or more. Warning: matching on zero occurrences takes a little getting used to; when you're first learning regexes, you don't always get the results you expected. It's also possible to match on an arbitrary exact number or range of occurrences, but I won't get into that here.)
\( and \) work together to form a "capturing group". This means that we don't just want to match on these characters, we also want to remember them specially so that we can do something with them later. You can have any number of capturing groups, and they can be nested too. You can refer to them later by number, starting at 1 (not 0). Just start counting (escaped) left-parantheses from the left to determine the number.
So here, we are matching a space followed by a group (which we will capture) of at least one "word" character followed by a space, within the square brackets.
Then section between the second and third / is the replacement text.
The "c" is literal.
\1 means the first captured group, which in this case will be the "Id".
In summary, we are finding text that matches the given description, capturing part of it, and replacing the entire match with the replacement text that we have constructed.
Perhaps a final suggestion: c after the final / (doesn't matter whether it comes before or after the 'g') enables *c*onfirmation: vim will highlight the characters to be replaced and will show the replacement text and ask whether you want to go ahead. Great for learning.
Yes, regexes are complicated, but super powerful and well worth learning. Once you have them internalized, they're actually fairly easy. I suggest that, as with learning vim itself, you start with the basics, get fluent in them, and then incrementally add new features to your repertoire.
Good luck and have fun.
I have several functions that start with get_ in my code:
get_num(...) , get_str(...)
I want to change them to get_*_struct(...).
Can I somehow match the get_* regex and then replace according to the pattern so that:
get_num(...) becomes get_num_struct(...),
get_str(...) becomes get_str_struct(...)
Can you also explain some logic behind it, because the theoretical regex aren't like the ones used in UNIX (or vi, are they different?) and I'm always struggling to figure them out.
This has to be done in the vi editor as this is main work tool.
Thanks!
To transform get_num(...) to get_num_struct(...), you need to capture the correct text in the input. And, you can't put the parentheses in the regular expression because you may need to match pointers to functions too, as in &get_distance, and uses in comments. However, and this depends partially on the fact that you are using vim and partially on how you need to keep the entire input together, I have checked that this works:
%s/get_\w\+/&_struct/g
On every line, find every expression starting with get_ and continuing with at least one letter, number, or underscore, and replace it with the entire matched string followed by _struct.
Darn it; I shouldn't answer these things on spec. Note that other regex engines might use \& instead of &. This depends on having magic set, which is default in vim.
For an alternate way to do it:
%s/get_\(\w*\)(/get_\1_struct(/g
What this does:
\w matches to any "word character"; \w* matches 0 or more word characters.
\(...\) tells vim to remember whatever matches .... So, \(w*\) means "match any number of word characters, and remember what you matched. You can then access it in the replacement with \1 (or \2 for the second, etc.)
So, the overall pattern get_\(\w*\)( looks for get_, followed by any number of word chars, followed by (.
The replacement then just does exactly what you want.
(Sorry if that was too verbose - not sure how comfortable you are with vim regex.)
I have a line in a source file: [12 13 15]. In vim, I type:
:%s/\([0-90-9]\) /\0, /g
wanting to add a coma after 12 and 13. It works, but not quite, as it inserts an extraspace [12 , 13 , 15].
How can I achieve the desired effect?
Use \1 in the replacement expression, not \0.
\1 is the text captured by the first \(...\). If there were any more pairs of escaped parens in your pattern, \2 would match the text capture between the pair starting at the second \(, \3 at the third \(, and so on.
\0 is the entire text matched by the whole pattern, whether in parentheses or not. In your case this includes the space at the end of your pattern.
Also note that [0-90-9] is the same as [0-9]: each [...] collection matches just one character. It happens to work anyway, because in your data ‘a digit followed by a space’ matches in the same places as ‘2 digits followed by a space’. (If you actually needed to only insert commas after 2 digits, you could write [0-9][0-9].)
"I have a line in a source file:..."
then you type :%s/... this will do the substitution on all lines, if it matched. or that is the single line in your file?
If it is the single line, you don't have to group, or [0-9], just :%s/ \+/,/g will do the job.
The fine answers already point interesting solutions, but here's another one,
making use of the \zs, which marks the start of the match. In this pattern:
/[0-9]\zs /
The searched text is /[0-9] /, but only the space counts as a match. Note
that you can use the class \d to simplify the digit character class, so the
following command shall work for your needs:
:s/\d\d\zs /, /g ; matches only the space, replace by `, '
You said you have multiple lines and these changes are only to certain lines.
You can either visually select the lines to be changed or use the :global
command, which searches for lines matching a pattern and applies a command to
them. Now you'd need to build an expression to match the lines to be changed
in a less precise as possible way. If the lines that begins with optional
spaces, a [ and two digits are the only lines to be matched and no other
ones, then this would work for you:
:g/\s*[\d\d/s/\d\d\zs /, /g
Check the help for pattern.txt for \ze and similar and
:global.
Homework: use the help to understand \zs and see how this works:
:s/\d\d\zs\ze /,/g