How to capture a string between parentheses? - string

str = "fa, (captured)[asd] asf, 31"
for word in str:gmatch("\(%a+\)") do
print(word)
end
Hi! I want to capture a word between parentheses.
My Code should print "captured" string.
lua: /home/casey/Desktop/test.lua:3: invalid escape sequence near '\('
And i got this syntax error.
Of course, I can just find position of parentheses and use string.sub function
But I prefer simple code.
Also, brackets gave me a similar error.

The escape character in Lua patterns is %, not \. So use this:
word=str:match("%((%a+)%)")
If you only need one match, there is no need for a gmatch loop.
To capture the string in square brackets, use a similar pattern:
word=str:match("%[(%a+)%]")
If the captured string is not entirely composed of letters, use .- instead of %a+.

lhf's answer likely gives you what you need, but I'd like to mention one more option that I feel is underused and may work for you as well. One issue with using %((%a+)%) is that it doesn't work for nested parentheses: if you apply it to something like "(text(more)text)", you'll get "more" even though you may expect "text(more)text". Note that you can't fix it by asking to match to the first closing parenthesis (%(([^%)]+)%)) as it will give you "text(more".
However, you can use %bxy pattern item, which balances x and y occurrences and will return (text(more)text) in this case (you'd need to use something like (%b()) to capture it). Again, this may be overkill for your case, but useful to keep in mind and may help someone else who comes across this problem.

Related

Split only when quotes contain 3 characters or more

I'm trying to split a given string when two quotes appear and they contain at least 3 characters(I also have to split whenever . or , appear). So, something like hello"example"hello,cat should return [hello;example;hello;cat].
I came up with:
re.split("\'(...+)\'|\.|,","hello'example'hello,cat")
This works fine with the quotes, but whenever it split for . or , this happens:
['hello', 'example', 'hello', None, 'cat']
I found out the capture group is the one that causes it (the None in the middle of the list), but it is the only way I know to keep the content.
Please keep in mind that I have to do as few as possible computations because the program shall work with huge files, also I'm not very experienced with Python so sorry if I did something obvious wrong.
Try just:
re.split("\'|\.|,", "hello'example'hello,cat")
It's tricky because the open quote and close quote is the exact same character. I think you'd have to use a negative look behind to exclude any single quote that is preceded by 0, 1 or 2 characters and another single quote. In addition, you'd have to use a positive lookahead. This works in javascript.
re.split("(?<!'(.?|..))'(?=[^']{3,})|\.|\,", "hello'example'hello,cat")
But it doesn't look like python supports variable-length lookbehinds. Also, this won't work if there is a lone single quote (apostrophe).

Nodejs equivalent of c sscanf

I need a function that behaves similar to the behavior of sscanf
For example, let's suppose we have a format string that looks like this (the function I'm looking for doesn't have to be exactly like this, but something similar)
"This is normal text that has to exactly match, but here is a ${var}"
And have return/modify a variable to look like
{'var': <whatever was there>}
After researching this for a while, the only things I could actually find was scanf, but that takes input form stdin, and not a string
I am aware that there is a regex solution for this, but I'm looking for a function that does this without the need for regex (regex is slow). However, if there is no other solution for this, I will accept a regex solution.
The normal solution for this in most languages that have regular expressions built-in is to use regular expressions.
If you're not used to or don't like regular expressions I'm sorry. Most of the programming world have assumed that knowledge of regular expressions is mandatory.
In any case. The normal solution to this is string.prototype.match:
let text = get_string_to_scan();
let match = text.match(/This is normal text that has to exactly match, but here is a (.+)/);
if (match) { // match is null if no match is found
// The result you want is in match[1]
console.log('value of var is:', match[1]);
}
What pattern you put in your capture group (the (..) part) depends on what you want. The code above captures anything at all including spaces and special characters.
If you just want to capture a "word", that is, printable characters without spaces, then you can use (\w+):
text.match(/This is normal text that has to exactly match, but here is a (\w+)/)
If you want to capture a word with only letters but not numbers you can use ([a-zA-Z]+):
text.match(/This is normal text that has to exactly match, but here is a ([a-zA-Z]+)/)
The flexibility of regular expression is why other methods of string scanning are usually not supported in languages that have had regular expression built-in since the beginning. But of course, flexibility comes with complexity.
Do you mean to have the ${var} to act as a placeholder? If so you could do it by replacing the " with the backtick:
console.log(`This is normal text that has to exactly match, but here is a ${"whatever was there"}`)

vim non-greedy unexpected behavior

I'm using vim (version 7.3).
On the following line
1xAxBx4
where A and B can be any alphanumerical character, I want to replace xBx4 with foo. I tried the following substitution command
:s/x.\{-}x4/foo/
and get 1foo instead of what I expected (1xAfoo). I can get 1xAfoo if I use this substitution command
:s/x[^A]x4/foo/
but this is too specific and won't be helpful if I want to replace on multiple lines, as "A" could be a different character on each line.
Why the unexpected behavior with \.{-}? Or is this exactly what one would expect, but I'm just misunderstanding the syntax?
Though you've correctly used the non-greedy \{-} quantifier, because there's no consumption before, it still will start matching at the first x, and then match as few as possible. Because that works, there's no backtracking.
Now, you need to add a greedy match before your expression, yet do not consume those characters. This can be achieved with \zs to let the match only start afterwards:
:s/.*\zsx.\{-}x4/foo/
this is not the use case for "non-greedy".
x.\{-}x4 will make sense for example you want to replace:
xAAAx4BBBx4CCCx4 -> ######BBBx4CCCx4
without the usage of \{-} the result would be ######
if it is known that only one single character between x and x4, you just use x.x4 or if you want to avoid space to be selected, use x\Sx4

Add parenthesis to vim string

I'm having a bit of trouble using parenthesis in a vim string. I just need to add a set of parenthesis around 3 digits, but I can't seem to find where I'm suppose to correctly place them. So for example; I would have to place them around a phone number such as: 2015551212.
Right now I have a strings that separates the numbers and puts a hyphen between them. For example; 201 555-1212. So I just need the parenthesis. The final result should look like: (201) 555-1212
The string I have so far is this: s/\(\d\{3}\)\(\d\{3}\)/\1 \2-/g
How might I go about doing this?
Thanks
Just add the parens around the \1 in your replacement.
s/\(\d\{3\}\)\(\d\{3\}\)/(\1) \2-/g
If you want to go in reverse, and change "(800) 555-1212" to "8005551212", you can use something like this:
s/(\(\d\d\d\))\ \(\d\d\d\)-\(\d\d\d\d\)/\1\2\3/g
Instead of the \d\d\d, you could use \d\{3\}, but that is more trouble to type.

replacing part of regex matches

I have several functions that start with get_ in my code:
get_num(...) , get_str(...)
I want to change them to get_*_struct(...).
Can I somehow match the get_* regex and then replace according to the pattern so that:
get_num(...) becomes get_num_struct(...),
get_str(...) becomes get_str_struct(...)
Can you also explain some logic behind it, because the theoretical regex aren't like the ones used in UNIX (or vi, are they different?) and I'm always struggling to figure them out.
This has to be done in the vi editor as this is main work tool.
Thanks!
To transform get_num(...) to get_num_struct(...), you need to capture the correct text in the input. And, you can't put the parentheses in the regular expression because you may need to match pointers to functions too, as in &get_distance, and uses in comments. However, and this depends partially on the fact that you are using vim and partially on how you need to keep the entire input together, I have checked that this works:
%s/get_\w\+/&_struct/g
On every line, find every expression starting with get_ and continuing with at least one letter, number, or underscore, and replace it with the entire matched string followed by _struct.
Darn it; I shouldn't answer these things on spec. Note that other regex engines might use \& instead of &. This depends on having magic set, which is default in vim.
For an alternate way to do it:
%s/get_\(\w*\)(/get_\1_struct(/g
What this does:
\w matches to any "word character"; \w* matches 0 or more word characters.
\(...\) tells vim to remember whatever matches .... So, \(w*\) means "match any number of word characters, and remember what you matched. You can then access it in the replacement with \1 (or \2 for the second, etc.)
So, the overall pattern get_\(\w*\)( looks for get_, followed by any number of word chars, followed by (.
The replacement then just does exactly what you want.
(Sorry if that was too verbose - not sure how comfortable you are with vim regex.)

Resources