What is the right pattern to for numbers with negative symbol?

What is the right pattern to for numbers with negative symbol? - string

I have a string of numbers separated by spaces and I need to store them in a table but for some reason negative symbol is not getting recognize.
cord = "-53 2 -21"
map = {}
for num in cord:gmatch("%w+") do
table.insert(map, num)
end
map[1], map[2], map[3] = tonumber(map[1]), tonumber(map[2]), tonumber(map[3])
print(map[1])
print(map[2])
print(map[3])
This is the output I'm getting:
53
2
21
I think the problem is with the pattern I'm using, what should I change?

The pattern "%w" is for alphanumeric characters, which doesn't include -, use this pattern instead:
"%-?%w+"
or better:
"%-?%d+"
since numbers are all you need.

%w+ does not attempt to mach only numbers, so try %S+ to get all "words", that is, all sequences of non-zero characters.
If you want to match only numbers, try %-?%d+. Note the optional minus sign in the pattern. Note also that you must escape the minus sign.

Related

How to substitute a repeating character with the same number of a different character in regex python?

Assume there's a string
"An example striiiiiing with other words"
I need to replace the 'i's with '*'s like 'str******ng'. The number of '*' must be same as 'i'. This replacement should happen only if there are consecutive 'i' greater than or equal to 3. If the number of 'i' is less than 3 then there is a different rule for that. I can hard code it:
import re
text = "An example striiiiing with other words"
out_put = re.sub(re.compile(r'i{3}', re.I), r'*'*3, text)
print(out_put)
# An example str***iing with other words
But number of i could be any number greater than 3. How can we do that using regex?

The i{3} pattern only matches iii anywhere in the string. You need i{3,} to match three or more is. However, to make it all work, you need to pass your match into a callable used as a replacement argument to re.sub, where you can get the match text length and multiply correctly.
Also, it is advisable to declare the regex outside of re.sub, or just use a string pattern since patterns are cached.
Here is the code that fixes the issue:
import re
text = "An example striiiiing with other words"
rx = re.compile(r'i{3,}', re.I)
out_put = rx.sub(lambda x: r'*'*len(x.group()), text)
print(out_put)
# => An example str*****ng with other words

Why doesn't this RegEx match anything?

I've been trying for about two hours now to write a regular expression which matches a single character that's not preceded or followed by the same character.
This is what I've got: (\d)(?<!\1)\1(?!\1); but it doesn't seem to work! (testing at https://regex101.com/r/whnj5M/6)
For example:
In 1111223 I would expect to match the 3 at the end, since it's not preceded or followed by another 3.
In 1151223 I would expect to match the 5 in the middle, and the 3 at the end for the same reasons as above.
The end goal for this is to be able to find pairs (and only pairs) of characters in strings (e.g. to find 11 in 112223 or 44 in 123544) and I was going to try and match single isolated characters, and then add a {2} to it to find pairs, but I can't even seem to get isolated characters to match!
Any help would be much appreciated, I thought I knew RegEx pretty well!
P.S. I'm testing in JS on regex101.com because it wouldn't let me use variable length lookbacks in Python on there, and I'm using the regex library to allow for this in my actual implementation.

Your regex is close, but by using simply (\d) you are consuming characters, which prevents the other match from occurring. Instead, you can use a positive lookahead to set the capture group and then test for any occurrences of the captured digit not being surrounded by copies of itself:
(?=.*?(.))(?<!\1)\1(?!\1)
By using a lookahead you avoid consuming any characters and so the regex can match anywhere in the string.
Note that in 1151223 this returns 5, 1 and 3 because the third 1 is not adjacent to any other 1s.
Demo on regex101 (requires JS that supports variable width lookbehinds)

The pattern you tried does not match because this part (\d)(?<!\1) can not match.
It reads as:
Capture a digit in group 1. Then, on the position after that captured
digit, assert what is captured should not be on the left.
You could make the pattern work by adding for example a dot after the backreference (?<!\1.) to assert that the value before what you have just matched is not the same as group 1
Pattern
(\d)(?<!\1.)\1(?!\1)
Regex demo | Python demo
Note that you have selected ECMAscript on regex101.
Python re does not support variable width lookbehind.
To make this work in Python, you need the PyPi regex module.
Example code
import regex
pattern = r"(\d)(?<!\1.)\1(?!\1)"
test_str = ("1111223\n"
"1151223\n\n"
"112223\n"
"123544")
matches = regex.finditer(pattern, test_str)
for matchNum, match in enumerate(matches, start=1):
print(match.group())
Output
22
11
22
11
44

#Theforthbird has provided a good explanation for why your regular explanation does not match the characters of interest.
Each character matched by the following regular expression is neither preceded nor followed by the same character (including characters at the beginning and end of the string).
r'^.$|^(.)(?!\1)|(?<=(.))(?!\2)(.)(?!\3)'
Demo
Python's re regex engine performs the following operations.
^.$ match the first char if it is the only char in the line
| or
^ match beginning of line
(.) match a char in capture group 1...
(?!\1) ...that is not followed by the same character
| or
(?<=(.)) save the previous char in capture group 2...
(?!\2) ...that is not equal to the next char
(.) match a character and save to capture group 3...
(?!\3) ...that is not equal to the following char
Suppose the string were "cat".
The internal string pointer is initially at the beginning of the line.
"c" is not at the end of the line so the first part of the alternation fails and the second part is considered.
"c" is matched and saved to capture group 1.
The negative lookahead asserting that "c" is not followed by the content of capture group 1 succeeds, so "c" is matched and the internal string pointer is advanced to a position between "c" and "a".
"a" fails the first two parts of the assertion so the third part is considered.
The positive lookbehind (?<=(.)) saves the preceding character ("c") in capture group 2.
The negative lookahead (?!\2), which asserts that the next character ("a") is not equal to the content of capture group 2, succeeds. The string pointer remains just before "a".
The next character ("a") is matched and saved in capture group 3.
The negative lookahead (?!\3), which asserts that the following character ("t") does not equal the content of capture group 3, succeeds, so "a" is matched and the string pointer advances to just before "t".
The same steps are performed when evaluating "t" as were performed when evaluating "a". Here the last token ((?!\3)) succeeds, however, because no characters follow "t".

How to write a better regex in python?

I have two scenarios to match . Length should be exactly 16.
Pattern should contain A-F,a-f,0-9 and '-' in 1st case.
AC-DE-48-23-45-67-AB-CD
ACDE48234567ABCD
I have tried with r'^([0-9A-Fa-f]{16})$|(([0-9A-Fa-f]{2}\-){7}[0-9A-Fa-f]{2})$'this , which is working fine . Looking for better expression .

You can simplify the regex by considering the string to be a group of two hex digits followed by an optional -, followed by 6 similar groups (i.e. if the first group had a -, the subsequent ones must too), followed by a group of 2 hex digits:
^[0-9A-Fa-f]{2}(-?)([0-9A-Fa-f]{2}\1){6}[0-9A-Fa-f]{2}$
Use of the re.I flag allows you to remove the a-f from the character classes:
^[0-9A-F]{2}(-?)([0-9A-F]{2}\1){6}[0-9A-F]{2}$
You can also simplify slightly further by replacing 0-9 by \d in the character classes (although personally I find 0-9 easier to read):
^[\dA-F]{2}(-?)([\dA-F]{2}\1){6}[\dA-F]{2}$
Demo on regex101
Sample python code:
import re
strs = ['AC-DE-48-23-45-67-AB-CD',
'ACDE48234567ABCD',
'AC-DE48-23-45-67-AB-CD',
'ACDE48234567ABC',
'ACDE48234567ABCDE']
for s in strs:
print(s + (' matched' if re.match(r'^[0-9A-F]{2}(-?)([0-9A-F]{2}\1){6}[0-9A-F]{2}$', s, re.I) else ' didn\'t match'))
Output
AC-DE-48-23-45-67-AB-CD matched
ACDE48234567ABCD matched
AC-DE48-23-45-67-AB-CD didn't match
ACDE48234567ABC didn't match
ACDE48234567ABCDE didn't match

Efficient way to insert characters between other characters in a string

What is an efficient way in MATLAB to replace/insert one symbol (in series of symbols) with several others that correspond to the one that is being replaced?
For example, consider having a string Eq: Eq = 'A*exp(-((x-xc)/w)^2)'. Is there a way to replace * with .*, / with ./,\ with .\, and ^ with .^ without writing four separate strrep() lines?

Regular expressions will do the job nicely. Regular expressions simply find patterns in text. You specify what kind of pattern you are looking for by a regular expression, and the output gives you the locations of where the pattern occurred.
For our particular case, not only do we want to find where patterns occur, we also want to replace those patterns with something else. Specifically, use the function regexprep from MATLAB to replace matches in a string with something else. What you want to do is replace all *, /, \ and ^ symbols by adding a . in front of each.
How regexprep works is that the first input is the string you're looking at, the second input is a pattern that you're trying to find. In our case, we want to find any of *, /, \ and ^. To specify this pattern, you put those desired symbols in [] brackets. Regular expressions reserve \ as a special symbol to delineate characters that can be parsed as a regular expression but actually aren't. As such, you need to use \\ for the \ character and \^ for the ^ character. The third input is what you want to replace each match with. In our case, we simply want to reuse each matched character, but we add a . at the beginning of the match. This is done by doing \.$0 in the regular expression syntax. $0 means to grab the first token produced by a match... which is essentially the matched symbol from the pattern. . is also a reserved keyword using regular expressions, so we must prepend this symbol with a \ character.
Without further ado:
>> Eq = 'A*exp(-((x-xc)/w)^2)';
>> out = regexprep(Eq, '[*/\\\^]', '\.$0')
out =
A.*exp(-((x-xc)./w).^2)
The pattern we are looking for is [*/\\\^], which means that we want to find any of *, /, \ - denoted as \\ in regex, and \^ - denoted as ^ in regex. We want to find any of these symbols and replace them with the same symbol by adding a . character in front - \.$0.
As a more complicated example, let's make sure that we include all of the symbols you're looking for in a sample equation:
>> A = 'A*exp(-((x-xc)/w)^2) \ b^2';
>> out = regexprep(A, '[*/\\\^]', '\.$0')
out =
A.*exp(-((x-xc)./w).^2) .\ b.^2

I'd go with regexp as in rayryeng's answer. But here's another approach, just to provide an alternative.
ops = '*/\^'; %// operators that need a dot
ii = find(ismember(Eq, ops)); %// find where dots should be inserted
[~, jj] = sort([1:numel(Eq) ii-.5]); %// will be used to properly order the result
result = [Eq repmat('.',1,numel(ii))]; %// insert dots at the end
result = result(jj); %// properly order the result
And a variant:
ops = '*/\^'; %// operators that need a dot
ii = find(ismember(Eq, ops)); %// find where dots should be inserted
jj = sort([1:numel(Eq) ii-.5]); %// dot locations are marked with fractional part
result = Eq(ceil(jj)); %// repeat characters where the dots will be placed
result(mod(jj,1)>0) = '.'; %// place dots at indices with fractional part

The vectorize function already does almost all of what you want except that it does not convert mldivide (\) to ldivide (.\).
By "efficient," do you mean fewer lines of code or faster? Regular expressions are almost always slower than other approaches and less readable. I don't think they're necessary or a good choice in this case. If you only need to convert your string once, then speed is less of a concern than readability (strrep will still be faster). If you need to do it many times, this simple code that you alluded to is 4–5 times faster than regexrep for short strings like your example (and much faster for longer strings):
out = strrep(Eq,'*','.*');
out = strrep(out,'/','./');
out = strrep(out,'\','.\');
out = strrep(out,'^','.^');
If you want one line, use:
out = strrep(strrep(strrep(strrep(Eq,'*','.*'),'/','./'),'\','.\'),'^','.^');
which will also be slightly faster still. Or create your own version of vectorize and call that.
Where regular expressions shine is in more complex cases, e.g., if your string is already partially vectorized: Eq = 'A.*exp(-((x-xc)/w)^2)'. Even still, the vectorize function just uses strrep and then calls strfind to "remove any possible '..*', '../', etc." and replace them with the proper element-wise operators because it's faster (symbolic math strings can get very large, for example).

What does the backward slash (\) do in a format mask?

Can't seem to find the answer in Google. Is it in a similar category of symbols like $ and ! or something else entirely?
The formula I found on Google is:
=text(A2,"\0.0,,\M")
which converts 1500000 to 1.5M.

In your given example (a format mask), a backslash acts as an escape sequence. Basically, it is equivalent to wrapping the next character in double quotations. This is done to get the literals for 0 and M, since there are a number of characters that have special meanings unless you use an escape sequence.
Date-formatting and time-formatting characters (a, c, d, h, m, n, p, q, s, t, w, y, /, and :), the numeric-formatting characters (#, 0, %, E, e, comma, and period), and the string-formatting characters (#, &, <, >, and !)all must be escaped to be accessed literally.
Due to the slight confusion as to what context you don't understand \ in, I have added a bit of additional information.
In cell formatting:
The backslash \ is used to escape special characters, like a colon.
For instance, if you wanted
100 : 1
since the colon is a special character, you would have to use \ (an escape sequence) to access it as a literal, like this:
100 \: 1
which outputs 100 : 1 as desired.
Another example of use would be the \n VB newline character (different language obviously), which will escape to a new line in your output. In this case, the \ escapes the literal of the key n to access the special constant vbNewLine.
It follows in many programming languages, \ followed by something is often an escape sequence, and used to avoid or access a special characterization of a given key or character.
In VBA:
It is also worth noting that in VBA, the backslash character can be used to force an evaluation of the integer equivalent of a quotient, for instance:
100\33 = Int(100/33)
This is an often overlooked way to divide as well as round down to an integer in a single step.
In workbooks:
One of the options to create a range name in your workbook is to precede the name of your desired range name with a backslash. For instance, a valid range name would be \HLF1

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

What is the right pattern to for numbers with negative symbol? - string

The pattern "%w" is for alphanumeric characters, which doesn't include -, use this pattern instead: "%-?%w+" or better: "%-?%d+" since numbers are all you need.

%w+ does not attempt to mach only numbers, so try %S+ to get all "words", that is, all sequences of non-zero characters. If you want to match only numbers, try %-?%d+. Note the optional minus sign in the pattern. Note also that you must escape the minus sign.

Related

How to substitute a repeating character with the same number of a different character in regex python?

Why doesn't this RegEx match anything?

How to write a better regex in python?

Efficient way to insert characters between other characters in a string

What does the backward slash (\) do in a format mask?

Categories

Resources