I have a string that contains only digits from 2 to 9, like '223488875662264442', it is guaranteed that it does not contain more than 3 adjacent similar digits, for example, it cannot contain '7777', but it can contain '27747772'.
I want to make a pattern that matches all the similar consecutive numbers, example:
> str = '44788895532244474568884511123331566';
> for n in string.gmatch(str,pat) do -- pat is the pattern
>> print(n);
> end
44
7
888
9
55
...
tried with patterns like '(%d)%1*' with no success.
I cannot use regex,I need to do it with Lua patterns.
Related
I pipe different values to bc.
If the value is a number, it works fine. If it's a string with lowercase letters, it returns 0 which makes sense to me, but if it's uppercase letters, bc converts it to 9 as the length of the input characters:
echo 1 | bc
1
echo aaa | bc
0
echo AAA | bc
999
echo FO | bc
99
echo null | bc
0
echo NULL | bc
9999
Why does bc have this behavior? What's the best way to work with unexpected string values?
According to https://www.gnu.org/software/bc/manual/html_mono/bc.html
(emphasis by me):
A simple expression is just a constant. bc converts constants into internal decimal numbers using the current input base, specified by the variable ibase. (There is an exception in functions.) The legal values of ibase are 2 through 16. Assigning a value outside this range to ibase will result in a value of 2 or 16. Input numbers may contain the characters 0-9 and A-F. (Note: They must be capitals. Lower case letters are variable names.) Single digit numbers always have the value of the digit regardless of the value of ibase. (i.e. A = 10.) For multi-digit numbers, bc changes all input digits greater or equal to ibase to the value of ibase-1. This makes the number FFF always be the largest 3 digit number of the input base.
So, assuming that your ibase is 10 your observation is explained.
It is unrelated to "unexected string values" or "the length of the input characters". bc does consider them (somewhat odd) attempts to provide numeric values and converts and uses them according to the quoted rule.
I would like to find all the number combinaitions without having 3 zero's in between.
There might be some delimiters (max 2 characters) in between the numbers.
I'm using python and I would like to perform this search with the regex.
Accepted numbers
This is number 1234 which should be accepted.
12-45
1 2 0 0 3 4 5
not accepted numbers:
1
12
123
1000
1000-2000
30000-31000
21 000-32 000-50 000
21 00 03 00 00
The regex with which I could come up is:
([\s\-]{0,2}\d(?!000)){4,}
My regex can find all the accepted numbers but it doesn't filter out all the excepted numbers.
See the results in regex
Actually this regex is used in python to remove the matched numbers from the text:
See python code
p.s. Delimiters are not only space but should be at least \s and dash.
p.s.s. The numbers might be in the middle of the string. So I think I cannot use ^ and $ in my regex.
You could assert not 3 zeroes in a row while matching optional delimiters in between.
\b(?![\d\s-]*?0(?:[\s-]*0){2})\d(?:[\s-]*\d){3,}\b
Explanation
\b A word boundary
(?! Negative lookahead, assert what is at the right is not
[\d\s-]*? Match any of a digit, whitespace char or - as least as possible
0(?:[\s-]*0){2} - ) Match a zere followed by 2 times a zero with optional delimiters in between
\d Match a digit
(?:[\s-]*\d){3,} Repeat 3 or more times matching a digit with optional delimiters in between
\b A word boundary
Regex demo
I'm not great with regex and the following has me stumped.
I need to find all the matches in a string that are between 2 and 5 characters [A-Z0-9] only, and must contain at least one alphabetic character [A-Z]
So
A1 - Match
AAA - Match
AAAAAA - No Match
A1234 - Match
123 - No Match
A123A - Match
A - No Match
1 - No Match
A1B2C3 - No Match
I have tried this:
([A-Z0-9]*[A-Z][A-Z0-9]*){2,5}
But it doesnt limit the total length of the match to between 2 and 5 characters
You can use
\b(?=\d*[A-Z])[A-Z\d]{2,5}\b
\b(?=[A-Z0-9]{2,5}\b)[A-Z0-9]*[A-Z][A-Z0-9]*\b
See the regex demo #1 and the regex demo #2. Details:
\b - word boundary
(?=\d*[A-Z]) - after zero or more digits, there must be an uppercase ASCII letter
(?=[A-Z0-9]{2,5}\b) - there must be 2 to 5 alnum chars up to the word boundary
[A-Z0-9]* - zero or more uppercase ASCII letters or digits
[A-Z] - an uppercase ASCII letter
[A-Z\d]{2,5} - two to five uppercase ASCII letters or digits
\b - word boundary.
See the Python demo:
import re
text = "A1 AAA....A1234!!!!~A123A abc,AAAAAA,123,A,1,A1B2C3"
print(re.findall(r'\b(?=[A-Z0-9]{2,5}\b)[A-Z0-9]*[A-Z][A-Z0-9]*\b', text))
# => ['A1', 'AAA', 'A1234', 'A123A']
Try this one
^([A-Z][A-Z0-9]{1,4}|[A-Z0-9][A-Z][A-Z0-9]{,3}|[A-Z0-9]{1,2}[A-Z][A-Z0-9]{,3}|[A-Z0-9][A-Z][A-Z0-9])$
If I have a string "213_str_12". I want to find digits only within first 5 characters so I would get only 212. Is that possible in bash?
Using only bash you can do
s='213_str_12'
f5="${s:0:5}"
echo "${f5//[^0-9]/}"
213
f5 contains the first 5 chars and then replaces all the non-digits to an empty string.
I have two scripts, A and B.
I want to execute them and read respectively two values. V and VALS.
V is just a floating point number, let's say 0.5
VALS has the following format:
1 10
2 20
3 60
4 45
and so on.
What I'm trying to do is to get a new variable where the second column of VALS (10, 20, ...) is divided by V.
As I understand this can be implemented with a mix of xargs and cut but I'm not really familiar with these tools.
#!/bin/bash
V=`./A`
VALS=`./B`
RESULT=#magic happens
The final result with the previous data should be:
1 20
2 40
3 120
4 90
Bash's builtin arithmetic expansion only works for integers. You can use awk for data extraction and floating point numbers.
V=`./A`
# No VALS needed
RESULT=($(./B | awk "{print \$2 / $V"}))
Note the escaped dollar sign in \$2.