Lua string space, letters and digits - string

I want to check whether a string only contains letters, digits and space. I found this function to check that it's only letters and digits, but I'm not sure how to allow spaces too:
string:match( "%W" );
thanks.

The pattern %w matches an alphanumeric character. The uppercase version %W matches its complement, i.e, a non-alphanumeric character.
To get the complement of alphanumeric and whitespace, try the pattern [^%w%s].
str:match("[^%w%s]")
returns a non-nil value when a string is NOT only letters, digits, and spaces.

Related

Append characters based on the count of a match in Vim

I would like to append - at the end of each word match. But, the number of - appended should be based on the count of the match, so that the total number of characters in that line remain constant.
As shown in the example below, the total number of characters should be 6.
e.g.
ab
xyz
abcde
The above text should be replaced to:
ab----
xyz---
abcde-
You can use \= to substitute with an expression, see :h sub-replace-expression.
When the substitute string starts with \=, the remainder is interpreted as an expression.
The submatch() function can be used to obtain matched text. The whole matched text can be accessed with submatch(0). The text matched with the first pair of () with submatch(1). Likewise for further sub-matches in ().
So you can achieve it like this:
:[range]s//\=submatch(0) . repeat('-', 6-strlen(submatch(0)))/

Regex - Stop after finding the first pattern

For a string like this:
1. Jane, Doe2. Good, Jay3. Turn, Bob[key]
Either Jane, Doe needs to be extracted if no [key] is present then whatever is between 1. and 2.
(or)
Turn, Bob if [key] is present
Put another way:
If [key] is present, then the person before [key] needs to be extracted and the process stopped.
If [key] is not present, then pick up whoever is after 1.
I tried this but it pulls up both Jane, Doe and Turn, Bob
(\.([^\.])(.+)\[key\])|(1\.(.+)2\.)
How to stop after finding the first successful pattern, knowing that patterns are read left to right? [key] can be anyone - 1,2 or 3.
Thanks.
For these requirements, you may use this regex in Python with an alternation:
(?<=\d\.\s)[a-zA-Z, ]+(?=\[key])|(?<=1\.\s)(?!.*\[key])[a-zA-Z, ]+
RegEx Demo
RegEx Details:
(?<=\d\.\s): Positive lookbehind to assert that there is a digit followed by dot followed by a whitespace before the current position
[a-zA-Z, ]+: Match 1+ of letter, space or comma characters
(?=\[key]): Positive lookahead to assert that there is a text [key] after the current position
|: OR
(?<=1\.\s): Positive lookbehind to assert that there is a digit 1 followed by dot followed by a whitespace before the current position
(?!.*\[key]): Negative lookbehind to assert that there is no [key] text after the current position
[a-zA-Z, ]+: Match 1+ of letter, space or comma characters
Not sure why you put .+ into your regex but it's greedy and matches . Good, Jay3. Turn, Bob. so the left part of the alternation matches.
Suggest you remove the .+ on both sides of the alternation ( | ).

Regex to match between 2 and 5 characters, one of which must be alphabetic

I'm not great with regex and the following has me stumped.
I need to find all the matches in a string that are between 2 and 5 characters [A-Z0-9] only, and must contain at least one alphabetic character [A-Z]
So
A1 - Match
AAA - Match
AAAAAA - No Match
A1234 - Match
123 - No Match
A123A - Match
A - No Match
1 - No Match
A1B2C3 - No Match
I have tried this:
([A-Z0-9]*[A-Z][A-Z0-9]*){2,5}
But it doesnt limit the total length of the match to between 2 and 5 characters
You can use
\b(?=\d*[A-Z])[A-Z\d]{2,5}\b
\b(?=[A-Z0-9]{2,5}\b)[A-Z0-9]*[A-Z][A-Z0-9]*\b
See the regex demo #1 and the regex demo #2. Details:
\b - word boundary
(?=\d*[A-Z]) - after zero or more digits, there must be an uppercase ASCII letter
(?=[A-Z0-9]{2,5}\b) - there must be 2 to 5 alnum chars up to the word boundary
[A-Z0-9]* - zero or more uppercase ASCII letters or digits
[A-Z] - an uppercase ASCII letter
[A-Z\d]{2,5} - two to five uppercase ASCII letters or digits
\b - word boundary.
See the Python demo:
import re
text = "A1 AAA....A1234!!!!~A123A abc,AAAAAA,123,A,1,A1B2C3"
print(re.findall(r'\b(?=[A-Z0-9]{2,5}\b)[A-Z0-9]*[A-Z][A-Z0-9]*\b', text))
# => ['A1', 'AAA', 'A1234', 'A123A']
Try this one
^([A-Z][A-Z0-9]{1,4}|[A-Z0-9][A-Z][A-Z0-9]{,3}|[A-Z0-9]{1,2}[A-Z][A-Z0-9]{,3}|[A-Z0-9][A-Z][A-Z0-9])$

XML Schema validation pattern shall not allow string

I want to allow the alphanumeric characters except for the world "AAAA"
I am using the below regex
To allow alphanumeric characters <xs:pattern value="[A-Za-z0-9]{2,4}"/>
Not to allow AAAA as <xs:pattern value="[^A]{4}"/>
But if I combine both it does not work.
Please help
It is not easy to match strings using a regex. The pattern [^A]{4} does not mean not 4 occurrences of A. It means 4 occurrences of 'not A'.
I think something like this should work:
[A-Za-z0-9]{2,3} |
[B-Za-z0-9][A-Za-z0-9]{3} |
[A-Za-z0-9][B-Za-z0-9][A-Za-z0-9]{2} |
[A-Za-z0-9]{2}[B-Za-z0-9][A-Za-z0-9] |
[A-Za-z0-9]{3}[B-Za-z0-9]
which means,
a 2-char or 3-char alphanumeric string or
a 4 char alphanumeric string with the 1st char not 'A' or
a 4 char alphanumeric string with the 2nd char not 'A' or
a 4 char alphanumeric string with the 3rd char not 'A' or
a 4 char alphanumeric string with the 4th char not 'A'
There might be an easier solution, but I cannot think of it.

Matching only a <tab> that is between two numbers

How to match a tab only when it is between two numbers?
Sample script
209.65834 27.23204908
119.37987 15.03317082
74.240635 8.30561924
29.1014 0
931.8861 -100.00000
-16.03784 -8.30562
;
_mirror
l
;
29.1014 0
1028.10 0.00
n
_spline
935.4875 250
924.2026913 269.8820375
912.9178825 277.4506484
890.348265 287.3181854
(in the above script, the tabs are between the numbers, not the spaces) (blank lines are significant; there is nothing in them, but I can't lose them)
I wish to get a "," between the numbers. Tried with :%s/\t/\,/ but that will touch the empty lines too, and the end of lines.
Try this:
:%s/\(\d\)\t\(-\?\d\)/\1,\2/
\d matches any digit. -? means "an optional -. The pair of (escaped) parenthesis capture the match, and \1 refers to the first captured match, \2 refers to the second.
google://vim+regex -> http://vimregex.com/ ->
:%s/\([0-9]\)\t\([0-9]\)/\1,\2/gc
You have 2 groups of numbers here ([0-9]) and tab-symbols \t between them. Add some escape symbols and you have the answer.
g for multichange in single line, c for some asking.
\1 and \2 are matching groups (numbers in your case).
It's not really hard to find answer for questions like that by yourself.
try
:%s/\([0-9]\)\t\([0-9]\)/\1,\2/g
explanation - search the patten <digit>\t<digit> and remember the part that matches <digit> .
\( ... \) captures and remembers the part that matches.
\1 recalls the first captured digit, \2 the second captured digit.
so if the match was on 123\t789, <digit>,<digit> matches 3\t7
the 3 and 7 are rememberd as \1 and \2
or
:g/[0-9]/ s/\t/,/g
explanation - filter all lines with a digit, then substitute tabs with a comma on those lines

Resources