XSD Regular expression to match first 4 characters - xsd

XSD Regular expression to match first 4 characters and it should not match any other chars for group. I have tried [A]{4}[^A]*, but the rest values are been taken by group.

Related

Is there a way to avoid g4 tokenize a variable name as a laxer rule when we want?

I defined some lexer rules as given below:
DATE: D A T E ;
ID : '&'*? IDENTIFIER ;
IDENTIFIER : [a-zA-Z_] [a-zA-Z_0-9]*;```
But for the line of coding as given below:
keep date column1 column2;
Because in here the date is a variable name instead of a keyword DATE. So my question is that is it possible for me to let g4 to treat the date as a lexer rule of ID but not a DATE?
The ANTLR Lexer is, in no way, influenced by your parser rules.
It operates directly against the input stream of characters, and, if multiple rules match a sequence of characters, the tie is broken by these two rules.
1 - The rule that matches the longest stream of input characters will take precedence. (In your case the IDENTIFIER rule and the DATE rule, both match the "date" sequence of characters.
2 - If two rules match the same length character sequence, the first rule "wins". (In your case, the DATE rule occurs first, so the "date" sequence of characters will be recognized as a DATE token.
It makes absolutely no difference that a parse rule might be looking for an IDENTIFIER; the Lexer has tokenized the input without influence from the parser rules, and the parser rules match the input stream of tokens generated from the Lexer.
IF you want "date" in this context to be acceptable, then you'll need to have your parser rule accept both an IDENTIFIER and a DATE token in that parser rule.

Regex: table line matcher

I want to parse a table line using regex.
Input
|---|---|---|
|---|---|---|
So far I've come up with this regex:
/^(?<indent>\s*)\|(?<cell>-+|)/g
Regex101 Link: https://regex101.com/r/wzMYxd/1
But this regex is incomplete.
This only finds the first cell --|, but I want to find all the following cells as different ----|.
Question: Can we catch the following cells with the same pattern using the regex?
ExpectedOutput: groups with array of matched cells: ["---|", "----|", "---|"]
Note: no constant number of - is required
How about first verifying, if the line matches the pattern:
^[ \t]*\|(?:-+\|)+$
See this demo at regex101 - If it matches, extract the stuff:
^(?<indent>[\t ]*)\||(?<cell>-+)\|
Another demo at regex101 (explanation on the right side)
With just one regex maybe by use of sticky flag y and a lookahead for validation:
/^(?<indent>[ \t]*)\|(?=(?:-+\|)+$)|(?!^)(?<cell>-+)\|/gy
One more demo at regex101
The lookahead checks once after the first | if the rest of the string matches the pattern. If this first match fails, due to the y flag (matches are "glued" to each other) the rest of the pattern fails too.

Mark words in notepad++ including dash (-)

I would like to mark in Notepad++ the sql scripts in a text log. The sql files have this format in the text:
AAAAAAAA.BBBBBBBBBBB.sql
So what I execute is this sentence in search menu:
\w*.sql
As I should get BBBBBBBBBBB.sql. The point is that in some script names there are dashes (-), and when that happens I dont get the whole name, but just the end after the last dash.
For example, in:
AAAAAAAA.BBBBB-CCCCCCC.sql
I would like to get BBBBB-CCCCCCC.sql, but I just get CCCCCCC.sql
Is there any possible formula to get them?
If the match can not start and end with a hyphen:
\w+(?:-\w+)*\.sql
\w+ Match 1+ word characters
(?:-\w+)* Optionally match - and 1+ word characters
\.sql Match .sql
See a regex demo.
Note that in your pattern the \w* can also match 0 occurrences and that the . can match any character if it is not escaped.
Another option could be using a character class to match either - or a word character, but this would also allow to mix and match like --a--.sql
[\w-]+\.sql
See another regex demo.

What do you understand by this RegEx?

I´m working with VBA and trying to split a string into three columns, almost all strings are like Company Name 3567782 Agent Name.pdf
With this pattern I want to match all the text before a space and digits (1st group), the digits (2nd group) and all the text after the space and before the .pdf (3rd group).
strPattern = "^(.+)\n(\d{4,10})\n(.+).pdf"
I recall spaces in python are \s but saw in VBA are \n.
Can you help me find the right pattern for what I´m looking for?
As I put in my comment, I use the https://regex101.com site. There are others but I find this one the most helpful to me.
When I put in your regex
^(.+)\n(\d{4,10})\n(.+).pdf
and test string
Company Name 3567782 Agent Name.pdf
the first thing I notice is that the regex does not match the test string (see right side under MATCH INFORMATION).
Here are a couple things that I saw:
\n is newline, not space. In regex, space is " ".
Your last "." in ".pdf" is not registering as a literal period, it's a token that matches any character. To match a literal period, you need \.
If we change those two things it returns three groups that seem to match what you are looking for.
^(.+) (\d{4,10}) (.+)\.pdf
It looks like for the digits, you are looking for between 4 and 10 digits. If that's correct, it looks like your regex is good. You could put in a handful of example strings into the TEST STRING area and make sure that it works in all cases.
I'd use either of these:
(?:(?:([a-zA-Z]+\.?)|(\d+)))
capture a-Z greedy with a possible . to allow for the .pdf or capture digits
this version excludes the space [ ] or \s
or keep the search structured so you can control what goes in and out of each column
^(\w+\s\w+)|(\d+)|(\w+\s\w+\.\w+$)
\b or ^ - word boundary or start of string
(\w+\s\w+) - 1st capture \w+ - any alpha numeric char greedily, followed by 1 x space (use \s* or \s+ for more), followed again by alpha numeric greedily
|(\d+) - alteration - \d+ - capture just digits
`|(\w+\s\w+.\w+$) - similar to 1st group but allows for the '.' of pdf and bounds to the end of string (\G or $).
you could optionally build the '.' into the 1st group like my top answer, but for neatness and better control I prefer the 2nd.

Show zeroes in a Regular Expression decimal

Can someone tell me how to get the zeroes to show in a regular Expression containing zeroes for decimals.
For Example 1,320.00 When I turn it to a Regular Expression the .00 disappears. I need them to show. Here is the formula I was working with.
(^\d*\.?\d*[0-9]+\d*$)|(^[0-9]+\d*\.\d*$)
Any help would be appreciated.
thanks,
Here's a pattern which will capture the full number, including commas and decimals:
^(\d{1,3},)*\d{1,3}\.\d\d$
The first group, (\d{1,3),), will match groups of one to three digits followed by a comma. It is followed by a *, so the pattern will match 0 or more of these groups (i.e. it will still match 320.00 and 12,312,122.00).
The second part, \d{1,3}\., will match the 1-3 digits preceding the decimal point.
Finally, \d\d$ match the two decimal points. It looked like you're trying to match US currency, so I hard-coded in 2 digits for readability, but if you need to match, say, one or more decimal points, try this:
^(\d{1,3},)*\d{1,3}\.\d+$
Here's a demo.

Resources