regular epxression (sass variables) - node.js

(Node.js) I have to match all Sass variables from file. But I can have variables and mixins in one file. I need to update regular epxression to not match a variables from mixin directive or from mixin / function content (nested).
So only:
$test: true;
$white: #fff !default;
$sizes: (25: 0.25rem, 50: 0.5rem) !default;
Regular expression: /\$([^:]*)\s*:\s*([^;]*)\s*;/g
https://regex101.com/r/oRuWjS/1
$test: true;
$white: #fff !default;
$sizes: (
25: 0.25rem,
50: 0.5rem
) !default;
#mixin parent ($first, $second: "") {
.#{$first} {
#content;
}
}

Using
/#mixin[^(]*\([\s\S]*?^}$|\$([^:]*?)\s*:\s*([^;]*?)\s*;/gm
you may match from #mixin to the } that is alone on a line, and skip this match, else collect your other matches. See the regex demo.
var s = "$test: true;\n\n$white: #fff !default;\n\n$sizes: (\n 25: 0.25rem,\n 50: \n0.5rem\n) !default;\n\n#mixin parent ($first, $second: \"\") {\n .#{$first} {\n \n#content;\n }\n}\n";
var rx = /#mixin[^(]*\([\s\S]*?^}$|\$([^:]*?)\s*:\s*([^;]*?)\s*;/gm;
var m, res = [];
while (m=rx.exec(s)) {
if (m[1]) {
res.push([m[1], m[2]]);
}
}
console.log(res);
Details
#mixin[^(]*\([\s\S]*?^}$ - the alternative that will be skipped:
#mixin - a literal substring
[^(]* - 0+ chars other than (
\( - a (
[\s\S]*? - any 0+ chars as few as possible
^}$ - a } that is on a separate line
| - or
\$ - a $ char
([^:]*?) - Group 1: 0+ chars other than : as few as possible
\s*:\s* - a : enclosed with 0+ whitespaces
([^;]*?) - Group 2: 0+ chars other than : as few as possible
\s*; - 0+ whitespaces followed with ;.

Related

Golang Determine if String contains a String (with wildcards)

With Go, how would you determine if a string contains a certain string that includes wildcards? Example:
We're looking for t*e*s*t (the *'s can be any characters and any length of characters.
Input True: ttttteeeeeeeesttttttt
Input False: tset
Use the regexp package by converting the * in your pattern to the .* of regular expressions.
// wildCardToRegexp converts a wildcard pattern to a regular expression pattern.
func wildCardToRegexp(pattern string) string {
var result strings.Builder
for i, literal := range strings.Split(pattern, "*") {
// Replace * with .*
if i > 0 {
result.WriteString(".*")
}
// Quote any regular expression meta characters in the
// literal text.
result.WriteString(regexp.QuoteMeta(literal))
}
return result.String()
}
Use it like this:
func match(pattern string, value string) bool {
result, _ := regexp.MatchString(wildCardToRegexp(pattern), value)
return result
}
Run it on the Go PlayGround.
Good piece of code. I would offer one minor change. It seems to me that if you're using wildcards, then the absence of wildcards should mean exact match. To accomplish this, I use an early return....
func wildCardToRegexp(pattern string) string {
components := strings.Split(pattern, "*")
if len(components) == 1 {
// if len is 1, there are no *'s, return exact match pattern
return "^" + pattern + "$"
}
var result strings.Builder
for i, literal := range components {
// Replace * with .*
if i > 0 {
result.WriteString(".*")
}
// Quote any regular expression meta characters in the
// literal text.
result.WriteString(regexp.QuoteMeta(literal))
}
return "^" + result.String() + "$"
}
Run it on the Go Playground

non-fragment lexer rule x can match the empty string

What's wrong with the following antlr lexer?
I got an error
warning(146): MySQL.g4:5685:0: non-fragment lexer rule VERSION_COMMENT_TAIL can match the empty string
Attached source code
VERSION_COMMENT_TAIL:
{ VERSION_MATCHED == False }? // One level of block comment nesting is allowed for version comments.
((ML_COMMENT_HEAD MULTILINE_COMMENT) | . )*? ML_COMMENT_END { self.setType(MULTILINE_COMMENT); }
| { self.setType(VERSION_COMMENT); IN_VERSION_COMMENT = True; }
;
You are trying to convert my ANTLR3 grammar for MySQL to ANTLR4? Remove all the comment rules in the lexer and insert this instead:
// There are 3 types of block comments:
// /* ... */ - The standard multi line comment.
// /*! ... */ - A comment used to mask code for other clients. In MySQL the content is handled as normal code.
// /*!12345 ... */ - Same as the previous one except code is only used when the given number is a lower value
// than the current server version (specifying so the minimum server version the code can run with).
VERSION_COMMENT_START: ('/*!' DIGITS) (
{checkVersion(getText())}? // Will set inVersionComment if the number matches.
| .*? '*/'
) -> channel(HIDDEN)
;
// inVersionComment is a variable in the base lexer.
MYSQL_COMMENT_START: '/*!' { inVersionComment = true; setChannel(HIDDEN); };
VERSION_COMMENT_END: '*/' {inVersionComment}? { inVersionComment = false; setChannel(HIDDEN); };
BLOCK_COMMENT: '/*' ~[!] .*? '*/' -> channel(HIDDEN);
POUND_COMMENT: '#' ~([\n\r])* -> channel(HIDDEN);
DASHDASH_COMMENT: DOUBLE_DASH ([ \t] (~[\n\r])* | LINEBREAK | EOF) -> channel(HIDDEN);
You need a local inVersionComment member and a function checkVersion() in your lexer (I have it in the base lexer from which the generated lexer derives) which returns true or false, depending on whether the current server version is equal to or higher than the given version.
And for your question: you cannot have actions in alternatives. Actions can only appear at the end of an entire rule. This differs from ANTLR3.

ANRLR4 lexer semantic predicate issue

I'm trying to use a semantic predicate in the lexer to look ahead one token but somehow I can't get it right. Here's what I have:
lexer grammar
lexer grammar TLLexer;
DirStart
: { getCharPositionInLine() == 0 }? '#dir'
;
DirEnd
: { getCharPositionInLine() == 0 }? '#end'
;
Cont
: 'contents' [ \t]* -> mode(CNT)
;
WS
: [ \t]+ -> channel(HIDDEN)
;
NL
: '\r'? '\n'
;
mode CNT;
CNT_DirEnd
: '#end' [ \t]* '\n'?
{ System.out.println("--matched end--"); }
;
CNT_LastLine
: ~ '\n'* '\n'
{ _input.LA(1) == CNT_DirEnd }? -> mode(DEFAULT_MODE)
;
CNT_Line
: ~ '\n'* '\n'
;
parser grammar
parser grammar TLParser;
options { tokenVocab = TLLexer; }
dirs
: ( dir
| NL
)*
;
dir
: DirStart Cont
contents
DirEnd
;
contents
: CNT_Line* CNT_LastLine
;
Essentially each line in the stuff in the CNT mode is free-form, but it never begins with #end followed by optional whitespace. Basically I want to keep matching the #end tag in the default lexer mode.
My test input is as follows:
#dir contents
..line..
#end
If I run this in grun I get the following
$ grun TL dirs test.txt
--matched end--
line 3:0 extraneous input '#end\n' expecting {CNT_LastLine, CNT_Line}
So clearly CNT_DirEnd gets matched, but somehow the predicate doesn't detect it.
I know that this this particular task doesn't require a semantic predicate, but that's just the part that doesn't work. The actual parser, while it may be written without the predicate, will be a lot less clean if I simply move the matching of the the #end tag into the mode CNT.
Thanks,
Kesha.
I think I figured it out. The member _input represents the characters of the original input, thus _input.LA returns characters, not lexer token IDs (is that the correct term?). Either way, the numbers returned by the lexer to the parser have nothing to do with the values returned by _input.LA, hence the predicate fails unless by some weird luck the character value returned by _input.LA(1) is equal to the lexer ID of CNT_DirEnd.
I modified the lexer as shown below and now it works, even though it is not as elegant as I hoped it would be (maybe someone knows a better way?)
lexer grammar TLLexer;
#lexer::members {
private static final String END_DIR = "#end";
private boolean isAtEndDir() {
StringBuilder sb = new StringBuilder();
int n = 1;
int ic;
// read characters until EOF
while ((ic = _input.LA(n++)) != -1) {
char c = (char) ic;
// we're interested in the next line only
if (c == '\n') break;
if (c == '\r') continue;
sb.append(c);
}
// Does the line begin with #end ?
if (sb.indexOf(END_DIR) != 0) return false;
// Is the #end followed by whitespace only?
for (int i = END_DIR.length(); i < sb.length(); i++) {
switch (sb.charAt(i)) {
case ' ':
case '\t':
continue;
default: return false;
}
}
return true;
}
}
[skipped .. nothing changed in the default mode]
mode CNT;
/* removed CNT_DirEnd */
CNT_LastLine
: ~ '\n'* '\n'
{ isAtEndDir() }? -> mode(DEFAULT_MODE)
;
CNT_Line
: ~ '\n'* '\n'
;

ANTLR semantic predicate - consume only part of the match

I need to handle this sequences: <1>, <1-2>, <3-5 /0.5/>.
In ANTLR v3 I used these rules:
LPOINTY : ('<' REPEAT (PROBABILITY)? '>') => '<' // will consume only '<'
repeatOperator : LPOINTY_OR_ABNF_URI (XML_NM_TOKEN (weightOrProbability'>')?
In ANTLR v4, there is not allowed this opertor "=>", so I wrote this like that:
LPOINTY_OR_ABNF_URI // will return only digit, ex: 1, 1-2, 3-5
: '<' REPEAT '>' { setText(getText().substring(1, getText().length() - 1)); }
| '<' REPEAT WS+ { setText(getText().substring(1, getText().length())); }
;
repeatOperator
: LPOINTY_OR_ABNF_URI (WEIGHT_OR_PROBABILITY)? SHARP_BRACKET_RIGHT?
;
where tokens:
XML_NM_TOKEN - it match content of '<..>'
weightOrProbability and WEIGHT_OR_PROBABILITY - it match /0.5/
PROBABILITY - it match /0.5/
WS - it match white spaces
SHARP_BRACKET_RIGHT - it matches '>'
Is there better way to this ? I would like to use look ahead functionality and consume only the first charcter, like in old version. Is there a way do this ?
My solution:
REPEAT_OP1
: '<' REPEAT '>' { setText(getText().substring(1, getText().length()-1)); }
;
REPEAT_OP2
: '<' REPEAT { setText(getText().substring(1, getText().length())); }
;
repeatOperator
: REPEAT_OP1
| REPEAT_OP2 WEIGHT_OR_PROBABILITY? SHARP_BRACKET_RIGHT
| REPEAT_OP2 WEIGHT_OR_PROBABILITY? {notifyErrorListeners("Missing closing '>'!");}
;

ANTLR4 lexer rule with #init block

I have this lexer rule defined in my ANTLR v3 grammar file - it maths text in double quotes.
I need to convert it to ANTLR v4. ANTLR compiler throws an error 'syntax error: mismatched input '#' expecting COLON while matching a lexer rule' (in #init line). Can lexer rule contain a #init block ? How this should be rewritten ?
DOUBLE_QUOTED_CHARACTERS
#init
{
int doubleQuoteMark = input.mark();
int semiColonPos = -1;
}
: ('"' WS* '"') => '"' WS* '"' { $channel = HIDDEN; }
{
RecognitionException re = new RecognitionException("Illegal empty quotes\"\"!", input);
reportError(re);
}
| '"' (options {greedy=false;}: ~('"'))+
('"'|';' { semiColonPos = input.index(); } ('\u0020'|'\t')* ('\n'|'\r'))
{
if (semiColonPos >= 0)
{
input.rewind(doubleQuoteMark);
RecognitionException re = new RecognitionException("Missing closing double quote!", input);
reportError(re);
input.consume();
}
else
{
setText(getText().substring(1, getText().length()-1));
}
}
;
Sample data:
" " -> throws error "Illegal empty quotes!";
"asd -> throws error "Missing closing double quote!"
"text" -> returns text (valid input, content of "...")
I think this is the right way to do this.
DOUBLE_QUOTED_CHARACTERS
:
{
int doubleQuoteMark = input.mark();
int semiColonPos = -1;
}
(
('"' WS* '"') => '"' WS* '"' { $channel = HIDDEN; }
{
RecognitionException re = new RecognitionException("Illegal empty quotes\"\"!", input);
reportError(re);
}
| '"' (options {greedy=false;}: ~('"'))+
('"'|';' { semiColonPos = input.index(); } ('\u0020'|'\t')* ('\n'|'\r'))
{
if (semiColonPos >= 0)
{
input.rewind(doubleQuoteMark);
RecognitionException re = new RecognitionException("Missing closing double quote!", input);
reportError(re);
input.consume();
}
else
{
setText(getText().substring(1, getText().length()-1));
}
}
)
;
There are some other errors as well in above like WS .. => ... but I am not correcting them as part of this answer. Just to keep things simple. I took hint from here
Just to hedge against that link moving or becoming invalid after sometime, quoting the text as is:
Lexer actions can appear anywhere as of 4.2, not just at the end of the outermost alternative. The lexer executes the actions at the appropriate input position, according to the placement of the action within the rule. To execute a single action for a role that has multiple alternatives, you can enclose the alts in parentheses and put the action afterwards:
END : ('endif'|'end') {System.out.println("found an end");} ;
The action conforms to the syntax of the target language. ANTLR copies the action’s contents into the generated code verbatim; there is no translation of expressions like $x.y as there is in parser actions.
Only actions within the outermost token rule are executed. In other words, if STRING calls ESC_CHAR and ESC_CHAR has an action, that action is not executed when the lexer starts matching in STRING.
I in countered this problem when my .g4 grammar imported a lexer file. Importing grammar files seems to trigger lots of undocumented shortcomings in ANTLR4. So ultimately I had to stop using import.
In my case, once I merged the LEXER grammar into the parser grammar (one single .g4 file) my #input and #after parsing errors vanished. I should submit a test case + bug, at least to get this documented. I will update here once I do that.
I vaguely recall 2-3 issues with respect to importing lexer grammar into my parser that triggered undocumented behavior. Much is covered here on stackoverflow.

Resources