how to swallow anything till the semicolon but ignore semicolons in quotes - antlr4

How to swallow anything till string encounter the semicolon ; but it should ignore the semicolons in the quotes
eg. This is example to swallow ';' character.; It should ignore this part.
It should give me This is example to swallow ';' character.;

You could use negative lookbehind to check if there's a quote:
^(?<!['"]).*;
Here's a demo on regex101: https://regex101.com/r/u0TLYn/2/

Related

rlang::parse_expr with string with escape characters

how I could use eval(rlang::parse_expr(string))’ or alternative with such expresssion string <-"print('A\s*B')"`? I am getting unrecognized escape character. The expression is evaluated inside function, print is an example, I am using grepl in similar manier.
Simply using second escape slash helped "print('A\s*B')"

String lexical rule in ANTLR with greedy wildcald and escape character

From the book "The Definitive ANTLR 4 Reference":
Our STRING rule isn’t quite good enough yet because it doesn’t allow
double quotes inside strings. To support that, most languages define
escape sequences starting with a backslash. To get a double quote
inside a double-quoted string, we use \". To support the common escape
characters, we need something like the following:
STRING ​: ​ ​'"' ​( ESC |.)*?​ ​'"' ​ ​;
fragment
ESC ​: ​ ​'\\"' | ​ ​'\\\\' ​ ​; ​ ​// 2-char sequences \" and \\
​ ANTLR itself needs to escape the escape character, so that’s why we need \\ to
specify the backslash character. The loop in STRING now matches either
an escape character sequence, by calling fragment rule ESC, or any
single character via the dot wildcard. The *? subrule operator
terminates the (ESC |.)*?
That sounds fine, but when I read that I noticed a certain ambiguity in the choice between ESC and .. As far as STRING is concerned, it is possible to match an input "Hi\"" by matching the escape character \ to the ., and to consider the following escaped double-quote as closing the string. This would even be less greedy and so would conform better to the use of ?.
The problem, of course, is that if we do that, then we have an extra double-quote at the end that does not get matched to anything.
So I wrote the following grammar:
grammar String;
anything: STRING '"'? '\r\n';
STRING: '"' (ESC|.)*? '"';
fragment
ESC: '\\"' | '\\\\';
which accepts an optional lonely double-quote character right after the string. This grammar still parses "Orange\"" as a full string:
So my question is: why is this the accepted parse, as opposed to the one taking "Orange\" as the STRING, followed by an isolated double-quote "? Note that the latter would be less greedy, which would seem to conform better to the use of ?, so one could think it would be preferable.
After some more experimentation, I realize the explanation is that the choice operator | is order-dependent (but only under non-greedy operator ?): ESC is tried before .. If I invert the two and write (.|ESC)*?, I do get
This is not really surprising, but an interesting reminder that ANTLR is not as declarative as we may sometimes expect (in the sense that logic-or is order-independent but | is not). It is also a good reminder that the non-greedy operator ? does not extend its minimization capabilities to all choices, but just to the first one that matches the input (#sepp2k adds that order dependency only applies to the non-greedy case).

ANTLR: How to write a rule for enforcing line continuation character while writing a string?

I want to write a rule for parsing a string inside double quotes. I want to allow any character, with the only condition being that there MUST be a line continuation character \, when splitting the string on multiple lines.
Example:
variable = "first line \n second line \
still second line \n \
third line"
If the line continuation character is not found before a newline character is found, I want the parser to barf.
My current rule is this:
STRING : '"' (ESC|.)*? '"';
fragment ESC : '\\' [btnr"\\] ;
So I am allowing the string to contain any character, including bunch of escape sequences. But I am not really enforcing that line continuation character \ is a necessity for splitting text.
How can I make the grammar enforce that rule?
Even though there is already an accepted answer let me put in my 2cents. I strongly recommend not to handle this type of error in a lexer rule. The reason is that you will not be able to give the user a good error message. First, lexer errors are usually not reported separately in ANTLR4, they appear as follow up parser errors. Second, the produced error (likely something like: "no viable alt at \n") is all but helpful.
The better solution is to accept both variants (linebreak with or w/o escape) and do a semantic check afterwards. Then you know exactly what is wrong and can the user tell what you really expected.
Solution
fragment ESCAPE
: '\\' .
;
STRING
: '"' (ESCAPE | ~[\n"])* '"'
;
Explanation
Fragment ESCAPE will match escaped characters (especially backslash and a new line character acting as a continuation sign).
Token STRING will match inside double quotation marks:
Escaped characters (fragment ESCAPE)
Everything except new line and double quotation marks.

How can I remove the last character of a string variable in ksh?

I've a string variable and I want to remove the last character of it.
For example: pass from "testing1" to "testing".
How can I do this in KSH?
var="testing1"
print ${var%?}
output
testing
The ${var%?} is a parameter editing feature. The '%' says remove from the right side and expects a pattern following. The pattern could be in your example case just the char '1' (without the quotes). I am using the wild-card char '?' so that any single character will be removed. You can use the '*' char to indicate all chars, but typically you want to 'bundle' that with some preceding chars, with your example echo ${var%i*} would give you just test as a result. There are also '%%' variants on this AND '#' and '##' that start from the left side of the string.
I hope this helps.

PowerShell using variables in strings passed as parameters

I have several php files in directory, I want to replace a few words in all files with different text. It's a part of my code:
$replacements_table=
("hr_table", "tbl_table"),
('$users', "tbl_users")
foreach ($file in $phpFiles){
foreach($replacement in $replacements_table){
(Get-Content $file) | Foreach-Object{$_ -replace $replacement} | Set-Content $file
}
}
It works fine for replacing "hr_table", but doesn't work at all for '$users'. Any suggestion would be nice
The string is actually a regular expression and so needs to be escaped using '\'. See this thread
$replacements_table= ("hr_table", "tbl_table"), ('\$users', "tbl_users")
will work.
The dollar sign is a special regular expression character, matches the end of a string, you need to escape it. Escaping a character in regex is done by a '\' in front of the character you want to escape. A safer method to escape characters (especially when you don't know if the string might contain special characters) is to use the Escape method.
$replacements_table= (hr_table', 'tbl_table'), ([regex]::Escape('$users'), 'tbl_users')
Try escaping "$' with a backslash: '\$users'
The $ symbol tells the regular expression to match at the end of the string. The backslash is the regular expression escape character.
try using double quotes around your variable name instead of single quotes
EDIT
Try something along these lines ....
$x = $x.Replace($originalText, '$user')

Resources