Why EOF does not match with the grammar? - antlr4

I have this simple grammar
grammar Monto;
import util;
documento: .*? monto .+;
monto: SYMBOL INT+;
SYMBOL: '$';
And when I run that I get this error:
line 1:0 mismatched input '<EOF>'
I added EOF to my main rule but it does not works, I tried with this
documento: .*? monto .+ EOF;
or this
documento: .*? monto .+? EOF;
The curious is when I run that from cmd(ANTLR4 tool) it works
EDITED
I'm using ANLTR 4.7.1 and this is how I create the lexers and parsers
public GrammarModule(String text) {
CharStream input = CharStreams.fromString(text);
demandantesLexer = new DemandantesLexer(input);
demandantesParser = new DemandantesParser(new CommonTokenStream(demandantesLexer));
demandadosLexer = new DemandadosLexer(input);
demandadosParser = new DemandadosParser(new CommonTokenStream(demandadosLexer));
direccionLexer = new DireccionLexer(input);
direccionParser = new DireccionParser(new CommonTokenStream(direccionLexer));
fechaLexer = new FechaLexer(input);
fechaParser = new FechaParser(new CommonTokenStream(fechaLexer));
montoLexer = new MontoLexer(input);
montoParser = new MontoParser(new CommonTokenStream(montoLexer));
numCuentaLexer = new NumCuentaLexer(input);
numCuentaParser = new NumCuentaParser(new CommonTokenStream(numCuentaLexer));
oficioLexer = new OficioLexer(input);
oficioParser = new OficioParser(new CommonTokenStream(oficioLexer));
referenciaLexer = new ReferenciaLexer(input);
referenciaParser = new ReferenciaParser(new CommonTokenStream(referenciaLexer));
}
Invoking the parsers
fechaParser.documento().fecha().getText();
montoParser.documento().monto().getText();
so on...

All of your lexers read from the same stream and presumably all of your grammars consume the entire input (at least Monto does and I expect Fecha does as well). You also don't appear to reset the input stream between the invocations of the different parsers. So after you invoke the Fecha parser, the input stream will be empty because the parser consumed all the input. So when you invoke the Monto parser, it reads from an empty stream and produces an error because the grammar does not match the empty input.
Instead you should just create a different CharStream instance for each lexer.

Related

Python - Reading YAML file with escape characters and escape them

I have a yaml file with Latex-strings in its entries, in particular with many un-escaped escape signs \. The file could look like that
content:
- "explanation" : "\text{Explanation 1} "
"formula" : "\exp({{a}}^2) = {{d}}^2 - {{b}}^2"
- "explanation" : "\text{Explanation 2}"
"formula" : "{{b}}^2 = {{d}}^2 - \exp({{a}}^2) "
The desired output form (in python) looks like that:
config = {
"content" : [
{"explanation" : "\\text{Now} ",
"formula" : "\\exp({{a}}^2) = {{d}}^2 - {{b}}^2"},
{"explanation" : "\\text{With}",
"formula" : "{{a}}^2 = {{d}}^2 + 3 ++ {{b}}^2"}
]
}
where the \ have been escaped, but not the "{" and "}" as you would have when using re.escape(string).
path = "config.yml"
with open(path, "r",encoding = 'latin1') as stream:
config1 = yaml.safe_load(stream)
with open(path, "r",encoding = 'utf-8') as stream:
config2 = yaml.safe_load(stream)
# Codecs
import codecs
with codecs.open(path, "r",encoding='unicode_escape') as stream:
config3 = yaml.safe_load(stream)
with codecs.open(path, "r",encoding='latin1') as stream:
config4 = yaml.safe_load(stream)
with codecs.open(path, 'r', encoding='utf-8') as stream:
config5 = yaml.safe_load(stream)
#
with open(path, "r", encoding = 'utf-8') as stream:
stream = stream.read()
config6 = yaml.safe_load(stream)
with open(path, "r", encoding = 'utf-8') as stream:
config7 = yaml.load(stream,Loader = Loader)
None of these solutions seems to work, e.g. the "unicode-escape" option still reads in
\x1bxp({{a}}^2) instead of \exp({{a}}^2).
What can I do? (The dictionary entries are later given to a Latex-Parser but I can't escape all the \ signs by hand.).
\n, \e and \t are all special characters when double-quoted in YAML, and if you're going treat them literally you're basically asking the YAML parser to blindly treat double-quoted text as plain text, which means that you're going to have to write your own non-conforming YAML parser.
Instead of writing a parser from the ground up, however, an easier approach would be to customize an existing YAML parser by monkey-patching the method that scans double-quoted texts and making it the same as the method that scans plain texts. In case of PyYAML, that can be done with a simple override:
yaml.scanner.Scanner.fetch_double = yaml.scanner.Scanner.fetch_plain
If you want to avoid affecting other parts of the code that may parse YAML normally, you can use unittest.mock.patch as a context manager to patch the fetch_double method temporarily just for the loader call:
import yaml
from unittest.mock import patch
with patch('yaml.scanner.Scanner.fetch_double', yaml.scanner.Scanner.fetch_plain):
with open('config.yml') as stream:
config = yaml.safe_load(stream)
With your sample input, config would become:
{
'content': [
{'"explanation"': '"\\text{Explanation 1} "',
'"formula"': '"\\exp({{a}}^2) = {{d}}^2 - {{b}}^2"'},
{'"explanation"': '"\\text{Explanation 2}"',
'"formula"': '"{{b}}^2 = {{d}}^2 - \\exp({{a}}^2) "'}
]
}
Demo: https://replit.com/#blhsing/WaryDirectWorkplaces
Note that this approach comes with the obvious consequence that you lose all the capabilities of double quotes within the same call. If the configuration file has other double-quoted texts that need proper escaping, this will not parse them correctly. But if the configuration file has only the kind of input you posted in your question, it will help parse it in the way you prefer without having to modify the code that generates such an (improper) YAML file (since presumably you're asking this question because you don't have the authorization to modify the code that generates the YAML file).

node uri regex not capturing capture groups

I know there are a billion regex questions on stackoverflow, but I can't understand why my uri matcher isn't working in node.
I have the following:
var uri = "file:tmp.db?mode=ro"
function parseuri2db(uri){
var regex = new RegExp("(?:file:)(.*)(?:\\?.*)");
let dbname = uri.match(regex)
return dbname
}
I'm trying to identify only the database name, which I expect to be:
After an uncaptured file: group
Before an optional ? + parameters to end of string.
While I'm using:
var regex1 = new RegExp("(?:file:)(.*)(?:\\?.*)");
I thought the answer was actually more like:
var regex2 = new RegExp("(?:file:)(.*)(?:\\??.*)");
With a 0 or 1 ? quantifier on the \\? literal. But the latter fails.
Anyway, my result is:
console.log(parseuri2db(conf.db_in.filename))
[ 'file:tmp.db?mode=ro',
'tmp.db',
index: 0,
input: 'file:tmp.db?mode=ro' ]
Which seems to be capturing the whole string in the first argument, rather than just the single capture group I asked for.
My questions are:
What am I doing wrong that I'm getting multiple captures?
How can I rephrase this to capture my capture groups with names?
I expected something like the following to work for (2):
function parseuri2db(uri){
// var regex = new RegExp("(?:file:)(.*)(?:\\?.*)");
// let dbname = uri.match(regex)
var regex = new RegExp("(?<protocol>file:)(?<fname>.*)(<params>\\?.*)");
let [, protocol, fname, params] = uri.match(regex)
return dbname
}
console.log(parseuri2db(conf.db_in.filename))
But:
SyntaxError: Invalid regular expression: /(?<protocol>file:)(?<fname>.*)(<params>\?.*)/: Invalid group
Update 1
Answer to my first question is that I needed to not capture the ? literal in the second capture group:
"(?:file:)([^?]*)(?:\\??.*)"
That particular node regex library does not support groups.

Creating if/else statements, and scanners based on the keyword in as string.

I'm trying to have my code search for specific keywords, and based on those specific keywords create a scanner user-input prompt to replace such keywords.
For example, in the txt file:
Hi my name is < name>, What is your name? is your name < name>?
I like to eat < food>. Do you?
The program should detect the "< name>" and prompt the user to enter in a name twice for different keywords.
So far I have this:
// Java program to illustrate reading from Text File
// Using scanner class
import java.io.File;
import java.util.Scanner;
public class TxtOutput{
public static void main(String[] args) throws Exception
{
// pass the path to the file as a parameter
File file = new File("C:\\Users\\aaron\\Documents\\TestTXT\\test.txt");
Scanner sc = new Scanner(file);
//Types of keywords
//<adjective>, <plural-noun>, <place>, <noun>, <funny-noise>, <person's-name>, <job>, <CITY>, , <Color!>
//, <Exciting-adjective>, <Interersting-Adjective>, <aDvErB>, <NUMBER>, <Plural-noun>, <body-part>, <verb>,
//<Number>, <verB>, <job-title>,
String data1 = sc.nextLine();
if (data1.contains("<job>"));
Scanner user_input = new Scanner (System.in);
String job1;
System.out.println("Enter a profession");
job1 = user_input.next();
String replacedData1 = data1.replace("<job>", job1 );
System.out.println(replacedData1);
}
}
The program can only detect one keyword and it has a pre-made if and else statement. Is there a way to make an if and else statement with a scanner based on the keywords such as "< name>" or "< food>" in a line?
I don't want to bombard this program with an unnecessary amount of pre-made if and else statements. I was wondering if there's a more efficient way to do this.
You could search the whole data file for <..> keyword templates using regex, add the keywords found to a unique Set, and then loop over the keywords to ask for replacements. I think you like this:
I suggest specifying the keyword templates explicitly using alternations | in the regex like that:
<adjective>|<plural-noun>|<place>|<noun>|<funny-noise>|<person's-name>|<job>|<CITY>|<Color!>|<Exciting-adjective>|<Interersting-Adjective>|<aDvErB>|<NUMBER>|<Plural-noun>|<body-part>|<verb>|<Number>|<verB>|<job-title>
Demo
We could use a generic regex like <[^<>]+> but I don't know what else is in your file. Give it a try.
Putting everything together, complete sample:
import java.util.*;
import java.lang.*;
import java.io.*;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class Ideone {
public static void main(String[] args) throws java.lang.Exception {
Set < String > uniqueKeywords = new HashSet < String > ();
final String regex = "<adjective>|<plural-noun>|<place>|<noun>|<funny-noise>|<person's-name>|<job>|<CITY>|<Color!>|<Exciting-adjective>|<Interersting-Adjective>|<aDvErB>|<NUMBER>|<Plural-noun>|<body-part>|<verb>|<Number>|<verB>|<job-title>";
final String filecontent = "Text template containing all sorts of .. <adjective>, <plural-noun>, <place>, <noun>, <funny-noise>, <person's-name>, <job>, <CITY>, , <Color!> <Exciting-adjective>, <Interersting-Adjective>, <aDvErB>, <NUMBER>, <Plural-noun>, <body-part>, <verb>, <Number>, <verB>, <job-title>, String data1 = sc.nextLine(); blah blah";
final Pattern pattern = Pattern.compile(regex, Pattern.DOTALL);
final Matcher matcher = pattern.matcher(filecontent);
while (matcher.find()) {
uniqueKeywords.add(matcher.group(0));
}
Scanner user_input = new Scanner(System.in);
for (String keyword: uniqueKeywords) {
System.out.println("Enter a " + keyword);
String replacement = user_input.next();
String replacedData1 = filecontent.replace(keyword, replacement);
System.out.println(replacedData1);
}
}
}

how to replace a string/word in a text file in groovy

Hello I am using groovy 2.1.5 and I have to write a code which show the contens/files of a directory with a given path then it makes a backup of the file and replace a word/string from the file.
here is the code I have used to try to replace a word in the file selected
String contents = new File( '/geretd/resume.txt' ).getText( 'UTF-8' )
contents = contents.replaceAll( 'visa', 'viva' )
also here is my complete code if anyone would like to modify it in a more efficient way, I will appreciate it since I am learning.
def dir = new File('/geretd')
dir.eachFile {
if (it.isFile()) {
println it.canonicalPath
}
}
copy = { File src,File dest->
def input = src.newDataInputStream()
def output = dest.newDataOutputStream()
output << input
input.close()
output.close()
}
//File srcFile = new File(args[0])
//File destFile = new File(args[1])
File srcFile = new File('/geretd/resume.txt')
File destFile = new File('/geretd/resumebak.txt')
copy(srcFile,destFile)
x = " "
println x
def dire = new File('/geretd')
dir.eachFile {
if (it.isFile()) {
println it.canonicalPath
}
}
String contents = new File( '/geretd/resume.txt' ).getText( 'UTF-8' )
contents = contents.replaceAll( 'visa', 'viva' )
As with nearly everything Groovy, AntBuilder is the easiest route:
ant.replace(file: "myFile", token: "NEEDLE", value: "replacement")
As an alternative to loading the whole file into memory, you could do each line in turn
new File( 'destination.txt' ).withWriter { w ->
new File( 'source.txt' ).eachLine { line ->
w << line.replaceAll( 'World', 'World!!!' ) + System.getProperty("line.separator")
}
}
Of course this (and dmahapatro's answer) rely on the words you are replacing not spanning across lines
I use this code to replace port 8080 to ${port.http} directly in certain file:
def file = new File('deploy/tomcat/conf/server.xml')
def newConfig = file.text.replace('8080', '${port.http}')
file.text = newConfig
The first string reads a line of the file into variable. The second string performs a replace. The third string writes a variable into file.
Answers that use "File" objects are good and quick, but usually cause following error that of course can be avoided but at the cost of loosen security:
Scripts not permitted to use new java.io.File java.lang.String.
Administrators can decide whether to approve or reject this signature.
This solution avoids all problems presented above:
String filenew = readFile('dir/myfile.yml').replaceAll('xxx','YYY')
writeFile file:'dir/myfile2.yml', text: filenew
Refer this answer where patterns are replaced. The same principle can be used to replace strings.
Sample
def copyAndReplaceText(source, dest, Closure replaceText){
dest.write(replaceText(source.text))
}
def source = new File('source.txt') //Hello World
def dest = new File('dest.txt') //blank
copyAndReplaceText(source, dest) {
it.replaceAll('World', 'World!!!!!')
}
assert 'Hello World' == source.text
assert 'Hello World!!!!!' == dest.text
other simple solution would be following closure:
def replace = { File source, String toSearch, String replacement ->
source.write(source.text.replaceAll(toSearch, replacement))
}

why does 'a'..'z' in ANTLR match wildcards like $ or £

When I run the following grammer:
test : WORD+;
WORD : ('a'..'z')+;
WS : ' '+ {$channel = HIDDEN;};
and I give the input "?test" why does antlr accept this as valid input? I thought the ('a'..'z') would only match characters within the lowercase alphabet?
ANTLR does produce an error when parsing the input string ?test with the grammar you posted. As is usually the case, the error lies with the tool being used around ANTLR (I see it happen a lot with ANTLRWorks as well, unfortunately!).
To test it yourself (properly), create a file Test.g:
grammar Test;
test : WORD+;
WORD : ('a'..'z')+;
WS : ' '+ {$channel = HIDDEN;};
and a file Main.java:
import org.antlr.runtime.*;
public class Main {
public static void main(String[] args) throws Exception {
ANTLRStringStream in = new ANTLRStringStream("?test");
TestLexer lexer = new TestLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
TestParser parser = new TestParser(tokens);
parser.test();
}
}
and download a copy of the ANTLR 3.2 JAR in the same directory.
Now generate a lexer & parser:
java -cp antlr-3.2.jar org.antlr.Tool Test.g
compile all Java source files:
javac -cp antlr-3.2.jar *.java
and run the Main class:
java -cp .:antlr-3.2.jar Main
(replace the : with ; if you're on Windows!)
which will produce the following error message:
line 1:0 no viable alternative at character '?'

Resources