Simple grammar looping infinitely - antlr4

I would expect this simple grammar to match strings such as 'abc':
grammar Hello;
entry
: LETTER+
;
LETTER : [a-z] ;
But it seems to enter an infinite loop:
C:\Code\antlr\hello>antlr4 Hello.g4 -encoding utf8
C:\Code\antlr\hello>javac Hello*.java
C:\Code\antlr\hello>grun Hello entry -tree
asdf^Z
Terminate batch job (Y/N)? y
Why?

Related

Lexer and Parser rules for a simple command processor

I am attempting to build a simple command processor for a legacy language.
I am attempting to work with C# with antlr4 version "ANTLR", "4.6.6")
I am unable to make progress against one scenario, of several.
The following examples shows various sample invocations of the command PKS.
PKS
PKS?
PKStext_that_is_a_filename
The scenario that I can not solve is the PKS command followed by filename.
Command:
PKS
(block (line (expr (command PKS)) (eol \r\n)) <EOF>)
Command:
PKS?
(block (line (expr (command PKS) (query ?)) (eol \r\n)) <EOF>)
Command:
PKSFILENAME
line 1:0 mismatched input 'PKSFILENAME' expecting COMMAND
(block PKSFILENAME \r\n)
Command:
what I believe to be the relevant snippet of grammar:
block : line+ EOF;
line : (expr eol)+;
expr : command file
| command listOfDouble
| command query
| command
;
command : COMMAND
;
query : QUERY;
file : TEXT ;
eol : EOL;
listOfDouble: DOUBLE (COMMA DOUBLE)* ;
From the lexer:
COMMAND : PKS;
PKS :'PKS' ;
QUERY : '?'
;
fragment LETTER : [A-Z];
fragment DIGIT : [0-9];
fragment UNDER : [_];
TEXT : (LETTER) (LETTER|DIGIT|UNDER)* ;
The main problem here is that your TEXT rule also matches what PKS is supposed to match. And since PKStext_that_is_a_filename can entirely be matched by that TEXT rule it is preferred over the PKS rule, even though it appears first in the grammar (if 2 rules match the same input then the first one wins).
In order to fix that problem you have 2 options:
Require whitespace(s) between the keyword (PKS) and the rest of the expression.
Change the TEXT rule to explicitly exclude "PKS" as valid input.
Option 2 is certainly possible, but will get very messy if you have have more keywords (as they all would have to be excluded). With a whitespace between the keywords and the text the lexer would automatically do that for you.
And let me give you a hint to approach such kind of problems: always check the token list produced by the lexer to see if it generated the tokens you expected. I reworked your grammar a bit, added missing tokens and ran it through my ANTLR4 debugger, which gave me:
Parser error (5, 1): extraneous input 'PKStext_that_is_a_filename' expecting {<EOF>, COMMAND, EOL}
Tokens:
[#0,0:2='PKS',<1>,1:0]
[#1,3:3='\n',<8>,1:3]
[#2,4:4='\n',<8>,2:0]
[#3,5:7='PKS',<1>,3:0]
[#4,8:8='?',<3>,3:3]
[#5,9:9='\n',<8>,3:4]
[#6,10:10='\n',<8>,4:0]
[#7,11:36='PKStext_that_is_a_filename',<7>,5:0]
[#8,37:37='\n',<8>,5:26]
[#9,38:37='<EOF>',<-1>,6:0]
For this input:
PKS
PKS?
PKStext_that_is_a_filename
Here's the grammar I used:
grammar Example;
start: block;
block: line+ EOF;
line: expr? eol;
expr: command (file | listOfDouble | query)?;
command: COMMAND;
query: QUERY;
file: TEXT;
eol: EOL;
listOfDouble: DOUBLE (COMMA DOUBLE)*;
COMMAND: PKS;
PKS: 'PKS';
QUERY: '?';
fragment LETTER: [a-zA-Z];
fragment DIGIT: [0-9];
fragment UNDER: [_];
COMMA: ',';
DOUBLE: DIGIT+ (DOT DIGIT*)?;
DOT: '.';
TEXT: LETTER (LETTER | DIGIT | UNDER)*;
EOL: [\n\r];
and the generated visual parse tree:

antlr4 can't extract literal into token

I have the following grammar and am trying to start out slowly, working up to move complex arguments.
grammar Command;
commands : command+ EOF;
command : NAME args NL;
args : arg | ;
arg : DASH LOWER | LOWER;
//arg : DASH 'a' | 'x';
NAME : [_a-zA-Z0-9]+;
NL : '\n';
WS : [ \t\r]+ -> skip ; // spaces, tabs, newlines
DASH : '-';
LOWER: [a-z];//'a' .. 'z';
I was hoping (for now) to parse files like this:
cmd1
cmd3 -a
If I run that input through grun I get an error:
$ java org.antlr.v4.gui.TestRig Command commands -tree
...
`line 3:6 mismatched input 'a' expecting LOWER`
It seems like LOWER should match 'a'. If I change the arg definition to be the commented out line it works fine and I get the '-a' as an arg. What's the difference between using LOWER and using a 'a' explicitly?
As soon as you have a "mismatched" error, add -tokens to grun to display the tokens, it helps finding the discrepancy between what you THINK the lexer will do and what it actually DOES. With your grammar :
$ alias grun='java org.antlr.v4.gui.TestRig'
$ grun Command commands -tokens -diagnostics t.text
[#0,0:3='cmd1',<NAME>,1:0]
[#1,4:4='\n',<'
'>,1:4]
[#2,5:8='cmd3',<NAME>,2:0]
[#3,10:10='-',<'-'>,2:5]
[#4,11:11='a',<NAME>,2:6]
[#5,12:12='\n',<'
'>,2:7]
[#6,13:12='<EOF>',<EOF>,3:0]
line 2:6 mismatched input 'a' expecting LOWER
you immediately see that the letter a is a NAMEand not the expected LOWER.
Also watch rules with an empty alternative :
args
: arg
|
;
may lead to problems in some circumstances. I prefer to explicitly add the ? suffix which means zero or one time. So my solution would be :
grammar Command;
commands
#init {System.out.println("Question last update 1829");}
: command+ EOF
;
command
: NAME args? NL
;
args
: arg
;
arg : DASH? LOWER ;
LOWER : [a-z] ;
NAME : [_a-zA-Z0-9]+;
DASH : '-' ;
NL : '\n' ;
WS : [ \t\r]+ -> skip ;
Execution :
$ grun Command commands -tokens -diagnostics t.text
[#0,0:3='cmd1',<NAME>,1:0]
[#1,4:4='\n',<'
'>,1:4]
[#2,5:8='cmd3',<NAME>,2:0]
[#3,10:10='-',<'-'>,2:5]
[#4,11:11='a',<LOWER>,2:6]
[#5,12:12='\n',<'
'>,2:7]
[#6,13:12='<EOF>',<EOF>,3:0]
Question last update 1829

How to parse keywords as normal words some of the time in ANTLR4

I have a language with keywords like hello that are only keywords in certain types of sentences. In other types of sentences, these words should be matched as an ID, for example. Here's a super simple grammar that tells the story:
grammar Hello;
file : ( sentence )* ;
sentence : 'hello' ID PERIOD
| INT ID PERIOD;
ID : [a-z]+ ;
INT : [0-9]+ ;
WS : [ \t\r\n]+ -> skip ;
PERIOD : '.' ;
I'd like these sentences to be valid:
hello fred.
31 cheeseburgers.
6 hello.
but that last sentence doesn't work in this grammar. The word hello is a token of type hello and not of type ID. It seems like the lexer grabs all the hellos and turns them into tokens of that type.
Here's a crazy way to do it, to explain what I want:
sentence : 'hello' ID PERIOD
| INT crazyID PERIOD;
crazyID : ID | 'hello' ;
but in my real language, there are a lot of keywords like hello to deal with, so, yeah, that way seems crazy.
Is there a reasonable, compact, target-language-independent way to handle this?
A standard way of handling keywords:
file : ( sentence )* EOF ;
sentence : key=( KEYWORD | INT ) id=( KEYWORD | ID ) PERIOD ;
KEYWORD : 'hello' | 'goodbye' ; // list others as alts
PERIOD : '.' ;
ID : [a-z]+ ;
INT : [0-9]+ ;
WS : [ \t\r\n]+ -> skip ;
The seeming ambiguity between the KEYWORD and ID rules is resolved based on the KEYWORD rule being listed before the ID rule.
In the parser SentenceContext, TerminalNode variables key and id will be generated and, on parsing, will effectively hold the matched tokens, allowing easy positional identification.

Bash script string processing

I wrote a script that reads a Plain text and a key, and then loops trough each character of plain text and shifts it with the value of the corresponding character in key text, with a=0 b=1 c=2 ... z = 25
the code works fine but with a string of size 1K characters it takes almost 3s to execute.
this is the code:
small="abcdefghijklmnopqrstuvwxyz" ## used to search and return the position of some small letter in a string
capital="ABCDEFGHIJKLMNOPQRSTUVWXYZ" ## used to search and return the position of some capital letter in a string
read Plain_text
read Key_text
## saving the length of each string
length_1=${#Plain_text}
length_2=${#Key_text}
printf " Your Plain text is: %s\n The Key is: %s\n The resulting Cipher text is: " "$Plain_text" "$Key_text"
for(( i=0,j=0;i<$length_1;++i,j=`expr $(($j + 1)) % $length_2` )) ## variable 'i' is the index for the first string, 'j' is the index of the second string
do
## return a substring statring from position 'i' and with length 1
c=${Plain_text:$i:1}
d=${Key_text:$j:1}
## function index takes two parameters, the string to seach in and a substring,
## and return the index of the first occerunce of the substring with base-insex 1
x=`expr index "$small" $c`
y=`expr index "$small" $d`
##shifting the current letter to the right with the vaule of the corresponding letter in the key mod 26
z=`expr $(($x + $y - 2)) % 26`
##print the resulting letter from capital letter string
printf "%s" "${capital:$z:1}"
done
echo ""
How is it possible to improve the performance of this code.
Thank you.
You are creating 4 new processes in each iteration of your for loop by using command substitution (3 substitutions in the body, 1 in the head). You should use arithmetic expansion instead of calling expr (search for $(( in the bash(1) manpage). Note that you don't need the $ to substitute variables inside $(( and )).
you can change character like this
a=( soheil )
echo ${a/${a:0:1}/${a:1:1}}
for change all char use loop like for
and for change char to upper
echo soheil | tr "[:lower:]" "[:upper:]"
i hope i understand your question.
be at peace
You will have a lot of repeating chars in a 1K string.
Imagine the input was 1M.
You should calculate all request/respond pairs in front, so your routine only has to lookup the replacement.
I would think of a solution with arrays is the best approach here.

Replace a substring by another one in bash

I have the following bash script
pass="kall"
cnumb="000000000000"
for (( i=0; i<${#pass}; i++))
do
code=`printf '%03d' "'${pass:i:i+1}"` #generate the code ASCII of letter as string with 3 chars
cnumb = .... #put the code ASCII of "k" in the first bloc of 3 chars , put the code ASCII of "a" in the second bloc of 3 chars, ...
done
As described in the code, I want to repace in each iteration in the loop a bloc of 3 chars in the cnumb by another bloc of 3 charachters. How to do it with bash
Is it possible to replace the sub string ${cnumb:i:i+3} by the code?
No need to put zeroes to cnumb. Also, use the %03d template for printf:
#! /bin/bash
pass="kall"
cnumb=''
for (( i=0; i<${#pass}; i++))
do
code=`printf '%03d' "'${pass:i:i+1}"` #generate the code ASCII of letter as string with 3 chars
cnumb+=$code
done
echo "$cnumb"

Resources