Why is ANTLR not respecting the grammer order? - antlr4

In my grammer I have
zLine
: Z fahrtnummer verwaltung takanzahl? taktZeitInMinuten? COMMENT ANYTHING NL
;
fahrtnummer
: INT
;
verwaltung
: ASCII
;
For input line like this
*Z 00006 003849 % 00006 003849 00
003849 is recognised as INT instead of ASCII. If I change the order of INT and ASCII in the lexer everything is recognised as ASCII which is wrong eiter. How can I make ANTLr respect the order given from the parser?
Full grammar
grammar FPLAN3;
fplan
: zLine*;
zLine
: Z fahrtnummer verwaltung takanzahl? taktZeitInMinuten? COMMENT ANYTHING NL
;
fahrtnummer
: INT
;
verwaltung
: ASCII
;
takanzahl
: INT
;
taktZeitInMinuten
: INT
;
Z: '*Z';
INT: [0-9]+;
ASCII: [0-9a-zA-Z]+;
//ASCII: [\P{Cc}\P{Cn}\P{Cs}]+;
COMMENT: '%';
ANYTHING: .*?;
NL: '\r'? '\n' | '\r';
IGNORE_SPACE
: [ ] -> skip
;
Input File
*Z 00006 003849 % 00006 003849 00
*Z 00007 003849 % 00007 003849 00
*Z 00008 003849 % 00008 003849 00
Test rig output
[#0,0:1='*Z',<'*Z'>,1:0]
[#1,3:7='00006',<INT>,1:3]
[#2,9:14='003849',<INT>,1:9]
[#3,58:58='%',<'%'>,1:58]
[#4,60:64='00006',<INT>,1:60]
[#5,66:71='003849',<INT>,1:66]
[#6,76:77='00',<INT>,1:76]
[#7,78:79='\r\n',<NL>,1:78]
[#8,80:81='*Z',<'*Z'>,2:0]
[#9,83:87='00007',<INT>,2:3]
[#10,89:94='003849',<INT>,2:9]
[#11,138:138='%',<'%'>,2:58]
[#12,140:144='00007',<INT>,2:60]
[#13,146:151='003849',<INT>,2:66]
[#14,156:157='00',<INT>,2:76]
[#15,158:159='\r\n',<NL>,2:78]
[#16,160:161='*Z',<'*Z'>,3:0]
[#17,163:167='00008',<INT>,3:3]
[#18,169:174='003849',<INT>,3:9]
[#19,218:218='%',<'%'>,3:58]
[#20,220:224='00008',<INT>,3:60]
[#21,226:231='003849',<INT>,3:66]
[#22,236:237='00',<INT>,3:76]
[#23,238:237='<EOF>',<EOF>,3:78]
line 1:9 missing ASCII at '003849'
line 1:60 mismatched input '00006' expecting ANYTHING
line 2:9 missing ASCII at '003849'
line 2:60 mismatched input '00007' expecting ANYTHING
line 3:9 missing ASCII at '003849'
line 3:60 mismatched input '00008' expecting ANYTHING
(fplan (zLine *Z (fahrtnummer 00006) (verwaltung <missing ASCII>) (takanzahl 003849) % 00006 003849 00 \r\n) (zLine *Z (fahrtnummer 00007) (verwaltung <missing ASCII>) (takanzahl 003849) % 00007 003849 00 \r\n) (zLine *Z (fahrtnummer 00008) (verwaltung <missing ASCII>) (takanzahl 003849) % 00008 003849 00))

parser rules have no impact on Lexer rule evaluation.
During the flexing phase all rules are evaluated against the input stream of characters. If multiple rules match the following come into play.
1 - if a rule matches a longer sequence of input characters the Lexer will produce a token for that rule.
2 - If multiple rules match sequences of the same length, then the first lexer rule will be used to generate the token.
since INT and ASCII can both match a sequence of digits, then the lexer will produce a token for whichever appears first in the grammar.
Note, while the parser is recursive descent, it runs against the token stream, so all tokens are determined before the parser has any involvement. It won't matter which parser rule path you follow, the token type has already been determined. In short, the Lexer can't "respect the order given from the parser". The parser acts on the output of the Lexer.

Related

Is there a way to convert a string (ASCII) to Hex in CMake

Is there a way to convert a string (ASCII "a-z, A-Z, 0-9") to Hex in CMake?
For example (ASCII to hex):
"HELLO" --> 0x48 0x45 0x4C 0x4C 0x4F
Should be the opposite operation of the following command (see here):
string(ASCII <number> \[<number> ...\] <output variable>)
I tried some CMake math operations, but it didn't seem to work on strings.
I can implement a function with a big "if" that compares the char input of "a-z, A-Z, 0-9" and returns its hex according to the AsciiTable, but I am looking for a smarter/shorter solution.
EDIT: As of CMake 3.18, the inverse operation of string(ASCII ...) now exists. Use string(HEX ...):
set(TEST_STRING "HELLO")
# Convert the string to hex.
string(HEX ${TEST_STRING} HEX_STRING)
message(${HEX_STRING})
This prints the following:
48454c4c4f
so you have to manually add the 0x prefixes (which is described below in the response applicable for earlier CMake versions).
For CMake 3.17 and earlier, I am not aware of any support for ASCII to hex conversions that is native to CMake (i.e. the inverse operation for string(ASCII ... ) doesn't exist). One work-around is to leverage CMake's file() commands to write the ASCII to a file, then read it as hex. With some additional formatting using string(REGEX MATCHALL ...) and list(JOIN ...), we can get a string of hex values representing the ASCII inputs:
set(TEST_STRING "HELLO")
# Write the ASCII file, then read it as hex.
file(WRITE ${CMAKE_BINARY_DIR}/asciiToHexFile.txt "${TEST_STRING}")
file(READ ${CMAKE_BINARY_DIR}/asciiToHexFile.txt HEX_CONTENTS HEX)
message("HEX_CONTENTS: ${HEX_CONTENTS}")
# Separate into individual bytes.
string(REGEX MATCHALL "([A-Za-z0-9][A-Za-z0-9])" SEPARATED_HEX "${HEX_CONTENTS}")
message("SEPARATED_HEX: ${SEPARATED_HEX}")
# Append the "0x" to each byte.
list(JOIN SEPARATED_HEX " 0x" FORMATTED_HEX)
# JOIN misses the first byte's "0x", so add it here.
string(PREPEND FORMATTED_HEX "0x")
message("FORMATTED_HEX: ${FORMATTED_HEX}")
With the input HELLO, the output prints the following:
HEX_CONTENTS: 48454c4c4f
SEPARATED_HEX: 48;45;4c;4c;4f
FORMATTED_HEX: 0x48 0x45 0x4c 0x4c 0x4f
I ended up creating this function, based on #squareskittles response:
function(STRING_HEX_KEY_TO_C_BYTE_ARRAY STRING_HEX VARIABLE_NAME)
# Separate into individual bytes.
string(REGEX MATCHALL "([A-Fa-f0-9][A-Fa-f0-9])" SEPARATED_HEX ${STRING_HEX})
# Append the "0x" to each byte.
list(JOIN SEPARATED_HEX ", 0x" FORMATTED_HEX)
# Prepend "{ 0x"
string(PREPEND FORMATTED_HEX "{ 0x")
# Append " }"
string(APPEND FORMATTED_HEX " }")
set(${VARIABLE_NAME} ${FORMATTED_HEX} PARENT_SCOPE)
message(${VARIABLE_NAME}=${FORMATTED_HEX})
endfunction()
Use this function to convert string hex to C byte array, like this:
STRING_HEX_KEY_TO_C_BYTE_ARRAY("FFFF020200000030" "DEVICE_EUI")
Output message:
DEVICE_EUI={ 0xFF, 0xFF, 0x02, 0x02, 0x00, 0x00, 0x00, 0x30 }

Why is my data converted to ASCII using Serial.print function in arduino?

I am coding a small software to send data with an RN2483 transciever, and I have realised that my data is converted to ASCII when I sent it through serial. It is to say, I have the following part in the sender, the data has to be HEX
String aux = String(message.charAt(i),HEX);
dataToBeTx = "radio tx " + aux+ "\r\n";
Serial1.print(dataToBeTx)
On the receiver I am reading Serial1 till I get the message, which I receive properly, however it is an ASCII representation of the HEX data, and I would like to have it HEX, I mean, I send HI that is converted to HEX (H I=>0x48 0x49) on the receiver if I translate that value to HEX again I got different things than my H or I , so I guess it is being encoded in ASCII, how can I ride off from that?
Thanks in advance,
regards
It is very unclear what you are trying to achieve. The first line in your code converts a single character into a string in hexadecimal. For example:
void setup ()
{
Serial.begin (115200);
Serial.println ();
String aux = String('A', HEX);
Serial.print ("aux = ");
Serial.println (aux);
} // end of setup
void loop ()
{
} // end of loop
Output:
aux = 41
So the 'A' in my code (internally represented as 0x41) has now become two ASCII characters: 4 and 1. That is, a string which is two bytes long.
So, in a sense, you can say it is already in hex.
if I translate that value to HEX again I got different things than my H or I
Well, yes, if you translate it "again" then you would get 0x34 and 0x31.
Do you want to send A in this case, 41 or something else?

Arduino convert ascii characters to string

I'm using this sensor with an arduino board.
On page 2, it describes the serial output from pin 5.
http://www.maxbotix.com/documents/HRXL-MaxSonar-WR_Datasheet.pdf
The output is an ASCII capital "R", followed by four ASCII character
digits representing the range in millimeters,followed by a carriage
return (ASCII 13). The serial data format is 9600 baud, 8 data bits, no parity,
with one stop bit (9600-8-N-1).
This is my arduino code (which isn't correct). It only outputs the '82' which is the capital R.
void setup()
{
Serial.begin(9600);
}
void loop()
{
int data = Serial.read();
Serial.println(data);
delay (1000);
}
How do I get a distance reading to a string?
Many thanks
Do you tried the readBytesUntil method ?
You should use it like that :
byte DataToRead [6];
Serial.readBytesUntil(char(13), DataToRead, 6);
Your data is contained into DataToRead (your 'R' in DataToRead[0] etc.)
As I read it, the question was:
How do you convert a byte (ascii) representation of a character into a readable alpha-numeric character like "a" versus 97?
The actual quesion: Arduino convert ascii characters to string.
Why do ppl post responses that don't answer the question?
Not exact answer but casting with (char) will get you on the way there.
char inByte = 0;
inByte = (char)Serial.read(); // ascii 97 received
Serial.println((char)inByte); // => prints an 'a' without the quotes

How to define a token which is all those characters in set A, except those in sub-set B?

In RFC2616 (HTTP/1.1) the definition of a 'token' in section '2.2 Basic Rules' is given as:
token = 1*<any CHAR except CTLs or separators>
From that section, I've got the following fragments, and now I want to define 'TOKEN':
lexer grammar AcceptEncoding;
TOKEN: /* (CHAR excluding (CTRL | SEPARATORS)) */
fragment CHAR: [\u0000-\u007f];
fragment CTRL: [\u0000-\u001f] | \u007f;
fragment SEPARATORS: [()<>#,;:\\;"/\[\]?={|}] | SP | HT;
fragment SP: ' ';
fragment HT: '\t';
How do I approximate my hypothetical 'excluding' operator for the definition of TOKEN?
There is no set/range math in ANTLR. You can only combine several sets/ranges via the OR operator. A typical rule for a number of disjoint ranges looks like:
fragment LETTER_WHEN_UNQUOTED:
'0'..'9'
| 'A'..'Z'
| '$'
| '_'
| '\u0080'..'\uffff'
;
One approach is to 'do the math' on set of characters, so that we can define lexical rules which only ever combine characters:
lexer grammar RFC2616;
TOKEN: (DIGIT | UPALPHA | LOALPHA | NON_SEPARATORS)+
/*
* split up ASCII 0-127 into 'atoms' of
* relevance per '2.2 Basic Rules'. Regions
* not requiring to be referenced are not
* given a name.
*/
// [\u0000-\u0008]; /* (control chars) */
fragment HT: '\u0009'; /* (tab) */
fragment LF: '\u0010'; /* (LF) */
// [\u0011-\u0012]; /* (control chars) */
fragment CR: '\u0013'; /* (CR)
// [\u0014-\u001f]; /* (control chars) */
fragment SP: '\u0020'; /* (space) */
// [\u0021-\u02f]; /* !"#$%'()*+,-./ */
fragment DIGIT: [u0030-\u0039]; /* 01234556789 */
// [\u003a-\u0040]; /* :;<=># */
fragment UPALPHA: [\u0041-\u005a]; /* ABCDEFGHIGJLMNOPQRSTUVWXYZ */
// [\u005b-\u0060]; /* [\]^_` */
fragment LOALPHA: [\u0061-\u0071]; /* abcdefghijklmnopqrstuvwxyz */
// [\u007b-\u007e]; /* {|}~ */
// '\u007f'; /* (del) */
/*
* Considering 'all relevant gaps' and the characters we
* cannot use per RFC 2616 Section 2.2 Basic Rules definition
* of 'separators', what does that leave us with?
* (manually determined)
*/
fragment SEPARATORS: [()<>#,;:\\;"/\[\]?={|}];
fragment NON_SEPARATORS: [!#$%&'*+-.^_`~*];
I don't find this approach especially satisfying. Another rule in RFC 2616 wants to be defined like:
TEXT: <any OCTET except CTLs, but including LWS>
qdtext = <any TEXT except <">>
This would force me to further refactor up my expedient 'SEPARATORS' token, above, like:
fragment QUOT: '"';
fragment SEPARATORS_OTHER_THAN_QUOT: [()<>#,;:\\;/\[\]?={|}];
fragment SEPARATORS: SEPARATORS_OTHER_THAN_QUOT | QUOT;
fragment LWS: SP | HT;
TEXT: DIGIT | UPALPHA | LOALPHA | LWS | SEPARATORS | NON_SEPARATORS;
QDTEXT: DIGIT | UPALPHA | LOALPHA | LWS | SEPARATORS_OTHER_THAN_QUOT | NON_SEPARATORS;
Perhaps this is part of the work of writing a lexer, and can't be avoided, but it feels more like solving the problem the wrong way!
(NB: I won't be marking this answer as 'correct'.)
Spurred on by the answer from #mike-lischke (because LETTER_WHEN_UNQUOTED really felt wrong still), I hunted for the surely-common treatment of quoted string literals in other grammars. In Terrence Parr's own Java 1.6 ANTLR3 grammar (er, not properly served as text/plain) (via ANTLR3 Grammar List), he reaches for a 'match any character other than' tilde-operator ~ in a lexer rule:
STRINGLITERAL
: '"'
( EscapeSequence
| ~( '\\' | '"' | '\r' | '\n' )
)*
'"'
;
// Copyright (c) 2007-2008 Terence Parr and possibly Yang Jiang.
NOTE: the above code is licenced under a BSD licence, but I am not re-distributing this fragment under the BSD license (since this post itself is under CC-BY-SA). Instead, I am using it within the terms of 'fair use' as I understand them.
So the ~ gives me an option to express: 'all those characters in Unicode, except those in Set B'. "Annoying I don't get to choose the set which is excluded from", I thought. But then I realised
TOOHIGH: [\u007f-\uffff];
TOKEN: (~( TOOHIGH | SP | HT | CTRL | SEPARATORS ))+
... should be fine. Although, in practice, ANTLR4 doesn't 'like' lexer sub-rules appearing in 'sets', and only handles sets of literals, so that ultimately becomes:
TOKEN:
/* this is given in '2.2 Basic Rules' as:
*
* token = 1*<any CHAR except CTLs or separators>
*
* which I am reducing down to:
* any character in ASCII 0-127 but _excluding_
* CTRL (0-31,127)
* SEPARATORS
* space (32)
* and tab (9) (which is a CTRL character anyhow)
*/
( ~( [\u0000-\u001f] | '\u007f' /*CTRL,HT*/ | [()<>#,;:\\;"/\[\]?={|}] /*SEPARATORS*/ | '\u0020' /*SP*/ | [\u0080-\uffff] /*NON_ASCII*/))*
;
The trick was expressing including the set I do want (Unicode 0-127) in terms of excluding the set I don't want (Unicode 128+).
This is much more succinct than my other answer. If it actually works, I'll mark it as correct.

Conversion from String to Char in visual c++

I am taking input from user in visual c++ through the following code
Console::WriteLine("Select the Cache Size.\n a. 1 Kb \n b. 2 Kb \n c. 4 Kb \n d. 8 Kb\n");
String^ CACHE_SIZEoption = Console::ReadLine();
Char wh= Char(CACHE_SIZEoption);
switch(wh)
{case 'a':
break;
case 'b':
break;
case 'c':
break;
case 'd':
break;
}
In this the conversion from String to Char is giving errors..
error C2440: '<function-style-cast>' : cannot convert from 'System::String ^' to 'wchar_t'
It's unrealistic to expect to be able to convert a string into a character. A string can contain 0, 1 or more characters. Which character do you want?
If you want the first character, use CACHE_SIZEoption[0], after having checked that the string is not empty.
In your case you probably want to add a check that the string's length is exactly 1 because otherwise that means the user's input is invalid. Check CACHE_SIZEoption->Length.
I would try
Char wh= CACHE_SIZEoption[0];
or
Char wh= CACHE_SIZEoption->ToChar();
Found here: http://msdn.microsoft.com/en-us/library/bb335877%28v=vs.110%29.aspx

Resources