Error while parsing a date with Antlr4 - antlr4

I'm trying to parse dates using the following grammar:
grammar Dates;
formattedDate : (DATE '/' MONTH '/' year);
year : SHORT_YEAR | FULL_YEAR;
SHORT_YEAR : DIGIT DIGIT;
FULL_YEAR : ('19' | '20' | '21') DIGIT DIGIT;
DATE : (('0'..'2')? DIGIT) | '30' | '31';
MONTH : ('0'? DIGIT) | '11' | '12';
fragment DIGIT : ('0' .. '9');
But it fails to parse the values that I would expect to work. For example, an input of 11/04/2017 produces the error:
line 1:0 mismatched input '11' expecting DATE
My first guess was that there are some values (1-12) that the lexer can't decide if it's a DATE or a MONTH, which is causing the problem. But when I tried to fix it by replacing them with parser rules instead, I had the same problem:
formattedDate : (dateNum '/' monthNum '/' year);
year : shortYear | fullYear;
shortYear : DIGIT DIGIT;
fullYear : ('19' | '20' | '21') DIGIT DIGIT;
dateNum : (('0'..'2')? DIGIT) | '30' | '31';
monthNum : ('0'? DIGIT) | '11' | '12';
fragment DIGIT : ('0' .. '9');
And it still seems to struggle on the first value, even it something like 31, outside of the range of ambiguity.
What am I doing wrong here?

As you say, "the tokens overlap" (note 31 is ambiguous, it could be a short year). In cases like this, the longest possible matching lexer rule will be chosen. In case there are two or more matching with the same length, it'll choose the first (in the order they appear). (I think I've read this some time ago in www.antlr.org)
So just changing the order of the rules "solves" the problem – or pushes it forward (note DATE is before SHORT_YEAR and MONTH):
grammar Dates;
formattedDate : (DATE '/' MONTH '/' year);
year : SHORT_YEAR | FULL_YEAR;
DATE : (('0'..'2')? DIGIT) | '30' | '31';
SHORT_YEAR : DIGIT DIGIT;
FULL_YEAR : ('19' | '20' | '21') DIGIT DIGIT;
MONTH : ('0'? DIGIT) | '11' | '12';
fragment DIGIT : ('0' .. '9');
yields line 1:3 mismatched input '04' expecting MONTH.
A possible solution is to use lexer grammar modes:
DatesLexer.g4:
lexer grammar DatesLexer;
// Mode expecting DATE (default mode)
DATE : (('0'..'2')? DIGIT) | '30' | '31';
DATE_BAR : '/'
-> pushMode(readingMonth);
// Mode expecting MONTH
mode readingMonth;
MONTH : ('0'? DIGIT) | '11' | '12';
MONTH_BAR : '/'
-> popMode, pushMode(readingYear);
// Mode expecting *_YEAR
mode readingYear;
SHORT_YEAR : DIGIT DIGIT
-> popMode;
FULL_YEAR : ('19' | '20' | '21') DIGIT DIGIT
-> popMode;
fragment DIGIT : ('0' .. '9');
DatesParser.g4:
parser grammar DatesParser;
options { tokenVocab=DatesLexer; }
formattedDate : (DATE DATE_BAR MONTH MONTH_BAR year);
year : SHORT_YEAR | FULL_YEAR;
Result:
Only for reference:
> antlr4 DatesLexer.g4 [-o outDir]
> antlr4 DatesParser.g4 [-o outDir]
> [cd outDir]
> javac *.java
> grun Dates formattedDate -tokens <file> [-gui]
[#0,0:1='11',<1>,1:0]
[#1,2:2='/',<2>,1:2]
[#2,3:4='04',<3>,1:3]
[#3,5:5='/',<4>,1:5]
[#4,6:9='2017',<6>,1:6]
[#5,10:9='<EOF>',<-1>,1:10]

Related

Difference between 2 consecutive values in Kusto

I have the following script:
let StartTime = datetime(2022-02-18 10:10:00 AM);
let EndTime = datetime(2022-02-18 10:15:00 AM);
MachineEvents
| where Timestamp between (StartTime .. EndTime)
| where Id == "00112233" and Name == "Higher"
| top 2 by Timestamp
| project Timestamp, Value
I got the following result:
What I am trying to achieve after that is to check if the last Value received (in this case for example it is 15451.433) is less than 30,000. If the condition is true, then I should check again the difference between the last two consecutive values (in this case : 15451.433 - 15457.083). If the difference is < 0 then I should return the Value as true, else it should return as false (by other words the Value should give a boolean value instead of double as shown in the figure)
datatable(Timestamp:datetime, Value:double)
[
datetime(2022-02-18 10:15:00 AM), 15457.083,
datetime(2022-02-18 10:14:00 AM), 15451.433,
datetime(2022-02-18 10:13:00 AM), 15433.333,
datetime(2022-02-18 10:12:00 AM), 15411.111
]
| top 2 by Timestamp
| project Timestamp, Value
| extend nextValue=next(Value)
| extend finalResult = iff(Value < 30000, nextValue - Value < 0, false)
| top 1 by Timestamp
| project finalResult
Output:
finalResult
1
You can use the prev() function (or next()) to process the values in the other rows.
...
| extend previous = prev(value)
| extend diff = value - previous
| extend isPositive = diff > 0
You might need to use serialize if you don't have something like top that already does that for you.

Add "invisible" decimal places to end of number?

I am printing a "Table" to the console. I will be using this same table structure for several different variables. However as you can see from Output below, the lines don't all align.
One way to resolve it would be to increase the number of decimal places (e.g. 6.730000 for Standard Deviation) which would push the line into place.
However, I do not want this many decimal places.
Is it possible to add extra 0s to the end of a number, and make these invisible?
I am planning on using this table structure for several variables, and the length of Mean, Stddev, and Median will likely never be more than 6 characters.
EDIT - I would really like to ensure that each value which appears in the table will be 6 characters long, and if it is not 6 characters long, add additional "invisible" zeros.
Input
# Create and structure Table to store descriptive statistics for each variable.
subtitle = "| Mean | Stddev | Median |"
structure = '| {:0.2f} | {:0.2f} | {:0.2f} |'
lines = '=' * len(subtitle)
# Print table.
print(lines)
print(subtitle)
print(lines)
print(structure.format(mean, std, median))
print(lines)
Output:
======================================
| Mean | Stddev | Median |
======================================
| 181.26 | 6.73 | 180.34 |
======================================
Didn't really figure this out - but found a workaround.
I just did the following:
"| {:^6} | {:^6} | {:^6} | {:^6} | {:^6} |"
This keeps the width between | consistent.

How to make awk print one of 2 different fields based on which of it matches the condition

I have a ascii table in Linux which would look like this:
Oct Dec Hex Char Oct Dec Hex Char
-------------------------------------------------------------
056 46 2E . 156 110 6E n
I want to build a one liner in awk, which would match the 3rd and 7th field to corresponding hex character , say "2E". If 3rd field matches then print 4th field, i.e ".". Else if 7th field matches to "2E", then print corresponding 8th field.
I have written something like this:
man ascii | awk '$3 == "2E"{print $4};$7 == "2E"{print $8}'
Output:
.
But the above works only if the match happens in 3rd field. If it happens in 7th field it prints nothing. For example for this case:
man ascii | awk '$3 == "6E"{print $4};$7 == "6E"{print $8}'
Expected output:
n
Output I'm getting:
nothing

How to disable operator associativity in antlr4

I have a few rules for an expression:
e:
e '*' e |
e '+' e |
e '<' e |
'2';
I can specify the associativity of the '+' operator, using <assoc=right> for ex., but how can I specify that expressions like 2 < 2 < 2 should be invalid?
Rather late in answering this, but,...
The best way to handle seems to be to split your expression into two bits, a boolean expression, and a numeric expression
exp:
numeric |
boolean
boolean:
numeric '<' numeric;
numeric:
numeric '*' numeric |
numeric '+' numeric |
2;
This will allow things like 1 + 2 < 3 but not 1 < 2 < 3.

Detect overlapping ranges and correct then in oracle

Googling it a bit I found this to be an interesting question. Would like you guys shots.
Having my table
USER | MAP | STARTDAY | ENDDAY
1 | A | 20110101 | 20110105
1 | B | 20110106 | 20110110
2 | A | 20110101 | 20110107
2 | B | 20110105 | 20110110
Whant I want is to fix user's 2 case, where maps A and B overlaps by a couple days (from 20110105 until 20110107).
I wish I was able to query that table in a way that it never return overlapping ranges. My input data is falky already, so I don't have to worry with the conflict treatment, I just want to be able to get a single value for any given BETWEEN these dates.
Possible outputs for the query I'm trying to build would be like
USER | MAP | STARTDAY | ENDDAY
2 | B | 20110108 | 20110110 -- pushed overlapping days ahead..
2 | A | 20110101 | 20110104 -- shrunk overlapping range
It doesn't even matter if the algorithm causes "invalid ranges", e.g. Start = 20110105, End = 20110103, I'll just put null when I get to these cases.
What would you guys say? Any straight forward way to get this done?
Thanks!
f.
Analytic functions could help:
select userid, map
, case when prevend >= startday then prevend+1 else startday end newstart
, endday
from
( select userid, map, startday, endday
, lag(endday) over (partition by userid order by startday) prevend
from mytable
)
order by userid, startday
Gives:
USERID MAP NEWSTART ENDDAY
1 A 01/01/2011 01/05/2011
1 B 01/06/2011 01/10/2011
2 A 01/01/2011 01/07/2011
2 B 01/08/2011 01/10/2011

Resources