antlr4: Grammar ambiguity, left-recursion, both? - antlr4
My grammar, shown below, does not compile. The returned error (from the antlr4 maven plugin) is:
[INFO] --- antlr4-maven-plugin:4.3:antlr4 (default-cli) # beebell ---
[INFO] ANTLR 4: Processing source directory /Users/kodecharlie/workspace/beebell/src/main/antlr4
[INFO] Processing grammar: DateRange.g4
org\antlr\v4\parse\GrammarTreeVisitor.g: node from line 13:87 mismatched tree node: startTime expecting <UP>
org\antlr\v4\parse\GrammarTreeVisitor.g: node from after line 13:87 mismatched tree node: RULE expecting <UP>
org\antlr\v4\parse\GrammarTreeVisitor.g: node from line 13:87 mismatched tree node: startTime expecting <UP>
org\antlr\v4\parse\GrammarTreeVisitor.g: node from after line 13:87 mismatched tree node: RULE expecting <UP>
org\antlr\v4\parse\GrammarTreeVisitor.g: node from line 13:87 mismatched tree node: startTime expecting <UP>
org\antlr\v4\parse\GrammarTreeVisitor.g: node from after line 13:87 mismatched tree node: RULE expecting <UP>
org\antlr\v4\parse\GrammarTreeVisitor.g: node from line 13:87 mismatched tree node: startTime expecting <UP>
org\antlr\v4\parse\GrammarTreeVisitor.g: node from after line 13:87 mismatched tree node: RULE expecting <UP>
[ERROR] error(20): internal error: Rule HOUR undefined
[ERROR] error(20): internal error: Rule MINUTE undefined
[ERROR] error(20): internal error: Rule SECOND undefined
[ERROR] error(20): internal error: Rule HOUR undefined
[ERROR] error(20): internal error: Rule MINUTE undefined
I can see how the grammar might be confused -- Eg, whether 2 digits is a MINUTE, SECOND, or HOUR (or maybe the start of a year). But a few articles suggest this error results from left-recursion.
Can you tell what's going on?
Thanks. Here's the grammar:
grammar DateRange;
range : startDate (THRU endDate)? | 'Every' LONG_DAY 'from' startDate THRU endDate ;
startDate : dateTime ;
endDate : dateTime ;
dateTime : GMTOFF | SHRT_MDY | YYYYMMDD | (WEEK_DAY)? LONG_MDY ;
// Dates.
GMTOFF : YYYYMMDD 'T' HOUR ':' MINUTE ':' SECOND ('-'|'+') HOUR ':' MINUTE ;
YYYYMMDD : YEAR '-' MOY '-' DOM ;
SHRT_MDY : MOY ('/' | '-') DOM ('/' | '-') YEAR ;
LONG_MDY : (SHRT_MNTH '.'? | LONG_MNTH) WS DOM ','? (WS YEAR (','? WS TIMESPAN)? | WS startTime)? ;
YEAR : DIGIT DIGIT DIGIT DIGIT ; // year
MOY : (DIGIT | DIGIT DIGIT) ; // month of year.
DOM : (DIGIT | DIGIT DIGIT) ; // day of month.
TIMESPAN : startTime (WS THRU WS endTime)? ;
// Time-of-day.
startTime : TOD ;
endTime : TOD ;
TOD : NOON | HOUR2 (':' MINUTE)? WS? MERIDIAN ;
NOON : 'noon' ;
HOUR2 : (DIGIT | DIGIT DIGIT) ;
MERIDIAN : 'AM' | 'am' | 'PM' | 'pm' ;
// 24-hour clock. Sanity-check range in listener.
HOUR : DIGIT DIGIT ;
MINUTE : DIGIT DIGIT ;
SECOND : DIGIT DIGIT ;
// Range verb.
THRU : WS ('-'|'to') WS -> skip ;
// Weekdays.
WEEK_DAY : (SHRT_DAY | LONG_DAY) ','? WS ;
SHRT_DAY : 'Sun' | 'Mon' | 'Tue' | 'Wed' | 'Thu' | 'Fri' | 'Sat' -> skip ;
LONG_DAY : 'Sunday' | 'Monday' | 'Tuesday' | 'Wednesday' | 'Thursday' | 'Friday' | 'Saturday' -> skip ;
// Months.
SHRT_MNTH : 'Jan' | 'Feb' | 'Mar' | 'Apr' | 'May' | 'Jun' | 'Jul' | 'Aug' | 'Sep' | 'Oct' | 'Nov' | 'Dec' ;
LONG_MNTH : 'January' | 'February' | 'March' | 'April' | 'May' | 'June' | 'July' | 'August' | 'September' | 'October' | 'November' | 'December' ;
DIGIT : [0-9] ;
WS : [ \t\r\n]+ -> skip ;
I resolved this issue by setting up a unique production rule for each sequence of digits (of length 1, 2, 3, or 4). As well, I simplified several rules -- in effect, trying to make the production rule alternatives more straightforward. Anyway, here is the final result, which does compile:
grammar DateRange;
range : 'Every' WS longDay WS 'from' WS startDate THRU endDate
| startDate THRU endDate
| startDate
;
startDate : dateTime ; endDate : dateTime ; dateTime : utc
| shrtMdy
| yyyymmdd
| longMdy
| weekDay ','? WS longMdy
;
// Dates.
utc : yyyymmdd 'T' hour ':' minute ':' second ('-'|'+') hour ':' minute ;
yyyymmdd : year '-' moy '-' dom ;
shrtMdy : moy ('/' | '-') dom ('/' | '-') year ;
longMdy : longMonth WS dom ','? optYearAndOrTime?
| shrtMonth '.'? WS dom ','? optYearAndOrTime?
;
optYearAndOrTime : WS year ','? WS timespan
| WS year
| WS timespan
;
fragment DIGIT : [0-9] ;
ONE_DIGIT : DIGIT ;
TWO_DIGITS : DIGIT ONE_DIGIT ;
THREE_DIGITS : DIGIT TWO_DIGITS ;
FOUR_DIGITS : DIGIT THREE_DIGITS ;
year : FOUR_DIGITS ; // year
moy : ONE_DIGIT | TWO_DIGITS ; // month of year.
dom : ONE_DIGIT | TWO_DIGITS ; // day of month.
timespan : (tod THRU tod) | tod ;
// Time-of-day.
tod : noon | (hour2 (':' minute)? WS? meridian?) ;
noon : 'noon' ; hour2 : ONE_DIGIT | TWO_DIGITS ;
meridian : ('AM' | 'am' | 'PM' | 'pm' | 'a.m.' | 'p.m.') ;
// 24-hour clock. Sanity-check range in listener.
hour : TWO_DIGITS ;
minute : TWO_DIGITS ;
second : TWO_DIGITS ; // we do not use seconds.
// Range verb.
THRU : WS? ('-'|'–'|'to') WS? ;
// Weekdays.
weekDay : shrtDay | longDay ; shrtDay : 'Sun' | 'Mon' | 'Tue' | 'Wed' | 'Thu' | 'Fri' | 'Sat' ; longDay : 'Sunday' | 'Monday' | 'Tuesday' | 'Wednesday' | 'Thursday' | 'Friday' | 'Saturday' ;
// Months.
shrtMonth : 'Jan' | 'Feb' | 'Mar' | 'Apr' | 'May' | 'Jun' | 'Jul' | 'Aug' | 'Sep' | 'Oct' | 'Nov' | 'Dec' ;
longMonth : 'January' | 'February' | 'March' | 'April' | 'May' | 'June' | 'July' | 'August' | 'September' | 'October' | 'November' | 'December' ;
WS : ~[a-zA-Z0-9,.:]+ ;
Related
Why is my django unittest failing a constraint?
I have this model: class TestopiaEvent(Model): event_id = AutoField(primary_key=True) name = CharField(max_length=255) start_date = DateField() end_date = DateField() testers_required = IntegerField() class Meta: constraints = [ CheckConstraint( check=Q(start_date__lte=F('end_date'), start_date__gte=datetime.now().date()), name='correct_datetime' ) ] And this test: class TestopiaEventTestCase(TestCase): def setUp(self): self.default_values = { 'name': 'Testopia 1', 'start_date': datetime.now().date(), 'end_date': datetime.now().date() + timedelta(days=1), 'testers_required': 1 } self.testopia_event = TestopiaEvent(**self.default_values) def test_save_with_valid_model_check_database(self): self.assertIsNone(self.testopia_event.save()) And it fails with this error: django.db.utils.IntegrityError: new row for relation "webserver_testopiaevent" violates check constraint "correct_datetime" DETAIL: Failing row contains (1, Testopia 1, 2020-07-24 00:00:00+00, 2020-07-25 00:00:00+00, 1). I don't understand why it is failing as it should only fail if today's date is less than the start date and the start date or/and the start date is greater than the end date, which it isn't? What have I done wrong? Thanks Edit: Here are the postgresdb constraints: testopia=# \d+ webserver_testopiaevent Table "public.webserver_testopiaevent" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description ------------------+------------------------+-----------+----------+-----------------------------------------------------------+----------+--------------+------------- event_id | integer | | not null | nextval('webserver_testopiaevent_event_id_seq'::regclass) | plain | | name | character varying(255) | | not null | | extended | | start_date | date | | not null | | plain | | end_date | date | | not null | | plain | | testers_required | integer | | not null | | plain | | Indexes: "webserver_testopiaevent_pkey" PRIMARY KEY, btree (event_id) Check constraints: "correct_datetime" CHECK (start_date >= statement_timestamp() AND start_date <= end_date) Access method: heap
Now() returns a DateTimeField() so with the timestamp addition it will be more than the current date if my DateField is set to the same date.
Generate cron expression for two years
I need to generate cron expression for every 10 min in the date range October 2017 to Feburary 2018. I tried the following expression: 0 10 0 ? 10-2 * 2017-2018, But its not a valid expression. I get this error message: ((Month) - Unsupported value '10-2' for range. ), Please help.
Try to use this: */10 * * 2-10 * 2017,2018 command here from nncron.ru page: * * * * * * | | | | | | | | | | | +-- Year (range: 1900-3000) | | | | +---- Day of the Week (range: 1-7, 1 standing for Monday) | | | +------ Month of the Year (range: 1-12) | | +-------- Day of the Month (range: 1-31) | +---------- Hour (range: 0-23) +------------ Minute (range: 0-59) if you want to test other format, use this page http://cronsandbox.com/
Remove Lines With Number Less Than X In Nth Field
I have a file consisting of lines like this: ExampleText | En | 1.0 ExampledText | Es | 0.9 ExamplesText | En | 0.9994 ExampleTexts | Br | 0.991 ExampledText | Es | 0.83324 ExamplerText | En | 0.4494 Using grep .*| En, I can get all the lines containing En. However, how can I also remove all values that contain less than 0.5 in the last column? Thus, the output is: ExampleText | En | 1.0 ExamplesText | En | 0.9994 Your positive input is highly appreciated.
awk '$2 == "En" && $3 >= .5' FS=' \\| ' Set field separator to | Match if field 2 equals En and field 3 is greater or equal to .5
pandas - create new columns based on existing columns / conditional average
I am new to Pandas and I am trying to learn column creation based on conditions applied to already existing columns. I am working with cellular data and this is how my source data looks like (the 2 columns to the right are empty to begin with): DEVICE_ID | MONTH | TYPE | DAY | COUNT | LAST_MONTH| SEASONAL_AVG 8129 | 201601 | VOICE | 1 | 8 | | 8129 | 201502 | VOICE | 1 | 5 | | 8129 | 201501 | VOICE | 1 | 2 | | 8321 | 201403 | DATA | 3 | 1 | | 2908 | 201302 | TEXT | 5 | 4 | | 8129 | 201406 | VOICE | 2 | 3 | | 8129 | 201306 | VOICE | 2 | 7 | | 3096 | 201501 | DATA | 5 | 6 | | 8129 | 201301 | VOICE | 1 | 2 | | I created a dataframe with this data and named it df. df = pd.DataFrame({'DEVICE_ID' : [8129, 8129,8129,8321,2908,8129,8129,3096,8129], 'MONTH' : [201601,201502,201501,201403,201302,201406,201306,201501,201301], 'TYPE' : ['VOICE','VOICE','VOICE','DATA','TEXT','VOICE','VOICE','DATA','VOICE'], 'DAY' : [1,1,1,3,5,2,2,5,1], 'COUNT' : [8,5,2,1,4,3,7,6,2] }) I am trying to create two additional columns to df: 'LAST_MONTH' and 'SEASONAL_AVG'. Logic for these two columns: LAST_MONTH: for the corresponding DEVICE_ID & TYPE & DAY combination return the previous month's COUNT. Ex: For row 1 (DEVICE_ID: 8129, TYPE: VOICE, DAY: 1, MONTH 201502), LAST_MONTH will be COUNT from row 2 (DEVICE_ID: 8129, TYPE: VOICE, DAY: 1, MONTH 201501. If there is no record for the previous month, LAST_MONTH will be zero. SEASONAL_AVG: for the corresponding DEVICE_ID & TYPE & DAY combination return the average of corresponding month from all previous years (data starts from 201301). Ex: SEASONAL_AVG for row 0 = average of COUNTs of rows 2 and 8. There will always be at least one record for corresponding month from the past. Need not be for for all TYPEs and DAYs combinations, but at least some of the possible combinations will be present for all DEVICE_IDs. Your help is greatly appreciated! Thanks! EDIT1: def last_month(record): year = int(str(record['MONTH'])[:4]) month = int(str(record['MONTH'])[-2:]) if month in (2,3,4,5,6,7,8,9,10): x = str(0)+str(month-1) y = int(str(year)+str(x)) last_month = int(y) elif month == 1: last_month = int(str(year-1)+str(12)) else: last_month = int(str(year)+str(month-1)) day = record['DAY'] cellular_type = record['TYPE'] #return record['COUNT'] return record['COUNT'][(record['MONTH'] == last_month) & (record['DAY'] == day) & (record['TYPE'] == cellular_type)] df['last_month'] = df.apply (lambda record: last_month(record),axis=1)
How to insert a new record(row or line) after the last line of input file using awk?
The marks of the students are given as a table in the following format Name | rollno | marks in exam1 | marks in exam 2 ... i.e. There is one record per line and each column is separated by a | (pipe) character.At the end of all the records I want to add extra lines which contains information about max, min mean...So my question is How would one add new record at the end of input file? Example: Here is a sample input Piyush | 12345 | 5 | 5 | 4 James | 007 | 0 | 0 | 7 Knuth | 31415 | 100 | 100 | 100 For which the output is Piyush | 12345 | 5 | 5 | 4 | 14 James | 007 | 0 | 0 | 7 | 7 Knuth | 31415 | 100 | 100 | 100 | 300 max | | 100 | 100 | 100 | 300 min | | 0 | 0 | 4 | 7 mean | | 35.00 | 35.00 | 37.00 | 107.00 sd | | 46.01 | 46.01 | 44.56 | 136.50
awk ' BEGIN { FS=OFS="|" } { sum = 0 for (i=3;i<=NF;i++) { tot[i] += $i sum += $i max[i] = ( (i in max) && (max[i] > $i) ? max[i] : $i ) } print $0, sum max[i] = ( (i in max) && (max[i] > sum) ? max[i] : sum ) } END { printf "max" OFS "" nf = NF+1 for (i=3; i<=nf; i++) { printf "%s%s", max[i], (i<nf?OFS:ORS) } }' repeat for min and whatever else you need to calculate and check the printf formatting flags for whatever spacing you need, if any.