why is 'no viable alternative at input' error throw? - antlr4

The following is a portion of a larger grammar. What I need is to understand why it doesn't work the way I think it should. I don't need a solution (I already found an alternative).
Next is the grammar:
grammar CommaSeparatorField;
document : field ( COMMA field)* EOF ;
field : value[false]+ ; //in real grammar, "value[false]+" is "value[getCustomLogic()]+"
value[ boolean commaAllow]
: TEXT
| NUMBER
| {$commaAllow}? COMMA
;
TEXT : [a-zA-Z]+ ;
NUMBER : [0-9]+ ;
COMMA : ',' ;
if input text is:
ab12cd,ef34gh
My logic tells me that since commaAllow is false, it will never enter into 'value' rule by COMMA, so the tree should be built:
document
|
-------------------------------------------------
field (COMMA field )* <EOF>
| | |
| -------------- |
| | | |
value[false]+ | value[false]+ |
| | | |
------------------ | ------------------ |
value value value | value value value |
| | | | | | | |
TEXT NUMBER TEXT COMMA TEXT NUMBER TEXT |
| | | | | | | |
ab 12 cd , ef 34 gh <EOF>
but instead I get:
line 1:6 no viable alternative at input ','
From my point of view it shoud work next way:
enter document, LT(1)=ab
enter field, LT(1)=ab
enter value, LT(1)=ab
consume [#0,0:1='ab',<1>,1:0] rule value
exit value, LT(1)=12
enter value, LT(1)=12
consume [#1,2:3='12',<2>,1:2] rule value
exit value, LT(1)=cd
enter value, LT(1)=cd
consume [#2,4:5='cd',<1>,1:4] rule value
exit value, LT(1)=, <-It should exit from 'value' and 'field' without to enter again value because 'commaAllow' is false
exit field, LT(1)=,
consume [#3,6:6=',',<3>,1:6] rule document
enter field, LT(1)=ef
enter value, LT(1)=ef
consume [#4,7:8='ef',<1>,1:7] rule value
exit value, LT(1)=34
enter value, LT(1)=34
consume [#5,9:10='34',<2>,1:9] rule value
exit value, LT(1)=gh
enter value, LT(1)=gh
consume [#6,11:12='gh',<1>,1:11] rule value
exit value, LT(1)=<EOF>
exit field, LT(1)=<EOF>
consume [#7,13:12='<EOF>',<-1>,1:13] rule document
exit document, LT(1)=<EOF>
but it is actually working this way:
enter document, LT(1)=ab
enter field, LT(1)=ab
enter value, LT(1)=ab
consume [#0,0:1='ab',<1>,1:0] rule value
exit value, LT(1)=12
enter value, LT(1)=12
consume [#1,2:3='12',<2>,1:2] rule value
exit value, LT(1)=cd
enter value, LT(1)=cd
consume [#2,4:5='cd',<1>,1:4] rule value
exit value, LT(1)=,
enter value, LT(1)=, <- why it tries to enter into value if commaAllow==false and LT(1)=, ?
Shouldn't it to:
- exit field
- consume ',' (COMMA) in rule 'document'
- enter again in rule 'field'
?
**** no viable alternative at input ',' is throw
exit value, LT(1)=,
enter value, LT(1)=,
exit value, LT(1)=ef
enter value, LT(1)=ef
consume [#4,7:8='ef',<1>,1:7] rule value
exit value, LT(1)=34
enter value, LT(1)=34
consume [#5,9:10='34',<2>,1:9] rule value
exit value, LT(1)=gh
enter value, LT(1)=gh
consume [#6,11:12='gh',<1>,1:11] rule value
exit value, LT(1)=<EOF>
exit field, LT(1)=<EOF>
consume [#7,13:12='<EOF>',<-1>,1:13] rule document
exit document, LT(1)=<EOF>
(note: I need to keep {$commaAllow}? COMMA option in value rule for a reason that does not make sense to raise it here)
What is failing in my logic? Am I misunderstanding something?
If i remove option:
| {$commaAllow}? COMMA // from 'value' rule
it works... but if it is not removed and 'commaAllow' is always false, Shouldn't it work as if it didn't exist?

Related

The difference between awk $1 and $NF?

case 1:
echo 'ABC-dev-test.zip' | awk -F'^ABC' {print $1}
output : null
case 2:
echo 'ABC-dev-test.zip' | awk -F'^ABC' {print $NF}
output : -dev-test.zip
I wonder. Why it comes out like this. $1 is the first record, $NF is the number of records. In the end they both point to the first, but I think running it should give you the same value.
Why is it different?
You are getting 1st results empty because you don't have anything before ABC in your input(and its going to be always empty since we are making specifically starting of the line ABC as field separator NOT any ABC in line), when you are making it field separator then it means things coming before it will be considered as 1st field($1) which is NOT there, hence your first command is not printing anything.
Let's run following command to see how many fields we have with shown samples and what are their respective values:
echo 'ABC-dev-test.zip' | awk -F'^ABC' '{for(i=1;i<=NF;i++){print "Field Number:"i " Field value is:" $i}}'
Field Number:1 Field value is:
Field Number:2 Field value is:-dev-test.zip
You could clearly see that 1st field is empty after making ABC as field separator with your shown samples, while $NF(which means last field of current line) works because we have -dev-test.zip after ABC in your shown samples.
Additional note: Looks like you are making ABC which is starting from line, in case you want to make ABC as field separator then if you have like: XYZ-ABC-dev-test.zipABC you will get XYZ- as 1st field value here.
Let's test this for string ABC-dev-test.zipABC-resvalues where we have 2 ABC values in it.
When we run it with making ^ABC as field separator see this: First field is empty and moreover 2nd ABC is not getting caught as a field separator here.
echo 'ABC-dev-test.zipABC-resvalues' | awk -F'^ABC' '{for(i=1;i<=NF;i++){print "Field Number:"i " Field value is:" $i}}'
Field Number:1 Field value is:
Field Number:2 Field value is:-dev-test.zipABC-resvalues
When we change field separator to ABC then see this: Its catching all ABC occurrences in whole value and treating them as a field separator.
echo 'ABC-dev-test.zipABC-resvalues' | awk -F'ABC' '{for(i=1;i<=NF;i++){print "Field Number:"i " Field value is:" $i}}'
Field Number:1 Field value is:
Field Number:2 Field value is:-dev-test.zip
Field Number:3 Field value is:-resvalues

Perform Left Search function using Pandas

This is my very first question linking to my first Python project.
To put it simple, I have 2 columns of data in Excel like this (first 6 rows):
destination_area | destination_code
SG37.D0 | SG37.D
SG30.C0 | SG30.C
SG4.A3.P | SG4.A
SG15.C16 | SG15.C
SG35.D02 | SG35.D
SG8.A5.BC | SG8.A
So in Excel, I'm using a function to get destination code by finding first "." in the cell & return all characters from the left of it, plus 1 character:
=IfError(left(E2,search(".",E2)+1),"")
Now I want to execute it using str.extract
df1['destination_code'] = df1['destination_area'].str.extract(r"(?=(.*[0-9][.][A-Z]))", expand = False)
print(df1['destination_area'].head(6),df1['destination_code'].head(6))
I almost got what I need but the code still recognize those that have more than 1 "."
destination_area | destination_code
SG37.D0 | SG37.D
SG30.C0 | SG30.C
SG4.A3.P | SG4.A3.P
SG15.C16 | SG15.C
SG35.D02 | SG35.D
SG8.A5.BC | SG8.A5.BC
I recognize that my regex is understanding the pattern of {a number + "." + a letter}, which returns all characters for the cases of "SG4.A3.P" and "SG8.A5.BC".
So how to modify my code? Or any better way to perform the code like how Excel does? Thanks in advance
No need in lookahead. Use
df1['destination_code'] = df1['destination_area'].str.extract(r"^([^.]+\..)", expand=False)
See proof. Mind the capturing group, it is enough here to return the value you need.
Explanation:
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
[^.]+ any character except: '.' (1 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
\. '.'
--------------------------------------------------------------------------------
. any character except \n
--------------------------------------------------------------------------------
) end of \1

Substracting part of cell

So lets say that in one row i have in 2 cells some data and I want to extract the data after the second "_" character:
| | A | B |
|---|:----------:|:---------------------:|
| 1 | 75875_QUWR | LALAHF_FHJ_75378_WZ44 | <- Input
| 2 | 75875_QUWR | 75378_WZ44 | <- Expected output
I tried using =RIGHT() function but than i will remove text from this first cell and so on, how can i write this function? Maybe I would compare this old cell and than to do if the second row is empty because maybe function deleted it to copy the one from first? No idea
Try:
=MID("_"&A1,FIND("#",SUBSTITUTE("_"&A1,"_","#",LEN("_"&A1)-LEN(SUBSTITUTE("_"&A1,"_",""))-1))+1,100)
Regardless of the times a "_" is present in your string, it will end up with the last two "words" in your string. Source
Use following formula.
=TRIM(MID(A1,SEARCH("#",SUBSTITUTE(A1,"_","#",2))+1,100))

Excel: select the last number or numbers from a cell

Given the following examples,
16A6
ECCB15
I would only like to extract the last number or numbers from the string value. So the end result that I'm looking for is:
6
15
I've been trying to find a way, but can't seem to find the correct one.
Use thisformula:
=MID(A1,AGGREGATE(14,7,ROW($Z$1:INDEX($ZZ:$ZZ,LEN(A1)))/(NOT(ISNUMBER(--MID(A1,ROW($Z$1:INDEX($ZZ:$ZZ,LEN(A1))),1)))),1)+1,LEN(A1))
Try this:
=--RIGHT(A2,SUMPRODUCT(--ISNUMBER(--RIGHT(SUBSTITUTE(A2,"E",";"),ROW(INDIRECT("1:"&LEN(A2)))))))
or this (avoid using INDIRECT):
=--RIGHT(A2,SUMPRODUCT(--ISNUMBER(--RIGHT(SUBSTITUTE(A2,"E",";"),ROW($A$1:INDEX($A:$A,LEN(A2)))))))
Replace A2 in the above formula to suit your case.
Here are the data for testing:
| String |
|-----------|
| 16A6 |
| ECCB15 |
| BATT5A6 |
| 16 |
| A1B2C3E0 |
| 16E |
| TEST00004 |
I have an even shorter version: --RIGHT(A2,SUMPRODUCT(--ISNUMBER(--RIGHT(SUBSTITUTE(A2,"E",";"),ROW(INDIRECT("1:"&LEN(A2)))))))
The difference is the use of SUBSTITUTE in my final formula. I used SUBSTITUTE to replace letter E with a symbol because in the fifth string in the above list, the RIGHT function in my formula will return the following: {"0";"E0";"3E0";"C3E0";"2C3E0";"B2C3E0";"1B2C3E0";"A1B2C3E0"} where the third string 3E0 will return TRUE by ISNUMBER function, and this will result in an incorrect answer. Therefore I need to get rid of letter E first.
Let me know if you have any questions. Cheers :)

Exact frequency of a specific word in a single cell (excluding suffix and prefix)

I earlier worked out a good solution for this with the help of the comunity, it works really good but I found out it can only handle suffix words (it dosen't ignore prefix-words).
Formula:
=IF(B1<>"";(LEN(A1)-LEN(SUBSTITUTE(A1;B1&" ";"")))/(LEN(B1)+1)+IF(RIGHT(A1;LEN(B1))=B1;1;0);"")
A contains sentences, multiple words (without punctuation)
B contains the word I want to count the exact frequency of.
C here is there the formula is placed and where I get the result
Sample table:
| A | B | C |
|:-------------------------:|:----:|:--------:|
| boots | shoe | 0 |
----------------------------------------------|
| shoe | shoe | 1 |
----------------------------------------------|
| shoes | shoe | 0 |
----------------------------------------------|
| ladyshoe dogshoe catshoe | shoe | 3 |
----------------------------------------------|
In C-column I am getting correct output in row 1, 2 and 3 but not 4. I want C4 should return 0 and not 3.
The problem is that it makes no match for shoexxxxxxxxxxx (correct) but makes a match for xxxxxxxxxxxshoe (wrong).
I only want the formula to count the exact match for shoe, any other word should not be counted for.
You want this formula:
=IF(B1<>"",(LEN(A1)-LEN(SUBSTITUTE(A1," "&B1&" ","")))/(LEN(B1)+2),"")+IF(A1=B1,1,0)+IF(LEFT(A1,LEN(B1)+1)=B1&" ",1,0)+IF(RIGHT(A1,LEN(B1)+1)=" "&B1,1,0)
I'll denote a space by * to make the following clearer:
There are four cases to consider:
string; the word has no spaces on either side (and is therefore the only word in cell A1
string*; the word appears at the start of a list of words.
*string; the word appears at the end of a list of words.
*string*; the word is in the middle of a list of words.
First we count the number of occurrences of *string*, by substituting "*string*" for "", subtracting the length of the new string from the old one, and dividing by len(string)+2 (which is the length of *string*).
Then we add one more to our count if A1 is exactly string, with no spaces either side.
Then we add one more if A1 starts with string*, and one more if A1 ends with *string.

Resources