How to write the rule: “some characters but except” in ANTLR4?

How to write the rule: “some characters but except” in ANTLR4? - antlr4

ID: (['_'a-zA-Z])(['_'a-zA-Z0-9])*;
INT_LIT: 'INT';
FLOAT: 'FLOAT';
What I want is that ID cannot be 'INT' or 'FLOAT'
What should I do ?? Thanks
(Sorry for my bad English)

Just move INT_LIT and FLOAT rules higher than ID. When two rules match the same text, the first one wins. I.e. "INT" will always be INT_LIT rather than ID with this setup.

Related

PySpark - data mismatch error when trying to split a column content

I'm trying to use PySpark's split() method on a column that has data formatted like:
[6b87587f-54d4-11eb-95a7-8cdcd41d1310, 603, landing-content, landing-content-provider]
my intent is to extract the 4th element after the last comma.
I'm using a syntax like:
mydf.select("primary_component").withColumn("primary_component_01",f.split(mydf.primary_component, "\,").getItem(0)).limit(10).show(truncate=False)
But I'm consistently getting this error:
"cannot resolve 'split(mydf.primary_component, ',')' due to data
type mismatch: argument 1 requires string type, however,
'mydf.primary_component' is of
structuuid:string,id:int,project:string,component:string
type.;;\n'Project [primary_component#17,
split(split(primary_component#17, ,)[1], \,)...
I've also tried escaping the "," using \, \\ or not escaping it at all and this doesn't make any difference. Also, removing the ".getItem(0)" produces no difference.
What am I doing wrong? Feeling a dumbass but I don't know how to fix this...
Thank you for any suggestions

You are getting the error:
"cannot resolve 'split(mydf.`primary_component`, ',')' due to data
type mismatch: argument 1 requires string type, however,
'mydf.`primary_component`' is of
struct<uuid:string,id:int,project:string,component:string>
because your column primary_component is using a struct type when split expects string columns.
Since primary_component is already a struct and you are interested in the value after your last comma you may try the following using dot notation
mydf.withColumn("primary_component_01","primary_component.component")
In the error message, spark has shared the schema for your struct as
struct<uuid:string,id:int,project:string,component:string>
i.e.
column
data type
uuid
string
id
int
project
string
component
string
For future debugging purposes, you may use mydf.printSchema() to show the schema of the spark dataframe in use.

invalid input syntax for type numeric: " "

I'm getting this message in Redshift: invalid input syntax for type numeric: " " , even after trying to implement the advice found in SO.
I am trying to convert text to number.
In my inner join, I try to make sure that the text being processed is first converted to null when there is an empty string, like so:
nullif(trim(atl.original_pricev::text),'') as original_price
... I noticed from a related post on coalesce that you have to convert the value to text before you can try and nullif it.
Then in the outer join, I test to see that there's a limited set of acceptable characters and if this test is met I try to do the to_number conversion:
,case
when regexp_instr(trim(atl.original_price),'[^0-9.$,]')=0
then to_number(atl.original_price,'FM999999999D00')
else null
end as original_price2
At this point I get the above error and unfortunately I can't see the details in datagrip to get the offending value.
So my questions are:
I notice that there is an empty space in my error message:
invalid input syntax for type numeric: " " . Does this error have the exact same meaning as
invalid input syntax for type numeric:'' which is what I see in similar posts??
Of course: what am I doing wrong?
Thanks!

It's hard to know for sure without some data and the complete code to try and reproduce the example, but as some have mentioned in the comments the most likely cause is the to_number() function you are using.
In the earlier code fragment you are converting original_price to text (string) and then substituting an empty string ('') if the value is NULL. Calling the to_number() function on an empty string will give you the error described.
Without the full SQL statement it's not clear why you're putting the nullif() function around the original_price in the "inner join" or how whether the CASE statement is really in an outer join clause or one of the columns returned by the query. However you could perhaps alter the nullif() to substitute a value that can be converted to a number e.g. '0.00' instead of ''.

Sorry I couldn't share real data. I spent the weekend testing small sets to try and trap the error. I found that the error was caused by the input string having no numbers, which is permitted by my regex filter:
when regexp_instr(trim(atl.original_price),'[^0-9.$,]') .
I wrongly expected that a non numeric string like "$" would evaluate to NULL and then the to_number function would = NULL . But from experimenting it seems that it needs at least one number somewhere in the string. Otherwise it reduces the string argument to an empty string prior to running the to_number formatting and chokes.
For example select to_number(trim('$1'::text),'FM999999999999D00') will evaluate to 1 but select to_number(trim('$A'::text),'FM999999999999D00') will throw the empty string error.
My fix was to add an additional regex to my initial filter:
and regexp_instr(atl.original_price2,'[0-9]')>0 .
This ensures that at least one number will be in the string and after that the empty string error went away.
Hope my learning experience helps someone else.

GraphQL Schema Definition Error

I am trying to define GraphQL schema like this:
type Obj {
id: Int
0_100: Int
}
But it gives following exception.
'GraphQLError: Syntax Error: Expected Name, found Int "0"',
How can I define attribute starting with numeric, -, + signs.

This is the regexp for names in GraphQL: /[_A-Za-z][_0-9A-Za-z]*/. Anything that does not match is not allowed.
Sample URL:
http://facebook.github.io/graphql/June2018/#sec-Names

Numerical parameter names do not work in GraphQL.
You can probably prefix it with a string like _0_100, but it's fairly unusual and I'd recommend against it. Consider using words to name your parameters instead.

The lexer chooses the wrong Token

Hi I am new to antrl and have a problem that I am not able to solve during the last days:
I wanted to write a grammar that recognizes this text (in reality I want to parse something different, but for the case of this question I simplified it)
100abc
150100
200def
Here each rows starts with 3 digits, that identifiy the type of the line (header, content, trailer), than 3 characters follow, that are the payload of the line.
I thought I could recogize this with this grammar:
grammar Types;
file : header content trailer;
A : [a-z|A-Z|0-9];
NL: '\n';
header : '100' A A A NL;
content: '150' A A A NL;
trailer: '200' A A A NL;
But this does not work. When the lexer reads the "100" in the second line ("150100") it reads it into one token with 100 as the value and not as three Tokens of type A. So the parser sees a "100" token where it expects an A Token.
I am pretty sure that this happens because the Lexer wants to match the longest phrase for one Token, so it cluster together the '1','0','0'. I found no way to solve this. Putting the Rule A above the parser Rule that contains the string literal "100" did not work. And also factoring the '100' into a fragement as follows did not work.
grammar Types;
file : header content trailer;
A : [a-z|A-Z|0-9];
NL: '\n';
HUNDRED: '100';
header : HUNDRED A A A NL;
content: '150' A A A NL;
trailer: '200' A A A NL;
I also read some other posts like this:
antlr4 mixed fragments in tokens
Lexer, overlapping rule, but want the shorter match
But I did not think, that it solves my problem, or at least I don't see how that could help me.

One of your token definitions is incorrect: A : [a-z|A-Z|0-9]; Don't use a vertical line inside a range [] set. A correct definition is: A : [a-zA-Z0-9];. ANTLR with version >= 4.6 will notify about duplicated chars | inside range set.
As I understand you mixed tokens and rules concept. Tokens defined with UPPER first letter unlike rules that defined with lower case first letter. Your header, content and trailer are tokens, not rules.
So, the final version of correct grammar on my opinion is
grammar Types;
file : Header Content Trailer;
A : [a-zA-Z0-9];
NL: '\r' '\n'? | '\n' | EOF; // Or leave only one type of newline.
Header : '100' A A A NL;
Content: '150' A A A NL;
Trailer: '200' A A A NL;
Your input text will be parsed to (file 100abc\n 150100\n 200def)

string mutation

I am trying to create a string mutation that, after a prompt for a city and state, would output the state in uppercase, followed directly by the city in lowercase, followed directly by the state again in uppercase.
I have tried many types of mutations but nothing is working.
Can anyone help me?

use String#toUpperCase() and String#toLowerCase() methods.
eg. System.out.println(state.toUpperCase());

Here is one in java
Scanner sc =new Scanner(System.in);
String city,state;
System.out.println("Enter City =");
city=sc.nextLine();
System.out.println("Enter State =");
state=sc.nextLine();
System.out.println( state.toUpperCase() + " "+city.toLowerCase() + " "+ state.toUpperCase());
Independent of any programming language(But dependent on character Representation) you could retrieve every character of string and add 22 which would convert UPPERCASE into LOWERCASE.As you must be knowing ASCII values of a-z is 97-122 and that of A-Z is 65-90.so you can lookout how much to add/ subtract to convert between cases.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to write the rule: “some characters but except” in ANTLR4? - antlr4

ID: (['_'a-zA-Z])(['_'a-zA-Z0-9])*; INT_LIT: 'INT'; FLOAT: 'FLOAT'; What I want is that ID cannot be 'INT' or 'FLOAT' What should I do ?? Thanks (Sorry for my bad English)

Just move INT_LIT and FLOAT rules higher than ID. When two rules match the same text, the first one wins. I.e. "INT" will always be INT_LIT rather than ID with this setup.

Related

PySpark - data mismatch error when trying to split a column content

invalid input syntax for type numeric: " "

GraphQL Schema Definition Error

The lexer chooses the wrong Token

string mutation

Categories

Resources