Using Presto's Coalesce function with a row on AWS Athena

Using Presto's Coalesce function with a row on AWS Athena - presto

I am using AWS Web Application Firewall (WAF). The logs are written to an S3 Bucket and queried by AWS Athena.
Some log fields are not simple data types but complex JSON types. For example "rulegrouplist". It contains a JSON array of complex types and the array might have 0 or more elements.
So I am using Presto's try() function to convert errors to NULLs and trying to use the coalesce() function to put a dash in their place. (Keeping null values cause problems while using GROUP BY)
try() is working fine but coalesce() is causing a type mismatch problem.
The function call below:
coalesce(try(waf.rulegrouplist[1].terminatingrule),'-')
causes this error:
All COALESCE operands must be the same type: row(ruleid varchar,action varchar,rulematchdetails varchar)
How can I convert "-" to a row or what else can I use that will count as a row?

Apperantly you can create an empty row and cast it to a typed row.
This worked...
coalesce(try(waf.rulegrouplist[1].terminatingrule),CAST(row('null','null','null') as row(ruleid varchar,action varchar,rulematchdetails varchar)))

Related

I have to find the file with maximum speed size in azure data factory

I created an array variable and tried to pass that to max math function in ADF but i'm getting error in it. So how to using max function there?

Array is one of the datatypes supported in ADF with both parameters and variables, so if you have a valid array then max will work either. A simple example:
create a valid parameter of the Array datatype:
Create a variable and add a Set Variable activity. Use this expression as the Value argument:
#string(max(pipeline().parameters.pInputArray))
NB I'm using the max function directly on the array and then string as only the String, Array and Boolean datatypes are supported for variables (at this time).

(Oracle) LIKE Clause not Working with Numeric Bind Variables

I'm having problems getting Oracle's LIKE clause working with numeric bind variables.
In the examples below the TABLEID column is numeric.
In SQLDeveloper I can write SELECT * FROM TABLEX WHERE TABLEID LIKE ('201%'); which works fine. However, when I try the same query in code using a bind variable: SELECT * FROM TABLEX WHERE TABLEID LIKE :bindVar
I get an ORA-01722: invalid number error.
I've tried surrounding the bind var with () and have tried adding the % symbol to the end of the bind value with no luck.
I'm using NodeJS to make the database calls.
Any ideas as to what I might be doing wrong here?

Datatypes should match.
If TABLEID column's datatype is NUMBER and you're comparing it to a string ( (in like '201%', '201%' is a string), then you'd apply TO_CHAR function to that column:
select * from tablex
where to_char(tableid) like :bindVar

So as it turns out the problem ended up being not with my code but the data within the table itself. Sorry to waste yall's time!

Multiple parameter in IN clause of Spark SQL from parameter file

I am trying to run spark query where I am creating curated table from a source table based upon values in parameter file.
properties_file.properties contains below key values:
substatus,allow,deny
SparkQuery is
//Code to load property file in parseConf
spark.sql(s"""insert into curated.table from source.table where
substatus='${parseConf.substatus}'""")
Above works with single value in substatus. But Can someone help what shall i do if I need to use substatus in ${parseConf.substatus} for multiple values from param as below.
spark.sql(s"""insert into curated.table from source.table where substatus in '${parseConf.substatus}'""")

To resolve my problem, I updated my property file as:
substatus,'allow'-'deny'
Then in scala code, I implemented below logic:
val subStatus=(parseConf.substatus).replace('-',',')
spark.sql(s"""insert into curated.table from source.table where substatus in ('${subStatus}')""")
Above strategy helped in breaking the values in string to muliple parameters of IN clause.

Equalto operator expects 1 value to be passed other than directly reading the value from parameter file who make in pass a one string. You need to break the values and then use IN clause inplace of equalto(=).

pySpark dataframe filter method

I use Databricks runtime 6.3 and use pySpark. I have a dataframe df_1. SalesVolume is an integer but AveragePrice is a string.
When I execute below code, code runs and I get the correct output.
display(df_1.filter('SalesVolume>10000 and AveragePrice>70000'))
But, below code ends up in error; "py4j.Py4JException: Method and([class java.lang.Integer]) does not exist"
display(df_1.filter(df_1['SalesVolume']>10000 & df_1['AveragePrice']>7000))
Why does the first one work but not the second one?

you have to wrap your conditions in ()
display(df_1.filter((df_1['SalesVolume']>10000) & (df_1['AveragePrice']>7000)))
Filter accepts SQL like syntax or dataframe like syntax, 1st one works because it's a valid sql like syntax. but second one isn't.

How do I make a WHERE clause with SQLalchemy to compare to a string?

Objective
All I am trying to do is retrieve a single record from a specific table where the primary key matches. I have a feeling I'm greatly over complicating this as it seems to be a simple enough task. I have a theory that it may not know the variable value because it isn't actually pulling it from the Python code but instead trying to find a variable by the same name in the database.
EDIT: Is it possible that I need to wrap my where clause in an expression statement?
Attempted
My Python code is
def get_single_record(name_to_search):
my_engine = super_secret_inhouse_engine_constructor("sample_data.csv")
print("Searching for " + name_to_search)
statement = my_engine.tables["Users"].select().where(my_engine.tables["Users"].c.Name == name_to_search)
# Print out the raw SQL so we can see what exactly it's checking for
print("You are about to run: " + str(statement))
# Print out each result (should only be one)
print("Results:")
for item in my_engine.execute(statement):
print(item)
I tried hard coding a string in its place.
I tried using like instead of where.
All to the same end result.
Expected
I expect it to generate something along the lines of SELECT * FROM MyTable WHERE Name='Todd'.
Actual Result
Searching for Todd
STATEMENT: SELECT "Users"."Name", ...
FROM "Users"
WHERE "Users"."Name" = ?
That is an actual question mark appearing my statement, not simply my own confusion. This is then followed by it printing out a collection of all the records from the table, as though it successfully matched everything.
EDIT 2: Running either my own hard coded SQL string or the generated query by Alchemy returns every record from the table. I'm beginning to think the issue may be with the engine I've set up not accepting the query.
Why I'm Confused
According to the official documentation and third party sources, I should be able to compare to hardcoded strings and then, by proxy, be able to compare to a variable.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Using Presto's Coalesce function with a row on AWS Athena - presto

Apperantly you can create an empty row and cast it to a typed row. This worked... coalesce(try(waf.rulegrouplist[1].terminatingrule),CAST(row('null','null','null') as row(ruleid varchar,action varchar,rulematchdetails varchar)))

Related

I have to find the file with maximum speed size in azure data factory

(Oracle) LIKE Clause not Working with Numeric Bind Variables

Multiple parameter in IN clause of Spark SQL from parameter file

pySpark dataframe filter method

How do I make a WHERE clause with SQLalchemy to compare to a string?

Categories

Resources