Dynamic Query in Spanner - google-cloud-spanner

I am implementing a search screen and I have some optional parameter coming up. In oracle I used to give true or 1=1 condition for optional parameter which is not supported in Spanner.
How we can achieve the same in Spanner SQL?
Sample Query
select mark.* from
abc mark join xyz mchhier on mark.X=mchhier.X where
--Mandatory
mark.X=123 and
--Below Params are Optional
mchhier.G in (null) and mchhier.C (null) and mchhier.D in (null)

Cloud Spanner also supports conditions like WHERE True and WHERE 1=1, so it should be possible to use the same strategy as in Oracle.
The following is for example a valid Spanner query:
select mark.*
from abc mark
join xyz mchhier on mark.X=mchhier.X
where
--Mandatory
mark.X=123
--#SomeParam could be NULL
AND CASE
-- If param is null, the condition will always be true
WHEN #SomeParam IS NULL THEN TRUE
ELSE mchhier.G in (#SomeParam)
END

Related

Spark partition filter is skipped when table is used in where condition, why?

Maybe someone observed this behavior and knows why Spark takes this route.
I wanted to read only few partitions from partitioned table.
SELECT *
FROM my_table
WHERE snapshot_date IN('2023-01-06', '2023-01-07')
results in (part of) the physical plan:
-- Location: PreparedDeltaFileIndex [dbfs:/...]
-- PartitionFilters: [cast(snapshot_date#282634 as string) IN (2023-01-06,2033-01-07)]
It is very fast, ~1s, in the execution plan I see it is using those provided datasets as arguments for partition filters.
If I try to provide filter predicate in form of the one column table it does full table scan and it takes 100x longer.
SELECT *
FROM
my_table
WHERE snapshot_date IN (
SELECT snapshot_date
FROM (VALUES('2023-01-06'), ('2023-01-07')) T(snapshot_date)
)
-- plan
Location: PreparedDeltaFileIndex [dbfs:/...]
PartitionFilters: []
ReadSchema: ...
I was unable to find any query hints that would force Spark to push down this predicate.
One can easily do for loop in python and wrap logic of reading a table with desired dates and read them one by one. But I'm not sure it is possible in SQL.
Is there any option/switch I have missed?
I don't think pushing down this kind of predicate is something supported by Spark's HiveMetaStore client, today.
So in first case, HiveShim.convertFilters(...) method will transform
:
WHERE snapshot_date IN ('2023-01-06', '2023-01-07')
into a filtering predicate understood by HMS as
snapshot_date="2023-01-06" or snapshot_date="2023-01-07"
but in the second, sub-select, case the condition will be skipped altogether.
/**
* Converts catalyst expression to the format that Hive's getPartitionsByFilter() expects, i.e.
* a string that represents partition predicates like "str_key=\"value\" and int_key=1 ...".
*
* Unsupported predicates are skipped.
*/
def convertFilters(table: Table, filters: Seq[Expression]): String = {
lazy val dateFormatter = DateFormatter()
:
:

Snowflake interprets boolean values in parquet as NULL?

Parquet Entry Example (All entries have is_active_entity as true)
{
"is_active_entity": true,
"is_removed": false
}
Query that demonstrates all values are taken as NULL
select $1:IS_ACTIVE_ENTITY::boolean, count(*) from #practitioner_delta_stage/part-00000-49224c02-150b-493b-8036-54ab30a8ff40-c000.snappy.parquet group by $1:IS_ACTIVE_ENTITY::boolean ;
Output has only one group for NULL
$1:IS_ACTIVE_ENTITY::BOOLEAN COUNT(*)
NULL 4930277
I don't know where I am going wrong, Spark writes the correct schema in parquet as evident from the example but snowflake takes it as NULL.
How do I fix this?
The columns in your file are quoted. As a consequence "is_active_entity" is not the same like "IS_ACTIVE_ENTITY"
Please try this query:
select $1:is_active_entity::boolean, count(*) from #practitioner_delta_stage/part-00000-49224c02-150b-493b-8036-54ab30a8ff40-c000.snappy.parquet group by $1:IS_ACTIVE_ENTITY::boolean ;
More infos: https://docs.snowflake.com/en/sql-reference/identifiers-syntax.html#:~:text=The%20identifier%20is%20case%2Dsensitive.

How to query CosmosDB for nested object value

How can I retrieve objects which match order_id = 9234029m, given this document in CosmosDB:
{
"order": {
"order_id": "9234029m",
"order_name": "name",
}
}
I have tried to query in CosmosDB Data Explorer, but it's not possible to simply query the nested order_id object like this:
SELECT * FROM c WHERE c.order.order_id = "9234029m"
(Err: "Syntax error, incorrect syntax near 'order'")
This seems like it should be so simple, yet it's not! (In CosmosDB Data Explorer, all queries need to start with SELECT * FROM c, but REST SQL is an alternative as well.)
As you discovered, order is a reserved keyword, which was tripping up the query parsing. However, you can get past that, and still query your data, with slightly different syntax (bracket notation):
SELECT *
FROM c
WHERE c["order"].order_id = "9234029m"
This was due, apparently, to order being a reserved keyword in CosmosDB SQL, even if used as above.

Presto map<varchar,set<varchar>> : How to query a field in presto which is of type map<varchar,set<varchar>>

I am trying to search a column having the data type map<varchar,set<varchar>>. I keep getting Query failed (#20190809_163618_00200_yyc4a) in your-presto: null
Any help is appreciated.
The below query is working perfectly
SELECT event_type_id FROM cassandra.data_integration_hub.my_table WHERE event_type_id = 123 limit 5
when I add business_keys field, the query is failing
SELECT event_type_id,business_keys FROM cassandra.data_integration_hub.my_table WHERE event_type_id = 123 limit 5
The business_keys is of type
Type: map<varchar,set<varchar>>
Sample Value:
{
"rule_id" : [ "12345" ]
}
Query failed (#20190809_163618_00200_yyc4a) in your-presto: null -- this is the (odd) way how Cassandra connector would report "column type not supported".
We improved error reporting since then and extended support for certain types in the Cassandra connector. Please try Presto 317 or Starburst Presto 312e.

How to configure Presto searches to be case-insensitive?

In my case, Presto connects to a MySQL database which has been configured to be case-insensitive. But any search through Presto seems to be case-sensitive.
Questions:
1) Is there a way to configure Presto searches to be case-insensitive? If not, can something be changed in the Presto-MySQL connector to make the searches case-insensitive?
2) If underlying DB is case-insensitive, shouldn't Presto searches also be case-insensitive? (I presume that Presto only generates the query plan and the actual execution happens on the underlying database)
Example: Consider the below table on MySQL.
name
____
adam
Alan
select * from table where name like '%a%'
// returns adam, Alan on MySQL
// returns only adam on Presto
select * from table where name = 'Adam'
// returns adam on MySQL
// returns NIL on Presto
You have to explicitly ask for case-insensitive comparison by normalizing compared values either to-lower, or to-upper, like this:
select * from table where lower(name) like '%a%';
select * from table where lower(name) = lower('Adam');
You can use regexp_like(), and prepend the regexp with (?i) for case insensitivity
select
*
from table_name
where
regexp_like(column_name, '(?i)fOO'); -- column contains fOO or FOO
or
select
*
from table_name
where
regexp_like(column_name, '(?i)^Foo'); -- column starts with fOO or FOO

Resources