I am trying to run a simple SQL Query using Airflow provider Snowflake (1.3.0)
SnowflakeOperator(
task_id=f'task',
snowflake_conn_id='snowflake_conn',
parameters={
"table": "dummy_table",
},
sql=["delete from %(table)s"],
autocommit=True,
dag=dag,
)
The SQL it is rendering is delete from ''dummy''. I want to get rid of '' but have tried everything and nothing seems to be working
To parametrize table name IDENFITIER should be used:
To use an object name specified in a literal or variable, use IDENTIFIER().
sql=["delete from IDENTIFIER(%(table)s)"],
The query DELETE FROM 'dummy' is not correct, but DELETE FROM IDENTIFIER('dummy') will work.
CREATE TABLE dummy(id INT);
DELETE FROM 'dummy';
-- Syntax error: unexpected ''dummy''. (line 4)
DELETE FROM IDENTIFIER('dummy');
-- number of rows deleted: 0
If you are using parameters then it's up to SQLAlchemy. You can find more information about it in How to render a .sql file with parameters in MySqlOperator in Airflow?
Alternatively, you can use Airflow rendering (Jinja engine) with params:
SnowflakeOperator(
task_id=f'task',
snowflake_conn_id='snowflake_conn',
params={
"table": "dummy_table",
},
sql=["delete from {{ params.table }}"],
autocommit=True,
dag=dag,
)
This will be rendered as:
thus the query that will be submitted to Snowflake is:
delete from dummy_table
Related
I am trying to run a great expectation suite on a delta table in Databricks. But I would want to run this on part of the table with a query. Though the validation is running fine, it's running on full table data.
I know that I can load a Dataframe and pass it to Batch Request but I would like to load the data directly with query.
batch_request = RuntimeBatchRequest(
datasource_name="datasource",
data_connector_name="data_quality_run",
data_asset_name="Input Data",
runtime_parameters={"path": "/delta table path"},
batch_identifiers={"data_quality_check": f"data_quality_check_{datetime.date.today().strftime('%Y%m%d')}"},
batch_spec_passthrough={"reader_method": "delta", "reader_options": {"header": True}, "query" : {"name":"John"}},
)
Above batch request loading the data ignoring the query option. Is there any way to pass the query for delta table in the batch request
You can try to put query inside of runtime_parameters.
This works for me when I am querying data in SQL Server:
batch_request = RuntimeBatchRequest(
datasource_name="my_mssql_datasource",
data_connector_name="default_runtime_data_connector_name",
data_asset_name="default_name",
runtime_parameters={
"query": "SELECT * from dbo.MyTable WHERE Created = GETDATE()"
},
batch_identifiers={"default_identifier_name": "default_identifier"},
)
How can this PostgreSQL query:
CREATE INDEX idx_lu_suggest_street_query_street ON fr_suggest_street_query (lower(f_unaccent(street))) INCLUDE (street);
be written in sqlalchemy? So far I tried:
Index(
"idx_suggest_street_street",
sa.func.lower(sa.func.f_unaccent("street")).label("street"),
postgresql_ops={
"street": "text_pattern_ops",
},
)
But I am missing the INCLUDE part, how can I achieve this?
UPDATE:
I achieved the INCLUDE part using
postgresql_include=["street"],
Still when I am running:
SELECT
indexname,
indexdef
FROM
pg_indexes
WHERE
tablename = 'lu_suggest_street_query';
The index that is created using sqlalchemy looks like:
CREATE INDEX idx_suggest_street_street_text_pattern ON public.lu_suggest_street_query USING btree (lower(f_unaccent('street'::text))) INCLUDE (street)
But it should be:
CREATE INDEX idx_suggest_street_street_text_pattern ON public.lu_suggest_street_query USING btree (lower(f_unaccent((street)::text))) INCLUDE (street)
I need to mention that I am using sqlalchemy declarative metadata.
My requirement is to use ADF to read data (columnA) from an xlx/csv file which is in the storage account and use that (columnA) to query my db and the output of my query which includes (columnA) should be written to a file in storage account.
I was able to read the data from the storage account but getting it as table. I Need to use it as a individual entry like select * from table where id=columnA.
Then the next task if I'm able to read each data, how to write it to a file
I used lookup activity to read data from excel, the below is the sample output, I need to use only the sku number for my query next, not able to proceed with this. Kindly suggest a solution
I set a variable as the output of the lookup as suggested here https://www.mssqltips.com/sqlservertip/6185/azure-data-factory-lookup-activity-example/ and tried to use that variable in my query, but I'm getting exception when I trigger it, bad template error.
Please try this:
I create a sample like yours and there is no need to use set variable.
Details:
Below is lookup output:
{
"count": 3,
"value": [
{
"SKU": "aaaa"
},
{
"SKU": "bbbb"
},
{
"SKU": "ccc"
}
]
}
Setting of copy data activity:
Query sql:
select * from data_source_table where Name = '#{activity('Lookup1').output.value[0].SKU}'
You can also use this sql,if you need:
select * from data_source_table where Name in('#{activity('Lookup1').output.value[0].SKU}','#{activity('Lookup1').output.value[1].SKU}','#{activity('Lookup1').output.value[2].SKU}')
This is my test data in my SQL DataBase:
Here is the result:
1,"aaaa",0,2017-09-01 00:56:00.0000000
2,"bbbb",0,2017-09-02 05:23:00.0000000
Hope this can help you.
Update:
You can try to use DataFlow.
source1 is your csv file,source2 is SQL DataBase.
This is setting of lookup
Filter condition:!isNull(PersonID)(One column in your SQL DataBase.)
Then,use select delete the SKU column.
Finally,Output to single file.
How can I retrieve objects which match order_id = 9234029m, given this document in CosmosDB:
{
"order": {
"order_id": "9234029m",
"order_name": "name",
}
}
I have tried to query in CosmosDB Data Explorer, but it's not possible to simply query the nested order_id object like this:
SELECT * FROM c WHERE c.order.order_id = "9234029m"
(Err: "Syntax error, incorrect syntax near 'order'")
This seems like it should be so simple, yet it's not! (In CosmosDB Data Explorer, all queries need to start with SELECT * FROM c, but REST SQL is an alternative as well.)
As you discovered, order is a reserved keyword, which was tripping up the query parsing. However, you can get past that, and still query your data, with slightly different syntax (bracket notation):
SELECT *
FROM c
WHERE c["order"].order_id = "9234029m"
This was due, apparently, to order being a reserved keyword in CosmosDB SQL, even if used as above.
We have an HDInsight cluster running HBase (Ambari)
We have created a table using Pheonix
CREATE TABLE IF NOT EXISTS Results (Col1 VARCHAR(255) NOT NULL,Col1
INTEGER NOT NULL ,Col3 INTEGER NOT NULL,Destination VARCHAR(255)
NOT NULL CONSTRAINT pk PRIMARY KEY (Col1,Col2,Col3) )
IMMUTABLE_ROWS=true
We have filled some data into this table (using some java code)
Later, we decided we want to create a local index on the destination column as follows
CREATE LOCAL INDEX DESTINATION_IDX ON RESULTS (destination) ASYNC
We have run the index tool to fill the index as follows
hbase org.apache.phoenix.mapreduce.index.IndexTool --data-table
RESULTS --index-table DESTINATION_IDX --output-path
DESTINATION_IDX_HFILES
When we run queries and filter using the destination columns everything is ok. For example
select /*+ NO_CACHE, SKIP_SCAN */ COL1,COL2,COL3,DESTINATION from
Results where COL1='data' DESTINATION='some value' ;
But, if we do not use the DESTINATION in the where query, then we will get a NullPointerException in BaseResultIterators.class
(from phoenix-core-4.7.0-HBase-1.1.jar)
This exception is thrown only when we use the new local index. If we query ignoring the index like this
select /*+ NO_CACHE, SKIP_SCAN ,NO_INDEX */ COL1,COL2,COL3,DESTINATION from
Results where COL1='data' DESTINATION='some value' ;
we will not get the exception
Showing some relevant code from the area where we get the exception
...
catch (StaleRegionBoundaryCacheException e2) {
// Catch only to try to recover from region boundary cache being out of date
if (!clearedCache) { // Clear cache once so that we rejigger job based on new boundaries
services.clearTableRegionCache(physicalTableName);
context.getOverallQueryMetrics().cacheRefreshedDueToSplits();
}
// Resubmit just this portion of work again
Scan oldScan = scanPair.getFirst();
byte[] startKey = oldScan.getAttribute(SCAN_ACTUAL_START_ROW);
byte[] endKey = oldScan.getStopRow();
====================Note the isLocalIndex is true ==================
if (isLocalIndex) {
endKey = oldScan.getAttribute(EXPECTED_UPPER_REGION_KEY);
//endKey is null for some reason in this point and the next function
//will fail inside it with NPE
}
List<List<Scan>> newNestedScans = this.getParallelScans(startKey, endKey);
We must use this version of the Jar since we run inside Azure HDInsight
and we can not select a newer jar version
Any ideas how to solve this?
What does "recover from region boundary cache being out of date" mean? it seems to be related to the problem
It appears that the version that azure HDInsight has for phoenix core (phoenix-core-4.7.0.2.6.5.3004-13.jar) has the bug but if i use a bit newer version (phoenix-core-4.7.0.2.6.5.8-2.jar, from http://nexus-private.hortonworks.com:8081/nexus/content/repositories/hwxreleases/org/apache/phoenix/phoenix-core/4.7.0.2.6.5.8-2/) we do not see the bug any more
note that it is not possible to take a much newer version like 4.8.0 since in this case the server will throw a version mismatch error