Combine UNION and WITH statement on Azure stream analytics - azure

I try to combine different sources with the UNION statement on Azure Stream Analytics.
In general, this works fine:
SELECT
date
, value
FROM source1
UNION
SELECT
date
, value
FROM source2
But now I need some calculations which require a WITH statement so I hoped this would work:
SELECT
date
, value
FROM source1
UNION
(WITH tempTab AS (
SELECT
date
, value
FROM source2
SELECT
date
, value
FROM tempTab
)
(I'm aware that the example for this WITH statement is completely stupid but let's assume I have real world scenario where it is necessary. Let's further assume the WITH statement works on its own, i.e. if I omit the lines from the first select until after UNION)
In this version I get a notification that there is a syntax error near the "WITH" statements. Is there a way to solve the syntax error and make the WITH and UNION statement work together on Stream Analytics?

With current ASA syntax/semantics,unlike T-SQL, the WITH clause is only allowed to appear first in the query.
You can only do "with step1 as (...), step2 as (...), ..." followed by select clauses using any of step1,step2,... as sources in the from clauses.
union can then be used either in select clauses after the WITH, or inside individual step definitions.

Related

Select where multiple fields are not in subquery (excluding join)

I have a requirement to pull records, that do not have history in an archive table. 2 Fields of 1 record need to be checked for in the archive.
In technical sense my requirement is a left join where right side is 'null' (a.k.a. an excluding join), which in abap openSQL is commonly implemented like this (for my scenario anyways):
Select * from xxxx //xxxx is a result for a multiple table join
where xxxx~key not in (select key from archive_table where [conditions] )
and xxxx~foreign_key not in (select key from archive_table where [conditions] )
Those 2 fields are also checked against 2 more tables, so that would mean a total of 6 subqueries.
Database engines that I have worked with previously usually had some methods to deal with such problems (such as excluding join or outer apply).
For this particular case I will be trying to use ABAP logic with 'for all entries', but I would still like to know if it is possible to use results of a sub-query to check more than than 1 field or use another form of excluding join logic on multiple fields using SQL (without involving application server).
I have tested quite a few variations of sub-queries in the life-cycle of the program I was making. NOT EXISTS with multiple field check (shortened example below) to exclude based on 2 keys works in certain cases.
Performance acceptable (processing time is about 5 seconds), although, it's noticeably slower than the same query when excluding based on 1 field.
Select * from xxxx //xxxx is a result for a multiple table inner joins and 1 left join ( 1-* relation )
where NOT EXISTS (
select key from archive_table
where key = xxxx~key OR key = XXXX-foreign_key
)
EDIT:
With changing requirements (for more filtering) a lot has changed, so I figured I would update this. The construct I marked as XXXX in my example contained a single left join ( where main to secondary table relation is 1-* ) and it appeared relatively fast.
This is where context becomes helpful for understanding the problem:
Initial requirement: pull all vendors, without financial records in 3
tables.
Additional requirements: also exclude based on alternative
payers (1-* relationship). This is what example above is based on.
More requirements: also exclude based on alternative payee (*-* relationship between payer and payee).
Many-to-many join exponentially increased the record count within the construct I labeled XXXX, which in turn produces a lot of unnecessary work. For instance: a single customer with 3 payers, and 3 payees produced 9 rows, with a total of 27 fields to check (3 per row), when in reality there are only 7 unique values.
At this point, moving left-joined tables from main query into sub-queries and splitting them gave significantly better performance.
than any smarter looking alternatives.
select * from lfa1 inner join lfb1
where
( lfa1~lifnr not in ( select lifnr from bsik where bsik~lifnr = lfa1~lifnr )
and lfa1~lifnr not in ( select wyt3~lifnr from wyt3 inner join t024e on wyt3~ekorg = t024e~ekorg and wyt3~lifnr <> wyt3~lifn2
inner join bsik on bsik~lifnr = wyt3~lifn2 where wyt3~lifnr = lfa1~lifnr and t024e~bukrs = lfb1~bukrs )
and lfa1~lifnr not in ( select lfza~lifnr from lfza inner join bsik on bsik~lifnr = lfza~empfk where lfza~lifnr = lfa1~lifnr )
)
and [3 more sets of sub queries like the 3 above, just checking different tables].
My Conclusion:
When exclusion is based on a single field, both not in/not exits work. One might be better than the other, depending on filters you use.
When exclusion is based on 2 or more fields and you don't have many-to-many join in main query, not exists ( select .. from table where id = a.id or id = b.id or... ) appears to be the best.
The moment your exclusion criteria implements a many-to-many relationship within your main query, I would recommend looking for an optimal way to implement multiple sub-queries instead (even having a sub-query for each key-table combination will perform better than a many-to-many join with 1 good sub-query, that looks good).
Anyways, any additional insight into this is welcome.
EDIT2: Although it's slightly off topic, given how my question was about sub-queries, I figured I would post an update. After over a year I had to revisit the solution I worked on to expand it. I learned that proper excluding join works. I just failed horribly at implementing it the first time.
select header~key
from headers left join items on headers~key = items~key
where items~key is null
if it is possible to use results of a sub-query to check more than
than 1 field or use another form of excluding join logic on multiple
fields
No, it is not possible to check two columns in subquery, as SAP Help clearly says:
The clauses in the subquery subquery_clauses must constitute a scalar
subquery.
Scalar is keyword here, i.e. it should return exactly one column.
Your subquery can have multi-column key, and such syntax is completely legit:
SELECT planetype, seatsmax
FROM saplane AS plane
WHERE seatsmax < #wa-seatsmax AND
seatsmax >= ALL ( SELECT seatsocc
FROM sflight
WHERE carrid = #wa-carrid AND
connid = #wa-connid )
however you say that these two fields should be checked against different tables
Those 2 fields are also checked against two more tables
so it's not the case for you. Your only choice seems to be multi-join.
P.S. FOR ALL ENTRIES does not support negation logic, you cannot just use some sort of NOT IN FOR ALL ENTRIES, it won't be that easy.

Is there a workaround for the maximum length of an ODBCConnection.CommandText string in VBA?

I have a VBA script that generates a query string for a SAP HANA ODBC Connection in Excel. The query is determined by user inputs and can vary greatly in length. The query itself uses many versions of a similar query appended to one another using UNION ALL syntax.
The script sometimes throws a runtime error when trying to refresh. From my research, it has become clear that the reason for this is that the CommandText string exceeds a maximum allowed length of 32,767 (https://ask.sqlservercentral.com/questions/50819/too-long-sql-in-excel-vba.html).
I wondered whether there is a workaround for this, other than using a stored procedure (I am not against this if there is a way to create a stored procedure at runtime then execute it, but I cannot use a predefined stored procedure as my query is always different hence the need for VBA to create it)
Some more info about the dynamic query in VBA:
Column names, as well as parameters, are created dynamically and can be different every time
The query uses groups of lists of product numbers to generate an IN statement for each product group, then sums the sales for those products under the name of the group. These are then all UNION'd together to create one table with grouped records
Example of user input:
Example of resulting query:
WITH SOME_CTE (SOME_FIELDS) AS
(SELECT SOME_STUFF
FROM SOME_TABLE
WHERE SOME_STUFF_IS_GOING_ON)
SELECT GEND "Gender", 'Attribute 1' "Attribute", SUM(UNITS) "Units", SUM(VAL) "Value", SUM(MARGIN) "Margin"
FROM SOME_CTE
WHERE PRODUCT IN ('12345', '23456', '34567', '45678')
GROUP BY GEND
UNION ALL
SELECT GEND, 'Attribute 2' ATTR_NAME, SUM(UNITS), SUM(VAL), SUM(MARGIN)
FROM SOME_CTE
WHERE PRODUCT IN ('01234', '02345', '03456', '03567')
GROUP BY GEND
ORDER BY "Gender", "Attribute"
...and so on.
As you can see, with 2 attribute groups containing 4 products each there is no problem, but when we get to about 30 with several hundred each, it could be too long.
Note: I have tried things like shortening field references in the repeated parts of the query string to 1 character etc. which helps but does not solve the problem.
Any help would be greatly appreciated.
One workaround is to send multiple queries. Since you are using union all, you could execute every time single select statement, i.e.
create table in (for example) master database (don't create temporary tables! as they will be dropped after every query) - but before that, make sure you create new table, so delete old one if exists (also drop the table after you are done with it). Now every single select statement you'll change to insert statement, which will insert records to your so-called temporary table.
This way, you'll avoid lengthy queries, you'll just send single insert .. into.. select statements.
At the end, to get all results, you just need simple select query. After getting this data, you should drop that table, as it's no longer needed.

Case Insensitive group by in Presto

By default, the Presto performs case sensitive group by. But I wanted to know how to do case insensitive group by. One method is convert all the things in the column to lower case and then perform group by ie
select * from ( select lower(name_of_the_column)), other_columns from table)
where conditions..
group by name_of_the_column
One way we can reduce time is by putting the conditions in the select statment inside the brackets. Is there any better method?
You don't need to push lower(...) into a subquery. If you simply write:
SELECT lower(name_of_the_column), ...
FROM ...
GROUP BY lower(name_of_the_column) -- or just "GROUP BY 1"
Presto will do the conversion to lowercase only once for each row (not twice).

SSIS Change data on Import from Excel to SQL Server table

I'm changing all of excel data to numbers before importing to my sql table with SSIS.
This is my original table.
Need to achive this :
Is it possible to do it with SSIS?
In your data flow, you need to provide a translation or mapping from your Source item's value to the target's value. This is generally accomplished through the use of a Lookup Transformation. By default, a Lookup will fail out if no match is found and will perform case sensitive matches so an EX value would not match ex. An example of using the Lookup
Since the lookup gives you the ability to write a query, if you don't have these in a table, you can "cheat" and write the query like
SELECT D.CutId, D.Cut
FROM
(
VALUES
(1, 'EX')
, (2, 'ZZ')
) D(CutId, Cut);

Hive query string case

Is there a way to get all the types of string cases while doing this:
select count(word) from table where word="abcd"
Actually when doing this, it is not the same as this:
select count(word) from table where word="ABCD"
Ignoring the case in a where clause is very simple. You can, for example, convert both sides of the comparison to all caps notation:
SELECT COUNT(word)
FROM table
WHERE UPPER(word)=UPPER('ABCD')
Regardless of the capitalization used for the search term , the UPPER function makes them match as desired.
select count(word) from table where lower(word)="abcd"
However this assumes it's not a partitioned table. If it's partitioned by word you would start doing a full table scan because of the "lower("
SELECT count(word) FROM table
WHERE word RLIKE
"(?i)WOrd1|wOrd2"

Resources