Azure data lake Auto gnerated column in usql - azure

I want to add a auto generated column in my U-SQL select stamen.
how can we do.
same like identity column in SQL server
Regards,
Manish

The closest would be ROW_NUMBER. Here is a simple example:
#output =
SELECT
ROW_NUMBER() OVER () AS rn,
*
FROM #input;
You cannot use ROW_NUMBER directly with EXTRACT at this time. Simply extract the data first then add the row number in a subsequent section, like this:
// Get raw input
#input =
EXTRACT rawData string
FROM "/input/yourFile.txt"
USING Extractors.Tsv();
// Add a row number
#working =
SELECT ROW_NUMBER() OVER() AS rn,
*
FROM #input;

Related

Power Query - How to run multiple SQL queries against a single SQL query result?

I have a SQL server query that has 7 joins and pulls over 50 columns. I run this in SSMS, query results are stored in a local temp table, and then there are a dozen or so SELECT statements within the query that run based off the temp table. I have no control over the data itself, and only have read access to the tables. So basically:
SELECT *
INTO #myTempTable
FROM myTableAnd7Joins
SELECT <stuff>
FROM #myTempTable
SELECT <different stuff>
FROM #myTempTable
<like 10 more SELECT statements>
The temp table can easily be 500,000 rows and can take 5-10 minutes to return.
I run this query once, where it creates the single temp table and then multiple sets of results are returned. I then copy and paste each set of results to various tabs in an Excel file, and the rest of the Excel file is already populated with formulas and such which read the data from those tabs. I want to get rid of having to copy and paste from SSMS to Excel and do everything in Excel instead. I also want to use parameterized queries, so I can change some of the query variables on the fly. The parameters are stored in a range on a tab named 'Parameters'.
I currently have an Excel workbook saved that has the dozen individual Power Query queries, but each one has to do that initial query into #myTempTable first before performing it's specific query.
Each of my current dozen queries look like this:
LET
myParamTable = Excel.CurrentWorkbook(){[Name="tParams"]}[Content],
fieldList = myParamTable[#"Parameter"],
valueList = myParamTable[Value],
param1 = valueList{List.PositionOf(fieldList, "First Parameter")},
param2 = valueList{List.PositionOf(fieldList, "Second Parameter")},
param3 = valueList{List.PositionOf(fieldList, "Third Parameter")},
Source = Sql.Database("mySqlServer", "DatabaseName"),
GoQuery = Value.NativeQuery(Source,
"
DECLARE
# <a bunch of different variables>
BEGIN
SET <a bunch of different variables>
END
SELECT <50 different columns from all the tables>
INTO #myTempTable
FROM myTable t1
JOIN otherTable t2 on t1.thing = t2.thing
JOIN <6 or 7 other tables to t1>
SELECT <complex stuff that's more than just a filter or two>
FROM #myTempTable
WHERE <something> = p1
DROP TABLE #myTempTable",
[p1 = param1,
p2 = param2,
p3 = param3])
IN
GoQuery
Is it possible to create #myTempTable just once, and then have the dozen other queries use that as the source, instead of having to create #myTempTable each time? I tried to put #myTempTable into a table and then get the data From Table/Range but don't see how I can replicate my SQL as suggested in a previous question of mine (Power Query - Can a range be queried using SQL?)

Using result of a set in a select query in python

I have a select query to fetch specific columns from a mysql table. These specific columns must come from the intersection of the two sets.
Suppose I have two sets
A = {'age', 'description', 'name', 'payment_mode', 'id'}
# has all the column names of the table
and
B={'name', 'id', 'class'}
# input column names coming from a file (where values can be changed)
Now I do
inter= A.intersection(B)
# would result in {'name', 'id'}
So I would want the select query to be select {inter} from customers.
How can a write a dynamic query to get the column names to come from the result of the intersection?
Since you already have the desired columns in the set inter, all you need to do is construct the query string.
In Python 3.6 and above, you can use f-strings to generate the query as follows:
query = f"select {', '.join(inter)} from customers"
Or, you can do normal concatenation as follows:
query = 'select ' + ', '.join(inter) + ' from customers'
Both cases will result in query being "select name, id from customers" when inter is {'name', 'id'}

Performance difference between SELECT sum(coloumn_name) FROM and SELECT coloumn_name in CQL

I like to know the performance difference in executing the following two queries for a table cycling.cyclist_points containing 1000s of rows. :
SELECT sum(race_points)
FROM cycling.cyclist_points
WHERE id = e3b19ec4-774a-4d1c-9e5a-decec1e30aac;
select *
from cycling.cyclist_points
WHERE id = e3b19ec4-774a-4d1c-9e5a-decec1e30aac;
If sum(race_points) causes the query to be expensive, I will have to look for other solutions.
Performance Difference between your query :
Both of your query need to scan same number of row.(Number of row in that partition)
First query only selecting a single column, so it is little bit faster.
Instead of calculating the sum run time, try to preprocess the sum.
If race_points is int or bigint then use a counter table like below :
CREATE TABLE race_points_counter (
id uuid PRIMARY KEY,
sum counter
);
Whenever a new data inserted into cyclist_points also increment the sum with your current point.
UPDATE race_points_counter SET sum = sum + ? WHERE id = ?
Now you can just select the sum of that id
SELECT sum FROM race_points_counter WHERE id = ?

WHERE variable = ( subquery ) in OpenSQL

I'm trying to retrieve rows from a table where a subquery matches an variable. However, it seems as if the WHERE clause only lets me compare fields of the selected tables against a constant, variable or subquery.
I would expect to write something like this:
DATA(lv_expected_lines) = 5.
SELECT partner contract_account
INTO TABLE lt_bp_ca
FROM table1 AS tab1
WHERE lv_expected_lines = (
SELECT COUNT(*)
FROM table2
WHERE partner = tab1~partner
AND contract_account = tab1~contract_account ).
But obviously this select treats my local variable as a field name and it gives me the error "Unknown column name "lv_expected_lines" until runtime, you cannot specify a field list."
But in standard SQL this is perfectly possible:
SELECT PARTNER, CONTRACT_ACCOUNT
FROM TABLE1 AS TAB1
WHERE 5 = (
SELECT COUNT(*)
FROM TABLE2
WHERE PARTNER = TAB1.PARTNER
AND CONTRACT_ACCOUNT = TAB1.CONTRACT_ACCOUNT );
So how can I replicate this logic in RSQL / Open SQL?
If there's no way I'll probably just write native SQL and be done with it.
The program below might lead you to an Open SQL solution. It uses the SAP demo tables to determines the plane types that are used on a specific number of flights.
REPORT zgertest_sub_query.
DATA: lt_planetypes TYPE STANDARD TABLE OF s_planetpp.
PARAMETERS: p_numf TYPE i DEFAULT 62.
START-OF-SELECTION.
SELECT planetype
INTO TABLE lt_planetypes
FROM sflight
GROUP BY planetype
HAVING COUNT( * ) EQ p_numf.
LOOP AT lt_planetypes INTO DATA(planetype).
WRITE: / planetype.
ENDLOOP.
It only works if you don't need to read fields from TAB1. If you do you will have to gather these with other selects while looping at your results.
For those dudes who found this question in 2020 I report that this construction is supported since ABAP 7.50. No workarounds are needed:
SELECT kunnr, vkorg
FROM vbak AS v
WHERE 5 = ( SELECT COUNT(*)
FROM vbap
WHERE kunnr = v~kunnr
AND vkorg = v~vkorg )
INTO TABLE #DATA(customers).
This select all customers who made 5 sales orders within some sales organization.
In ABAP there is no way to do the query as in NATIVE SQL.
I would advice not to use NATIVE SQL, instead give a try to SELECT/ENDSELECT statement.
DATA: ls_table1 type table1,
lt_table1 type table of table1,
lv_count type i.
SELECT PARTNER, CONTRACT_ACCOUNT
INTO ls_table1
FROM TABLE1.
SELECT COUNT(*)
INTO lv_count
FROM TABLE2
WHERE PARTNER = TAB1.PARTNER
AND CONTRACT_ACCOUNT = TAB1.CONTRACT_ACCOUNT.
CHECK lv_count EQ 5.
APPEND ls_table1 TO lt_table1.
ENDSELECT
Here you append to ls_table1 only those rows where count is equals to 5 in selection of table2.
Hope it helps.

how to join two or more tables and result set having all distinct values

I have some 20 excel files containing data. all the tables have same columns like id name age location etc..... each file has distinct data but i don't know if data in one file is again repeated in another file. so i want to join all the files and the result st should contain distinct values. please help me out with this problem as soon as possible. i want the result set to be stored in an access database.
I would recomend either linking the sheets in acces, or importing the sheets as tabels.
Then from there try to determine using a DISTINCT select from the tables/sheets the keys required, and only selecting the records as required.
In SQL, you can use JOIN or NATURAL JOIN to join tables. I would look into NATURAL JOIN since you said all tables have the same values.
After that you can use DISTINCT to get distinct values.
I'm not sure if this is what you're looking for though: your question asks about excel but you've tagged it with SQL.
If you can use all the tables in one query, you can use a union to get the distinct rows:
select id, name, age, location from Table1
union
select id, name, age, location from Table2
union
select id, name, age, location from Table3
union
...
You can insert the records directly from the result:
insert into ResultTable
select id, name, age, location from Table1
union
....
If you only can select from one table at a time, you can skip the insert of rows that are already in the table:
insert into ResultTable
select t.id, t.name, t.age, t.location from Table1 as t
left join ResultTable as r on r.id = t.id
where r.id is null
(Assuming that id is a unique field identifying the record.)
It seems the unique set of data you want is this:
SELECT T1.name, T1.loc
FROM [Excel 8.0;HDR=YES;IMEX=1;DATABASE=C:\db1.xls;
].[Sheet1$] AS T1
UNION
SELECT T1.name, T1.loc
FROM [Excel 8.0;HDR=YES;IMEX=1;DATABASE=C:\db2.xls;
].[Sheet1$] AS T1
...but that you then want to arbitrarily apply a sequence of integers as id (rather than using the id values from the Excel tables).
Because Access Database Engine does not support common table expressions and Excel does not support VIEWs, you will have to repeat that UNION query as derived tables (hopefully the optimizer will recognize the repeat?) e.g. using a correlated subquery to get the row number:
SELECT (
SELECT COUNT(*) + 1
FROM (
SELECT T1.name, T1.loc
FROM [Excel 8.0;HDR=YES;IMEX=1;DATABASE=C:\db1.xls;
].[Sheet1$] AS T1
UNION
SELECT T1.name, T1.loc
FROM [Excel 8.0;HDR=YES;IMEX=1;DATABASE=C:\db2.xls;
].[Sheet1$] AS T1
) AS DT1
WHERE DT1.name < DT2.name
) AS id,
DT2.name, DT2.loc
FROM (
SELECT T2.name, T2.loc
FROM [Excel 8.0;HDR=YES;IMEX=1;DATABASE=C:\db1.xls;
].[Sheet1$] AS T2
UNION
SELECT T2.name, T2.loc
FROM [Excel 8.0;HDR=YES;IMEX=1;DATABASE=C:\db2.xls;
].[Sheet1$] AS T2
) AS DT2;
Note:
i want the result set to be stored in
an access database
Then maybe you should migrate the Excel data into a staging table in your Access database and do the data scrubbing from there. At least you could put that derived table into a VIEW :)
Join is to combine two tables by matching the values in corresponding columns. In result, you will get a merged table which consists of the first table, plus the matched rows copied from the second table. You can use DIGBD add-in for excel

Resources