I have currently setup multiple Azure DBs on a single server primarily with different schemas.
External Queries work perfectly fine on small tables. I am currently using an INNER JOIN on multiple tables within 2 DBs.
This works great for small tables with limited data sets since it appears to be physically copying the tables over to a temp table then performing the query.
However when I do a join on a large table ~500K rows the query will fail as the size of the table causes a timeout while it tries to copy the table to the temp directory.
Is there a way to execute the query without copying the JOIN table to a temp directory?
I have previously tried to create Stored procedures on the DB with the Large Table I am trying to join, however that DB will eventually be sunset and I will be back where I am now so I would like a longer term solution.
Alternately consider consolidating your separate databases into one single database, eg using schemas to provide separation. Ask yourself the question "Why are my databases split?" or is having to join across them an occasional thing? Do you need them to be split. If having to join across them is a regular task then consolidating them makes sense.
Alternately consider Managed Instance. This is a PaaS service but gives you an experience closer to traditional SQL Server. First off, you can have multiple databases in once instance and cross-database joins are as easy as they are in box product SQL Server.
I ended up using a CROSS APPLY (Inner Join) and OUTER CROSS APPLY (Left Join) then adding the logic to a where statement within
Select c.Id, a.Quantity, b.Name
From Table1 a CROSS APPLY
(Select * FROM Table2 WHERE x = y) c
INNER JOIN Table3 b ON c.x = b.x
Add Join
Add Join
This executed the joins on the target without having to bring the whole table into a TempTable.
Related
I have 100-150 Azure databases with same table schema. There are 300-400 tables in each database. Separate reports are enabled on all these databases.
Now I want to merge these database into a centralized database and generate some different Power BI reports from this centralized database.
The approach I am thinking is -
There will be Master table on target database which will have
DatabaseID and Name.
All the tables on target database will have the composite primary key
created with the Source Primary key and Database ID.
There will be multiple (30-35) instances of Azure data factory
pipeline and each instance will be responsible to merge data from
10-15 databases.
These ADF pipelines will be scheduled to run weekly.
Can anyone please guide me that the above approach will be feasible in this scenario? Or there could any other option we can go for.
Thanks in Advance.
You trying to create a Data Warehouse.
I hope you will never archive to merge 150 Azure SQL Databases because is soon as you try to query that beefy archive what you will see is this:
This because Power BI, as any other tool, comes with limitations:
Limitation of Distinct values in a column: there is a 1,999,999,997 limit on the number of distinct values that can be stored in a column.
Row limitation: If the query sent to the data source returns more than one million rows, you see an error and the query fails.
Column limitation: The maximum number of columns allowed in a dataset, across all tables in the dataset, is 16,000 columns.
A data warehouse is not just the merge of ALL of your data. You need to clean them and import only the most useful ones.
So the approach you are proposing is overall OK, but just import what you need.
My Excel applications exports data (6 columns by 300+ rows) to Access DB once a minute.
After exporting, I try to import two sets of older values from Access to Excel.
The primary query is:
SELECT qryStDevPct.StDevPct, atblExcelIntraday.ValueClose
FROM atblExcelIntraday
LEFT JOIN qryStDevPct
ON atblExcelIntraday.SecurityID = qryStDevPct.SecurityID
WHERE atblExcelIntraday.RecordDate Between #2022-05-04 13:55:59# And #2022-05-04 13:54:59#
ORDER BY atblExcelIntraday.Ticker;
The internal Access qryStDevPct is:
SELECT atblExcelIntraday.SecurityID,
Avg(atblExcelIntraday.ValueClose) AS AvgP,
StDev(atblExcelIntraday.ValueClose) AS STD,
[STD]/[AvgP] AS StDevPct
FROM atblExcelIntraday
INNER JOIN atblSecuritiesOpenPriceDaily
ON atblExcelIntraday.SecurityID = atblSecuritiesOpenPriceDaily.SecurityID
GROUP BY atblExcelIntraday.SecurityID;
The Price data from table always imports correctly, but StDevPct is either 0 or 1 30% of the time.
I can easily refresh it by doing manual "Data Refresh", but neither .Calculate, nor .QueryTables("qryPriceDB_1").Refresh work.
I could split it it two separate queries, but aside from programming it will take more time on the Excel end, and I'm trying to keep my main procedure to under 1,500 msec.
Please advise.
Business Case:
I have a list of key IDs in an excel spreadsheet. I want to use Power Query to join these IDs with a details table in a SQL Server database.
Problem
Currently using Power Query I only know how to import the entire table, which is greater than 1 million records, then do a left join on it against an existing query that targets a local table of IDs.
What I want to do is send that set of IDs in the original query so I'm not pulling back the entire table and then filtering it.
Question
Is there an example of placing an IN clause targeting a local table similar to what is shown below?
= Sql.Database("SQLServer001", "SQLDatabase001",
[Query="SELECT * FROM DTree WHERE ParentID
IN(Excel.CurrentWorkbook(){[Name="tbl_IDs"]}[Content])"])
I would first build a "Connection only" Query on the excel spreadsheet key IDs.
Then I would start a new Query by connecting to the SQL table. In that query I would add a Merge step to apply the key IDs query as an Inner Join (filter).
This will download the 1m rows to apply the filter, but it is surprisingly quick as this is mostly done in memory. It will only write the filtered result to an Excel table.
To improve performance, filter the rows and columns as much as you can before the Merge step.
I've got 2 connections in a worksheet, querying an.accdb. How can I get an inner join on them within Excel?
Restrictions:
One of the existing connections returns a crosstab query. Crosstab queries cannot be used as subqueries in Access directly within an SQL string (the TRANSFORM keyword cannot be used after the FROM keyword). The crosstab must be saved as a new querydef first, and then the querydef is used as an alias
Both connection strings have WHERE criteria that change dynamically and are generated on-the-fly, hence it's not practical to open up the .accdb and change a crosstab querydef constantly especially in a multiuser environment
Hence I stress I want the join to be done within Excel. Since all the required data gets pulled in and stored locally in the spreadsheet after a Refresh All, surely we have all we need to perform a join in Excel...
I would like to allow two threads to write in a table at the same time (I know the problem of updating the same row, but this would be a story apart). I need that in behalf of speed up the operations in my aplication (one thread could write in row X while another could do the same in row X+n instead of waiting the first to finalize).
So, can I block rows instead of tables with Linq to SQL?
Thanks.
LINQ to SQL does not block any tables. It simply sends SQL queries to your SQL Server database and it is up to SQL Server to deside how to block. However, when your database has the proper design (right indexes etc) SQL server will not lock your whole table when you've queried a single row and update that row. Therefore, you can expect that scenario to just work.