Stored Procedure in azure sql database query tables in another azure sql database - azure

I have two azure databases say DB1 and DB2.
DB1 has got few tables and I want to create stored procedure in DB2 to cross query with joins in the tables in DB1. I have seen examples with Cross Database query but mostly showing only single table. My Stored procedure is like:
select
u.UserID as [UserID],
u.Username as [UserName],
u.LastLoginDate,
ISNULL(au.IsDeleted, 1) as [IsDeleted]
from DB1.[sec].[User] u
join DB1.[sec].[AppUser] au on (u.UserID = au.UserID)
join DB1.[sec].[Application] a on (au.ApplicationID = a.ApplicationID)
where (a.Name = 'name')

Just follow the method of this blog post and set up two tables instead of one. It's currently the only way to do a cross database query in Azure SQL Database.

Related

How to structure schemas better in Synapse SQL dedicated pool

Using Azure Synapse , Dedicated SQL pool.
How can I structure my tables underneath a button that represents the schema?
This is a small issue that really makes a big impact when many tables and schemas will be used in the database, and users will need to navigate to the correct schema quickly.
I tried dragging the schema under over the tables section, but nothing worked.
to structure the table first we need to create schema in sql dedicated pool using below command
CREATE SCHEMA <schemaName>
we need to create table using above created schema with required columns with suitable data types to that column.
Table creation:
CREATE TABLE <tableName>(col1 dataType,col2 dataType)
I created external table in dedicated sql pool in synapse following below steps:
Schema creation:
Created external data source:
CREATE EXTERNAL DATA SOURCE [DATASOURCE] WITH
(
LOCATION = '<location>',
)
Image for reference:
created external file format:
CREATE EXTERNAL FILE FORMAT [FileFormat1] WITH
(
FORMAT_TYPE = DELIMITEDTEXT
)
Image for reference:
I created external table using above data source and file format using below code:
CREATE EXTERNAL TABLE [wwi].[information2]
(
[Id] INT
)
WITH
(
LOCATION = '<folder/file>',
DATA_SOURCE = [DATASOURCE1],
FILE_FORMAT = [FileFormat1]
)
Image for reference:
In this we can structure the table in synapse dedicated pool.

Find All the tables related to a stored procedure in Azure Synapse Datawarehouse

Is there a simple way to find out all the tables referenced in a stored procedure in azure analytics data warehouse other than parsing the stored procedure code? I tried few commands like sp_tables, sp_depends but none seems to be working in azure data warehouse.
sys.sql_expression_dependencies is supported in Azure Synapse Analytics, dedicated SQL pools, but only supports tables, views and functions at this time. A simple example:
SELECT * FROM sys.sql_expression_dependencies;
So you are left either parsing sys.sql_modules. Something like this is imperfect (ie doesn't deal with schema name, square brackets, partial matches etc) but could server as a starting point:
SELECT
sm.[definition],
OBJECT_SCHEMA_NAME(t.object_id) schemaName,
OBJECT_NAME(t.object_id) tableName
FROM sys.sql_modules sm
CROSS JOIN sys.tables t
WHERE sm.object_id = OBJECT_ID('dbo.usp_test')
AND sm.[definition] Like '%' + t.name + '%';
I actually use SQL Server Data Tools (SSDT) with dedicated SQL pools so your dependencies can't get out of step and are trackable via the project.

Handling partitioned data in Azure?

I have some containers in ADLS (gen2) and have multiple folders within that container. I would like to have a mechanism to scan those folders to infer their schema and detect partitions and update them in the data catalog. How do I achieve this functionality in Azure?
Sample:
- container1
---table1-folder
-----10-12-1970
-------files1.parquet
-------files2.parquet
-------files3.parquet
-----10-13-1970
-------files1.parquet
-------files2.parquet
-------files3.parquet
-----10-14-1970
-------files1.parquet
-------files2.parquet
----table2-folder
-----zipcode1
-------files1.parquet
-------files2.parquet
-------files3.parquet
-----zipcode2
-------files1.parquet
-------files2.parquet
...
So, what I expect is that in the catalog, it will create two tables (table1 & table2) where table1 will have date-based partitions (3 dates for this case) and have underline data within that table. Same for table2 which will have two partitions and their underline data.
In the AWS world, I can run a Glue crawler that can crawl these files, infers schemas and partitions, and populate Glue data catalogs, later I can query them through Athena. What's the Azure equivalent approach to achieve something similar?
I would recommend looking at Azure Synapse Analytics Serverless SQL. You can create a view which consumes the folders and does partition elimination if you follow this approach:
-- If you do not have a Master Key on your DW you will need to create one
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '<password>' ;
GO
CREATE DATABASE SCOPED CREDENTIAL msi_cred
WITH IDENTITY = 'Managed Service Identity' ;
GO
CREATE EXTERNAL DATA SOURCE ds_container1
WITH
( TYPE = HADOOP ,
LOCATION = 'abfss://container1#mystorageaccount.dfs.core.windows.net' ,
CREDENTIAL = msi_cred
) ;
GO
CREATE VIEW Table2
AS SELECT *, f.filepath(1) AS [zipcode]
FROM
OPENROWSET(
BULK 'table2-folder/*/*.parquet',
DATA_SOURCE = 'ds_container1',
FORMAT='PARQUET'
) AS f
Then setup Azure Purview as your data catalog and have it index your Synapse Serverless SQL pool.

ADF Copy Data FIRE_TRIGGERS

i read that the ADF Copy Data uses bulk insert but its not calling SQL Triggers.
In a sql query bulk statement i can activate 'FIRE_TRIGGERS' to solve this problem, is there a possibility to use the ADF Copy Data with SQL Triggers?
You can use the bulk insert with SQL triggers using FIRE_TRIGGERS.
First, make sure you have right permissions to use BULK commands. Grant bulk operations permissions to user accessing from ADF in SQL database.
GRANT ADMINISTER DATABASE BULK OPERATIONS TO [user];
ADF pipeline:
In the copy data activity, connect the source to source DB and select the ‘Query’ option under the Use query property.
In Query, write the bulk insert script with FIRE_TRIGGERS.
In Sink, connect the sink database to copy data from the source.
Source:
Query-
BULK INSERT Sales
FROM 'Sales.csv'
WITH (
DATA_SOURCE = 'MyAzureBlobStorage',
FIRSTROW = 2,
FIELDTERMINATOR = ',',
ROWTERMINATOR='\n',
FIRE_TRIGGERS);
select * from SalesLog
Sink:

ADF copy data activity - check for duplicate records before inserting into SQL db

I have a very simple ADF pipeline to copy data from local mongoDB (self-hosted integration environment) to Azure SQL database.
My pipleline is able to copy the data from mongoDB and insert into SQL db.
Currently if I run the pipeline it inserts duplicate data if run multiple times.
I have made _id column as unique in SQL database and now running pipeline throws and error because of SQL constraint wont letting it insert the record.
How do I check for duplicate _id before inserting into SQL db?
should I use Pre-copy script / stored procedure?
Some guidance / directions would be helpful on where to add extra steps. Thanks
Azure Data Factory Data Flow can help you achieve that:
You can follow these steps:
Add two sources: Cosmos db table(source1) and SQL database table(source2).
Using Join active to get all the data from two tables(left join/full join/right join) on Cosmos table.id= SQL table.id.
AlterRow expression to filter the duplicate _id, it not duplicate then insert it.
Then mapping the no-duplicate column to the Sink SQL database table.
Hope this helps.
You Should implement your SQL Logic to eliminate duplicate at the Pre-Copy Script
Currently I got the solution using a Stored Procedure which look like a lot less work as far this requirement is concerned.
I have followed this article:
https://www.cathrinewilhelmsen.net/2019/12/16/copy-sql-server-data-azure-data-factory/
I created table type and used in stored procedure to check for duplicate.
my sproc is very simple as shown below:
SET QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE [dbo].[spInsertIntoDb]
(#sresults dbo.targetSensingResults READONLY)
AS
BEGIN
MERGE dbo.sensingresults AS target
USING #sresults AS source
ON (target._id = source._id)
WHEN NOT MATCHED THEN
INSERT (_id, sensorNumber, applicationType, place, spaceType, floorCode, zoneCountNumber, presenceStatus, sensingTime, createdAt, updatedAt, _v)
VALUES (source._id, source.sensorNumber, source.applicationType, source.place, source.spaceType, source.floorCode,
source.zoneCountNumber, source.presenceStatus, source.sensingTime, source.createdAt, source.updatedAt, source.updatedAt);
END
I think using stored proc should do for and also will help in future if I need to do more transformation.
Please let me know if using sproc in this case has potential risk in future ?
To remove the duplicates you can use the pre-copy script. OR what you can do is you can store the incremental or new data into a temp table using copy activity and use a store procedure to delete only those Ids from the main table which are in temp table after deletion insert the temp table data into the main table. and then drop the temp table.

Resources