force replication of replicated tables - azure

Some of my tables are of type REPLICATE. I would these tables to be actually replicated (not pending) before I start querying my data. This will help me avoid data movement.
I have a script, which I found online, which runs in a loop and do a SELECT TOP 1 on all the tables which are set for replication, but sometimes the script runs for hours. It may seem as the server sometimes won't trigger replication even if you do a SELECT TOP 1 from foo.
How can you force SQL Datawarehouse to complete replication?
The script looks something like this:
begin
CREATE TABLE #tbl
WITH
( DISTRIBUTION = ROUND_ROBIN
)
AS
SELECT
ROW_NUMBER() OVER(
ORDER BY
(
SELECT
NULL
)) AS Sequence
, CONCAT('SELECT TOP(1) * FROM ', s.name, '.', t.[name]) AS sql_code
FROM sys.pdw_replicated_table_cache_state AS p
JOIN sys.tables AS t
ON t.object_id = p.object_id
JOIN sys.schemas AS s
ON t.schema_id = s.schema_id
WHERE p.[state] = 'NotReady';
DECLARE #nbr_statements INT=
(
SELECT
COUNT(*)
FROM #tbl
), #i INT= 1;
WHILE #i <= #nbr_statements
BEGIN
DECLARE #sql_code NVARCHAR(4000)= (SELECT
sql_code
FROM #tbl
WHERE Sequence = #i);
EXEC sp_executesql #sql_code;
SET #i+=1;
END;
DROP TABLE #tbl;
SET #i = 0;
WHILE
(
SELECT TOP (1)
p.[state]
FROM sys.pdw_replicated_table_cache_state AS p
JOIN sys.tables AS t
ON t.object_id = p.object_id
JOIN sys.schemas AS s
ON t.schema_id = s.schema_id
WHERE p.[state] = 'NotReady'
) = 'NotReady'
BEGIN
IF #i % 100 = 0
BEGIN
RAISERROR('Replication in progress' , 0, 0) WITH NOWAIT;
END;
SET #i = #i + 1;
END;
END

Henrik, if 'select top 1' doesn't trigger a replicated table build, then that would be a defect. Please file a support ticket.
Without looking at your system, it is impossible to know exactly what is going on. Here are a couple of things that could be in factoring into extended build time to look into:
The replicated tables are large (size, not necessarily rows) requiring long build times.
There are a lot of secondary indexes on the replicated table requiring long build times.
Replicated table builds require statirc20 (2 concurrency slots). If the concurrency slots are not available, the build will queue behind other running queries.
The replicated tables are constantly being modified with inserts, updates and deletes. Modifications require the table to be built again.

The best way is to run a command like this as part of the job which creates/updates the table:
select top 1 * from <table>
That will force its redistribution at the correct time, without the slow loop through the stored procedure.

Related

SQL Azure Schema issue error code 208 with temporary table

I have a couple of stored procedures that create different temporary tables.
At the end of the procedure i drop them (know that is not required, but it's good practice).
The stored procedures are executed as a part of a SSIS package. I got 4 different SQL jobs that execute the same SSIS package running in parallel.
When logging into the Azure portal and using the performance recommendation feature, I get a recommendation to fix the schema issues. It states an Sql error code 208. According to documentation that means "object not found".
Temporary tables are valid within the scope of the stored procedure and should get a unique name in the database, so I do not think where are any conflicts.
I have no idea what causes this, and the stored procedures seems to work alright. Anyone know what could be the cause here?
Simplified sample of one of the procedures:
SET NOCOUNT ON;
CREATE TABLE #tmpTransEan
(
Ean_Art_Str_id BIGINT ,
Artikler_id BIGINT
);
INSERT INTO #tmpTransEan
( Ean_Art_Str_id ,
Artikler_id
)
SELECT DISTINCT
eas.Ean_Art_Str_id ,
a.Artikler_id
FROM dbo.Artikkel_Priser ap
JOIN Ean_Art_Str eas ON eas.artikler_id = ap.Artikler_id
JOIN wsKasse_Til_Kasselogg ktk ON eas.Ean_Art_Str_id = ktk.ID_Primary
JOIN dbo.Artikler a ON a.Artikler_id = eas.artikler_id
JOIN dbo.Felles_Butikker b ON b.Butikker_id = ap.butikker_id
WHERE ktk.ID_Table = OBJECT_ID('Ean_Art_Str')
AND LEN(a.Artikkelnr) >= 8
AND ktk.Tidspunkt >= #tidspunkt
AND ( ( ap.butikker_id = #nButikker_id1
AND #Alle_artikler_til_kasse = 'N'
)
OR ( b.Databaser_id = #Databaser_id
AND #Alle_artikler_til_kasse = 'J'
)
)
AND b.Akt_kode = 'A'
AND a.Akt_kode = 'A'
AND a.Databaser_id IN ( -1, #Databaser_id )
SELECT DISTINCT
a.Artikkelnr ,
s.Storrelse ,
eas.* ,
EAN_12 = LEFT(eas.EAN_13, 12)
FROM dbo.Ean_Art_Str eas
JOIN #tmpTransEan t ON t.Artikler_id = eas.artikler_id
JOIN Artikler a ON a.Artikler_id = eas.artikler_id
JOIN dbo.Felles_Storrelser s ON s.Storrelser_id = eas.storrelser_id
DROP TABLE #tmpTransEan;
END;

Is there a multithreading in Oracle PL/SQL?

Now I run data transformations sequentially in my PL/SQL ETL scripts.
I've noticed that some transformations do not depend on each other.
Is there a way to open a thread in PL/SQL and offload some of the processing so job finishes faster?
I tried some performance options including setting parallelism degree but it's capped for a session.
I cannot use Java and I cannot design jobs outside Oracle - it has to be PL/SQL.
If you tried "parallelism degree" setting, I think it's possible to run some of your ETL operations in parallel, so you will have a "task" which will wait it's child threads, each of these threads will process only it's "chunk" (subset of all records). In Oracle since 11g Release 2 there is a package DBMS_PARALLEL_EXECUTE which manages it. I've got simple example.
Preparing test data:
create table test_parallel_processing(
col_id int
, col number
)
begin
for i in 1..1000 loop
execute immediate 'insert into test_parallel_processing
select :p1, 100 from dual' using i;
end loop;
commit;
end;
Process all 1000 records in 4 chunks with parallel degree of 2:
declare
v_sql varchar2(1000);
v_chunking_sql varchar2(1000);
begin
/*we can use our own logic of deviding all scope to chunks:*/
v_chunking_sql := ' select 1 as start_id, 250 as end_id from dual
union all
select 251 as start_id, 500 as end_id from dual
union all
select 501 as start_id, 750 as end_id from dual
union all
select 751 as start_id, 1000 as end_id from dual';
DBMS_PARALLEL_EXECUTE.CREATE_TASK ('mytask');
/*creating chunks with our sql*/
DBMS_PARALLEL_EXECUTE.CREATE_CHUNKS_BY_SQL(
task_name => 'mytask'
, sql_stmt => v_chunking_sql
, by_rowid => false
);
/*SQL which will be performed in each chunk*/
v_sql := 'update test_parallel_processing
set col = col*2
where col_id between :start_id and :end_id';
/*running task with any parallel degree, for example, 2*/
DBMS_PARALLEL_EXECUTE.RUN_TASK('mytask', v_sql, DBMS_SQL.NATIVE,
parallel_level => 2);
/*I comment this, cause dropping task will also delete information
about task and chunks from views: USER_PARALLEL_EXECUTE_TASKS
and USER_PARALLEL_EXECUTE_CHUNKS*/
--DBMS_PARALLEL_EXECUTE.drop_task ('mytask');
end;
You can control state and result of processing using views:
USER_PARALLEL_EXECUTE_TASKS
USER_PARALLEL_EXECUTE_CHUNKS

Azure SQL Data Warehouse hanging or not responding to simple query after large BCP operation

I have a preview version of Azure Sql Data Warehouse running which was working fine until I imported a large table (~80 GB) through BCP. Now all the tables including the small one do not respond even to a simple query
select * from <MyTable>
Queries to Sys tables are working still.
select * from sys.objects
The BCP process was left over the weekend, so any Statistics Update should have been done by now. Is there any way to figure out what is making this happen? Or at lease what is currently running to see if anything is blocking?
I'm using SQL Server Management Studio 2014 to connect to the Data Warehouse and executing queries.
#user5285420 - run the code below to get a good view of what's going on. You should be able to find the query easily by looking at the value in the "command" column. Can you confirm if the BCP command still shows as status="Running" when the query steps are all complete?
select top 50
(case when requests.status = 'Completed' then 100
when progress.total_steps = 0 then 0
else 100 * progress.completed_steps / progress.total_steps end) as progress_percent,
requests.status,
requests.request_id,
sessions.login_name,
requests.start_time,
requests.end_time,
requests.total_elapsed_time,
requests.command,
errors.details,
requests.session_id,
(case when requests.resource_class is NULL then 'N/A'
else requests.resource_class end) as resource_class,
(case when resource_waits.concurrency_slots_used is NULL then 'N/A'
else cast(resource_waits.concurrency_slots_used as varchar(10)) end) as concurrency_slots_used
from sys.dm_pdw_exec_requests AS requests
join sys.dm_pdw_exec_sessions AS sessions
on (requests.session_id = sessions.session_id)
left join sys.dm_pdw_errors AS errors
on (requests.error_id = errors.error_id)
left join sys.dm_pdw_resource_waits AS resource_waits
on (requests.resource_class = resource_waits.resource_class)
outer apply (
select count (steps.request_id) as total_steps,
sum (case when steps.status = 'Complete' then 1 else 0 end ) as completed_steps
from sys.dm_pdw_request_steps steps where steps.request_id = requests.request_id
) progress
where requests.start_time >= DATEADD(hour, -24, GETDATE())
ORDER BY requests.total_elapsed_time DESC, requests.start_time DESC
Checkout the resource utilization and possibly other issues from https://portal.azure.com/
You can also run sp_who2 from SSMS to get a snapshot of what's threads are active and whether there's some crazy blocking chain that's causing problems.

Effecientcy of using multiple subselects

I'm trying to gather multiple related pieces of data for a master account and create a view (e.g. overdue balance, account balance, debt recovery status, interest hold). Will this approach be effecient? Database platforms are Informix, Oracle and Sql Server. Doing some statistics on Informix I'm just getting 1 sequential scan of auubmast. I assume the sub-selects are quite effecient because they filter down to the account number immediately. I may need many sub-selects before I'm finished. On top of the question of efficiency are there any other 'tidy' approaches?
Thank you.
select
auubmast.acc_num,
auubmast.cls_cde,
auubmast.acc_typ,
(select
sum(auubtrnh.trn_bal)
from auubtrnh, aualtrcd
where aualtrcd.trn_cde = auubtrnh.trn_cde
and auubtrnh.acc_num = auubmast.acc_num
and (auubtrnh.due_dte < current or aualtrcd.trn_typ = 'I')
) as ovd_bal,
(select
sum(auubytdb.ytd_bal)
from auubytdb, auubsvgr
where auubytdb.acc_num = auubmast.acc_num
and auubsvgr.svc_grp = auubmast.svc_grp
and auubytdb.bil_yer = auubsvgr.bil_yer
) as acc_bal,
(select
max(cur_stu)
from audemast
where mdu_acc = auubmast.acc_num
and mdu_ref = 'UB'
) as drc_stu,
(select
hol_typ
from aualhold
where mdu_acc = auubmast.acc_num
and mdu_ref = 'UB'
and pro_num = 2601
and (hol_til is null or hol_til > current)
) as int_hld
from auubmast
In general, the answer to this is that correlated subqueries should be avoided whenever possible.
Using them will result in a full table scan for your view, which is bad. The only times you want to use subqueries like this is if you can limit the range of the main select to only a few rows, or if there really is no other choice.
When you're running into situations like this, you might want to consider adding columns and precalculating them on an update trigger, rather than using subqueries. This will save your database a thrashing.

How to optimize DELETE .. NOT IN .. SUBQUERY in Firebird

I've this kind of delete query:
DELETE
FROM SLAVE_TABLE
WHERE ITEM_ID NOT IN (SELECT ITEM_ID FROM MASTER_TABLE)
Are there any way to optimize this?
You can use EXECUTE BLOCK for sequential scanning of detail table and deleting records where no master record is matched.
EXECUTE BLOCK
AS
DECLARE VARIABLE C CURSOR FOR
(SELECT d.id
FROM detail d LEFT JOIN master m
ON d.master_id = m.id
WHERE m.id IS NULL);
DECLARE VARIABLE I INTEGER;
BEGIN
OPEN C;
WHILE (1 = 1) DO
BEGIN
FETCH C INTO :I;
IF(ROW_COUNT = 0)THEN
LEAVE;
DELETE FROM detail WHERE id = :I;
END
CLOSE C;
END
(NOT) IN can usually be optimized by using (NOT) EXISTS instead.
DELETE
FROM SLAVE_TABLE
WHERE NOT EXISTS (SELECT 1 FROM MASTER_TABLE M WHERE M.ITEM_ID = ITEM_ID)
I am not sure what you are trying to do here, but to me this query indicates that you should be using foreign keys to enforce these kind of constraints, not run queries to cleanup the mess afterwards.

Resources