Execute stored procedure within Azure Logic App fails with Gateway Timeout - azure

I've been trying to develop an Azure Logic App that imports files from an FTP server and with a stored procedure in an Azure SQL service parse the contents.
Currently I've been struggling with executing this stored procedure from the logic app; the stored procedure can take up to 10 minutes to execute.
I've been trying a few solution setting up the Execute Stored Procedure Action in the Azure Logic App:
- Add execute stored procedure as an action with an asynchronous timeout of (PT1H)
- Surround it with a do-until loop that checks the return code.
None of these solutions seem to be resolving the issue. Does anyone have anything else I can try when developing this Azure Logic App?

If you could reduce the time of SP by reducing the data payload in the tables under JOIN, then you could use pagination to achieve the successful execution via Logic App.
For example let's say you have a stored procedure like sp_UpdateAColumn which updates columns on tableA based on JOINs with tableB and tableC and tableD
Now this does run but takes more than 2 minutes to finish, because of the large number of rows in tableA.
You can reduce the time on this SP by say creating a new column isUpdated on tableA which is say boolean and by default has value =0
So then if you use
SELECT TOP 100 * FROM tableA WHERE isUpdated =0
instead of whole tableA in the JOIN then you should be able to update the 100 rows in under two minutes.
So if you change your definition of SP from sp_UpdateAColumn to
sp_UpdateAColumnSomeRows(pageSize int) then in this SP all you need to do is in the JOINs where you use TableA use
(SELECT TOP (SELECT pageSize ) * FROM tableA WHERE isUpdated =0) instead.
Now you need to ensure that this new SP is called enough times to process all records, for this you need to use a do-until loop in logic app ( for total rows in TableA/pazeSize times) and call your SP inside this loop.
Try tweaking with PageSize parameter to find optimal paging size.

Related

Suggested way for ADF to trigger pipeline by SQL table change

I have a tracking SQL table which has following schema:
CREATE TABLE [dbo].[TEST_TABLE](
[id] [int] IDENTITY(1,1) NOT NULL,
[value] [nvarchar](50) NULL,
[status] [nvarchar](50) NULL,
[source] [nvarchar](50) NULL,
[timestamp] [datetime] NULL
)
My application code will automatically maintain the table by inserting record and updating the field status.
My target is to trigger an ADF pipeline based on the result of following query:
SELECT COUNT(1) AS cnt FROM [dbo].[TEST_TABLE] WHERE [status] = 'active'
If the result is >0, then trigger an ADF pipeline.
Current status:
My current work:
set up an Stored procedure SP_TEST to return 1 if condition is filled; otherwise 0
setup an pipeline like below:
the result of SP is parsed and used for routing to trigger later stages (which will mark the SQL table status to 'inactive' to avoid duplicate processing)
3. associate the pipeline with a scheduling trigger every 5 minutes.
My current work is "working", in the sense that it can detect for whether there is DB change every 5 minutes and execute subsequent processing.
Problem:
However, the scheuling trigger may be too frequent and cost activity run unit on every execution, which could be costly. Is there any trigger like "SQL table change trigger"?
what I have tried:
A quick google points me to this link, but seems no answer yet.
I am also aware of storage event trigger and custom events trigger. Unfortunately, we are not permitted to create other Azure resource. Only the existing ADF and SQL server is provided to us.
Appreciate any insights/directions in advance.
Polling using ADF can be expensive, we want to avoid that. Instead have the polling take place within an Azure Logic App, it's much cheaper. Here are the steps to listen to a SQL Server DB (Azure included) then trigger an ADF pipeline if a table change is found.
Here is the pricing for Azure Logic App:
I believe this means that every trigger is using a standard connector, so it will be 12.5 cents (USD) per 1000 firings of the app, and 2.5 cents (USD) per 1000 actions triggered.
For ADF it is $1 (USD) per 1000 activities, so it's much more expensive for ADF
Please let me know if you have any issues at all!

How to use synchronous messages on rabbit queue?

I have a node.js function that needs to be executed for each order on my application. In this function my app gets an order number from a oracle database, process the order and then adds + 1 to that number on the database (needs to be the last thing on the function because order can fail and therefore the number will not be used).
If all recieved orders at time T are processed at the same time (asynchronously) then the same order number will be used for multiple orders and I don't want that.
So I used rabbit to try to remedy this situation since it was a queue. It seems that the processes finishes in the order they should, but a second process does NOT wait for the first one to finish (ack) to begin, so in the end I'm having the same problem of using the same order number multiple times.
Is there anyway I can configure my queue to process one message at a time? To only start process n+1 when process n has been acknowledged?
This would be a life saver to me!
If the problem is to avoid duplicate order numbers, then use an Oracle sequence, or use an identity column when you insert into a table to generate the order number:
CREATE TABLE mytab (
id NUMBER GENERATED BY DEFAULT ON NULL AS IDENTITY(START WITH 1),
data VARCHAR2(20));
INSERT INTO mytab (data) VALUES ('abc');
INSERT INTO mytab (data) VALUES ('def');
SELECT * FROM mytab;
This will give:
ID DATA
---------- --------------------
1 abc
2 def
If the problem is that you want orders to be processed sequentially, then don't pull an order from the queue until the previous one is finished. This will limit your throughput, so you need to understand your requirements and make some architectural decisions.
Overall, it sounds Oracle Advanced Queuing would be a good fit. See the node-oracledb documentation on AQ.

Running a repetitive task in Node.js for each row in a postgres table on a different interval for each row

What would be a good approach to running a repetitive task for each row in a large postgres db table on a different per row interval in Node.js.
To give you some more context, here's a quick description of the application:
It's a chat based customer support app.
It consists of teams, which can be either a client team or a support team. Teams have users, which can be either client users or support users.
Client users send messages to a support team and wait for one of that team's users to answer their question.
When there's an unanswered client message waiting for a response, every agent for the receiving support team will receive a notification every n seconds (n being set on a per-team basis by the team admin).
So this task needs to infinitely loop through the rows in the teams table and send notifications if:
The team has messages waiting to be answered.
N seconds have passed since the last notification was sent (N being the number of seconds set by the team admin).
There might be a better approach to this condition altogether.
So my questions are:
What is an efficient way to infinitely loop through a postgres table with no upper limit on the number rows?
Should I load 1 row at a time? Several at a time?
What would be a good way to do this in Node?
I'm using Knex. Does Knex provide a mechanism for lazy loading a table and iterating through the rows?
A) Running a repetitive task via node can be done via a the js built-in function 'setInterval'.
// run the intervalFnc() every 5 seconds
const timerId = setTimeout(intervalFnc, 5000);
function intervalFnc() { console.log("Hello"); }
// to quit running it:
clearTimeout(timerId);
Then your interval function can do the actual work. An alternative would be to use cron (linux), or some OS process scheduler to trigger the function. I would use this method if you want to do it every minute, and a cron job if you want to do it every hour (in between these times becomes more debatable).
B) An efficient way...
B-1) Retrieving a block of records from a DB will be more efficient than one at a time. Knex has .offset and .limit clauses to choose a group of records to retrieve. A sample from the knex doc:
knex.select('*').from('users').limit(10).offset(30)
B-2) Database indexed access is important for performance if your tables are very large. I would recommend including an status flag field in your table to note which records are 'in-process', and also include a "next-review-timestamp" field with both fields being both indexed. Retrieve the records that have status_flag='in-process' AND next_review_timestamp <= now(). Sample:
knex('users').where('status_flag', 'in-process').whereRaw('next_review_timestamp <= now()')
Hope this helps!

Inserting 1000 rows into Azure Database takes 13 seconds?

Can anyone please tell me why it might be taking 12+ seconds to insert 1000 rows into a SQL database hosted on Azure? I'm just getting started with Azure, and this is (obviously) absurd...
Create Table xyz (ID int primary key identity(1,1), FirstName varchar(20))
GO
create procedure InsertSomeRows as
set nocount on
Declare #StartTime datetime = getdate()
Declare #x int = 0;
While #X < 1000
Begin
insert into xyz (FirstName) select 'john'
Set #X = #X+1;
End
Select count(*) as Rows, DateDiff(SECOND, #StartTime, GetDate()) as SecondsPassed
from xyz
GO
Exec InsertSomeRows
Exec InsertSomeRows
Exec InsertSomeRows
GO
Drop Table xyz
Drop Procedure InsertSomeRows
Output:
Rows SecondsPassed
----------- -------------
1000 11
Rows SecondsPassed
----------- -------------
2000 13
Rows SecondsPassed
----------- -------------
3000 14
It's likely the performance tier you are on that is causing this. With a Standard S0 tier you only have 10 DTUs (Database throughput units). If you haven't already, read up on the SQL Database Service Tiers. If you aren't familiar with DTUs it is a bit of a shift from on-premises SQL Server. The amount of CPU, Memory, Log IO and Data IO are all wrapped up in which service tier you select. Just like on premises if you start to hit the upper bounds of what your machine can handle things slow down, start to queue up and eventually start timing out.
Run your test again just as you have been doing, but then use the Azure Portal to watch the DTU % used while the test is underway. If you see that the DTU% is getting maxed out then the issue is that you've chosen a service tier that doesn't have enough resources to handle you've applied without slowing down. If the speed isn't acceptable, then move up to the next service tier until the speed is acceptable. You pay more for more performance.
I'd recommend not paying too close attention to the service tier based on this test, but rather on the actual load you want to apply to the production system. This test will give you an idea and a better understanding of DTUs, but it may or may not represent the actual throughput you need for your production loads (which could be even heavier!).
Don't forget that in Azure SQL DB you can also scale your Database as needed so that you have the performance you need but can then back down during times you don't. The database will be accessible during most of the scaling operations (though note it can take a time to do the scaling operation and there may be a second or two of not being able to connect).
Two factors made the biggest difference. First, I wrapped all the inserts into a single transaction. That got me from 100 inserts per second to about 2500. Then I upgraded the server to a PREMIUM P4 tier and now I can insert 25,000 per second (inside a transaction.)
It's going to take some getting used to using an Azure server and what best practices give me the results I need.
My theory: Each insert is one log IO. Here, this would be 100 IOs/sec. That sounds like a reasonable limit on an S0. Can you try with a transaction wrapped around the inserts?
So wrapping the inserts in a single transaction did indeed speed this up. Inside the transaction it can insert about 2500 rows per second
So that explains it. Now the results are no longer catastrophic. I would now advise looking at metrics such as the Azure dashboard DTU utilization and wait stats. If you post them here I'll take a look.
one way to improve performance ,is to look at Wait Stats of the query
Looking at Wait stats,will give you exact bottle neck when a query is running..In your case ,it turned out to be LOGIO..Look here to know more about this approach :SQL Server Performance Tuning Using Wait Statistics
Also i recommend changing while loop to some thing set based,if this query is not a Psuedo query and you are running this very often
Set based solution:
create proc usp_test
(
#n int
)
Begin
begin try
begin tran
insert into yourtable
select n ,'John' from
numbers
where n<#n
commit
begin catch
--catch errors
end catch
end try
end
You will have to create numbers table for this to work
I had terrible performance problems with updates & deletes in Azure until I discovered a few techniques:
Copy data to a temporary table and make updates in the temp table, then copy back to a permanent table when done.
Create a clustered index on the table being updated (partitioning didn't work as well)
For inserts, I am using bulk inserts and getting acceptable performance.

C# 2 instances of same app reading from same SQL table, each row processed once

I'm writing a Windows Service in C# in .Net 4.0 to forfill the following functionality:
At a set time every night the app connects to SQL Server, opens a User table and for each record retrieves the user's IP address, does a WCF call to the user's PC to determine if it's available for transacting and inserts a record into a State table (with y/n and the error if there is one).
Once all users have been proccessed the app then reads each record in the State table where IsPcAvailable = true, retrieves a list of reports for that user from another table and for each report fetches the report from the Enterprise doc repository, calls the user's PC via WCF and pushes the report onto their harddrive, then updates the state table to its success.
The above senario is easy enough to code if single threaded running on 1 app server; but due to redundancy & performance there will be at least 2 app servers doing exactly the same thing at the same time.
So how do I make sure that each user is processed only once in firstly the User table then the State table (same problem) as fetching the reports and pushing them out to PCs all across the country is a lengthy process. And optimally the app should be multithreaded, so for example, having 10 threads running on 2 servers processing all the users.
I would prefer a C# solution as I'm not a DataBase guru :) The closest I've found to my problem is:
SQL Server Process Queue Race Condition - it uses SQL code
and multithreading problems with the entity framework, I'm probally going to have to go 1 layer down and use ADO.net?
I would recommend using the techniques at http://rusanu.com/2010/03/26/using-tables-as-queues/ That's an excellent read for you at this time.
Here is some sql for a fifo
create procedure usp_dequeueFifo
as
set nocount on;
with cte as (
select top(1) Payload
from FifoQueue with (rowlock, readpast)
order by Id)
delete from cte
output deleted.Payload;
go
And one for a heap (order does not matter)
create procedure usp_dequeueHeap
as
set nocount on;
delete top(1) from HeapQueue with (rowlock, readpast)
output deleted.payload;
go
This reads so beautifully its almost poetry
You could simply just have each application server polling a common table (work_queue). You can use a common table expression to read/update the row so you don't have them stepping on each other.
;WITH t AS
(
SELECT TOP 1 *
FROM work_queue WHERE NextRun <= GETDATE()
AND IsPcAvailable = 1
)
UPDATE t WITH (ROWLOCK, READPAST)
SET
IsProcessing = 1,
MachineProcessing = 'TheServer'
OUTPUT INSERTED.*
Now you could have a producer thread in your application checking for unprocessed records periodically. Once that thread finishes it's work, it pushes the item in to a ConcurrentQueue and consumer threads can process the work as it's available. You can set the number of consumer threads yourself to the optimum level. Once the consumer threads are done, it simply sets IsProcessing = 0 as to show that the PC was updated.

Resources