ADF Stored Procedure call takes too long - azure

I have an Azure data factory loop activity that executes stored procedure on every iteration, passing 2 JSON objects as arguments.
Stored procedure reads json objects with openJson function using cross apply few times .. and then constructs #temp table and after some sql massaging inserts data from #temp table into a static sql table.
If I imitate this tasks via sqlserver management studio it takes no time at all, so I am able to hit execute button 20=50 times and it would show total time always as 0.
However, in a data factory every execution of an activity calling stored procedure takes a long time .. can take 5-6 seconds, that in a loop amounts to a very long time.
Is there any ways to tune these in ? So that every execution takes less time ?

Is there any ways to tune these in ? So that every execution takes less time?
I also tried with sample stored procedure which is executing in 0 seconds in SSMS and taking 4-6 seconds in ADF. If your stored procedure is taking 5-6 seconds, then it's a normal time of execution in Azure data factory stored procedure activity.
You can break up the method into a series of sub-procedures and call the master procedure and call that master procedure from data factory as a workaround to reduce the time it takes.
The above approach will help you to reduce the take taken by procedure in one go by doing the task in sub procedures. If it's still taking more than expected time. Please check link which explains how to raise support ticket.

Related

How to sequence stored procedure process on Azure Data Factory?

I have a stored procedure that has a DateTime parameter, and I want to execute the pipeline sequence, so it will start first on 'JAN-FEB' and then 'MAR-APR' then 'MEI-JUN'
How can I do that ? without using hard pipeline from the stored procedure?
So for the example like this:
I have 3 stored procedures with different DateTime. And I don't want it run like this.
What can I do to solve my problem ? What function from Azure Data Factory can I use for this case?
Note:
Why I want to sequence this stored procedure is to prevent crash from Azure Synapse. Because in this script will process around 1 billion rows from the source table, and I need to batch this process to prevent from any error.
First you can define an array type variable in ADF. Eg. ['JAN-FEB','MAR-APR','MEI-JUN']
Traverse this array via Foreach activity. Select Sequential, this will sequentially cycle the internal activities. Add dynamic content, select your declared variable name.
Inside Foreach activity, we can use a stored procedure, click Import will import params in your stored procedure. Then add dynamic content #item().
ADF will execute the stored procedure sequentially.

Aggregate function taking too long to execute in postgreSQL

I have recently deployed PostgreSQL database on Linux server.
One of the stored procedure is taking around 20 to 24second.I have executed same stored procedure in blank database as well(no any row return) and it is taking same time. I found that slowness occurs because of aggregate function.
Here if i removed function ARRAY_TO_JSON(ARRAY_AGG( then result will be fetch within second.
Below is my code snippet:
SELECT ARRAY_TO_JSON(ARRAY_AGG(ROW_TO_JSON(A))) FROM (
select billservice.billheaderid,billservice.billserviceid AS billserviceid,.....(around 120 columns in select ).....
)A;
Explain Execution Plan:
Previously i was deployed PostgreSQL database to windows server and the same stored procedure is taking around only 1 to 1.5 second.
In both cases i have tested with same database with same amount of data. and also both server have same configuration like RAM, Processor. And also have same PostgreSQL configuration.
While executing my stored procedure in Linux server CPU usages goes to 100%.
Let me know if you have any solution for the same.

Azure Data factory and Data flow taking too much time to process data from staging to Database

So I have one data factory which runs every day, and it selects data from oracle on-premise database around 80M records and moves it to parquet file, which is taking around 2 hours I want to speed up this process... also the data flow process which insert and update data in db
parquet file setting
Next step is from parquet file it call the data flow which move data as upsert to database but this also taking too much time
data flow Setting
Let me know which compute type for data flow
Memory Optimized
Computed Optimized
General Purpose
After Round Robin Update
Sink Time
Can you open the monitoring detailed execution plan for the data flow? Click on each stage in your data flow and look to see where the bulk of the time is being spent. You should see on the top of the view how much time was spent setting-up the compute environment, how much time was taken to read your source, and also check the total write time on your sinks.
I have some examples of how to view and optimize this here.
Well, I would surmise that 45 min to stuff 85M files into a SQL DB is not horrible. You can break the task down into chunks and see what's taking the longest time to complete. Do you have access to Databricks? I do a lot of pre-processing with Databricks, and I have found Spark to be super-super-fast!! If you can pre-process in Databricks and push everything into your SQL world, you may have an optimal solution there.
As per the documentation - https://learn.microsoft.com/en-us/azure/data-factory/concepts-data-flow-performance#partitioning-on-sink can you try modifying your partition settings under Optimize tab of your Sink ?
I faced similar issue with the default partitioning setting, where the data load was taking close to 30+ mins for 1M records itself, after changing the partition strategy to round robin and provided number of partitions as 5 (for my case) load is happening in less than a min.
Try experimenting with both Source partition (https://learn.microsoft.com/en-us/azure/data-factory/concepts-data-flow-performance#partitioning-on-source) & Sink partition settings to come up with the optimum strategy. That should improve the data load time

Run VoltDB stored procedures at regular interval from VoltDB

Is there any way to execute VoltDB stored procedures at regular interval or schedule store procedure to run at a specific time?
I am exploring VotlDB to shift out product from RDBMS to VotlDB. Out produce written in java.
Most of the query can be migrated into the VoltDB stored procedures. But In our product, we have cron job in oracle which executes at regular interval. Now I do not find such features in VoltDB.
I know VoltDB stored procedures can be called from the application at regular interval but our product deploys in an Active-Active mode, in that case, all application will call store procedure at regular interval and that is not a good solution or otherwise, we have to develop some mechanism to run procedure from one instance only.
so It would be good if I get cron job feature from VoltDB.
I work at VoltDB. There isn't currently a feature like this in VoltDB, for example like DBMS_JOB in Oracle.
You could certainly use a cron job on one of the servers in your cluster, or on some other server within your network that could invoke sqlcmd to run a script or echo individual SQL statements or execute procedure commands through sqlcmd to the database. Making cron jobs highly available is a general problem. You might find these other discussions helpful:
How to convert Linux cron jobs to "the Amazon way"?
https://www.reddit.com/r/linuxadmin/comments/3j3bz4/run_cronjob_only_on_one_node_in_cluster/
You could also look into something like rcron.
One thing to be careful of when converting from an RDBMS to VoltDB is that VoltDB is optimized for processing many small transactions in parallel across many partitions. While the architecture of serialized execution per partition excels for many operational and streaming workloads, it is not designed to perform bulk operations on many rows at a time, especially transactions that need to perform writes on many rows that may be in different partitions within one transaction.
If you have a periodic job that does something like "process all the new rows that meet some criteria" you may find this transaction is slow and every time it runs it could delay other parts of the workload, especially if many rows have accumulated. It would be more the "VoltDB Way" to replace a simple INSERT statement that you may be using to ingest data (to be processed later by a scheduled job) with a procedure that inserts and immediately processes the row of data. You might even need a procedure that checks for other records and processes small sets of rows as a group, for example stitching together segments of data that go together but may have arrived out of order. By operating on fewer records at a time within one partition at a time, this type of procedure would be more scalable and would keep the data closer to your desired finished state in real time, rather than always having some data waiting to be processed.

Slow performance first queries on SQL Azure

I have a very small database (50MB) and I'm on a basic plan. There will be only a single user, but we need to create many databases (always one per user) since they will be used for training purposes. Each database is created by doing the following statement:
CREATE DATABASE Training1 AS COPY OF ModelDatabase1
We seem to be getting very very slow performance when we first query this database, afterwards it seems acceptable.
To give you an idea: we have a SP: StartupEvents that runs when the application is started. This query takes 25 seconds to run the first time. This seems incredible since the database is very small, and the tables the query calls don't contain many records. If we run this procedure afterwards it executes immediately...
How can we avoid this?

Resources