I'm doing a data load on azure sql server using azure data factory v2. I started the data load & the DB was set to Standard Pricing Tier with 800 DTUs. It was slow, so I increased the DTUs to 1600. (My pipeline is still running since 7 hrs).
I decided to change the pricing tier. I changed the pricing tier to Premium, DTUs set to 1000. (I didnt make any additional changes).
The pipeline failed as it lost connection. I rerun the pipeline.
Now, when I monitor the pipeline, it is working fine. When I monitor the database. The DTU usage on average is not going above 56%.
I am dealing with tremendous data. How can I speed up the process?
I expect the DTUs must max out. But the average utilization is around 56%.
Please follow this document Copy activity performance and scalability guide.
This tutorial gives us the Performance tuning steps.
One of ways is increase the Azure SQL Database tier with more DTUs. You have increased the Azure SQL Database tier with more 1000 DTUs, but the average utilization is around 56%. I think You don't need so higher price tier.
You need to think about other ways to improve the performance. Such as set more Data Integration Units(DIU).
A Data Integration Unit is a measure that represents the power (a combination of CPU, memory, and network resource allocation) of a single unit in Azure Data Factory. Data Integration Unit only applies to Azure integration runtime, but not self-hosted integration runtime.
Hope this helps.
The standard answer from Microsoft seems to be that you need to tune the target database or scale up to a higher tier. This suggests that Azure Data Factory is not a limiting factor in the copy performance.
However we've done some testing on a single table, single copy activity, ~15 GB of data. The table did not contain varchar(max), high precision, just simple and plain data.
Conclusion: it does barely matter what kind of tier you choose (not too low ofcourse), roughly above S7 / 800 DTU, 8 vcores, the performance of the copy activity is ~10 MB/s and does not go up. The load on the target database is 50%-75%.
Our assumption is that since we could keep throwing higher database tiers against this problem, but did not see any improvement in the copy activity performance, this is Azure Data Factory related.
Our solution is, since we are loading a lot of separate tables, to scale out instead of scale up via a for each loop and a batch count set to at least 4.
The approach to increase the DIU is only applicable in some cases:
https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-performance#data-integration-units
Setting of DIUs larger than four currently applies only when you copy
multiple files from Azure Storage, Azure Data Lake Storage, Amazon S3,
Google Cloud Storage, cloud FTP, or cloud SFTP to any other cloud data
stores.
In our case we are copying data from relational databases.
Related
I need to calculate the cost of the Azure Synapse Analytics. I have used the Azure Pricing Calculator but I could not figure it out. It shows close to USD 2,100.
I have the following components as a part of the Azure Synapse Analytics
Synapse workspace
Self Hosted agent - Standard_B2s
Synapse SQL pool
How do I calculate the cost of Azure Synapse Analytics?
This is a very difficult question to answer, because most of the costs are consumption/runtime oriented.
The pricing calculator defaults are not great, so you'll really want to fine tune it. For instance, you cannot remove Dedicated Pools, but you can set the Hours to 0. It also includes Data Explorer, which cannot be removed. To not include these prices in the calculator, deselect the "Auto select engine instances", and under both Engine V-Cores and Data Management V-Cores, set the hours to 0.
The calculator will NOT include any time for Spark pools (Notebooks) or Data Flows. These are both heavily consumption oriented which will vary greatly based on your runtime choices like pool size. Their costs are based on minutes of consumption, so good luck predicting that.
Here is a sample pricing calculator filled out to describe your situation. The assumptions are below.
you are using a Dedicated SQL pool not a Serverless SQL pool
you have scaled the dedicated SQL pool to DWU100c and left it running 24 hours a day (if you programmatically pause it then that would reduce the cost)
you do not want to commit to running it 24 hours a day for 1 or 3 years and get reserved pricing discounts
in the dedicated SQL pool you have under 1TB of data (compressed) and you have geo-redundant backups enabled
you are running under 1,000 pipeline activities per month on the self-hosted integration runtime, copy activities run less than an hour per month, and other activity hours are less than 7 hours per month.
you are not using other parts of Synapse like Spark pools, data flows, Data Explorer pools, Synapse Serverless SQL, etc.
you are in the East US Azure region
you have a B2s virtual machine with a 128GB premium SSD OS disk and no other attached disks where the self-hosted IR is installed. It is running 24 hours a day. (The VM cost but not storage cost could be lowered if you pause and resume it programmatically)
on the B2s virtual machine you do not want to commit to running it 24 hours a day for 1 or 3 years to get a reserved pricing discount and you are renting the Windows license with the VM rather than bringing your license with Azure Hybrid Benefit
this is retail pricing
Recently we have upgraded our SSAS resources. Currently our SSAS is on Azure VM costing us based on this VM type 'Standard E32-8s_v3'.
I am looking for a way to save more cost by selecting a better option.
What can be a good option to save cost and at the same time have better efficiency.
what factors/ differences can be considered if we go to Azure analysis services instead of SSAS on Azure VM.
Our SQL server is also on Azure VM.
We have our reports on Power BI report server and SSRS.
Data is coming from different resources like SAP, external parties etc using SSIS.
Can you please Advice/ Suggest a better options for our data architecture.
Thank you.
Your VM is 8 cores and 256GB RAM.
One factor in pricing you haven’t mentioned is SQL licensing. You didn’t specify whether you are renting the SQL license with the VM or bringing your own license and what that costs. I’m going to assume you are renting it. With Azure Analysis Services the license is included in the price.
In Azure Analysis Services 100QPU is roughly equivalent to 5 cores. So 200QPU (an S2) would be an equivalent amount of CPU and a similar price but only has 50GB RAM.
To get the equivalent amount of RAM the S8 would get you close (200GB RAM) but for substantially more cost.
If you have one large model which (at least during peak usage or processing) uses most of the 256GB RAM then it may be tough to move to Azure Analysis Services for a similar price. If you have several models on that one server then you could split them across several smaller Azure Analysis Services servers and it may be a reasonable price for you. Or you could scale up for processing when RAM is needed most and scale down for the rest of the day to save cost.
I have an sql database on azure, My service tier is standard, I already scaled data max size to 500GB (standard tier is 200GB by default). Now I want to scale it to 750GB. I read documentaion but I'm not sure. How long it will take and if any data loss is possible ? Also will I have to change my configuration or all connection string etc stays the same ?
Data loss is not a byproduct of changing performance tiers or storage size, no. In most of the standard tiers, you are running on remote storage (similar to how SQL Server in a VM would run over blob storage backed disks). So, if all you are doing is increasing the max size on remote storage, that's morally equivalent to the same operation on-premises and it is immediate. If you are crossing tiers (say, standard to premium) or within premium tiers where there is no space available on the local nodes to satisfy your requests, the operation can take time since new space needs to be provisioned and your database needs to be seeded (copied) into the new space. This is done in the background and is related to the size of your database, the performance tier (as IOPS are based on that), and the current transaction load (as this also has to be replicated to N new nodes). When things are replicated and up-to-date, your current connections are closed from the server (which means any active transactions there are aborted) and you can reconnect and retry those on the newly seeded database replicas. This usually takes minutes to tens of minutes but for very large databases it can take an hour or longer.
At present we have 3 (Dev, QA & Prod) stages in our azure resources. All the three are using SQL Database 'Standard S6: 400 DTUs'. Because of Dev and QA SQL Database our monthly cost is going more than 700 euro's. I am planning to move from DTU to vCore serverless. Below are my queries,
Just going into portal -> Compute and storage -> and changing from DTU to vCore Serverless is the right process?
Do i need to take any other things before doing this operation?
Does my existing Azure SQL DB is going to get affected by this operation?
If things are not fine as per my requirement same way can i come back to DTU mode.
Thanks in advance.
You can have a look at this MS doc for details: Migrate Azure SQL Database from the DTU-based model to the vCore-based model
Just going into portal -> Compute and storage -> and changing from
DTU to vCore Serverless is the right process?
Yes! just change to required option from dropdown and click on Apply.
Migrating a database from the DTU-based purchasing model to the
vCore-based purchasing model is similar to scaling between service
objectives in the Basic, Standard, and Premium service tiers, with
similar duration and a minimal downtime at the end of the migration
process.
Do i need to take any other things before doing this operation?
Some hardware generations may not be available in every region. Check availability under Hardware generations for SQL
Database.
In the vCore model, the supported maximum database size may differ depending on hardware generation. For large databases, check supported
maximum sizes in the vCore model for single
databases
and elastic
pools.
If you have geo-replicated databases, during migration, you don't have
to stop geo-replication, but you must upgrade the secondary database
first, and then upgrade the primary. When downgrading, reverse the
order Also go through the doc once.
Does my existing Azure SQL DB is going to get affected by this
operation?
You can copy any database with a DTU-based compute size to a database
with a vCore-based compute size without restrictions or special
sequencing as long as the target compute size supports the maximum
database size of the source database. Database copy creates a
transactionally consistent snapshot of the data as of a point in time
after the copy operation starts. It doesn't synchronize data between
the source and the target after that point in time.
If things are not fine as per my requirement same way can i come
back to DTU mode.
A database migrated to the vCore-based purchasing model can be
migrated back to the DTU-based purchasing model at any time in the
same fashion, with the exception of databases migrated to the
Hyperscale service tier.
I'm looking at high scale Azure table operations, and I am looking for documentation that describes the maxIOPS to expect from Azure instance sizes for Azure Web App, Function, etc.
The Web Roles and corresponding limitation is well documented. For example see this comment in the linked question
so, we ran our tests on different instance sizes and yes that makes a huge difference. at medium we get around 1200 writes per second, on extra large we get around 7200. We are looking at building a distributed read/write controller possibly using the dcache as the middle man. – JTtheGeek Aug 9 '13 at 22:39
Question
What is the corresponding limitation for the Web Apps (logic, mobile, etc) and Azure Table IOPS
According to the official document that total Request Rate (assuming 1 KB object size) per storage account Up to 20,000 IOPS, entities per second, or messages per second. We also can get the VM Max IOPS limitations from the Azure VM size document. Web Apps are based on service plan, in the service plan we could choose different price tiers that have different VM sizes. It maybe could use for reference. More Azure limitation please refer to Azure subscription and service limits, quotas, and constraints.