We currently have an Azure SQL database that supports a web application. It is read only from a user perspective. The problem is we have to drop the tables and reload them with new data each hour, this makes the application unavailable for 5 minutes each hour which is unacceptable.
Reading the information for Azure active geo-replication seems a little vague. I thought I might be able to use active geo-replication to fail over to a secondary database and take the primary offline to do the update and then when the update is complete switch back to the original primary to allow the secondary to auto-sync. However, it is not clear from what I have read about active geo-replication if that scenario is possible.
Could someone provide some help on this problem or direct me toward another possible solution.
Thanks
You can give it a try to Azure Data Factory since it allows you to append data to a destination table or invoke a stored procedure with custom logic during copy when SQL Server/SQL Azure is used as a "sink". You can learn more here.
Azure Data Factory allows you to incrementally load data (delta) after an initial full data load by using a watermark column that has the last updated time stamp or an incrementing key. The delta loading solution loads the changed data between an old watermark and a new watermark. You can learn more how to do that with Azure Data Factory on this article.
If the setup takes 5 mins, and the the data is read only, then I would make a new empty DB in every hour, using some C# code or PowerShell script, and run the data import on this new empty DB, and once it is done, I'd change the connection string in the running production application to point to the new updated version of the DB, and then drop the old DB. This way you won't have any downtime, because till the import is running, the application will connect to the old DB, and when the import is done, it will connect to the new DB.
Related
I am trying to design a timer-triggered processor (all in azure) which will process a set of records that are set out for it to be consumed. It will be grouping it based on a column, creating files out of it, and dumping in a blob container. The records that it will consume are supposed to be generated based on an event - when the event is raised, containing a key, which can be used to query the data for the record (the data/ record being generated is to be pulled from different services.)
This is what I am thinking currently
Event is raised to event-grid-topic
Azure Function(ConsumerApp) is event triggered, reads the key, calls a service API to get all the data, stores that record in storage
table, with flag ready to be consumed.
Azure Function(ProcessorApp) is timer triggered, will read from the storage table, group based on another column, create and dump them as
files. This can then mark the records as processed, if not updated
already by ConsumerApp.
Some of my questions on these, apart from any way we can do it in a different better way are -
The table storage is going to fill up quickly, which will again decrease the speed to read the 'ready cases' so is there any better approach to store this intermediate & temporary data? One thing which I thought was to regularly flush the table or delete the record from the consumer app instead of marking it as 'processed'
The service API is being called for each event, which might increase the strain on that service/its database. should I group the call for records as a single API call, since the processor will run only after a said interval, or is there a better approach here?
Any feedback on this approach or a new design will be appreciated.
If you don't have to process data on Step 2 individually, you can try saving it in a blob too and add a record the blob path in Azure Table Storage to keep minimal row count.
Azure Table Storage has partitions that you can use to partition your data and keep your read operations fast. Partition scan is faster compared to table scan. In addition, Azure Table Storage is cheap, but if you have pricing concern. Then you can write a clean up function to periodically clean the processed rows. Keeping the processed rows around for a reasonable time is usually a good idea. Because you may need those for debugging issues.
By batching multiple calls in a single call, you can decrease network I/O delay. But resource contention will remain at service level. You can try moving that API to a separate service if possible to scale it separately.
I have a DWH running on Azure Synapse dedicated pool.
In addition to existing nightly/daily ETL processes, I need to add another in parallel that will kill performance of the current instance. That process is required to be run only 1 week per month during day time.
Similar to a Snowflake approach, is it possible to set up independent Azure Synapse compute to process the same data as the first instance? Not a copy of data, but the same data in the same files.
Or should I simply change instance size 2 times a day for 1 weak per month? (Requires to pause all activity)
Any advise will be appreciated!
Thanks!
I agree that scaling up or using a serverless SQL pool is a good option.
Before implementing I would also evaluate if the additional (and/or existing) process you are adding is properly optimized for MPP. Validate first that you are effectively co-locating data as much as possible via leveraging common HASH distributions. Often times ETL written first for SQL server (SMP) needs some amount of refactoring to truly leverage the power of MPP.
Look at query plans for long running jobs - is there excessive data broadcasting or shuffling? Fix via updating table distributions
Are statistics available and up to date?
I have sql azure database. Currently I'm using the "Web" SQL database since my DB was small ie about 300mb and the maximum size is 5GB. I came to know that the Web service tiers will be retired in September 2015 i have restored my my Live DB as a "Standard" s0 which has a maximum size of 2 GB. But what i noticed is the performance with the new standard type database is poor when compared to the retired web edition. Say for instance it used to take like 40 seconds to delete 60 thousand records in the Web edition and it is now taking two minutes to 3 minutes with the new standard type. Have any one experienced this kind of thing or its just me ?
Please give me your suggestions
I had a similar issue; I migrated sql 2008 to Azure web; got a performance hit; then switched from web to S0; got another hit. I think im now at s1
I figured it was probably missing indexes; but with the ability to Trace + tune gone with azure, I had to do things a bit more manually.
First, look at this, http://msdn.microsoft.com/en-us/library/azure/ff394114.aspx you want to be able to get to the part where you can get the long running queries.
Then, with each long running query; you will want to execute the execution plan. To view a query’s execution plan, we need to explicitly include it before executing the query. Right-Click the query window and select Include Actual Execution Plan.
If this does not help you then you need to do more work; what you will want to do is export the database (it comes out as a bacpac file) to sql 2012 (Right click on the Connection > Databases node and select "Import Data-tier application...") on a local server somwhere (I used an Azure VM); then hookup an application/website to this, enable query analyzer., and tune it the old way., this will reveal all the non-clustered indexes that magically disappeared... once you add those to your sql azure db, you will get performance back.
Sure you could just increase your standard tier., but this can get expensive., its better to tune and find out where things went wrong...
What is the best way to do a one-time job on Azure?
Say we want to extend a table in the associated database with a double column. All the new entries will have this value computed by the worker(s) at insertion, but somebody has to take care of the entries that are already in the table. I thought of two alternatives:
a method called by the worker only if a database entry (say, "JobRun") is set to true, and the method would flip the entry to false.
a separate app that does the job, and which is downloaded and run manually using remote desktop (I cannot connect the local app to the Azure SQL server).
The first alternative is messy (how should I deal with the code at the next deployment? delete it? comment it? leave it there? also, what if I will have another job in the future? create a new database entry "Job2Run"?). The second one looks like a cheap hack. I am sure that there is a better way I could not think of.
If you want to run a job once you'll need to take into account the following:
Concurrency: While the job is running, make sure no other worker picks up the job and runs it at the same time (you can use leases for this. More info here).
Once the job is done, you'll need to keep track (in Table Storage, SQL Azure, ...) that the job completed successfully. The next time a worker tries to pick up the job, it will look in Table Storage / SQL Azure / ..., it will see that the job completed and skip the job.
Failure: Maybe your worker crashes during the job which should allow another worker to pick up the job without any issue.
In your specific use case I would also consider using a tool like dbup to manage updates to your schema and existing data with SQL Scripts. Tools like these keep track of which scripts have been executed by adding them in a table in the database.
We are using SQL Azure for our application and need some inputs on how to handle queries that scan a lot data for reporting. Our application is both read/write intensive and so we don't want the report queries to block the rest of the operations.
To avoid connection pooling issues caused by long running queries we put the code that queries the DB for reporting onto a worker role. This still does not avoid the database getting hit with a bunch of read only queries.
Is there something we are missing here - Could we setup a read only replica which all the reporting calls hit?
Any suggestions would be greatly appreciated.
Have a look at SQL Azure Data Sync. It will allow you to incrementally update your reporting database.
here are a couple of links to get you started
http://msdn.microsoft.com/en-us/library/hh667301.aspx
http://social.technet.microsoft.com/wiki/contents/articles/1821.sql-data-sync-overview.aspx
I think it is still in CTP though.
How about this:
Create a separate connection string for reporting, for example use a different Application Name
For your reporting queries use SET TRANSACTION ISOLATION LEVEL SNAPSHOT
This should prevent your long running queries blocking your operational queries. This will also allow your reports to get a consistent read.
Since you're talking about reporting I'm assuming you don't need real time data. In that case, you can consider creating a copy of your production database at a regular interval (every 12 hours for example).
In SQL Azure it's very easy to create a copy:
-- Execute on the master database.
-- Start copying.
CREATE DATABASE Database1B AS COPY OF Database1A;
Your reporting would happen on Database1B without impacting the actual production database (Database1A).
You are saying you have a lot of read-only queries...any possibility of caching them? (perfect since it is read-only)
What reporting tool are you using? You can output cache the queries as well if needed.