SQL Azure distributing heavy read queries for reporting - azure

We are using SQL Azure for our application and need some inputs on how to handle queries that scan a lot data for reporting. Our application is both read/write intensive and so we don't want the report queries to block the rest of the operations.
To avoid connection pooling issues caused by long running queries we put the code that queries the DB for reporting onto a worker role. This still does not avoid the database getting hit with a bunch of read only queries.
Is there something we are missing here - Could we setup a read only replica which all the reporting calls hit?
Any suggestions would be greatly appreciated.

Have a look at SQL Azure Data Sync. It will allow you to incrementally update your reporting database.
here are a couple of links to get you started
http://msdn.microsoft.com/en-us/library/hh667301.aspx
http://social.technet.microsoft.com/wiki/contents/articles/1821.sql-data-sync-overview.aspx
I think it is still in CTP though.

How about this:
Create a separate connection string for reporting, for example use a different Application Name
For your reporting queries use SET TRANSACTION ISOLATION LEVEL SNAPSHOT
This should prevent your long running queries blocking your operational queries. This will also allow your reports to get a consistent read.

Since you're talking about reporting I'm assuming you don't need real time data. In that case, you can consider creating a copy of your production database at a regular interval (every 12 hours for example).
In SQL Azure it's very easy to create a copy:
-- Execute on the master database.
-- Start copying.
CREATE DATABASE Database1B AS COPY OF Database1A;
Your reporting would happen on Database1B without impacting the actual production database (Database1A).

You are saying you have a lot of read-only queries...any possibility of caching them? (perfect since it is read-only)
What reporting tool are you using? You can output cache the queries as well if needed.

Related

Azure database performance degraded with 100% DTU usage and showing additional tables and stored procedures

We are having issues with one of Azure database where the DTU is hitting 100% almost at all times.
DTU Percentage is showing max
This is causing connection failure when some other applications are trying to connect with the database as database is taking to long to respond to create a connection and the connection is timing out.
We had a quick look at the schema and could see that some tables and stored procedures which are not created by us are somehow generated. On modifying these procedures, we find out that some of these are running and appearing in high execution queries list. I am pasting the screenshot of these tables and SPs here. Does anyone know how these are created/auto generated in the database and if these are auto generated, there is a way to find out which application has created and executing these?
Stored Procedures
Tables

Pacemaker/Corosync/PostgreSQL cluster failovers during heavy load

At our company we're running PostgreSQL on 4-node clusters using Pacemaker and Corosync.
During heavy batch loads we suffer from cluster failovers because the inbuilt resource monitoring gets timed out when trying to access the database because well, server overload...
On one end it's understandable cluster behaviour that a 'self induced denial of service' should trigger a master switchover, on the other hand we'd like to not see our batches and service (temporarily) aborted because of this. A standalone server would have just pulled through. Obviously we look into optimizing and spreading the batches, but that's like putting one fire out and another pops up elsewhere.
I looked into linux cgroups but this doesn't seem to be a viable solution as all it does is CPU/IO limit your postgresql resource, which is part of the problem :-)
Any ideas or suggestions very much appreciated!

Adding compute instance to Azure Synapse (dedicated pool)

I have a DWH running on Azure Synapse dedicated pool.
In addition to existing nightly/daily ETL processes, I need to add another in parallel that will kill performance of the current instance. That process is required to be run only 1 week per month during day time.
Similar to a Snowflake approach, is it possible to set up independent Azure Synapse compute to process the same data as the first instance? Not a copy of data, but the same data in the same files.
Or should I simply change instance size 2 times a day for 1 weak per month? (Requires to pause all activity)
Any advise will be appreciated!
Thanks!
I agree that scaling up or using a serverless SQL pool is a good option.
Before implementing I would also evaluate if the additional (and/or existing) process you are adding is properly optimized for MPP. Validate first that you are effectively co-locating data as much as possible via leveraging common HASH distributions. Often times ETL written first for SQL server (SMP) needs some amount of refactoring to truly leverage the power of MPP.
Look at query plans for long running jobs - is there excessive data broadcasting or shuffling? Fix via updating table distributions
Are statistics available and up to date?

Is using an Audit Table in Postgres to create triggers for NOTIFY/LISTEN a good idea?

So I have a postgres database that I have installed an audit table - source https://wiki.postgresql.org/wiki/Audit_trigger_91plus
Now my question is as follows:
I have been wanting to create a sort of stream that notifies me of any changes that have been made by any application that has access to my DB. Now, I know that I can create a trigger and a pub/sub via pg but that will take up performance time and that is something that can become significant as the DB scales.
So instead of slowing the actual DB I was wondering if I were to do the same NOTIFY/LISTEN functionality I would've on the main tables but instead install it on the audit tables.
Has anyone ever done this? If so what have you experienced, pros? cons?. Or if anyone knows why I should or should not do this can you please let me know.
Thanks
Via NOTIFY/LISTEN, the PRO-s:
Light communications with the server, no need to pull for the data changes.
Via NOTIFY/LISTEN, the CON-s:
The practice shows that it is not sufficient just to set it up and listen for the events, because every so often the channel goes down, due to various communication problems. For a serious system you would need to put in place an additional monitoring service that can verify that your listeners are still operating, and if not - destroy the existing ones and create new ones. This can be tricky, and you probably won't find a good example of doing it.
Via scheduled data pulls, PRO-s:
Simplicity - you just check for the data change according to the schedule;
Reliability - there is nothing to break there, once the pull implementation is working.
Via scheduled data pulls, CON-s:
Additional traffic for the server, depending on how quickly you need to see the data change, and how would that interfere (if at all) with other requests to the server.

SQL query takes time

HI I have my application running on my production server perfectly, I updated the application some 2 days ago and since then I experienced some performance related issue. The issue is when you click the button the query that runs against that button needs some 1 min or more to pull out the result and my application thus shows time out error but the same application runs fine here in my local.
I don't think its the issue related to query optimization since its simple select query having joins b/w 2 tables and its some 40-50 records pulled.
I am using SQL 2012 database, Is there any setting needs to be done on it?
Could be anything dude, without having all the information. E.g. recent degraded disk on DB server, fragmented indexes, inefficient joins, inefficient query...
Here are some general tips...
User Performance Monitor (PerfMon) to capture and review any long running queries
DBCC FREEPROCCACHE
http://msdn.microsoft.com/en-AU/library/ms174283.aspx
DBCC INDEXDEFRAG
http://msdn.microsoft.com/en-au/library/ms177571.aspx
or more intrusive, rebuild indexes DBCC DBREINDEX
http://msdn.microsoft.com/en-us/library/ms181671.aspx
Review the health of the database server, in particular harddrives where data and log files reside. Also, check server CPUs usages.

Resources