I have a very small database (50MB) and I'm on a basic plan. There will be only a single user, but we need to create many databases (always one per user) since they will be used for training purposes. Each database is created by doing the following statement:
CREATE DATABASE Training1 AS COPY OF ModelDatabase1
We seem to be getting very very slow performance when we first query this database, afterwards it seems acceptable.
To give you an idea: we have a SP: StartupEvents that runs when the application is started. This query takes 25 seconds to run the first time. This seems incredible since the database is very small, and the tables the query calls don't contain many records. If we run this procedure afterwards it executes immediately...
How can we avoid this?
Related
We are running an elastic pool in Azure running multiple databases, when running 1 of our larger imports this seems to take longer than we are used to. During these imports we ran at 6 cores as a test. All databases are allowed to use all cores.
On our local enviroment, it inserts about 100k records per second, however, the same dataset on Azure does about 1k per second (our vm) to 4k per second (dev laptop).
During this insert, the database only uses 14% log IO, 5% CPU and 0% DataIO.
When setting up a new database using DTU model in P2 we are noticing the same experience. So we are not even hitting the limits of the database
The table contains about 36 columns which are all required.
We have tried this using BulkInsert in the following way using different batchsizes
BulkConfig b = new BulkConfig();
b.BatchSize = 100000;
await dbcontext.BulkInsertAsync(entities, b);
As well as using standard EntityFramework addranges using smaller batches. We even went as far as using the manually written SqlBulkCopy methods, however all with no dice.
Now the question is mainly, is this a software issue? Are we running into issues in our AzureDB? Do we need to change the way we do Bulk imports?
Edit:
Attempted to run the import using the TempDB Setting in BulkInsert, however this also does not increase performance. LogIO is still at 14%.
Iterate through the dataset on the application layer, invoking a
stored procedure for each row that will perform an INSERT/UPDATE
action based on the existence of a record with a certain key. If the
number of records to upsert is limited, this strategy may work well;
otherwise, roundtrips and log writes will have a major influence on
speed.
To minimise roundtrips and log writes and increase throughput, use
bulk insert approaches like the SqlBulkCopy class in ADO.NET to
upload the full dataset to Azure SQL Database and then execute all
the INSERT/UPDATE (or MERGE) operations in a single batch. Overall
execution times may be reduced from hours to minutes/seconds using
this method.
Here, is a discussion related to same scenario: Optimize Azure SQL Database Bulk Upsert scenarios - link.
I'm using an Azure function like a scheduled job, using the cron timer. At a specific time each morning it calls a stored procedure.
The function is now taking 4 mins to run a stored procedure that takes a few seconds to run in SSMS. This time is increasing despite efforts to successfully improve the speed of the stored procedure.
The function is not doing anything intensive.
using (SqlConnection conn = new SqlConnection(str))
{
conn.Open();
using (var cmd = new SqlCommand("Stored Proc Here", conn) { CommandType = CommandType.StoredProcedure, CommandTimeout = 600})
{
cmd.Parameters.Add("#Param1", SqlDbType.DateTime2).Value = DateTime.Today.AddDays(-30);
cmd.Parameters.Add("#Param2", SqlDbType.DateTime2).Value = DateTime.Today;
var result = cmd.ExecuteNonQuery();
}
}
I've checked and the database is not under load with another process when the stored procedure is running.
Is there anything I can do to speed up the Azure function? Or any approaches to finding out why it's so slow?
UPDATE.
I don't believe Azure functions is at fault, the issue seems to be with SQL Server.
I eventually ran the production SP and had a look at the execution plan. I noticed that the statistic were way out, for example a join expected the number of returned rows to be 20, but actual figure was closer to 800k.
The solution for my issue was to update the statistic on a specific table each week.
Regarding why that stats were out so much, well the client does a batch update each night and inserts several hundred thousand rows. I can only assume this affected the stats and it's cumulative, so it seems to get worse with time.
Please be careful adding with recompile hints. Often compilation is far more expensive than execution for a given simple query, meaning that you may not get decent perf for all apps with this approach.
There are different possible reasons for your experience. One common reason for this kind of scenario is that you got different query plans in the app vs ssms paths. This can happen for various reasons (I will summarize below). You can determine if you are getting different plans by using the query store (which records summary data about queries, plans, and runtime stats). Please review a summary of it here:
https://learn.microsoft.com/en-us/sql/relational-databases/performance/monitoring-performance-by-using-the-query-store?view=sql-server-2017
You need a recent ssms to get the ui, though you can use direct queries from any tds client.
Now for a summary of some possible reasons:
One possible reason for plan differences is set options. These are different environment variables for a query such as enabling ansi nulls on or off. Each different setting could change the plan choice and thus perf. Unfortunately the defaults for different language drivers differ (historical artifacts from when each was built - hard to change now without breaking apps). You can review the query store to see if there are different “context settings” (each unique combination of set options is a unique context settings in query store). Each different set implies different possible plans and thus potential perf changes.
The second major reason for plan changes like you explain in your post is parameter sniffing. Depending on the scope of compilation (example: inside a sproc vs as hoc query text) sql will sometimes look at the current parameter value during compilation to infer the frequency of the common value in future executions. Instead of ignoring the value and just using a default frequency, using a specific value can generate a plan that is optimal for a single value (or set of values) but potentially slower for values outside that set. You can see this in the query plan choice in the query store as well btw.
There are other possible reasons for performance differences beyond what I mentioned. Sometimes there are perf differences when running in mars mode vs not in the client. There may be differences in how you call the client drivers that impact perf beyond this.
I hope this gives you a few tools to debug possible reasons for the difference. Good luck!
For a project I worked on we ran into the same thing. Its not a function issue but a sql server issue. For us we were updating sprocs during development and it turns out that per execution plan, sql server will cache certain routes/indexes (layman explanation) and that gets out of sync for the new sproc.
We resolved it by specifying WITH (RECOMPILE) at the end of the sproc and the API call and SSMS had the same timings.
Once the system is settled, that statement can and should be removed.
Search on slow sproc fast ssms etc to find others who have run into this situation.
I am new to the documentDb. I wrote a stored procedure that checks all records and update them under certain circumstances.
Current scenario:
It would run 100 records at a time, updates them and after running few times( taking 100 records at a time and updating) it is timing out.
Expectation
Run the script on all the records without timing out.
The document has close to a million records. So, running the same script multiple times manually is not a the way I am looking for.
Can anyone please advise me how I can achieve that?
tl;dr; Keep calling the sproc with the query continuation token being passed back and forth.
A few thoughts:
There is no capacity of RUs for collections that will allow you to do all million in one call to the sproc.
Sprocs run in isolation on a single replica. This means that they can be transactional but their use will have lower throughput than a regular query that can use all replicas to satisfy the request, so unless you need it to be in a sproc, I recommend using direct queries for reads that don't need to be transactional with writes. Even then, with a million documents, your queries will max out and you'll have to run the query again with a continuation token.
If you must use a sproc... As you are probably aware since you have done the 100 at a time thing, each query returns a continuation token. You can actually add that to the package that you send back from your sproc when it times out. Then you can pass that back into another call to the same sproc and write your sproc to pick up where you left off. The documentdb-utils library for node.js automatically re-calls the sproc until done as long as you follow this pattern for writing your sprocs. If you are using node.js, you could use that (but it has not yet been upgraded to support partitioned collections) or you could write the equivalent in whatever platform you are using.
We are using MVC4, ASP.NET 4.5, Entity Framework 6.
When we used Azure SQL Database v11, initial record inserts and deletes via EF, worked fine and quickly. However now, on v12, I notice that initial inserts and deletes can be very slow, especially if we choose a new value when inserting. If we insert a new record with the same value, the response is rapid. The delay I am talking about can be about 30 on S1, 15 secs on S2, 7 secs on S3.
As I say, we never encountered this on v11.
Any ideas gratefully received.
EDIT1
Just been doing some diagnostics and it seems that a view that I was using now runs very slowly first time:
db.ExecuteStoreCommand("DELETE FROM Vw_Widget where Id={0}", ID);
Do I need to rejig views in anyway for Azure SQL Database v12?
EDIT2
Looking at the Code a little more I see that I have added a delete trigger to the View, so basically I have set up a view so I can use this trigger code in certain situations. I am now trying to take out the trigger code and run it from the app, which does run alot quicker. Perhaps this code should be a stored procedure.
Definitely you need to do some diagnostics for your view to check the performance of your query and you may need to tune your query. The time measures you are saying is so high to perform any operation. Please make sure to do insert or deletes on your target tables and not views. The best practice is not to use views to insert or delete.
You can use views only in select statements.
I had a similar problem when make a migration of sql database v2 to v12. Actually i was working with business model and I tried to migrate to S0. The performance of the DB was not good. After sometime i discover that dtu model has particular views to monitor what type of provison model do you need. If is on the first time the problem, probably your application are making a lot of queries to load data in memory and these can be affecting the performance of your CRUD statement.
SELECT end_time
, (SELECT Max(v)
FROM (VALUES (avg_cpu_percent)
, (avg_data_io_percent)
, (avg_log_write_percent)
) AS value(v)) AS [avg_DTU_percent]
FROM sys.dm_db_resource_stats
ORDER BY end_time DESC;
more information about that, can be found on these page:
https://azure.microsoft.com/en-us/documentation/articles/sql-database-upgrade-server-portal/
I'm not a mongodb expert, so I'm a little unsure about server setup now.
I have a single instance running mongo3.0.2 with wiredtiger, accepting both read and write ops. It collects logs from client, so write load is decent. Once a day I want to process this logs and calculate some metrics using aggregation framework, data set to process is something like all logs from last month and all calculation takes about 5-6 hours.
I'm thinking about splitting write and read to avoid locks on my collections (server continues to write logs while i'm reading, newly written logs may match my queries, but i can skip them, because i don't need 100% accuracy).
In other words, i want to make a setup with a secondary for read, where replication is not performing continuously, but starts in a configured time or better is triggered before all read operations are started.
I'm making all my processing from node.js so one option i see here is to export data created in some period like [yesterday, today] and import it to read instance by myself and make calculations after import is done. I was looking on replica set and master/slave replication as possible setups but i didn't get how to config it to achieve the described scenario.
So maybe i wrong and miss something here? Are there any other options to achieve this?
Your idea of using a replica-set is flawed for several reasons.
First, a replica-set always replicates the whole mongod instance. You can't enable it for individual collections, and certainly not only for specific documents of a collection.
Second, deactivating replication and enabling it before you start your report generation is not a good idea either. When you enable replication, the new slave will not be immediately up-to-date. It will take a while until it has processed the changes since its last contact with the master. There is no way to tell how long this will take (you can check how far a secondary is behind the primary using rs.status() and comparing the secondaries optimeDate with its lastHeartbeat date).
But when you want to perform data-mining on a subset of your documents selected by timespan, there is another solution.
Transfer the documents you want to analyze to a new collection. You can do this with an aggregation pipeline consisting only of a $match which matches the documents from the last month followed by an $out. The out-operator specifies that the results of the aggregation are not sent to the application/shell, but instead written to a new collection (which is automatically emptied before this happens). You can then perform your reporting on the new collection without locking the actual one. It also has the advantage that you are now operating on a much smaller collection, so queries will be faster, especially those which can't use indexes. Also, your data won't change between your aggregations, so your reports won't have any inconsistencies between them due to data changing between them.
When you are certain that you will need a second server for report generation, you can still use replication and perform the aggregation on the secondary. However, I would really recommend you to build a proper replica-set (consisting of primary, secondary and an arbiter) and leave replication active at all times. Not only will that make sure that your data isn't outdated when you generate your reports, it also gives you the important benefit of automatic failover should your primary go down for some reason.