I have an application that requires large RUs, but for some reason I cannot get the client app to handle more than 1000-1500 RUs, although the collection is set to 10000 RUs. Obviously I can add more clients, but I need one client to give me at least 10000 RUs then scale that.
My requests are simple
var query = connection.CreateDocumentQuery<DocumentDBProfile>(
CollectionUri, //cached
"SELECT * FROM Col1 WHERE Col1.key = '" + partitionKey + "' AND Col1.id ='" + id + "'",
new FeedOptions
{
MaxItemCount = -1,
MaxDegreeOfParallelism = 10000000,
MaxBufferedItemCount = 1000,
}).AsDocumentQuery();
var dataset = await query.ExecuteNextAsync().ConfigureAwait(false);
The query above is hitting 150,000 partitions, each is within own task (awaiting all at the end) and the client is initialized with TCP and direct mode:
var policy = new ConnectionPolicy
{
EnableEndpointDiscovery = false,
ConnectionMode = ConnectionMode.Direct,
ConnectionProtocol = Protocol.Tcp,
};
The CPU on the client appears to max out, mostly to service the call query.ExecuteNextAsync()
Am I doing anything wrong? Any optimization tips? Is there a lower level API I can use? Is there a way to pre-parse queries or make Json parsing more optimal?
UPDATE
I was able to get up to 3000-4000 RU on one client by lowering the number of concurrent requests, and stripping down my deserialized class to one with a single property (id), but I am still 10% of the limit of 50,000 RUs mentioned in the performance guidelines.
Not sure what else I could do. Is there any security checks or overhead I can disable in the .Net SDK?
UPDATE2
All our tests are run on Azure in the same region D11_V2. Running multiple clients scales well, so we are client bound not server bound.
Still not able to achieve 10% of the performance outlined in the CosmosDB performance guideline
By default the SDK will use a retry policy to mask throttling errors. Have you looked at the RU metrics available on Azure portal to confirm if you are being throttled or not? For more details on this, see tutorial here.
Not sure why the REST API would perform better than the .NET SDK. Can you give some more details on the operation you used here?
The example query you provided is querying a single document with a known partitionkey and id per request. For this kind of point-read operation, it would be better to use DocumentClient.ReadDocumentAsnyc, as it should be cheaper than a query.
It sounds like your sole purpose has become to disprove the documentation of Microsoft. Don't overrate this "50.000 RU/S" value for how you should scale your clients.
I don't think you can get a faster & lower level API than using .NET SDK with TCP & direct mode. The critical part is to use the TCP protocol (which you are). Only Java SDK also has direct mode, i doubt its faster. Maybe .NET Core...
How can your requirement be to "have large RU/s"? That is equivalent to "the application should require us to pay X$ for CosmosDB every month". The requirement should rather be "needs to complete X queries per second" or something like this. You then go on from there. See also the request unit calculator.
A request unit is the cost of your transaction. It depends on how large your documents are, how your collection is configured and on what your are doing. Inserting documents is usually much more expensive than retrieving data. Retrieving data across partitions within one query is more expensive than touching only a single one. A rule of thumb is that writing data is about 5 times more expensive than reading it.
I suggest you read the documentation about request units.
The problem with the performance tip of Microsoft is that they don't mention anything about which request should incur those RU/s. I would not expect it to mean: "The most basic request possible will not max out the CPU on the client system if you are still below 50.000 RU/s". Inserting data will get you to those numbers much more easily. I did a very quick test on my local machine, and got the official benchmarking sample up to about 7-8k RU/s using TCP+direct. I did not do anything apart from downloading the code and running it from Visual Studio. So my guess would be that the tips are also about inserting, since the performance testing examples are as well. The example achieves 100.000RU/s incidentally.
There are some good samples from Azure about "Benchmarking" and "Request Units". They should also be good sources for further experiments.
Only one actual tip on how to improve your query: Maybe ditch deserialization to your class, using CreateDocumentQuery(..) or CreateDocumentQuery<dynamic>. Could help your CPU. My first guess would be that your CPU is doing a bunch of that.
Hope this helps in any way.
Related
I'm using an Azure function like a scheduled job, using the cron timer. At a specific time each morning it calls a stored procedure.
The function is now taking 4 mins to run a stored procedure that takes a few seconds to run in SSMS. This time is increasing despite efforts to successfully improve the speed of the stored procedure.
The function is not doing anything intensive.
using (SqlConnection conn = new SqlConnection(str))
{
conn.Open();
using (var cmd = new SqlCommand("Stored Proc Here", conn) { CommandType = CommandType.StoredProcedure, CommandTimeout = 600})
{
cmd.Parameters.Add("#Param1", SqlDbType.DateTime2).Value = DateTime.Today.AddDays(-30);
cmd.Parameters.Add("#Param2", SqlDbType.DateTime2).Value = DateTime.Today;
var result = cmd.ExecuteNonQuery();
}
}
I've checked and the database is not under load with another process when the stored procedure is running.
Is there anything I can do to speed up the Azure function? Or any approaches to finding out why it's so slow?
UPDATE.
I don't believe Azure functions is at fault, the issue seems to be with SQL Server.
I eventually ran the production SP and had a look at the execution plan. I noticed that the statistic were way out, for example a join expected the number of returned rows to be 20, but actual figure was closer to 800k.
The solution for my issue was to update the statistic on a specific table each week.
Regarding why that stats were out so much, well the client does a batch update each night and inserts several hundred thousand rows. I can only assume this affected the stats and it's cumulative, so it seems to get worse with time.
Please be careful adding with recompile hints. Often compilation is far more expensive than execution for a given simple query, meaning that you may not get decent perf for all apps with this approach.
There are different possible reasons for your experience. One common reason for this kind of scenario is that you got different query plans in the app vs ssms paths. This can happen for various reasons (I will summarize below). You can determine if you are getting different plans by using the query store (which records summary data about queries, plans, and runtime stats). Please review a summary of it here:
https://learn.microsoft.com/en-us/sql/relational-databases/performance/monitoring-performance-by-using-the-query-store?view=sql-server-2017
You need a recent ssms to get the ui, though you can use direct queries from any tds client.
Now for a summary of some possible reasons:
One possible reason for plan differences is set options. These are different environment variables for a query such as enabling ansi nulls on or off. Each different setting could change the plan choice and thus perf. Unfortunately the defaults for different language drivers differ (historical artifacts from when each was built - hard to change now without breaking apps). You can review the query store to see if there are different “context settings” (each unique combination of set options is a unique context settings in query store). Each different set implies different possible plans and thus potential perf changes.
The second major reason for plan changes like you explain in your post is parameter sniffing. Depending on the scope of compilation (example: inside a sproc vs as hoc query text) sql will sometimes look at the current parameter value during compilation to infer the frequency of the common value in future executions. Instead of ignoring the value and just using a default frequency, using a specific value can generate a plan that is optimal for a single value (or set of values) but potentially slower for values outside that set. You can see this in the query plan choice in the query store as well btw.
There are other possible reasons for performance differences beyond what I mentioned. Sometimes there are perf differences when running in mars mode vs not in the client. There may be differences in how you call the client drivers that impact perf beyond this.
I hope this gives you a few tools to debug possible reasons for the difference. Good luck!
For a project I worked on we ran into the same thing. Its not a function issue but a sql server issue. For us we were updating sprocs during development and it turns out that per execution plan, sql server will cache certain routes/indexes (layman explanation) and that gets out of sync for the new sproc.
We resolved it by specifying WITH (RECOMPILE) at the end of the sproc and the API call and SSMS had the same timings.
Once the system is settled, that statement can and should be removed.
Search on slow sproc fast ssms etc to find others who have run into this situation.
I'm trying out different pricing tiers on SQL Server.
Im inserting 4000 rows distributed over 4 tables in 10 seconds
My problem: I don't any performance improvements from a small D2S_V3 to D8S_V3
My application need to insert many rows (bulking is not an option), and this kind of performance is not acceptable
I wonder why I dont see improvements.
So my noob question: Do I need to configure something to see improvements? My naive thinking says I should some difference :-)
What am I doing wrong?
Without knowing much about your schema, it looks like you are storage bound or network bound.
Storage:
Try to mount the database to the local (temporary disk) and see if you notice any difference, if it is faster then your bottleneck is the mounted disk.
Network bound:
Where is the client that's inserting these transaction? On same machine? On Azure?
I suggest you setup a client in the same region and do the tests.
inserting 4000 rows distributed over 4 tables in 10 seconds.I don't any performance improvements from a small D2S_V3 to D8S_V3
I would approach this problem using wait stats approach rather than throwing hardware first with out knowing problem..
For example,running below insert
insert into #t
select orderid from orders o
join
Customers c
on c.custid=o.custid
showed me below wait stats
Wait WaitType="SOS_SCHEDULER_YIELD" WaitTimeMs="1" WaitCount="167"
Wait WaitType="PAGEIOLATCH_SH" WaitTimeMs="12" WaitCount="3" />
Wait WaitType="MEMORY_ALLOCATION_EXT" WaitTimeMs="21" WaitCount="4975" />
most of the time, the query spent time on
PAGEIOLATCH_SH:
getting data from disk into memory
MEMORY_ALLOCATION_EXT :allocating memory for the query to run
based on this i will try to troubleshoot by seeing if i have memory pressure on my system,since this query is trying to get data from disk.
This is just one example,but hopefully this will give you an idea..
Further i will try to see if select is returing data fast
Performance can be directly linked to your hardware or configuration, but it's more likely that it has to do with the structures and the queries. Take a look at the execution plan for the INSERT operation to see how it is being resolved by the optimizer. Also, capture the query metrics using extended events to see how many resources are being used by the operation. These are more likely to lead to a resolution on why the query is performing slowly and enable you to scale the hardware to best serve the query.
We migrated our mobile app (still being developed) from Parse to Azure. Everything is running, but the price of DocumentDB is so high that we can't continue with Azure without fix that. Probably we're doing something wrong.
1) Price seams to have a bottleneck in the DocumentDB requests.
Running a process to load the data (about 0.5 million documents), memory and CPU was ok, but the DocumentDB request limit was a bottleneck, and the price charged was very high.
2) Even after the end of this data migration (few days of processing), azure continue to charge us every day.
We can't understand what is going on here. The graphic for use are flat, but the price is still climbing, as you can see in the imagens.
Any ideas?
Thanks!
From your screenshots, you have 15 collections under the Parse database. With Parse: Aside from the system classes, each of your user-defined classes gets stored in its own collection. And given that each (non-partitioned) collection has a starting run-rate of ~$24/month (for an S1 collection), you can see where the baseline cost would be for 15 collections (around $360).
You're paying for reserved storage and RU capacity. Regardless of RU utilization, you pay whatever the cost is for that capacity (e.g. S2 runs around $50/month / collection, even if you don't execute a single query). Similar to spinning up a VM of a certain CPU capacity and then running nothing on it.
The default throughput setting for the parse collections is set to 1000 RUPS. This will cost $60 per collection (at the rate of $6 per 100 RUPS). Once you finish the parse migration, the throughput can be lowered if you believe the workload decreased. This will reduce the charge.
To learn how to do this, take a look at https://azure.microsoft.com/en-us/documentation/articles/documentdb-performance-levels/ (Changing the throughput of a Collection).
The key thing to note is that DocumentDB delivers predictable performance by reserving resources to satisfy your application's throughput needs. Because application load and access patterns change over time, DocumentDB allows you to easily increase or decrease the amount of reserved throughput available to your application.
Azure is a "pay-for-what-you-use" model, especially around resources like DocumentDB and SQL Database where you pay for the level of performance required along with required storage space. So if your requirements are that all queries/transactions have sub-second response times, you may pay more to get that performance guarantee (ignoring optimizations, etc.)
One thing I would seriously look into is the DocumentDB Cost Estimation tool; this allows you to get estimates of throughput costs based upon transaction types based on sample JSON documents you provide:
So in this example, I have an 8KB JSON document, where I expect to store 500K of them (to get an approx. storage cost) and specifying I need throughput to create 100 documents/sec, read 10/sec, and update 100/sec (I used the same document as an example of what the update will look like).
NOTE this needs to be done PER DOCUMENT -- if you're storing documents that do not necessarily conform to a given "schema" or structure in the same collection, then you'll need to repeat this process for EVERY type of document.
Based on this information, I cause use those values as inputs into the pricing calculator. This tells me that I can estimate about $450/mo for DocumentDB services alone (if this was my anticipated usage pattern).
There are additional ways you can optimize the Request Units (RUs -- metric used to measure the cost of the given request/transaction -- and what you're getting billed for): optimizing index strategies, optimizing queries, etc. Review the documentation on Request Units for more details.
Each index batch is limited from 1 to 1000 documents. When I call it from my local machine or azure VM, I got 800ms to 3000ms per 1000 doc batch. If I submit multiple batches with async, the time spent is roughly the same. That means it would take 15 - 20 hours for my ~50M document collection.
Is there a way I can make it faster?
It looks like you are using our Standard S1 search service and although there are a lot of things that can impact how fast data can be ingested. I would expect to see ingestion to a single partition search service at a rate of about 700 docs / second for an average index, so I think your numbers are not far off from what I would expect, although please note that these are purely rough estimates and you may see different results based on any number of factors (such as number of fields, quantity of facets, etc)..
It is possible that some of the extra time you are seeing is due to the latency of uploading the content from your local machine to Azure, and it would likely be faster if you did this directly from Azure but if this is just a one time-upload that probably is not worth the effort.
You can slightly increase the speed of data ingestion by increasing the number of partitions you have and the S2 Search Service will also ingest data faster. Although both of these come at a cost.
By the way, if you have 50 M documents, please make sure that you allocate enough partitions since a single S1 partition can handle 15M documents or 25GB so you will definitely need extra partitions for this service.
Also as another side note, when you are uploading your content (and especially if you choose to do parallelized uploads), keep an eye on the HTTP responses because if the search service exceeds the resources available you could get HTTP 207 (indicating one or more item failed to apply) or 503's indicating the whole batch failed due to throttling. If throttling occurs, you would want to back off a bit to let the service catch up.
I think you're reaching the request capacity:
https://azure.microsoft.com/en-us/documentation/articles/search-limits-quotas-capacity/
I would try another tier (s1, s2). If you still face the same problem, try get in touch with support team.
Another option:
Instead of pushing data, try to add your data to the blob storage, documentDb or Sql Database, and then, use the pull approach:
https://azure.microsoft.com/en-us/documentation/articles/search-howto-indexing-azure-blob-storage/
Two somewhat related questions.
1) Is there anyway to get an ID of the server a table entity lives on?
2) Will using a GUID give me the best partition key distribution possible? If not, what will?
we have been struggling for weeks on table storage performance. In short, it's really bad, but early on we realized that using a randomish partition key will distribute the entities across many servers, which is exactly what we want to do as we are trying to achieve 8000 reads per second. Apparently our partition key wasn't random enough, so for testing purposes, I have decided to just use a GUID. First impression is it is waaaaaay faster.
Really bad get performance is < 1000 per second. Partition key is Guid.NewGuid() and row key is the constant "UserInfo". Get is execute using TableOperation with pk and rk, nothing else as follows: TableOperation retrieveOperation = TableOperation.Retrieve(pk, rk); return cloudTable.ExecuteAsync(retrieveOperation). We always use indexed reads and never table scans. Also, VM size is medium or large, never anything smaller. Parallel no, async yes
As other users have pointed out, Azure Tables are strictly controlled by the runtime and thus you cannot control / check which specific storage nodes are handling your requests. Furthermore, any given partition is served by a single server, that is, entities belonging to the same partition cannot be split between several storage nodes (see HERE)
In Windows Azure table, the PartitionKey property is used as the partition key. All entities with same PartitionKey value are clustered together and they are served from a single server node. This allows the user to control entity locality by setting the PartitionKey values, and perform Entity Group Transactions over entities in that same partition.
You mention that you are targeting 8000 requests per second? If that is the case, you might be hitting a threshold that requires very good table/partitionkey design. Please see the article "Windows Azure Storage Abstractions and their Scalability Targets"
The following extract is applicable to your situation:
This will provide the following scalability targets for a single storage account created after June 7th 2012.
Capacity – Up to 200 TBs
Transactions – Up to 20,000 entities/messages/blobs per second
As other users pointed out, if your PartitionKey numbering follows an incremental pattern, the Azure runtime will recognize this and group some of your partitions within the same storage node.
Furthermore, if I understood your question correctly, you are currently assigning partition keys via GUID's? If that is the case, this means that every PartitionKey in your table will be unique, thus every partitionkey will have no more than 1 entity. As per the articles above, the way Azure table scales out is by grouping entities in their partition keys inside independent storage nodes. If your partitionkeys are unique and thus contain no more than one entity, this means that Azure table will scale out only one entity at a time! Now, we know Azure is not that dumb, and it groups partitionkeys when it detects a pattern in the way they are created. So if you are hitting this trigger in Azure and Azure is grouping your partitionkeys, it means your scalability capabilities are limited to the smartness of this grouping algorithm.
As per the the scalability targets above for 2012, each partitionkey should be able to give you 2,000 transactions per second. Theoretically, you should need no more than 4 partition keys in this case (assuming that the workload between the four is distributed equally).
I would suggest you to design your partition keys to group entities in such a way that no more than 2,000 entities per second per partition are reached, and drop using GUID's as partitionkeys. This will allow you to better support features such as Entity Transaction Group, reduce the complexity of your table design, and get the performance you are looking for.
Answering #1: There is no concept of a server that a particular table entity lives on. There are no particular servers to choose from, as Table Storage is a massive-scale multi-tenant storage system. So... there's no way to retrieve a server ID for a given table entity.
Answering #2: Choose a partition key that makes sense to your application. just remember that it's partition+row to access a given entity. If you do that, you'll have a fast, direct read. If you attempt to do a table- or partition-scan, your performance will certainly take a hit.
See
http://blogs.msdn.com/b/windowsazurestorage/archive/2010/11/06/how-to-get-most-out-of-windows-azure-tables.aspx for more info on key selection (Note the numbers are 3 years old, but the guidance is still good).
Also this talk can be of some use as far as best practice : http://channel9.msdn.com/Events/TechEd/NorthAmerica/2013/WAD-B406#fbid=lCN9J5QiTDF.
In general a given partition can support up to 2000 tps, so spreading data across partitions will help achieve greater numbers. Something to consider is that atomic batch transactions only apply to entities that share the same partitionkey. Additionally, for smaller requests you may consider disabling Nagle as small requests may be getting held up at the client layer.
From the client end, I would recommend using the latest client lib (2.1) and Async methods as you have literally thousands of requests per second. (the talk has a few slides on client best practices)
Lastly, the next release of storage will support JSON and JSON no metadata which will dramatically reduce the size of the response body for the same objects, and subsequently the cpu cycles needed to parse them. If you use the latest client libs your application will be able to leverage these behaviors with little to no code change.