I am finding connecting to and querying Azure Cosmos DB with C# .Net is very slow. It is taking about 2.5 seconds to connect and 8 seconds to do a simple query returning about 600 rows (no where clause). Is there more efficient way to do this? Or best to use a client side connection pool so connections are re-used so don't have to connect as many times? This will mainly be used as a Azure Web Service (ASP.Net)
Interesting if I don't do the ReadThroughputAsync() method after getting the cosmos container, then the Initialize() only takes 438 ms but the query takes longer (8.9 seconds). Anyone know why this is?
With calling await _container.ReadThroughputAsync():
Initialize() in 2482 ms
Found 598 results in 8171 ms
Without calling await _container.ReadThroughputAsync():
Initialize() in 438 ms
Found 598 results in 8937 ms
private const string ContainerId = "Items";
private const string DatabaseId = "Results";
private const string EndpointUri = "https://myServer.documents.azure.com:443/";
private const string PrimaryKey ="xxxxxxx==";
private Container _container;
private CosmosClient _cosmosClient;
private Database _database;
public async Task Initialize()
{
var tickCount = Environment.TickCount;
_cosmosClient = new CosmosClient(EndpointUri, PrimaryKey,
new CosmosClientOptions {ApplicationName = "DataImporter"});
_database = _cosmosClient.GetDatabase(DatabaseId);
_container = _database.GetContainer(ContainerId);
var throughput = await _container.ReadThroughputAsync();
tickCount = Environment.TickCount - tickCount;
WriteLine($"Initialize() in {tickCount} ms");
var tickCount2 = Environment.TickCount;
var sqlQueryText = $"SELECT * FROM c";
var queryDefinition = new QueryDefinition(sqlQueryText);
var queryResultSetIterator = _container.GetItemQueryIterator<SampleData>(queryDefinition);
var results = new List<SampleData>();
while (queryResultSetIterator.HasMoreResults)
{
var currentResultSet = await queryResultSetIterator.ReadNextAsync();
foreach (var result in currentResultSet)
results.Add(result);
}
tickCount2 = Environment.TickCount - tickCount;
WriteLine($"Found {results.Count} results in {tickCount} ms");
}
Thank you Gaurav Mantri, David Makogon and Mark Brown. Posting your suggestions as answer to help other community members.
Practices that you can adopt to make the connection faster
Initialize the cosmos connection at startup. you would need to use CreateAndInitializeAsync
method of the CosmosClient.
Make sure to run the code in the same region as that of cosmos DB.
Always have the reference to cosmos client and container alive. This will ensure the subsequent calls (after 1st call) will be faster.
Reference:
Why the first request takes so much time and how you can speed that up.
https://stackoverflow.com/questions/67943528/asp-net-core-3-application-slow-to-load-cosmos-db-query#:~:text=First%2C%20I%20would,from%2Dportal%22%2C%0AcontainersToInitialize)
Related
CreateContainerIfNotExistsAsync is throwing an exception with status code "Bad Request" if the container does not exist in the db. If the container exists in the db, then no exception is thrown. Can anyone help me why this is happening.
(Hid the url and key for online posting)
using Microsoft.Azure.Cosmos;
using Microsoft.Azure.Cosmos.Linq;
using System.Threading.Tasks;
namespace CosmosDB // Note: actual namespace depends on the project name.
{
class Program
{
public static async Task Main(string[] args)
{
var cosmosUrl = "###########################";
var cosmoskey = "###########################";
var databaseName = "TestDB";
// var containerId = "ToDo";
CosmosClient client = new CosmosClient(cosmosUrl, cosmoskey);
Database database = await client.CreateDatabaseIfNotExistsAsync(databaseName);
Container container = await database.CreateContainerIfNotExistsAsync(
id: "ToDoList",
partitionKeyPath: "/category",
throughput: 100
);
}
}
}
The command fails because your input is invalid. The throughput must be a value between 400 and 10,000 RU/s (for a normal database or container) and since you are using 100 it will throw the exception.
The error won't occur if your container already exists, because it will not check (server-side) or perform an update on the throughput.
Edit:
Link to Microsoft documentation regarding service limits.
Link to Microsoft REST API (used by the SDK).
In Cosmos DB v3, I'm getting an IOrderedQueryable<T> using GetItemLinqQueryable<T>. This allows me to write custom queries. The problem is I'd like to track request charges whenever a query is materialized. How can this be accomplished?
When I execute methods like ReadItemAsyncand ExecuteStoredProcedureAsync, the returned object has a RequestCharge property, but I need to detect charges with linq queries.
You can use the extension method ToFeedIterator on your IOrderedQueryable.
using Microsoft.Azure.Cosmos.Linq;
var query = container.GetItemLinqQueryable<MyClass>()
.Where(c => true)
.ToFeedIterator();
while (query.HasMoreResults)
{
var response = await query.ReadNextAsync();
Console.WriteLine(response.RequestCharge);
foreach (var myClassInstance in response)
{
// do stuff
}
}
edit: if you need count or any aggregate function:
var query = container.GetItemLinqQueryable<MyClass>()
.Where(c => true);
Response<int> x = await query.CountAsync();
Console.WriteLine(x.RequestCharge);
int count = x; // Autoboxing
You can find the full list of available extension functions on GitHub.
We upgraded to the next version of SDK to access our Azure Table storage.
We observed performance degradation of our application after that. We even created test applications with identical usage pattern to isolate it, and still see this performance hit.
We are using .NET Framework code, reading data from Azure table.
Old client: Microsoft.WindowsAzure.Storage - 9.3.2
New client: Microsoft.Azure.Cosmos.Table - 1.0.6
Here is one of the sample tests we tried to run:
public async Task ComparisionTest1()
{
var partitionKey = CompanyId.ToString();
{
// Microsoft.Azure.Cosmos.Table
var storageAccount = Microsoft.Azure.Cosmos.Table.CloudStorageAccount.Parse(ConnectionString);
var tableClient = Microsoft.Azure.Cosmos.Table.CloudStorageAccountExtensions.CreateCloudTableClient(storageAccount);
var tableRef = tableClient.GetTableReference("UserStatuses");
var query = new Microsoft.Azure.Cosmos.Table.TableQuery<Microsoft.Azure.Cosmos.Table.TableEntity>()
.Where(Microsoft.Azure.Cosmos.Table.TableQuery.GenerateFilterCondition("PartitionKey", "eq", partitionKey));
var result = new List<Microsoft.Azure.Cosmos.Table.TableEntity>(20000);
var stopwatch = Stopwatch.StartNew();
var tableQuerySegment = await tableRef.ExecuteQuerySegmentedAsync(query, null);
result.AddRange(tableQuerySegment.Results);
while (tableQuerySegment.ContinuationToken != null)
{
tableQuerySegment = await tableRef.ExecuteQuerySegmentedAsync(query, tableQuerySegment.ContinuationToken);
result.AddRange(tableQuerySegment.Results);
}
stopwatch.Stop();
Trace.WriteLine($"Cosmos table client. Elapsed: {stopwatch.Elapsed}");
}
{
// Microsoft.WindowsAzure.Storage
var storageAccount = Microsoft.WindowsAzure.Storage.CloudStorageAccount.Parse(ConnectionString);
var tableClient = storageAccount.CreateCloudTableClient();
var tableRef = tableClient.GetTableReference("UserStatuses");
var query = new Microsoft.WindowsAzure.Storage.Table.TableQuery<Microsoft.WindowsAzure.Storage.Table.TableEntity>()
.Where(Microsoft.WindowsAzure.Storage.Table.TableQuery.GenerateFilterCondition("PartitionKey", "eq", partitionKey));
var result = new List<Microsoft.WindowsAzure.Storage.Table.TableEntity>(20000);
var stopwatch = Stopwatch.StartNew();
var tableQuerySegment = await tableRef.ExecuteQuerySegmentedAsync(query, null);
result.AddRange(tableQuerySegment.Results);
while (tableQuerySegment.ContinuationToken != null)
{
tableQuerySegment = await tableRef.ExecuteQuerySegmentedAsync(query, tableQuerySegment.ContinuationToken);
result.AddRange(tableQuerySegment.Results);
}
stopwatch.Stop();
Trace.WriteLine($"Old table client. Elapsed: {stopwatch.Elapsed}");
}
}
Anyone observed it, any thoughts about it?
The performance issue will be resolved in Table SDK 1.0.7 as verified with large entity.
On 1.0.6 the workaround is to disable Table sdk trace by adding diagnostics section in app.config if it's a .NET framework app. It will still be slower than Storage sdk, but much better than without the workaround depending on the usage.
I think your data are stored in the legacy Storage Table. Just in case, if this is CosmosDB Table backed, you may get better performance if you set TableClientConfiguration.UseRestExecutorForCosmosEndpoint to True.
If it's the legacy Storage Table store, CosmosDB Table sdk 1.0.6 is about 15% slower than Storage Table sdk 9.3.3. In addition, it has an extra second overhead upon the first CRUD operation. Higher query duration has been resolved in 1.0.7, which is on-par with Storage SDK. The initialization second is still required why using CosmosDB Table sdk 1.0.7, which should be acceptable.
We are planning to release 1.0.7 during the week of 4/13.
I'm seeing a lot of exceptions in the collectionSelfLink when making DocumentDb call -- see image below.
I'm able to connect to DocumentDb and read data but these exceptions concern me -- especially in something that's pretty straight forward like a collectionSelfLink.
Any idea what may be causing them and how to fix them?
Here's the function that's using the selfLink
public async Task<IEnumerable<T>> ReadQuery<T>(string dbName, string collectionId, SqlQuerySpec query)
{
// Prepare collection self link
// IMPORTANT: This is where I'm seeing those exceptions when I inspect the collectionLink. Otherwise, I'm NOT getting any errors.
var collectionLink = UriFactory.CreateDocumentCollectionUri(dbName, collectionId);
var result = _client.CreateDocumentQuery<T>(collectionLink, query, null);
_client.CreateDocumentQuery<T>(collectionLink);
return await result.QueryAsync();
}
And here's the QueryAsync() extension method
public async static Task<IEnumerable<T>> QueryAsync<T>(this IQueryable<T> query)
{
var docQuery = query.AsDocumentQuery();
var batches = new List<IEnumerable<T>>();
do
{
var batch = await docQuery.ExecuteNextAsync<T>();
batches.Add(batch);
}
while (docQuery.HasMoreResults);
var docs = batches.SelectMany(b => b);
return docs;
}
So SelfLink is an internal property that is set by DocumentDB. It cannot be set by the user and will only be populated on resources that have been returned from a call to the server.
The UriFactory code that you are using is construction a link that can be used to execute operations, but it is not a SelfLink.
If you are looking at a SelfLink property on a newly initialized DocumentCollection() object the SelfLink will be null as it has not been persisted on the server yet. This would explain all those errors in debug watch.
I'm using EF5.0 with SQL server 2008. I have two databases on the same server instance. I need to update tables on both databases and want them to be same transaction. So I used the TransactionScope. Below is the code -
public void Save()
{
var MSObjectContext = ((IObjectContextAdapter)MSDataContext).ObjectContext;
var AWObjectContext = ((IObjectContextAdapter)AwContext).ObjectContext;
using (var scope = new TransactionScope(TransactionScopeOption.Required,
new TransactionOptions
{
IsolationLevel = IsolationLevel.ReadUncommitted
}))
{
MSObjectContext.SaveChanges(SaveOptions.DetectChangesBeforeSave);
AWObjectContext.SaveChanges(SaveOptions.DetectChangesBeforeSave);
scope.Complete();
}
}
When I use the above code Transaction gets promoted to DTC. After searching on internet I found that this happens because of two different connectionstrings / connections. But what I dont understand is if I write a stored procedure on one database which updates table in a different database (on same server) no DTC is required. Then why EF or TransactionScope is promoting this to DTC? Is there any other work around for this?
Please advise
Thanks in advance
Sai
With plain DbConnections, you can prevent DTC escalation for multiple databases on the same server by using the same connection string (with any database you like) and manually change the database on the opened connection object like so:
using (var tx = new TransactionScope())
{
using (var conn = new SqlConnection(connectStr))
{
conn.Open();
new SqlCommand("INSERT INTO atest VALUES (1)", conn).ExecuteNonQuery();
}
using (var conn = new SqlConnection(connectStr))
{
conn.Open();
conn.ChangeDatabase("OtherDB");
new SqlCommand("INSERT INTO btest VALUES (2)", conn).ExecuteNonQuery();
}
tx.Complete();
}
This will not escalate to DTC, but it would, if you used different values for connectStr.
I'm not familiar with EF and how it manages connections and contexts, but using the above insight, you might be able to avoid DTC escalation by doing a conn.ChangeDatabase(..) and then creating your context like new DbContext(conn, ...).
But please note that even with a shared connect string, as soon as you have more than one connection open at the same time, the DTC will get involved, like in this modified example:
using (var tx = new TransactionScope())
{
using (var conn = new SqlConnection(mssqldb))
{
conn.Open();
new SqlCommand("INSERT INTO atest VALUES (1)", conn).ExecuteNonQuery();
using (var conn2 = new SqlConnection(mssqldb))
{
conn2.Open();
conn2.ChangeDatabase("otherdatabase");
new SqlCommand("INSERT INTO btest VALUES (2)", conn2).ExecuteNonQuery();
}
}
tx.Complete();
}