How to measure RU in DocumentDB? - azure

Given that Azure DocumentDB uses Requests Units as a measurement for throughput I would like to make sure my queries utilize the least amount of RUs as possible to ncrease my throughput. Is there a tool that will tell me how many RUs a query will take and if the query is actually using an index or not?

As you discovered, certain tools will provide RU's upon completion of a query. This is also available programmatically, as the x-ms-request-charge header is returned in the response, and easily retrievable via the DocumentDB SDKs.
For example, here's a snippet showing RU retrieval using JS/node:
var queryIterator = client.queryDocuments(collLink, querySpec );
queryIterator.executeNext(function (err, results, headers) {
if (err) {
// deal with error...
} else {
// deal with payload...
var ruConsumed = headers['x-ms-request-charge'];
}
});
As far as your question regarding indexing, and determining if a property is indexed (which should then answer your question about a query using or not using an index): You may query the collection, which returns the indexing details in the response header.
For example: given some path dbs/<databaseId>/colls/<collectionId>:
var collLink = 'dbs/' + databaseId+ '/colls/'+ collectionId;
client.readCollection(collLink, function (err, coll) {
if (err) {
// deal with error
} else {
// compare indexingPolicy with your property, to see if it's included or excluded
// this just shows you what these properties look like
console.log("Included: " + JSON.stringify(coll.indexingPolicy.includedPaths))
console.log("Excluded: " + JSON.stringify(coll.indexingPolicy.excludedPaths))
}
});
You'll see includedPaths and excludedPaths looking something like this, and you can then search for your given property in any way you see fit:
Included: [{"path":"/*","indexes":[{"kind":"Range","dataType":"Number","precision":-1},{"kind":"Hash","dataType":"String","precision":3}]}]
Excluded: []

I found DocumentDb Studio which shows the response header that provide the RUs on every query.

Another option is to use the emulator with the trace collection option turned on.
https://learn.microsoft.com/en-us/azure/cosmos-db/local-emulator
I was trying to profile LINQ aggregate queries, which currently seems to be impossible with c# SDK.
Using the trace output from the emulator I was able to identify the request charges and a host of other metrics. There is a lot of data to wade through through.
I found the request charge stored under this event key
DocDBServer/Transport_Channel_Processortask/Genericoperation
Example output:
ThreadID="141,928" FormattedMessage="EndRequest DocumentServiceId localhost, ResourceType 2, OperationType 15, ResourceId 91M7AL+QPQA=, StatusCode 200, HRESULTHex 0, ResponseLength 61, Duration 70,546, HasQuery 1, PartitionId a4cb495b-38c8-11e6-8106-8cdcd42c33be, ReplicaId 1, ConsistencyLevel 3, RequestSessionToken 0:594, ResponseSessionToken 594, HasContinuation 0, HasPreTrigger 0, HasPostTrigger 0, IsFeedUnfiltered 0, IndexingDirective 5, XDate Fri, 09 Jun 2017 08:49:03 GMT, RetryAfterMilliseconds 0, MaxItemCount -1, ActualItemCount 1, ClientVersion 2017-02-22, UserAgent Microsoft.Azure.Documents.Common/1.13.58.2, RequestLength 131, NetworkBucket 2, SubscriptionId 00000000-0000-0000-0000-000000000000, Region South Central US, IpAddress 192.168.56.0, ChannelProtocol RNTBD, RequestCharge 51.424, etc...
This can then be correlated with data from another event which contains the query info:
DocDBServer/ServiceModuletask/Genericoperation
Note you need perfview to view the ETL log files. See here for more info:
https://github.com/Azure/azure-documentdb-dotnet/blob/master/docs/documentdb-sdk_capture_etl.md

Related

Can we log a SQL query having bind parameters in node-oracledb?

const query = `INSERT INTO countries VALUES (:country_id, :country_name)`;
try {
const result = await connection.execute(query, { country_id: 90, country_name: "Tonga" });
} catch (error) {
console.error(`error while executing: ${query}`);
}
Is there any way to log the query along with the bind parameters data
so that I can log INSERT INTO countries VALUES (90, "Tonga")
I think there's currently no builtin option to do that, but according to the docs you could create a wrapper around the execute function and log the actual query there. From the docs:
Sometimes it is useful to trace the bind data values that have been
used when executing statements. Several methods are available.
In the Oracle Database, the view V$SQL_BIND_CAPTURE can capture bind
information. Tracing with Oracle Database’s
DBMS_MONITOR.SESSION_TRACE_ENABLE() may also be useful.
You can also write your own wrapper around execute() and log any
parameters.
Eventually, I found a package known as bind-sql-string which has queryBindToString method that solves my problem. 🎉

Cosmos DB: Gremlin API Request too large exception. How to retry the call

I have a Throughput of 1000 RU/s in my Azure Cosmos DB and I have around 290 queries to be executed. I keep getting request too large exception.
Each query have 12 properties and 1 Partition key but I still think with 1000 RU/s the queries should be executed properly.
I have a gremlinClient
public static GremlinClient GetGremlinClient()
{
var gremlinServer = new GremlinServer(Endpoint, Port, enableSsl: true,
username: "/dbs/" + Databasename + "/colls/" + Collectionname, password: Authkey);
var gremlinClient = new GremlinClient(gremlinServer, new GraphSON2Reader(), new GraphSON2Writer(),
GremlinClient.GraphSON2MimeType);
return gremlinClient;
}
A sample query. I am just trying to add vertices
g.addV('Experience').property('_test', 'dummy').property('someProperty', 'dummy').property('someProperty', 'dummy').property('someProperty', 'Documentation of the business processes
of all departments as well as the management level for an informed
selection of an ERP-system for a medium-sized industrial enterprise;
Role: Project management ').property('someProperty',
'2016').property('someProperty', 'Offen').property('someProperty',
'Dummy').property('someProperty', 'EN').property('someProperty',
'Industry').property('someProperty', 'Process documentation of
the whole company for a profounded selection of an ERP-System.')
That for-each executes all the queries
foreach (string query in queries)
{
await gremlinClient.SubmitAsync<dynamic>(query);
}
The error I get
Server error: \r\n\nActivityId : 2312f64f-b865-49cc-bb26-843d46313199\nExceptionType : RequestRateTooLargeException\nExceptionMessage :\r\n\tMessage: {\"Errors\":[\"Request rate is large\"]}\r\n\tActivityId: 157daf87-3238-4e1c-9a81-41bcd6d7c2e1, Request URI: /apps/413f848b-ce17-40fc-ad7f-14c0e21e9633/services/29abd22a-4e74-48c1-aab3-b311be968829/partitions/9e4cb405-4f74-4d7f-8d12-26e79b910143/replicas/132142016542682221s/, RequestStats: \r\n\tRequestStartTime: 2019-10-24T09:27:38.2395067Z, RequestEndTime: 2019-10-24T09:27:38.2395067Z, Number of regions attempted:1\r\n\tResponseTime: 2019-10-24T09:27:38.2395067Z
Its a simple code I dont understand what I can change in that.
Is there a way to retry the request for the same point or somehow not get the error or to avoid the error
The exception you receive is Request*Rate*TooLargeException, meaning you are submitting too many request in a short period of time.
For running bulk operations you should use the vendor specific tooling.

How to get more customer sources using Stripe NodeJS SDK?

I am able to get customer sources only 1st 10 via get customer API:
# stripe.customers.retrieve
{
"id": "cus_DE8HSMZ75l2Dgo",
...
"sources": {
"object": "list",
"data": [
],
"has_more": false,
"total_count": 0,
"url": "/v1/customers/cus_DE8HSMZ75l2Dgo/sources"
},
...
}
But how do I get more? Is the only way via an AJAX call? I was thinking there should be a function somewhere in the SDK?
When you retrieve a Customer object via the API, Stripe will return the sources property which is a List object. The data property will be an array with up to 10 sources in it.
If you want the ability to get more sources than the 10 most recent ones, you will need to use Pagination. The idea is that you will first get a list of N objects (10 by default). Then you will request the next "page" from Stripe by asking for N objects again but using the parameter starting_after set to the id of the last object in the previous page. You will continue doing that until the has_more property in the page returned is false indicating you retrieved all the objects.
For example if your Customer has 35 sources, you would get the first page (10), then call list to get 10 more (20), then 10 more again (30) and then the last call would return only 5 sources (35) and has_more would be false.
To decrease the number of calls, you can also set limit to a higher value. The maximum value is 100 in that case.
Here's what the code would look like:
// list those cards 3 at a time
var listOptions = {limit: 3};
while(1) {
var sources = await stripe.customers.listSources(
customer.id,
listOptions
);
var nbSourcesRetrieved = sources.data.length;
var lastSourceId = sources.data[nbSourcesRetrieved - 1].id;
console.log("Received " + nbSourcesRetrieved + " - last source: " + lastSourceId + " - has_more: " + sources.has_more);
// Leave if we are done with pagination
if(sources.has_more == false) {
break;
}
// Store the last source id in the options for the next page
listOptions['starting_after'] = lastSourceId;
}
You can see a full running example on Runkit here: https://runkit.com/5a6b26c0e3908200129fbb5d/5b49eabda462940012c33880
Taking a quick look into the sources of the stripe-node package, it seems there is a stripe.customers.listSources method, which takes a customerId as parameter and requests to the correct url. I suppose it works similar to the listCards method. But I couldn't find it in the docs, so you have to treat it as an undocumented feature ... But maybe it's just an error in the docs. You could contact the support about it. We used stripe in an old project and they appreciated any input on their documentation.
As of stripe-node 6.11.0, you may auto-paginate list methods, including customer sources. Stripe provides a few different APIs for this to aid with a variety of node versions and styles.
See the docs here
The important part to notice is .autoPagingEach:
await stripe.customers.listSources({ limit: 100 }).autoPagingEach(async (source) => {
doSomethingWithYourSource(source)
})

Data tracking in DocumentDB

I was trying to keep the history of data (at least one step back) of DocumentDB.
For example, if I have a property called Name in document with value "Pieter". Now I am changing that to "Sam", I have to maintain the history , it was "Pieter" previously.
As of now I am thinking of a pre-trigger. Any other solutions ?
Cosmos DB (formerly DocumentDB) now offers change tracking via Change Feed. With Change Feed, you can listen for changes on a particular collection, ordered by modification within a partition.
Change feed is accessible via:
Azure Functions
DocumentDB (SQL) SDK
Change Feed Processor Library
For example, here's a snippet from the Change Feed documentation, on reading from the Change Feed, for a given partition (full code example in the doc here):
IDocumentQuery<Document> query = client.CreateDocumentChangeFeedQuery(
collectionUri,
new ChangeFeedOptions
{
PartitionKeyRangeId = pkRange.Id,
StartFromBeginning = true,
RequestContinuation = continuation,
MaxItemCount = -1,
// Set reading time: only show change feed results modified since StartTime
StartTime = DateTime.Now - TimeSpan.FromSeconds(30)
});
while (query.HasMoreResults)
{
FeedResponse<dynamic> readChangesResponse = query.ExecuteNextAsync<dynamic>().Result;
foreach (dynamic changedDocument in readChangesResponse)
{
Console.WriteLine("document: {0}", changedDocument);
}
checkpoints[pkRange.Id] = readChangesResponse.ResponseContinuation;
}
If you're trying to make an audit log I'd suggest looking into Event Sourcing.Building your domain from events ensures a correct log. See https://msdn.microsoft.com/en-us/library/dn589792.aspx and http://www.martinfowler.com/eaaDev/EventSourcing.html

Azure Storage Table Does not return whole partition

I found some situation on production when
CloudContext.TableData.Where( A => A.PartitionKey == "MYKEY").ToList();
where TableData is
public DataServiceQuery<T> TableData { get { return CreateQuery<T>( _TableName ); } }
does not return the whole partition (I have less than 1000 records there).
In my case it returns 367 records while in VS2010 Server Explorer or in Azure Storage Explorer I get 414 records (condition is the same).
Did anyone experience the same problem?
Also If I change the query and add RowKey into the condition - I get required record with no problem.
You have to better understand the Table Service. In the official documentation here there are listed other conditions which affect number of records returned. If you want to retrieve the whole partition you have to inspect the TableResult for Continuation Token and use provided continuation token to execute the same query over and over again, until all the results come.
You can use an approach similar to the following:
private IEnumerable<MyEntityType> GetAllEntities()
{
var result = this._tables.GetSegmentedEntities(100, null); // null is for continuation token
while (result.Results.Count > 0)
{
foreach (var ufs in result.Results)
{
yield return new MyEntityType(ufs.RowKey, ufs.WhateverOtherPropertyINeed);
}
if (result.ContinuationToken != null)
{
result = this._tables.GetSegmentedEntities(100, result.ContinuationToken);
}
else
{
break;
}
}
}
Where GetSegmentedEntities(100, result.ContinuationToken) is defined as:
public TableQuerySegment<MyEntityType> GetSegmentedEntities(int pageSize, TableContinuationToken token)
{
var partKey = "My_Desired_Partition_key_passed_via_Const_or_method_Param";
TableQuery<MyEntityType> query = new TableQuery<MyEntityType>()
.Where(TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, partKey));
query.TakeCount = pageSize;
return this.azureTableReference.ExecuteQuerySegmented<MyEntityType>(query, token);
}
You can use and modify this code for your case.
This is a known and documented behavior. The Table service API will either return 1000 entities or as much entities as possible within 5 seconds. If the query takes longer than 5 seconds to execute, it'll return a continuation token.
With the addition of rowkey you are making the query more specific and hence faster and as a result yo are getting all the entities.
See TimeOuts and Pagination on MSDN for details
If you are getting partial result sets then there will be two factors.
i) You are having more than 1000 records matching the filter
ii) Querying took more than 5 seconds.
iii) Query crosses partition boundary.
As you are having less than 1000 records the first factor wont be a issue.And as you are retrieving based on PartitionKey equality third one also wont cause any problem. You are facing this problem because of second factor.
Two handle this you need to work on continuation token. You can refer this link for more info.

Resources