Cosmos DB: Gremlin API Request too large exception. How to retry the call - azure

I have a Throughput of 1000 RU/s in my Azure Cosmos DB and I have around 290 queries to be executed. I keep getting request too large exception.
Each query have 12 properties and 1 Partition key but I still think with 1000 RU/s the queries should be executed properly.
I have a gremlinClient
public static GremlinClient GetGremlinClient()
{
var gremlinServer = new GremlinServer(Endpoint, Port, enableSsl: true,
username: "/dbs/" + Databasename + "/colls/" + Collectionname, password: Authkey);
var gremlinClient = new GremlinClient(gremlinServer, new GraphSON2Reader(), new GraphSON2Writer(),
GremlinClient.GraphSON2MimeType);
return gremlinClient;
}
A sample query. I am just trying to add vertices
g.addV('Experience').property('_test', 'dummy').property('someProperty', 'dummy').property('someProperty', 'dummy').property('someProperty', 'Documentation of the business processes
of all departments as well as the management level for an informed
selection of an ERP-system for a medium-sized industrial enterprise;
Role: Project management ').property('someProperty',
'2016').property('someProperty', 'Offen').property('someProperty',
'Dummy').property('someProperty', 'EN').property('someProperty',
'Industry').property('someProperty', 'Process documentation of
the whole company for a profounded selection of an ERP-System.')
That for-each executes all the queries
foreach (string query in queries)
{
await gremlinClient.SubmitAsync<dynamic>(query);
}
The error I get
Server error: \r\n\nActivityId : 2312f64f-b865-49cc-bb26-843d46313199\nExceptionType : RequestRateTooLargeException\nExceptionMessage :\r\n\tMessage: {\"Errors\":[\"Request rate is large\"]}\r\n\tActivityId: 157daf87-3238-4e1c-9a81-41bcd6d7c2e1, Request URI: /apps/413f848b-ce17-40fc-ad7f-14c0e21e9633/services/29abd22a-4e74-48c1-aab3-b311be968829/partitions/9e4cb405-4f74-4d7f-8d12-26e79b910143/replicas/132142016542682221s/, RequestStats: \r\n\tRequestStartTime: 2019-10-24T09:27:38.2395067Z, RequestEndTime: 2019-10-24T09:27:38.2395067Z, Number of regions attempted:1\r\n\tResponseTime: 2019-10-24T09:27:38.2395067Z
Its a simple code I dont understand what I can change in that.
Is there a way to retry the request for the same point or somehow not get the error or to avoid the error

The exception you receive is Request*Rate*TooLargeException, meaning you are submitting too many request in a short period of time.
For running bulk operations you should use the vendor specific tooling.

Related

How to Speed Up Contract API CustomerID Search?

I'm trying to search the existing Customers and return the CustomerID if it exists. This is the code I'm using which works:
var CustomerToFind = new Customer
{
MainContact = new Contact
{
Email = new StringSearch { Value = emailIn }
}
};
var sw = new Stopwatch();
sw.Start();
//see if any results
var result = (Customer)soapClient.Get(CustomerToFind);
sw.Stop();
Debug.WriteLine(sw.ElapsedMilliseconds);
However, I've finding it appears extremely slow to the point of being unusable. For example on the DEMO dataset, on my i7-6700k # 4GHz with 24gb ram and SSD running SQL Server 2016 Developer Edition locally a simple email search takes between 3-4seconds. However on my production dataset with 10k Customer records, it takes over 60 seconds and times out.
Is this typical using Contract based soap? Screen based soap seems much faster and almost instant. If I perform a SQL select on the database tables in Microsoft Management Studio I can also return the result instantly.
Is there a better quick way to query if a Customer with email address = "test#test.com" exists and return the Customer ID?
Try using GetList instead of Get. It's better suited for "search for smth" scenarios.
When using GetList, depending on which endpoint you're using, there are two more optimizations. In Default/5.30.001 endpoint there's a second parameter to GetList which you should set to false. In Default/6.00.001 endpoint there's no second parameter but there is additional property in the entity itself, called ReturnBehavior. Either set it to OnlySpecified and then add *Return to required fields, like this:
var CustomerToFind = new Customer
{
ReturnBehavior = ReturnBehavior.OnlySpecified,
CustomerID = new StringReturn(),
MainContact = new Contact
{
Email = new StringSearch { Value = emailIn }
}
};
or set it to OnlySystem and then use ID on returned entity to request the full entity.

Data tracking in DocumentDB

I was trying to keep the history of data (at least one step back) of DocumentDB.
For example, if I have a property called Name in document with value "Pieter". Now I am changing that to "Sam", I have to maintain the history , it was "Pieter" previously.
As of now I am thinking of a pre-trigger. Any other solutions ?
Cosmos DB (formerly DocumentDB) now offers change tracking via Change Feed. With Change Feed, you can listen for changes on a particular collection, ordered by modification within a partition.
Change feed is accessible via:
Azure Functions
DocumentDB (SQL) SDK
Change Feed Processor Library
For example, here's a snippet from the Change Feed documentation, on reading from the Change Feed, for a given partition (full code example in the doc here):
IDocumentQuery<Document> query = client.CreateDocumentChangeFeedQuery(
collectionUri,
new ChangeFeedOptions
{
PartitionKeyRangeId = pkRange.Id,
StartFromBeginning = true,
RequestContinuation = continuation,
MaxItemCount = -1,
// Set reading time: only show change feed results modified since StartTime
StartTime = DateTime.Now - TimeSpan.FromSeconds(30)
});
while (query.HasMoreResults)
{
FeedResponse<dynamic> readChangesResponse = query.ExecuteNextAsync<dynamic>().Result;
foreach (dynamic changedDocument in readChangesResponse)
{
Console.WriteLine("document: {0}", changedDocument);
}
checkpoints[pkRange.Id] = readChangesResponse.ResponseContinuation;
}
If you're trying to make an audit log I'd suggest looking into Event Sourcing.Building your domain from events ensures a correct log. See https://msdn.microsoft.com/en-us/library/dn589792.aspx and http://www.martinfowler.com/eaaDev/EventSourcing.html

How to measure RU in DocumentDB?

Given that Azure DocumentDB uses Requests Units as a measurement for throughput I would like to make sure my queries utilize the least amount of RUs as possible to ncrease my throughput. Is there a tool that will tell me how many RUs a query will take and if the query is actually using an index or not?
As you discovered, certain tools will provide RU's upon completion of a query. This is also available programmatically, as the x-ms-request-charge header is returned in the response, and easily retrievable via the DocumentDB SDKs.
For example, here's a snippet showing RU retrieval using JS/node:
var queryIterator = client.queryDocuments(collLink, querySpec );
queryIterator.executeNext(function (err, results, headers) {
if (err) {
// deal with error...
} else {
// deal with payload...
var ruConsumed = headers['x-ms-request-charge'];
}
});
As far as your question regarding indexing, and determining if a property is indexed (which should then answer your question about a query using or not using an index): You may query the collection, which returns the indexing details in the response header.
For example: given some path dbs/<databaseId>/colls/<collectionId>:
var collLink = 'dbs/' + databaseId+ '/colls/'+ collectionId;
client.readCollection(collLink, function (err, coll) {
if (err) {
// deal with error
} else {
// compare indexingPolicy with your property, to see if it's included or excluded
// this just shows you what these properties look like
console.log("Included: " + JSON.stringify(coll.indexingPolicy.includedPaths))
console.log("Excluded: " + JSON.stringify(coll.indexingPolicy.excludedPaths))
}
});
You'll see includedPaths and excludedPaths looking something like this, and you can then search for your given property in any way you see fit:
Included: [{"path":"/*","indexes":[{"kind":"Range","dataType":"Number","precision":-1},{"kind":"Hash","dataType":"String","precision":3}]}]
Excluded: []
I found DocumentDb Studio which shows the response header that provide the RUs on every query.
Another option is to use the emulator with the trace collection option turned on.
https://learn.microsoft.com/en-us/azure/cosmos-db/local-emulator
I was trying to profile LINQ aggregate queries, which currently seems to be impossible with c# SDK.
Using the trace output from the emulator I was able to identify the request charges and a host of other metrics. There is a lot of data to wade through through.
I found the request charge stored under this event key
DocDBServer/Transport_Channel_Processortask/Genericoperation
Example output:
ThreadID="141,928" FormattedMessage="EndRequest DocumentServiceId localhost, ResourceType 2, OperationType 15, ResourceId 91M7AL+QPQA=, StatusCode 200, HRESULTHex 0, ResponseLength 61, Duration 70,546, HasQuery 1, PartitionId a4cb495b-38c8-11e6-8106-8cdcd42c33be, ReplicaId 1, ConsistencyLevel 3, RequestSessionToken 0:594, ResponseSessionToken 594, HasContinuation 0, HasPreTrigger 0, HasPostTrigger 0, IsFeedUnfiltered 0, IndexingDirective 5, XDate Fri, 09 Jun 2017 08:49:03 GMT, RetryAfterMilliseconds 0, MaxItemCount -1, ActualItemCount 1, ClientVersion 2017-02-22, UserAgent Microsoft.Azure.Documents.Common/1.13.58.2, RequestLength 131, NetworkBucket 2, SubscriptionId 00000000-0000-0000-0000-000000000000, Region South Central US, IpAddress 192.168.56.0, ChannelProtocol RNTBD, RequestCharge 51.424, etc...
This can then be correlated with data from another event which contains the query info:
DocDBServer/ServiceModuletask/Genericoperation
Note you need perfview to view the ETL log files. See here for more info:
https://github.com/Azure/azure-documentdb-dotnet/blob/master/docs/documentdb-sdk_capture_etl.md

Using Redis to store results from a query from neo4j

I have a query in neo4j with some aggregation functions, which takes around 10 seconds to retrieve the information. What I would like to do is store the query results into redis and from time to time the redis database would update the results from neo4j.
One record will be like:
{ entry: "123", model: "abc", reactants: [{specie: "abc#12", color: "black"}], .... }
I'm using node.js with express, thank you in advance for your attention
UPDATE: my query is quite extensive, I had to do that 'UNWIND' part to be able to search by the reactants (I wanted the products too but I didn't know how to do it). I don't know if it's possible to be optimized to at least 2 seconds but here it goes:
MATCH (rx:ModelReaction),
(rx)-[l:left_component]->(lc:MetaboliteSpecie),
(rx)-[r:right_component]->(rc:MetaboliteSpecie)
OPTIONAL MATCH (rx)-[:has_gpr]-(gpr:Root)
OPTIONAL MATCH (rx)-[:has_crossreference_to]-(cr)-[:has_ec_number]-(ec)
WITH rx,r,cr,ec,gpr,
COLLECT(DISTINCT {specie: l.cpdEntry, stoichiometry: l.stoichiometry}) as reacts
UNWIND reacts as rcts
WITH rx,r,cr,ec,gpr, rcts, reacts
WHERE rcts.specie =~ {searchText} OR rx.entry =~ {searchText} OR
rx.name =~ {searchText} OR (ec.entry IS NOT NULL AND
ec.entry =~ {searchText}) OR rx.geneRule =~ {searchText}
RETURN {entry: rx.entry,
reactants: reacts,
products:COLLECT(DISTINCT {specie: r.cpdEntry,
stoichiometry: r.stoichiometry}),
orientation: rx.orientation, name: rx.name, ecnumber: ec.entry,
gpr_rule: rx.geneRule, gpr_normalized: gpr.normalized_rule}
ORDER BY ' + reactionsTableMap[sortCol] + ' ' + order + ' SKIP {offset} LIMIT {number}'
The easiest is to store the result from Neo4j as a JSON string in redis, and set an expiry time on that key. Now when you need to retrieve the data, you check if the key is there, then redis serves as a cache, and if the key is not there, you ask Neo4j, store the result in redis and returns that to your Node.js program.
Pseudo code since I don't know Node.js specifics regarding Neo4J and Redis:
var result = redis.get("Record:123")
if (result == null) {
result = neo4j.query("...");
redis.setex("Record:123", toJson(result), 10); // set with expiry time
}
return result;
Redis will handle the expiry so you don't have to.
If you want to store them all, you can store them in a LIST or a ZSET (sorted set by record Id for example) and just call redis LRANGE/ZRANGE to retrieve a part of that list/set
Example with list:
var exist = redis.exist("Records"); // check if something stored in redis
if (!exist) {
var queryResult = neo4j.query("...); // get a list of results from neo4j
queryResult.foreach(result => redis.lpush("Records", toJson(result))); // add the results in the redis list
}
return redis.lrange("Records", 0, 50); // get the 50 first items
Now just iterate on that using the two parameters of redis.lrange by getting the ten items then the next ten.
You can also call redis EXPIRE to set an expiry time on the redis list.

Azure Storage Table Does not return whole partition

I found some situation on production when
CloudContext.TableData.Where( A => A.PartitionKey == "MYKEY").ToList();
where TableData is
public DataServiceQuery<T> TableData { get { return CreateQuery<T>( _TableName ); } }
does not return the whole partition (I have less than 1000 records there).
In my case it returns 367 records while in VS2010 Server Explorer or in Azure Storage Explorer I get 414 records (condition is the same).
Did anyone experience the same problem?
Also If I change the query and add RowKey into the condition - I get required record with no problem.
You have to better understand the Table Service. In the official documentation here there are listed other conditions which affect number of records returned. If you want to retrieve the whole partition you have to inspect the TableResult for Continuation Token and use provided continuation token to execute the same query over and over again, until all the results come.
You can use an approach similar to the following:
private IEnumerable<MyEntityType> GetAllEntities()
{
var result = this._tables.GetSegmentedEntities(100, null); // null is for continuation token
while (result.Results.Count > 0)
{
foreach (var ufs in result.Results)
{
yield return new MyEntityType(ufs.RowKey, ufs.WhateverOtherPropertyINeed);
}
if (result.ContinuationToken != null)
{
result = this._tables.GetSegmentedEntities(100, result.ContinuationToken);
}
else
{
break;
}
}
}
Where GetSegmentedEntities(100, result.ContinuationToken) is defined as:
public TableQuerySegment<MyEntityType> GetSegmentedEntities(int pageSize, TableContinuationToken token)
{
var partKey = "My_Desired_Partition_key_passed_via_Const_or_method_Param";
TableQuery<MyEntityType> query = new TableQuery<MyEntityType>()
.Where(TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, partKey));
query.TakeCount = pageSize;
return this.azureTableReference.ExecuteQuerySegmented<MyEntityType>(query, token);
}
You can use and modify this code for your case.
This is a known and documented behavior. The Table service API will either return 1000 entities or as much entities as possible within 5 seconds. If the query takes longer than 5 seconds to execute, it'll return a continuation token.
With the addition of rowkey you are making the query more specific and hence faster and as a result yo are getting all the entities.
See TimeOuts and Pagination on MSDN for details
If you are getting partial result sets then there will be two factors.
i) You are having more than 1000 records matching the filter
ii) Querying took more than 5 seconds.
iii) Query crosses partition boundary.
As you are having less than 1000 records the first factor wont be a issue.And as you are retrieving based on PartitionKey equality third one also wont cause any problem. You are facing this problem because of second factor.
Two handle this you need to work on continuation token. You can refer this link for more info.

Resources