How do we handle concurrent writes to Azure Search documents? - azure

Use Case and Question
making one network call to fetch a document from search into memory,
then changing that document in memory, and
finally making a second network call to update that document in search.
How can we guarantee that step (3) fails if the document has changed since step (1).
Using an ETag in the request header seems ideal. We found docs on doing that for non-document resources (e.g., index fields), but we found scant documentation on doing that with document resources.
Our Research
The .NET v11 SearchClient provides .NET methods for the update and the merge operations, but does not explicitly mention concurrency in the XML comments of those methods.
In the source code for the v11 SearchClient, both the upload and merge operations call into IndexDocumentsInternal. That method has no explicit ETag usage (and it ends up calling into the REST API).
The REST API headers documentation says of ETags that it supports indexers, indexes, and data sources, but not documents.
The REST API response documentation includes a 409; however, the docs do not give a lot of information on when this response would be returned. The 409 means that a, version conflict was detected when attempting to index a document. This can happen when you're trying to index the same document more than once concurrently.
This related SO item does not answer our question: Does Azure Search Provides Etags for managing concurrency for Add, Update or Delete Documents?

Related

Performance Tips: To update a single document in cosmos db what to use server side javascript or .net api?

I have to update one of the property of document in cosmos db. And this is a frequent call and concurrent as well as per business logic. In order to have a better performance - should I use server side javascript to update the document which will also handle race condition or should I use .Net Cosmos Db API? Please suggest.
I also want to race condition handled in a better way as the same record can be updated from multiple services? Also what should I do if the race condition fails? Should I put retry mechanism or simply return the error to the caller?
Should I use server side javascript to update the document which will
also handle race condition or should I use .Net Cosmos Db API?
I don't think it would matter as both of them will eventually call the REST API to replace the document (https://learn.microsoft.com/en-us/rest/api/cosmos-db/replace-a-document).
I also want to race condition handled in a better way as the same
record can be updated from multiple services?
Cosmos DB provides etag based optimistic concurrency handling. In your scenario, you will include If-Match request header with the document's _etag property value. If the value matches with the etag value of the document on the server, the update will succeeed otherwise it will fail.
Also what should I do if the race condition fails? Should I put retry
mechanism or simply return the error to the caller?
I don't think retrying with same parameters would help if you're implementing optimistic concurrency. In case an update fails because of this, you should fetch the latest document from the server, update it and then try to save it.

Syncing Problems with Xamarin Forms and Azure Easy Tables

I've been working on a Xamarin.Forms application in Visual Studio using Azure for the backend for a while now, and I've come across a really strange issue.
Please note, that I am following the methods mentioned in this blog
For some strange reason the PullAsync() method seems to have some bizarre problems. Any data that I create and sync will only be pulled by PullAsync() from that solution. What I mean by that is that if I create another solution that accesses the exact same backend, it can perform it's own create/sync data, but will not bring over the data generated by the other solution, even though they both seem to have the exact same access. This appears to be some kind of a security feature/issue, but I can't quite make sense of it.
Has anyone else encountered this at all? Was there a work-around at all? This could potentially cause problems down the road if I were to ever want to create another solution that accesses the same system/data for whatever reason.
For some strange reason the PullAsync() method seems to have some bizarre problems. Any data that I create and sync will only be pulled by PullAsync() from that solution.
According to your provided tutorial, I found that the related PullAsync is using Incremental Sync.
await coffeeTable.PullAsync("allCoffees", coffeeTable.CreateQuery());
Incremental Sync:
the first parameter to the pull operation is a query name that is used only on the client. If you use a non-null query name, the Azure Mobile SDK performs an incremental sync. Each time a pull operation returns a set of results, the latest updatedAt timestamp from that result set is stored in the SDK local system tables. Subsequent pull operations retrieve only records after that timestamp.
Here is my test, you could refer to it for a better understanding of Incremental Sync:
Client : await todoTable.PullAsync("todoItems-02", todoTable.CreateQuery());
The client SDK would check if there has a record with the id equals deltaToken|{table-name}|{query-id} from the __config table of your SQLite local store.
If there has no record, then the SDK would send a request as following for pulling your records:
https://{your-mobileapp-name}.azurewebsites.net/tables/TodoItem?$filter=(updatedAt%20ge%20datetimeoffset'1970-01-01T00%3A00%3A00.0000000%2B00%3A00')&$orderby=updatedAt&$skip=0&$top=50&__includeDeleted=true
Note: the $filter would be set as (updatedAt ge datetimeoffset'1970-01-01T00:00:00.0000000+00:00')
While there has a record, then the SDK would pick up the value as the latest updatedAt timestamp and send the request as follows:
https://{your-mobileapp-name}.azurewebsites.net/tables/TodoItem?$filter=(updatedAt%20ge%20datetimeoffset'2017-06-26T02%3A44%3A25.3940000%2B00%3A00')&$orderby=updatedAt&$skip=0&$top=50&__includeDeleted=true
Per my understanding, if you handle the same logical query with the same query id (non-null) in different mobile client, you need to make sure the local db is newly created by each client. Also, if you want to opt out of incremental sync, pass null as the query ID. In this case, all records are retrieved on every call to PullAsync, which is potentially inefficient. For more details, you could refer to How offline synchronization works.
Additionally, you could leverage fiddler for capturing the network traces when you invoke the PullAsync, in order to troubleshoot your issue.

Hard-Coding Categories or Fetching from API

What is the recommended method of getting CategoryIds. I understand Foursquare provides this list: https://developer.foursquare.com/categorytree. My question is should I just use this list and hard-code the values or fetch the Ids on first opening of the app and caching these results?
From the venues/categories API documentation:
When designing client applications, please download this list only once per session, but also avoid caching this data for longer than a week to avoid stale information.
So fetch on app launch and cache for the current session to insure the hierarchy is always up-to-date.

Issue with CouchDB

In the TAMA implementation, I came across an issue with Couchdb. (Version 1.2.0) ,
We are using named documents to maintain unique constraint logic in the application. (named documents : whose _id is user defined, and not couch generated.)
We are using the REST API to add the documents to Couchdb, where we found strange behavior :
When we try to recreate the documents using HTTP PUT which have been deleted in the past(because of bug in the code), the documents are not created the first time .
HTTP Put - Returns HTTP 200, but doc is not saved in couchdb.
Again trying the same request,
HTTP Put - Returns HTTP 200 and adds the doc in database.
HTTP PUT request needs to be sent twice to create and save the doc.
I have checked that the above bug is reproducible for deleted docs, i.e the response for GET _id is {"error":"not_found","reason":"deleted"}.
This looks like a bug in CouchDB to me, could you please let us know if you could think of any scenario where above error might occur and any possible workarounds/solutions ?
Couchdb has a builtin mechanism to ensure that you do not overwrite the same document as someone else.
If you PUT any existing document, you'll have to accompany this process with the current doc._rev value, so that couchdb can confirm the document you are updating is based on the most recent version in the database.
I've not come across this case with deletions, but it makes sense to me that couchdb should not allow you to overwrite a deleted document as the assumption should be, you just don't know about the deletion.
Have you tried if you can access the revision of the deleted document and if so, whether by adding it to the new document, you can succeed with the PUT on the first call?

RestKit and Core Data - How to POST data?

I am using RestKit .22.0 with Core Data integration, both of which I'm pretty unfamiliar with. I followed the RKGist tutorial and was able to learn how to get objects from a REST endpoint, set up object mappings, add routes, and see the data from the web service correctly insert into the Core Data sqlite database.
Now I'm starting to work on persisting objects to the web service, but can't find any information on how best to do this. It seems like there are multiple ways to skin a cat with RestKit, so I wanted to see what the best practices are for POST/PUTing data.
When POSTing a new object, do you usually save the object in the managed object context first, then call [[RKObjectManager sharedManager] postObject:path:parameters:success:failure:]? Or is there some RestKit method that performs both of these operations at once?
If you first save the object in Core Data then POST it to the web service, is RestKit going to be able to update the already inserted object with the service's database identification attributes? Does the [[RKObjectManager sharedManager] postObject:path:parameters:success:failure:] method do this for you?
If there was an error POSTing the object, what is the typical way you'd retry the POST? Would you look for some sort of flag in the core data managed object and retry in a separate thread?
Thanks!
Yes, then the response from the POST updates that same object (perhaps filling in the server specified unique id)
Yes, updating the POSTed object is the default behaviour (you need to specify the response mapping and the response must be a single object)
No separate thread generally, and it depends what caused the error. Have a flag that indicates it's uploaded and retry when network connection is reestablished

Resources