Get Timestamp after Insert/Update - azure

In azure table storage. Is there a way to get the new timestamp value after an update or insert. I am writing a 3-phase commit protocol to get table storage to support distributed transactions , and it involes multiple writes to the same entity. So the operation order goes like this, Read Entity, Write Entity (Lock Item), Write Entity (Commit new values). I would like to get the new timestamp after the lock item operation so I don't have to unecessarily read the item again before doing the commit new value operation. So does any one know how to efficiently get the new timestamp value after a savechanges operation?

I don't think you need to do anything special/extra. When you read your entity you will get an Etag for it. When you save that entity (setting someLock=true) that save will only succeed if nobody else have updated the entity since your read. Hence you know you have the lock. And then you can do your second write as you please.

I don't believe it is possible. I would use your own timestamp and/or guid to mark entries.

If you're willing to go back to the Update REST API call, it does return the time that the response was generated. It probably won't be exactly the same as the time stamp on the record, but it will be close I'm sure.

You may need to hack your Azure table. drivers
In the Azure python lib (TableStorage) for example, the Timestamp is simply skipped over.
# exclude the Timestamp since it is auto added by azure when
# inserting entity. We don't want this to mix with real properties
if name in ['Timestamp']:
continue

Related

does cosmosdb update delete record even only single field change

I am trying to understand how cosmosdb udpate works? In cosmosdb, there is a upsert operation to update or insert depending on whether the item exists in container or not. usually the flow is like this:
record = client.read_item(id, partition_key)
record['one_field'] = 'new_value'
client.upsert(record)
My doubt here is whether such update operation will delete the original record even only a singe field is changed? If that is the case, then update becomes expensive is the record is large in size. Is my understanding correct here?
Cosmos DB updates a document by replacing it, not by in-place update.
If you query (or read) a document, and then update some properties, you would then replace the document. Or, as you've done, call upsert() (which is similar to a replace, except that it will create a new document if the specified partition+id doesn't exist already).
The notion of "expensive" is not exactly easy to quantify; look at the returned headers to see the RU charge for a given upsert/replace, to determine the overall cost, and whether you'll need to scale your RU/sec setting based on overall usage patterns.

How to seed data with CloudKit?

I need to create some records in CloudKit for each user when they start an app.
I can't just write a seed function that create records. Because when the user starts the app in two devices, they will each write their own seed record.
What I want instead is for the first device to write to CloudKit gets to create the record. And then second device will simply update the values of those records no recreate them.
How can I achieve this?
You have a few options available to you, but all could potentially lead to race-conditions when attempting to write both at the same time, but the actuality of it happening is minimal.
No matter which approach is taken, you should always take the stance of query first. Check if the record exists, update it if needed, then write the new/updated values.
So, in your example:
The first app would query for the record, and create the record - because no record exists.
The second app to launch would query for the record, find it, then do nothing, because the record exists.
Each record in CloudKit maintains a modificationDate. So if you are really concerned about overwriting data that shouldn't be overridden, then you can add attentional queries and date checks to determine if the write should happen.

How to update fields automatically

In my CouchDB database I'd like all documents to have an 'updated_at' timestamp added when they're changed (and have this enforced).
I can't modify the document with validation functions
updates functions won't run unless they're called specifically (so it'd be possible to update the document and not call the specific update function)
How should I go about implementing this?
There is no way to do this now without triggering _update handlers. This is nice idea to track documents changing time, but it faces problems with replications.
Replications are working on top of public API and this means that:
In case of enforcing such trigger you'll have replications broken since it will be impossible to sync data as it is without document modification. Since document get modified, he receives new revision which may easily lead to dead loop if you replicate data from database A to B and B to A in continuous mode.
In other case when replications are fixed there will be always way to workaround your trigger.
I can suggest one work around - you can create a view which emits a current date as a key (or a part of it):
function( doc ){
emit( new Date, null );
}
This will assign current dates to all documents as soon as the view generation gets triggered (which happens after first request to it) and will reassign new dates on each update of a specific document.
Although the above should solve your issue, I would advice against using it for the reasons already explained by Kxepal: if you're on a replicated network, each node will assign its own dates. So taking this into account, the best I can recommend is to solve the issue on the client side and just post the documents with a date already embedded.

Is it possible to make conditional inserts with Azure Table Storage

Is it possible to make a conditional insert with the Windows Azure Table Storage Service?
Basically, what I'd like to do is to insert a new row/entity into a partition of the Table Storage Service if and only if nothing changed in that partition since I last looked.
In case you are wondering, I have Event Sourcing in mind, but I think that the question is more general than that.
Basically I'd like to read part of, or an entire, partition and make a decision based on the content of the data. In order to ensure that nothing changed in the partition since the data was loaded, an insert should behave like normal optimistic concurrency: the insert should only succeed if nothing changed in the partition - no rows were added, updated or deleted.
Normally in a REST service, I'd expect to use ETags to control concurrency, but as far as I can tell, there's no ETag for a partition.
The best solution I can come up with is to maintain a single row/entity for each partition in the table which contains a timestamp/ETag and then make all inserts part of a batch consisting of the insert as well as a conditional update of this 'timestamp entity'. However, this sounds a little cumbersome and brittle.
Is this possible with the Azure Table Storage Service?
The view from a thousand feet
Might I share a small tale with you...
Once upon a time someone wanted to persist events for an aggregate (from Domain Driven Design fame) in response to a given command. This person wanted to ensure that an aggregate would only be created once and that any form of optimistic concurrency could be detected.
To tackle the first problem - that an aggregate should only be created once - he did an insert into a transactional medium that threw when a duplicate aggregate (or more accurately the primary key thereof) was detected. The thing he inserted was the aggregate identifier as primary key and a unique identifier for a changeset. A collection of events produced by the aggregate while processing the command, is what is meant by changeset here. If someone or something else beat him to it, he would consider the aggregate already created and leave it at that. The changeset would be stored beforehand in a medium of his choice. The only promise this medium must make is to return what has been stored as-is when asked. Any failure to store the changeset would be considered a failure of the whole operation.
To tackle the second problem - detection of optimistic concurrency in the further life-cycle of the aggregate - he would, after having written yet another changeset, update the aggregate record in the transactional medium if and only if nobody had updated it behind his back (i.e. compared to what he last read just before executing the command). The transactional medium would notify him if such a thing happened. This would cause him to restart the whole operation, rereading the aggregate (or changesets thereof) to make the command succeed this time.
Of course, now he had solved the writing problems, along came the reading problems. How would one be able to read all the changesets of an aggregate that made up its history? Afterall, he only had the last committed changeset associated with the aggregate identifier in that transactional medium. And so he decided to embed some metadata as part of each changeset. Among the meta data - which is not so uncommon to have as part of a changeset - would be the identifier of the previous last committed changeset. This way he could "walk the line" of changesets of his aggregate, like a linked list so to speak.
As an additional perk, he would also store the command message identifier as part of the metadata of a changeset. This way, when reading changesets, he could know in advance if the command he was about to execute on the aggregate was already part of its history.
All's well that ends well ...
P.S.
1. The transactional medium and changeset storage medium can be the same,
2. The changeset identifier MUST not be the command identifier,
3. Feel free to punch holes in the tale :-),
4. Although not directly related to Azure Table Storage, I've implemented the above tale successfully using AWS DynamoDB and AWS S3.
How about storing each event at "PartitionKey/RowKey" created based on AggregateId/AggregateVersion?where AggregateVersion is a sequential number based on how many events the aggregate already has.
This is very deterministic, so when adding a new event to the aggregate, you will make sure that you were using the latest version of it, because otherwise you'll get an error saying that the row for that partition already exists. At this time you can drop the current operation and retry, or try to figure out if you could merge the operation anyways if the new updates to the aggregate do not conflict to the operation you just did.

Updating an object to Azure Table Storage - is there any way to get the new Timestamp?

I'm updating an object in AzureTableStorage using the StorageClient library with
context.UpdateObject(obj);
context.SaveChangesWithRetries(obj);
when I do this, is there any way to get hold of the new timestamp for obj without making another request to the server?
Thanks
Stuart
To supplement Seva Titov's answer: the excerpt reported was valid at least until May 2013, but as of November 2013 it has changed (emphasis added):
The Timestamp property is a DateTime value that is maintained on the server side to record the time an entity was last modified. The Table service uses the Timestamp property internally to provide optimistic concurrency. The value of Timestamp is a monotonically increasing value, meaning that each time the entity is modified, the value of Timestamp increases for that entity. This property should not be set on insert or update operations (the value will be ignored).
Now the Timestamp property is no longer regarded as opaque and it is documented that its value increases after each edit -- this suggests that could Timestamp could be now used to track subsequent updates (at least with regard to the single entity).
Nevertheless, as of November 2013 it is still needed another request to Table Storage to obtain the new timestamp when you update the entity (see the documentation of Update Entity REST method). Only when inserting an entity the REST service returns the entire entity with the timestamp (but I don't remember if this is exposed by the StorageClient/Windows Azure storage library).
MSDN page has some guidance on the usage of Timestamp field:
Timestamp Property
The Timestamp property is a DateTime
value that is maintained on the server
side to record the time an entity was
last modified. The Table service uses
the Timestamp property internally to
provide optimistic concurrency. You
should treat this property as opaque:
It should not be read, nor set on
insert or update operations (the value
will be ignored).
This implies that it is really implementation details of the table storage, you should not rely the Timestamp field to represent timestamp of last update.
If you want a field which is guaranteed to represent time of last write, create new field and set it on every update operatio. I understand this is more work (and more storage space) to maintain the field, but that would actually automatically resolves your question -- how to get the timestamp back, because you would already know it when calling context.UpdateObject().
The Timestamp property is actually a Lamport timestamp. It is guaranteed to always grow over time and while it is presented as a DateTime value it's really not.
On the server side, that is, Windows Azure Storage, for each change does this:
nextTimestamp = Math.Max(currentTimestamp + 1, DateTime.UtcNow)
This is all there is to it. And it's of course guaranteed to happen in a transactional manner. The point of all this is to provide a logical clock (monotonic function) that can be used to ensure that the order of events happen in the intended order.
Here's a link to a version of the actual WAS paper and while it doesn't contain any information on the timestamp scheme specifically it has enough stuff there that you quickly realize that there's only one logical conclusion you can draw from this. Anything else would be stupid. Also, if you have any experience with LevelDB, Cassandra, Memtables and it's ilk, you'll see that the WAS team went the same route.
Though I should add to clarify, since WAS provides a strong consistency model, the only way to maintain the timestamp is to do it under lock and key, so there's no way you can guess the correct next timestamp. You have to query WAS for the information. There's no way around that. You can however hold on to an old value and presume that it didn't change. WAS will tell you if it did and then you can resolve the race condition any way you see fit.
I am using Windows Azure Storage 7.0.0
And you can check the result of the operation to get the eTag and the Timespan properties :
var tableResult = cloudTable.Execute(TableOperation.Replace(entity));
var updatedEntity = tableResult.Result as ITableEntity;
var eTag = updatedEntity.ETag;
var timestamp = updatedEntity.Timestamp;
I don't think so, as far as I know Timespan and Etag are set by Azure Storage itself.

Resources