In LMDB, does a transaction read the value it just wrote, or the old value? - lmdb

a transaction provides a consistent view of the data. now if there's an RW transaction, which has acquired a handle and is currently operating, if it changes a key, and reads it back, later, within the same transaction, but before commiting or aborting, will it read the old value or the value it just wrote?

It will read the value it just wrote.

Related

Optimistic concurrency control clarification

I am new to ES7 and trying to understand optimistic concurrency control.
I think I understand that when I get-request a document and send its _seq_no and _primary_term values in a later write-request to the same document, if the values differ, the write will be completely ignored.
But what happens to the document in the default case where I don't send the _seq_no and _primary_term values? Will the write go through even if it has older _seq_no and _primary_term values (therefore making the index inconsistent), or only be processed if the values are newer?
If the former, will the document eventually be consistent?
I'm trying to figure out if I need to send these values to get eventual consistency or if I get it for free without sending those values.
It's a great distributed system question. Let me break down the problem into sub-parts for readability and even before explain what is _seq_no and _primary_term as there isn't much explanation of those on the ES site.
_seq_no is the incremental counter which is assigned to ES document for each operation(update, delete, index), for example:- the first time you index a doc, it will have value 1, next update will have 2, next delete operation will have three and so on. Read operation doesn't update it.
_primary_term is the also an incremental counter, but change only when a replica shard is promoted as primary, due to network or any other failure, so if everything is excellent in your cluster it will not be changed, but in case of some failure and other replica promoted to primary then it would be increased.
Coming to the first question,
Q:- What happens to the document in the default case where I don't send the _seq_no and _primary_term values?
Ans:- you can have lost update issue, suppose you have a counter which you are updating, simultaneously 2 requests read the counter value to 1 and trying to increment by 1. now when you don't specify these above terms explicitly, then it's calculated by ES.
Now both the requests reach simultaneously to ES, then ES(primary shard) will process them one by one by increasing the sequence number, so at the end, your counter will have value 2, instead of 3. to make sure this doesn't happen, you pass these term values explicitly, and when ES tries to update them will see different sequence number and will reject your request.
To prevent such lost updates, use-cases, its always recommended sending explicit version number.
Q:- I'm trying to figure out if I need to send these values to get eventual consistency or if I get it for free without sending those values..
Answer:- These are related to concurrency control and nothing to deal with eventual consistency. In ES, write always happens to primary shards, but read can happen to any replicas(may contain obsolete data), which makes ES eventual consistent.
Important read
https://www.elastic.co/blog/elasticsearch-sequence-ids-6-0

Couchdb watch changes feed in clustered mode returning random changes for the same since value

According to the internet. You make a request to /_changes?since=0&limit=1 do what you want with the change, then use the last_seq value and pass to since and request again.
My problem is, this skips changes. You can keep requesting /_changes?since=0&limit=1 and get a different change over and over. Only occasionally actually getting the first change to the database. Sometimes you get the 7th change, or the 4th, etc. If you then repeat but using the last_seq value, it skips ahead further, far as I can tell, it never goes back and gets the changes it skipped.
Is there a proper way to periodically watch a couchdb changes feed without using the sockets method instead when using clusters?
What we have right now is a php script that runs on a cron task and requests the last 1000 changes, then it works through them and syncs up SQL databases to match what was in couchdb. With couchdb skipping changes, this is a big problem.
CouchDB 2.x doc states that (see):
"The results returned by _changes are partially ordered. In other words, the order is not guaranteed to be preserved for multiple calls."
So, when you call /_changes?since=0&limit=1 you obtain a different result as the order is not guaranteed.
The _changes response contains a pending attribute with the number of elements that are out of the response. If you take the last_seq value from the last request and use that value as the since attribute in the next request you'll get the next bunch of changes and the pending value is decreased consistently.
Also, you should be careful with the next documentation note:
If the specified replicas of the shards in any given since value are unavailable, alternative replicas are selected, and the last known checkpoint between them is used. If this happens, you might see changes again that you have previously seen. Therefore, an application making use of the _changes feed should be ‘idempotent’, that is, able to receive the same data multiple times, safely.
Read changes in batches is a recommendation of the CouchDB Replication Protocol (see) used by CouchDB compatible clients as Cloudant Sync, so the approach you described should be correct.
Please, don't use the numeric value of the change seq as a reference to infer that there are missed changes as this number is computed from cluster state which may vary between calls. You can check this answer for more detail.

Does CQL3 "IF" make my update not idempotent?

It seems to me that using IF would make the statement possibly fail if re-tried. Therefore, the statement is not idempotent. For instance, given the CQL below, if it fails because of a timeout or system problem and I retry it, then it may not work because another person may have updated the version between retries.
UPDATE users
SET name = 'foo', version = 4
WHERE userid = 1
IF version = 3
Best practices for updates in Cassandra are to make updates idempotent, yet the IF operator is in direct opposition to this. Am I missing something?
If your application is idempotent, then generally you wouldn't need to use the expensive IF clause, since all your clients would be trying to set the same value.
For example, suppose your clients were aggregating some values and writing the result to a roll up table. Each client would calculate the same total and write the same value, so it wouldn't matter if multiple clients wrote to it, or what order they wrote to it, since it would be the same value.
If what you are actually looking for is mutual exclusion, such as keeping a bank balance, then the IF clause could be used. You might read a row to get the current balance, then subtract some money and update the balance only if the balance hadn't changed since you read it. If another client was trying to add a deposit at the same time, then it would fail and would have to try again.
But another way to do that without mutual exclusion is to write each withdrawal and deposit as a separate clustered transaction row, and then calculate the balance as an idempotent result of applying all the transaction rows.
You can use the IF clause for idempotent writes, but it seems pointless. The first client to do the write would succeed and Cassandra would return the value "applied=True". And the next client to try the same write would get back "applied=False, version=4", indicating that the row had already been updated to version 4 so nothing was changed.
This question is more about linerizability(ordering) than idempotency I think. This query uses Paxos to try to determine the state of the system before applying a change. If the state of the system is identical then the query can be retried many times without a change in the results. This provides a weak form of ordering (and is expensive) unlike most Cassandra writes. Generally you should only use CAS operations if you are attempting to record state of a system (rather than a history or log)
Do not use many of these queries if you can help it, the guidelines suggest having only a small percentage of your queries rely on this behavior.

Updating and retrieving keys in Redis

I am using Redis key-value pair for storing the data. The data against a particular key can change at any point of time, so after every retrieval request I asynchronously update the data stored against the requested key so that the next request can be served with updated data.
I have done quite a bit of testing but still I am wondering if there could be any case where this approach might have some negative consequences?
PS: The data is consolidated from multiple servers.
Thanks in advance for any help/suggestions.
If you already know the value to be stored, you can use GETSET (or a transaction if it is not a simple string type).
If the new value is some manipulation on the value i.e. f(value), you should do it in a LUA script.
Otherwise some other client might read the old value before you update it.

Get Timestamp after Insert/Update

In azure table storage. Is there a way to get the new timestamp value after an update or insert. I am writing a 3-phase commit protocol to get table storage to support distributed transactions , and it involes multiple writes to the same entity. So the operation order goes like this, Read Entity, Write Entity (Lock Item), Write Entity (Commit new values). I would like to get the new timestamp after the lock item operation so I don't have to unecessarily read the item again before doing the commit new value operation. So does any one know how to efficiently get the new timestamp value after a savechanges operation?
I don't think you need to do anything special/extra. When you read your entity you will get an Etag for it. When you save that entity (setting someLock=true) that save will only succeed if nobody else have updated the entity since your read. Hence you know you have the lock. And then you can do your second write as you please.
I don't believe it is possible. I would use your own timestamp and/or guid to mark entries.
If you're willing to go back to the Update REST API call, it does return the time that the response was generated. It probably won't be exactly the same as the time stamp on the record, but it will be close I'm sure.
You may need to hack your Azure table. drivers
In the Azure python lib (TableStorage) for example, the Timestamp is simply skipped over.
# exclude the Timestamp since it is auto added by azure when
# inserting entity. We don't want this to mix with real properties
if name in ['Timestamp']:
continue

Resources