S3 - Eventual Consistency and multiple clients - get

If I PUT an object in S3 (US East) and then repeatedly request the object's metadata until I can GET it, then at that point am I guaranteed that all other clients can now GET the object? OR is it possible that another client's request is somehow routed to a different server that has not yet registered the PUT? I'm trying to understand the consequence of eventual consistency specifically in the case where one client has been able to GET.

S3 does not provide any specific guarantees as to when all replicas converge and all GETs are guaranteed to return the latest data; this is because, if a network partition were to occur between S3 replicas, replication could be delayed until the partition is repaired. However usually consistency is acheived within seconds.

Related

Azure Cosmos Session consistency Vs Bounded staleness consistency for microservices with multiple instances in same region

Suppose I am developing a social media app. We have multiple instances of backend service deployed in the same region. Now, whenever the app page is reloaded, one of the instances of backend service will receive the request and contact cosmos. The problem is follows -:
T1 - App page loaded. Instance 1 of service issued read request to cosmos.
T2 - Comment added by user A. Instance 2 of service issued a write in cosmos.
T3 - App page again loaded. Instance 3 of service issued read request to cosmos.
If we use session consistency, the session token for time T2 and T3 will be different as write query is issued by a different instance and read query by a different one. Due to this, it may happen that at time T3, when the app page is loaded, the comment added by user A is not loaded as session consistency dilutes to consistent-prefix in case of different session tokens in same region.
To solve this, we can use bounded staleness but I think it may be an overkill.
How can we handle these multi-instance service scenarios with session consistency?
The only way to do this with session token is to implement a distributed mutex. While this will work, it may adversely impact performance because the token resource may be locked when their is a high concurrency of requests.
All writes in Cosmos DB are majority quorum (3/4 replicas). For accounts with just a single region, Bounded staleness ensures no stale reads by doing a 2 replica read, then comparing the LSN from each replica. If the LSN's match, the data is the latest and it returns to the client. If the LSN's do not match, it returns the higher one because this will be the latest.
The trade off here between doing a distributed mutex and using bounded staleness is building a distributed mutex takes time and effort and can adversely impact latency. Using Bounded Staleness requires zero effort, but every read costs 2x that of Session because it is a 2 replica read, rather than a single replica with Session consistency.

How is the coordinator node in cassandra determined by a client driver? [duplicate]

This question already has answers here:
how Cassandra chooses the coordinator node and the replication nodes?
(2 answers)
Closed 3 years ago.
I don't understand the load balancing algorithm in cassandra.
It seems that the TokenAwarePolicy can be used to route my request to the coordinator node holding the data. In particular, the documentation states (https://docs.datastax.com/en/developer/java-driver/3.6/manual/load_balancing/) that it works when the driver is able to automatically calculate a routing-key. If it can, I am routed to the coordinator node holding the data, if not, I am routed to another node. I can still specify the routing-key myself if I really want to reach the data without any extra hop.
What does not make sense to me:
If the driver cannot calculate the routing-key automatically, then why can the coordinator do this? Does it have more information than the client driver? Or does the coordinator node then ask every other node in the cluster on my behalf? This would then not scale, right?
I thought that the gossip protocol is used to share the topology of the ring among all nodes (AND the client driver). The client driver than has the complete 'ring' structure and should be equal to any 'hop' node.
Load balancing makes sense to me when the client driver determines the N replicas holding the data, and then prioritizes them (host-distance, etc), but it doesn't make sense to me when I reach a random node that is unlikey to have my data.
Token aware load balancing happens only for that statements that are able to hold routing information. For example, for prepared queries, driver receives information from cluster about fields in query, and has information about partition key(s), so it's able to calculate token for data, and select the node. You can also specify the routing key youself, and driver will send request to corresponding node.
It's all explained in the documentation:
For simple statements, routing information can never be computed automatically
For built statements, the keyspace is available if it was provided while building the query; the routing key is available only if the statement was built using the table metadata, and all components of the partition key appear in the query
For bound statements, the keyspace is always available; the routing key is only available if all components of the partition key are bound as variables
For batch statements, the routing information of each child statement is inspected; the first non-null keyspace is used as the keyspace of the batch, and the first non-null routing key as its routing key
When statement doesn't have routing information, the request is sent to node selected by nested load balancing policy, and the node-coordinator, performs parsing of the statement, extract necessary information and calculates the token, and forwards request to correct node.

Cassandra timeout during write query but entry present in Datebase

We are using Cassandra 3.0 on our system. For insertion in the db, we are using the Datastax C# driver.
We have a query regarding the timeout and retry during insertion. We faced an instance where a timeout during insert was thrown yet there is that entry present in the database. All are settings are default in the Cassandra.yaml file as well as in the driver.
How can we know the actual status of the insert even if there is a timeout? If there was a timeout thrown, how could possibly the insert have gone through ahead? Whether the insert was successful or there was some default retry policy in place that was applied, we don't have any tangible answer on it currently and we need to know exactly about that.
How do we make sure that the status of that insertion was actually successful/failed with or without the timeout?
A write timeout is not necessarily a failure to write, moreover it's a notification that not enough replicas acknowledged the write within a time period. The write will still eventually happen on all replicas.
If you do observe a write timeout, it indicates that not enough replicas responded for the configured consistency level within the configured write_request_timeout_in_ms value in cassandra.yaml, the default being 2 seconds. Keep in mind however that the write will still happen.
The coordinating Cassandra node responsible for that write sends write mutations to all replicas and responds to the client as soon as enough have replied or the timeout is reached. Because of this, if you get a WriteTimeoutException you should assume the write happened. If any of the replicas are down, the coordinator maintains a hint for that write, which will be delivered to the replica when it becomes available again.
Cassandra also employs Read Repairs and Operators should run recurring Repairs to help keep data consistent.
If your operations are idempotent, you can simply retry the write until it succeeds. Or you can attempt to read the data back to make sure the write was processed. However, depending on your application requirements, you may not need to employ these strategies and you can safely assume the write did or will happen.
Please note on the other hand that unavailable errors (i.e. Not enough replicas available at consistency level X) indicate that not enough replicas were available to perform a write and therefore the write is never attempted.

Is it possible to have external trigger in Cassandra?

I need a worker to subscribe to new data entries in a column family.
I have to invoke the services consuming data on the producer side, or poll the column family for new data, which is a waste of resources and also leads to some extended latency.
I want some external service to be invoked when new data is written to column family. Is it possible to invoke an external service, such as an REST endpoint upon new data arrival?
There are two features, triggers and CDC (change data capture) that may work. You can create a trigger to receive updates and execute the http request, or you can use CDC to get a per replica copy of the mutations as a log to walk through.
CDC is better for consistency, since a trigger fires before mutations applied, your API endpoint may be notified but then have the mutation fail to apply so your at an inconsistent state. But triggers are easier since you dont need to worry about deduplication since its only 1 per query vs 1 per replica. Or you can use both, triggers that update a cached state and then CDC with a map reduce job to fix any inconsistencies.

Cassandra DB - Node is down and a request is made to fetch data in that Node

If we configured our replication factor in such a way that there are no replica nodes (Data is stored in one place/Node only) and if the Node contains requested data is down, How will the request be handled by Cassandra DB?
Will it return no data or Other nodes gossip and somehow pick up data from failed Node(Storage) and send the required response? If data is picked up, Will data transfer between nodes happen as soon as Node is down(GOSSIP PROTOCOL) or after a request is made?
Have researched for long time on how GOSSIP happens and high availability of Cassandra but was wondering availability of data in case of "No Replicas" since I do not want to waste additional Storage for occasional failures and at the same time, I need availability and No data loss(though delayed)
I assume when you say that there is "no replica nodes" you mean that you have set the Replication Factor=1. In this case if the request is a Read then it will fail, if the request is a write it will be stored as a hint, up to the maximum hint time, and will be replayed. If the node is down for longer than the hint time then that write will be lost. Hinted Handoff: repair during write path
In general only having a single replica of data in your C* cluster goes against some the basic design of how C* is to be used and is an anti-pattern. Data duplication is a normal and expected part of using C* and is what allows for it's high availability aspects. Having an RF=1 introduces a single point of failure into the system as the server containing that data can go out for any of a variety of reasons (including things like maintenance) which will cause requests to fail.
If you are truly looking for a solution that provides high availability and no data loss then you need to increase your replication factor (the standard I usually see is RF=3) and setup your clusters hardware in such a manner as to reduce/remove potential single points of failure.

Resources