When a read-from-follower enabled query ends up reading from a tablet leader for some parts of the query (because some tablet isn't caught up to desired recency), does the database still read from selected read point (e.g., now() - <selected_stateless>) and doesn’t end up reading the latest value from the leader (because that can cause an inconsistent cut on multi-tablet reads, where some parts are served from local tablets, and some from leader zone tablets).
Yes.
For read-from-follower, there are two settings that set the functionality: yb_read_from_followers to enable reading from followers, and yb_follower_read_staleness_ms to set the consistency timestamp back in time, to allow a follower to have obtained and apply a change.
If a follower is not able to provide the data consistent with the timestamp minus the yb_follower_read_staleness_ms time, the read is redirected to the leader to provide the data.
The re-routing is performed by the tablet_rpc/RpcRetrier. The Request sent to the yb-tserver remains the same. Specifically, it uses the same read_hybrid_time. And it also uses the CONSISTENT_PREFIX setting even though it may end up going to the leader.
Related
According to the internet. You make a request to /_changes?since=0&limit=1 do what you want with the change, then use the last_seq value and pass to since and request again.
My problem is, this skips changes. You can keep requesting /_changes?since=0&limit=1 and get a different change over and over. Only occasionally actually getting the first change to the database. Sometimes you get the 7th change, or the 4th, etc. If you then repeat but using the last_seq value, it skips ahead further, far as I can tell, it never goes back and gets the changes it skipped.
Is there a proper way to periodically watch a couchdb changes feed without using the sockets method instead when using clusters?
What we have right now is a php script that runs on a cron task and requests the last 1000 changes, then it works through them and syncs up SQL databases to match what was in couchdb. With couchdb skipping changes, this is a big problem.
CouchDB 2.x doc states that (see):
"The results returned by _changes are partially ordered. In other words, the order is not guaranteed to be preserved for multiple calls."
So, when you call /_changes?since=0&limit=1 you obtain a different result as the order is not guaranteed.
The _changes response contains a pending attribute with the number of elements that are out of the response. If you take the last_seq value from the last request and use that value as the since attribute in the next request you'll get the next bunch of changes and the pending value is decreased consistently.
Also, you should be careful with the next documentation note:
If the specified replicas of the shards in any given since value are unavailable, alternative replicas are selected, and the last known checkpoint between them is used. If this happens, you might see changes again that you have previously seen. Therefore, an application making use of the _changes feed should be ‘idempotent’, that is, able to receive the same data multiple times, safely.
Read changes in batches is a recommendation of the CouchDB Replication Protocol (see) used by CouchDB compatible clients as Cloudant Sync, so the approach you described should be correct.
Please, don't use the numeric value of the change seq as a reference to infer that there are missed changes as this number is computed from cluster state which may vary between calls. You can check this answer for more detail.
I'm using Cassandra Java driver with a fetch size set to 1k. I need to query all records in a table and perform some time consuming action for a every row.
What will happen if I'll keep the ResultSet open (not fully iterated) for a one day?
What I don't care about:
consistency. If some new record will be written in the meantime, I'm ok to fetch it. However, I'm fine if I won't get it
fault tolerance. If during that process some node will fail, I'm fine if the query will fail too. However, I would like to detect that from the client perspective.
What I care about:
Cassandra resource utilization - I don't want to cause cluster outage due to some blocked resources
lateness - I don't want to block (or slow down much) cluster for other consumers of that table
I would like to get all records which existed when I started the query (assuming no deletions). However, they don't have to be up to date
The paging state is the information about the last read data (literally serialized partition key, clustering, and remaining). When sent to coordinator it will look for everything greater than that. So there are no resources in the server spent for this and no performance impact vs a normal read.
Cassandra does not have any features to allow isolation even within a single query. If data has changed from when the first query was made and the second, you will get the up to date information.
I am thinking about using Kafka connect to stream updates from Cassandra to a Kafka topic. The existing connector from StreamReactor seems to use a timestamp or uuidtimestamp to extract new changes since the last poll. The value of the timestamp is inserted using now() in the insert statement. The connector then saves the maximum time is received last time.
Since Cassandra is eventually consistent I am wondering what actually happens when doing repeated queries using a time range to get new changes. Is there not risk to miss rows inserted into Cassandra because it "arrived late" to the node queried when using WHERE create >= maxTimeFoundSoFar?
Yes it might happen that you have newer data in front of your "cursor" when you already went on with processing if you are using consistency level one for reading and writing, but even if you use higher consistency you might run into "problems" depending on the setup that you have. Basically there are a lot of things that can go wrong.
You can increase the chances of not doing this by using an old cassandra formula NUM_NODES_RESPONDING_TO_READ + NUM_NODES_RESPONDING_TO_WRITE > REPLICATION_FACTOR but since you are using now() from cassandra the node clocks might have millisecond offsets between them so you might even miss data if you have high frequency data. I know of some systems where people are actually using raspberry pi's with gps modules to keep the clock skew really tight :)
You would have to provide more about your use case but in reality yes you can totally skip some inserts if you are not "careful" but even then there is no 100% guarantee other then you process the data with some offset that would be enough for the new data to come in and settle.
Basically you would have to keep some moving time window in the past and then move it along plus making sure that you don't take into account anything newer than the let's say last minute. That way you are making sure the data is "settling".
I had some use cases where we processed sensory data that would came in with multiple days of delay. On some projects we simply ignored it on some the data was for reporting on the month level so we always processed the old data and added it to reporting database. i.e. we kept a time window 3 days back in history.
It just depends on your use case.
I'm using libpq C Library for testing PG + BDR replica set. I'd like to get acknowledgement of the CRUD operations' replication. My purpose is to make my own log of the replication time in milliseconds or if possible in microseconds.
The program:
Starts 10-20 threads witch separate connections, each thread makes 1000-5000 cycles of basic CRUD operations on three tables.
Which would be the best way?
Parsing some high verbosity logs if they have proper data with time stamp or in my C api I should start N thread (N = {number of nodes} - {the master I'm connected to}) after every CRUD op. and query the nodes for the data.
You can't get replay confirmation of individual xacts easily. The system keeps track of the log sequence number replayed by peer nodes but not what transaction IDs those correspond to, since it doesn't care.
What you seem to want is near-synchronous or semi-synchronous replication. There's some work coming there for 9.6 that will hopefully benefit BDR in time, but that's well in the future.
In the mean time you can see the log sequence number as restart_lsn in pg_replication_slots. This is not the position the replica has replayed to, but it's the oldest point it might have to restart replay at after a crash.
You can see the other LSN fields like replay_location only when a replica is connected in pg_stat_replication. Unfortunately in 9.4 there's no easy way to see which slot in pg_replication_slots is associated with which active connection in pg_stat_replication (fixed in 9.5, but BDR is based on 9.4 still). So you have to use the application_name set by BDR if you want to pick out individual nodes, and it's ... "interesting" to parse. Also often truncated.
You can get the current LSN of the server you committed an xact on after committing it by calling SELECT pg_current_xlog_location(); which will return a value like 0/19E0F060 or whatever. You can then look that value up in the pg_stat_replication of peer nodes until you see that the replay_location for the node you committed on has reached or passed the LSN you captured immediately after commit.
It's not perfect. There could be other work done between when you commit and when you capture the server's current LSN. There's no way around that, but at worst you wait slightly too long. If you're using BDR you shouldn't be caring about micro or even milliseconds anyway, since it's an asynchronous replication solution.
The principles are pretty similar to measuring replication lag for normal physical standby servers, so I suggest reading some docs on that. Except that pg_last_xact_replay_timestamp() won't work for logical replication, so you can't get lag using that, you have to use the LSNs and do your own timing client-side.
I'm not a mongodb expert, so I'm a little unsure about server setup now.
I have a single instance running mongo3.0.2 with wiredtiger, accepting both read and write ops. It collects logs from client, so write load is decent. Once a day I want to process this logs and calculate some metrics using aggregation framework, data set to process is something like all logs from last month and all calculation takes about 5-6 hours.
I'm thinking about splitting write and read to avoid locks on my collections (server continues to write logs while i'm reading, newly written logs may match my queries, but i can skip them, because i don't need 100% accuracy).
In other words, i want to make a setup with a secondary for read, where replication is not performing continuously, but starts in a configured time or better is triggered before all read operations are started.
I'm making all my processing from node.js so one option i see here is to export data created in some period like [yesterday, today] and import it to read instance by myself and make calculations after import is done. I was looking on replica set and master/slave replication as possible setups but i didn't get how to config it to achieve the described scenario.
So maybe i wrong and miss something here? Are there any other options to achieve this?
Your idea of using a replica-set is flawed for several reasons.
First, a replica-set always replicates the whole mongod instance. You can't enable it for individual collections, and certainly not only for specific documents of a collection.
Second, deactivating replication and enabling it before you start your report generation is not a good idea either. When you enable replication, the new slave will not be immediately up-to-date. It will take a while until it has processed the changes since its last contact with the master. There is no way to tell how long this will take (you can check how far a secondary is behind the primary using rs.status() and comparing the secondaries optimeDate with its lastHeartbeat date).
But when you want to perform data-mining on a subset of your documents selected by timespan, there is another solution.
Transfer the documents you want to analyze to a new collection. You can do this with an aggregation pipeline consisting only of a $match which matches the documents from the last month followed by an $out. The out-operator specifies that the results of the aggregation are not sent to the application/shell, but instead written to a new collection (which is automatically emptied before this happens). You can then perform your reporting on the new collection without locking the actual one. It also has the advantage that you are now operating on a much smaller collection, so queries will be faster, especially those which can't use indexes. Also, your data won't change between your aggregations, so your reports won't have any inconsistencies between them due to data changing between them.
When you are certain that you will need a second server for report generation, you can still use replication and perform the aggregation on the secondary. However, I would really recommend you to build a proper replica-set (consisting of primary, secondary and an arbiter) and leave replication active at all times. Not only will that make sure that your data isn't outdated when you generate your reports, it also gives you the important benefit of automatic failover should your primary go down for some reason.