I have a couchdb in a cluster setup. 3 nodes, all shards on all nodes and W = 2. We have code to create a document in couchdb and read it back from a view. However, the view returns no corresponding data intermittently. The data is there after we check couchdb directly. So, my question is that why the third nodes taking so long to write a value and how long should I expect the write latency to be?
Thanks in advance.
If you query a view and not use stale parameter, views are supposed to always return fresh data. A view will first gets itself updated to the database, and then returns results for your query.
A view can get results from any node. If you query a view, and don't get expected fresh data, it means that the updates are not yet available on the node used.
If a write a document with W =2, than at least two nodes out of three should successfully update this document. And if all nodes are up, internal synchronization between nodes, within milliseconds or seconds should bring updates to all nodes. So the latency should be just several seconds.
How long was the latency that you experienced? Was your view finally able to produce the expected results after this latency?
Related
Consider a growing number of data, let's choose from two extreme choices:
Evenly distribute all data across all nodes in the cluster
We pack them to as few nodes as possible
I prefer option 1 because as the volume of data grows, we can scatter it with all nodes, so that when each node is queried, it has the lowest load.
However, some resources state that we shouldn't query all the nodes because that will slow down the query. Why would that slow the query? Isn't that just a normal scatter and gather? They even claim this hurts linear scalability as adding more nodes will further drag down the query.
(Maybe I am missing on how Cassandra performs the query, some background reference is appreciated).
On the contrary, some resources state that we should go with option 2 because it queries the least number of nodes.
Of course there is no black and white choices here; everything must have a tradeoff.
I want to know, what's the real difference between option 1 and option 2. Plus, regarding the network querying, why option 1 would be slow.
I prefer option 1 because as the volume of data grows, we can scatter it with all nodes, so that when each node is queried, it has the lowest load.
You definitely want to go with option #1. This is also preferable, in that new or replacement nodes will stream much faster than a cluster made of fewer, dense nodes.
However, some resources state that we shouldn't query all the nodes because that will slow down the query.
And those resources are absolutely correct. First of all, if you read through the resources which Alex posted above you'll discover how to build your tables so that your queries can be served by a single node. Running queries which only hit a single node is the best way around that problem.
Why would that slow the query?
Because in a distributed database environment, query time becomes network time. There are many people out there who like to run multi-key or unbound queries against Cassandra. When that happens, and the query is unable to find a single node with the data, Cassandra picks one node to designate as a "coordinator."
That node builds the result set with data from the other nodes. Which means in a 30 node cluster, that one node is now pulling data from the other 29. Assuming that these requests don't time-out, the likelihood that the coordinator will crash due to trying to manage too much data is very high.
The bottom line, is that this is one of those tradeoffs between a CA relational database and an AP partitioned row store. Build your tables to support your queries, store data together which is queried together, and Cassandra will perform just fine.
I'm using Cassandra Java driver with a fetch size set to 1k. I need to query all records in a table and perform some time consuming action for a every row.
What will happen if I'll keep the ResultSet open (not fully iterated) for a one day?
What I don't care about:
consistency. If some new record will be written in the meantime, I'm ok to fetch it. However, I'm fine if I won't get it
fault tolerance. If during that process some node will fail, I'm fine if the query will fail too. However, I would like to detect that from the client perspective.
What I care about:
Cassandra resource utilization - I don't want to cause cluster outage due to some blocked resources
lateness - I don't want to block (or slow down much) cluster for other consumers of that table
I would like to get all records which existed when I started the query (assuming no deletions). However, they don't have to be up to date
The paging state is the information about the last read data (literally serialized partition key, clustering, and remaining). When sent to coordinator it will look for everything greater than that. So there are no resources in the server spent for this and no performance impact vs a normal read.
Cassandra does not have any features to allow isolation even within a single query. If data has changed from when the first query was made and the second, you will get the up to date information.
I am trying to request a large number of documents from my database (which has over 400k documents). I started using _all_docs built-in view. I first tried with this query:
http://database:port/databasename/_all_docs?limit=100&include_docs=true
No problem. Completes as expected. Now to ramp it up:
http://database:port/databasename/_all_docs?limit=1000&include_docs=true
Still fine. Took longer, more data, etc. as expected. Ramp it up again:
http://database:port/databasename/_all_docs?limit=10000&include_docs=true
Request never completes. The Dev tools in chrome show Size = 5.3MB (seems to be significant), and this occurs no matter what value for the limit parameter I use that is over 6500ish. No matter if i specify 6500 or 10,000, it always returns 5.3MB downloaded, and the request stalls.
I have also tried other combinations, such as "skip" and it seems that limit + skip must be < 6500 or I get the same stall.
My environment: Couchdb 1.6.1, Ubuntu 14.04.3 LTS, Azure A1 standard
you have to prewarm your queries, just throwing a 100K or more docs and expecting that you'd get them out of couchdb won't work, it just won't work.
When you ask for some items from a view (in your case Default View), at the first read CouchDB will notice that the B-tree for the view doesn't exist yet, so it goes ahead and builds it on the first read. Depending on how many documents you have in your database, that can take a while, putting a good work load on your database.
On every subsequent read, CouchDB will check if documents have changed since the last write, and throw the changed documents at the map and reduce function. So if you only query some views from time to time, but have lots of changes in between, expect some delays on the next read.
There are 2 ways to handle this situation
1. Pre-warm your view - run a cronjob that does reads to make sure that your view has the B-Tree for this View.
2. Prepare your view in advance for a particular query before inserting the data in the couchdb.
and for now if you really want to read all your docs, don't read them all at once, rather use the skip, limit range queries.
I understand that CouchDB hashes the source of each design documents against the name of the index file. Whenever I change the source code, the index needs to be rebuild. CouchDB does this when the document is requested for the first time.
What I'd expect to happen and want to happen
Each time I change a design doc, the first call to a view will take significantly longer than usual and may time out. The index will continue to build. Once this is completed, the view will only process changes and will be very fast.
What actually happens
When running an amended view for the first time, I see the process in the status window, slowly reach 100%. This takes about 2 hours. During this time all CPU's are fully utilized.
Once process reaches 99% it remains there for about an hour and then disappears. CPU utilization drops to just one cpu.
When the process has disappeared, the data file for the view keeps growing for about half an hour to an hour. CPU utilization is near 0%
The index file suddenly stops to increase in size.
If I request the view again when I've reached state 4), the characteristics of 3) start again. I have to repeat this process between 5 to 50 times until I can finally retrieve the view values.
If the view get's requested a second time whilst till in stage 1 or 2, it will most definitely run out of memory and I have to restart the CouchDB service. This is despite my DB rarely using more than 2 GByte when runninng just one job and more than 4 GByte free in usual operation.
I have tried to tweak configuration settings, add more memory, but nothing seems to have an impact.
My Question
Do I misunderstand the concept of running views or is something wrong with my setup?
If this is expected, is there anything I can tweak to reduce the number of reruns?
Context
My documents are pretty large (1 to 20 MByte). The data they contain is well structured, they are usually web-analytics reports and would in a relational database be stored as several 10k rows of data.
My map function extracts these rows. It returns the dimensions as key array. The key array sometimes exceeds 20 columns. Most views will only have less than 10 columns.
The reduce function will aggregate (sum) all values in rows with identical keys. The metrics are stored in a dictionary and may contain different keys. The reduce function identifies missing keys in one document and adds these to the aggregate as 0.
I am using CouchDB 1.5.0 on Windows Server 2008 R2 with 2CPUs and 8 GByte memory.
The views are written in javascript using the couchjs query server.
My designs documents usually consist of several views, with a '_lib' view that does not emit any data, but contains an exhaustive library of functions accessed by the actual views.
It is a known issue, but just in case: if you have gigabytes of docs, you can forget about reduce functions. Only build-in ones will work fast enough.
It is possible to set os_process_limit to an extra-low value (1 sec, for sample). This way you can detect which doc takes long to be indexed and optimize your map function for performance.
It may be too much turkey over the holidays, but I've been thinking about a potential problem that we could have with Couchbase.
Currently we paginate based on time, but I'm thinking a similar issue could occur with other values used for paging for example the atomic counter. I'll try to explain best I can, this would only occur in a load balanced environment.
For example say we have 4 servers load balanced and storing data to our Couchbase cluster. We sort our records based on timestamps currently. If any of the 4 servers writing the data starts to lag behind the others than our pagination would possibly be missing records when retrieving client side. A SQL DB auto-increment and timestamps for example can be created when the record is stored to the DB which will avoid similar issues. Using a NoSql DB like Couchbase you define the data you need to retrieve on before it is stored to the DB. So what I am getting at is if there is a delay in storing to the DB and you are retrieving in a pagination fashion while this delay has occurred, you run the real possibility of missing data. Since we are paging that data may never be viewed.
Interested in what other thoughts people have on this.
EDIT**
Response to Andrew:
Example a facebook or pintrest type app is storing data to a DB, they have many load balanced servers from the frontend writing to the db. If for some reason writing is delayed its a non issue with a SQL DB because a timestamp or auto increment happens when the data is actually stored to the DB. There will be no missing data when paging. asking for 1-7 will give you data that is only stored in the DB, 7-* will contain anything that is delayed because an auto-increment value has not been created for that record becuase it is not actually stored.
In Couchbase its different, you actually get your auto increment value (atomic counter) and then save it. So for example say a record is going to be stored as atomic counter number 4. For some reasons this is delayed in storing to the DB. Other servers are grabbing 5, 6, 7 and storing that data just fine. The client now asks for all data between 1 and 7, 4 is still not stored. Then the next paging request is 7 to *. 4 will never be viewed.
Is there a way around this? Can it be modelled differently in CB, or is this just a potential weakness in CB when needing to page results. As I mentioned are paging is timestamp sensitive.
Michael,
Couchbase is an eventually consistent database with respect to views. It is ACID with respect to documents. There are durability interfaces that let you manage this. This means that you can rest assured you won't lose data and that indexes will catch up eventually.
In my experience with Couchbase, you need to expect that the nodes will never be in-sync. There are many things the database is doing, such as compaction and replication. The most important thing you can do to enhance performance is to put your views on a separate spindle from the data. And you need to ensure that your main data spindles across your cluster can sustain between 3-4 times your ingestion bandwidth. Also, make sure your main document key hashes appropriately to distribute the load.
It sounds like you are discussing a situation where the data exists in your system for less time than it takes to be processed through the view system. If you are removing data that fast, you need either a bigger cluster or faster disk arrays. Of the two choices, I would expand the size of your cluster. I like to think of Couchbase as building a RAIS, Redundant Array of Independent Servers. By expanding the cluster, you reduce the coincidence of hotspots and gain disk bandwidth. My ideal node has two local drives, one each for data and views, and enough RAM for my working set.
Anon,
Andrew