CouchDB 2 global_changes system table is getting insanely big - couchdb

We have a system that basically writes 250MB of data into our CouchDB 2 instance, which generates ~50GB/day in the global_changes database.
This makes CouchDB2 consumes all the disk.
Once you get to this state, CouchDB2 goes and never comes back.
We would like to know if there is any way of limiting the size of the global_changes table, or if there is a way of managing this table, like a set of best practices.

tldr: just delete it
http://docs.couchdb.org/en/latest/install/setup.html#single-node-setup
States:
Note that the last of these (referring to _global_changes) is not necessary if you do not expect to be using the global changes feed. Feel free to delete this database if you have created it, it has grown in size, and you do not need the function (and do not wish to waste system resources on compacting it regularly.

Related

Running two instances of MongoDB

I am working on a highly I/O Intensive application (A selection based on the availability of seats) using MERN Stack.
The app is expected to get 2000 concurrent users.
I want to know whether it's wise to use two instances of MongoDB, one on the RAM (in memory) and another on the Hard drive.
The RAM one to be used to store the available seats.
And the Hard drive one to backup the data after regular intervals.
But at the same time I know that if the server crashes my MongoDB data on the RAM is lost.
Could anyone guide me please?
I am using Socket IO instead of AJAX...
I don't think you need this. You can get a good server, with a good amount of RAM, and if you create your indexes correctly, everything should work fine.
Also Mongo 3 won't lock the entire database on each update, like Mongo 2 used to do.
I believe the best approach would be using something like Memcached in order to improve reads. Also, in order to improve database performance and have automated failover use sharding and replica sets.
Consider also that you would have headaches when your server restarted and you lose your data...
This seems unnecessary, because MongoDB already behaves exactly like that out-of-the-box.
The old engine (MMAPv1) was using memory-mapped files, which means that if you have as much RAM as you have data, it practically behaves like an in-memory database with automatic hard-drive backing.
The new engine (Wired Tiger) works a bit different in detail, but the same in general. It allows you to set a cache size (config key storage.wiredTiger.engineConfig.cacheSizeGB). When the cache size is as large enough, you again have an in-memory database with automatic hard-drive mirroring.
More about that in the storage FAQ.
What you are talking about is a scaling problem. You have two options when it comes to scaling: Add resources causing the bottleneck to your existing setup (more RAM and faster disks, usually) or expand your setup. You should first add resources, almost up to the point where adding resources does not give you an according bang for the buck.
At some point, this "scaling up" will not be feasible any more and you have to distribute the load amongst more nodes.
MongoDB comes with a feature for distributing load amongst (logical) nodes: sharding.
Basically, it works like this: multiple replica sets each form a logical node called a shard. Each shard in turn only holds a subset of your data. Instead of connecting to the shards directly, you acres your data via a mongos query router which is aware of which shard holds the data to answer the query and where to write new data.
By carefully selecting your shard key, your reads and writes should be evenly distributed between the shards.
Side note: putting production data on a standalone instance instead of a replica set crosses the border of negligence in my book. Given the prices of today's (rented) hardware, it has never been easier to eliminate a single point of failure than with a MongoDB replica set.

fake fsync calls to improve performance [duplicate]

I am switching to PostgreSQL from SQLite for a typical Rails application.
The problem is that running specs became slow with PG.
On SQLite it took ~34 seconds, on PG it's ~76 seconds which is more than 2x slower.
So now I want to apply some techniques to bring the performance of the specs on par with SQLite with no code modifications (ideally just by setting the connection options, which is probably not possible).
Couple of obvious things from top of my head are:
RAM Disk (good setup with RSpec on OSX would be good to see)
Unlogged tables (can it be applied on the whole database so I don't have change all the scripts?)
As you may have understood I don't care about reliability and the rest (the DB is just a throwaway thingy here).
I need to get the most out of the PG and make it as fast as it can possibly be.
Best answer would ideally describe the tricks for doing just that, setup and the drawbacks of those tricks.
UPDATE: fsync = off + full_page_writes = off only decreased time to ~65 seconds (~-16 secs). Good start, but far from the target of 34.
UPDATE 2: I tried to use RAM disk but the performance gain was within an error margin. So doesn't seem to be worth it.
UPDATE 3:*
I found the biggest bottleneck and now my specs run as fast as the SQLite ones.
The issue was the database cleanup that did the truncation. Apparently SQLite is way too fast there.
To "fix" it I open a transaction before each test and roll it back at the end.
Some numbers for ~700 tests.
Truncation: SQLite - 34s, PG - 76s.
Transaction: SQLite - 17s, PG - 18s.
2x speed increase for SQLite.
4x speed increase for PG.
First, always use the latest version of PostgreSQL. Performance improvements are always coming, so you're probably wasting your time if you're tuning an old version. For example, PostgreSQL 9.2 significantly improves the speed of TRUNCATE and of course adds index-only scans. Even minor releases should always be followed; see the version policy.
Don'ts
Do NOT put a tablespace on a RAMdisk or other non-durable storage.
If you lose a tablespace the whole database may be damaged and hard to use without significant work. There's very little advantage to this compared to just using UNLOGGED tables and having lots of RAM for cache anyway.
If you truly want a ramdisk based system, initdb a whole new cluster on the ramdisk by initdbing a new PostgreSQL instance on the ramdisk, so you have a completely disposable PostgreSQL instance.
PostgreSQL server configuration
When testing, you can configure your server for non-durable but faster operation.
This is one of the only acceptable uses for the fsync=off setting in PostgreSQL. This setting pretty much tells PostgreSQL not to bother with ordered writes or any of that other nasty data-integrity-protection and crash-safety stuff, giving it permission to totally trash your data if you lose power or have an OS crash.
Needless to say, you should never enable fsync=off in production unless you're using Pg as a temporary database for data you can re-generate from elsewhere. If and only if you're doing to turn fsync off can also turn full_page_writes off, as it no longer does any good then. Beware that fsync=off and full_page_writes apply at the cluster level, so they affect all databases in your PostgreSQL instance.
For production use you can possibly use synchronous_commit=off and set a commit_delay, as you'll get many of the same benefits as fsync=off without the giant data corruption risk. You do have a small window of loss of recent data if you enable async commit - but that's it.
If you have the option of slightly altering the DDL, you can also use UNLOGGED tables in Pg 9.1+ to completely avoid WAL logging and gain a real speed boost at the cost of the tables getting erased if the server crashes. There is no configuration option to make all tables unlogged, it must be set during CREATE TABLE. In addition to being good for testing this is handy if you have tables full of generated or unimportant data in a database that otherwise contains stuff you need to be safe.
Check your logs and see if you're getting warnings about too many checkpoints. If you are, you should increase your checkpoint_segments. You may also want to tune your checkpoint_completion_target to smooth writes out.
Tune shared_buffers to fit your workload. This is OS-dependent, depends on what else is going on with your machine, and requires some trial and error. The defaults are extremely conservative. You may need to increase the OS's maximum shared memory limit if you increase shared_buffers on PostgreSQL 9.2 and below; 9.3 and above changed how they use shared memory to avoid that.
If you're using a just a couple of connections that do lots of work, increase work_mem to give them more RAM to play with for sorts etc. Beware that too high a work_mem setting can cause out-of-memory problems because it's per-sort not per-connection so one query can have many nested sorts. You only really have to increase work_mem if you can see sorts spilling to disk in EXPLAIN or logged with the log_temp_files setting (recommended), but a higher value may also let Pg pick smarter plans.
As said by another poster here it's wise to put the xlog and the main tables/indexes on separate HDDs if possible. Separate partitions is pretty pointless, you really want separate drives. This separation has much less benefit if you're running with fsync=off and almost none if you're using UNLOGGED tables.
Finally, tune your queries. Make sure that your random_page_cost and seq_page_cost reflect your system's performance, ensure your effective_cache_size is correct, etc. Use EXPLAIN (BUFFERS, ANALYZE) to examine individual query plans, and turn the auto_explain module on to report all slow queries. You can often improve query performance dramatically just by creating an appropriate index or tweaking the cost parameters.
AFAIK there's no way to set an entire database or cluster as UNLOGGED. It'd be interesting to be able to do so. Consider asking on the PostgreSQL mailing list.
Host OS tuning
There's some tuning you can do at the operating system level, too. The main thing you might want to do is convince the operating system not to flush writes to disk aggressively, since you really don't care when/if they make it to disk.
In Linux you can control this with the virtual memory subsystem's dirty_* settings, like dirty_writeback_centisecs.
The only issue with tuning writeback settings to be too slack is that a flush by some other program may cause all PostgreSQL's accumulated buffers to be flushed too, causing big stalls while everything blocks on writes. You may be able to alleviate this by running PostgreSQL on a different file system, but some flushes may be device-level or whole-host-level not filesystem-level, so you can't rely on that.
This tuning really requires playing around with the settings to see what works best for your workload.
On newer kernels, you may wish to ensure that vm.zone_reclaim_mode is set to zero, as it can cause severe performance issues with NUMA systems (most systems these days) due to interactions with how PostgreSQL manages shared_buffers.
Query and workload tuning
These are things that DO require code changes; they may not suit you. Some are things you might be able to apply.
If you're not batching work into larger transactions, start. Lots of small transactions are expensive, so you should batch stuff whenever it's possible and practical to do so. If you're using async commit this is less important, but still highly recommended.
Whenever possible use temporary tables. They don't generate WAL traffic, so they're lots faster for inserts and updates. Sometimes it's worth slurping a bunch of data into a temp table, manipulating it however you need to, then doing an INSERT INTO ... SELECT ... to copy it to the final table. Note that temporary tables are per-session; if your session ends or you lose your connection then the temp table goes away, and no other connection can see the contents of a session's temp table(s).
If you're using PostgreSQL 9.1 or newer you can use UNLOGGED tables for data you can afford to lose, like session state. These are visible across different sessions and preserved between connections. They get truncated if the server shuts down uncleanly so they can't be used for anything you can't re-create, but they're great for caches, materialized views, state tables, etc.
In general, don't DELETE FROM blah;. Use TRUNCATE TABLE blah; instead; it's a lot quicker when you're dumping all rows in a table. Truncate many tables in one TRUNCATE call if you can. There's a caveat if you're doing lots of TRUNCATES of small tables over and over again, though; see: Postgresql Truncation speed
If you don't have indexes on foreign keys, DELETEs involving the primary keys referenced by those foreign keys will be horribly slow. Make sure to create such indexes if you ever expect to DELETE from the referenced table(s). Indexes are not required for TRUNCATE.
Don't create indexes you don't need. Each index has a maintenance cost. Try to use a minimal set of indexes and let bitmap index scans combine them rather than maintaining too many huge, expensive multi-column indexes. Where indexes are required, try to populate the table first, then create indexes at the end.
Hardware
Having enough RAM to hold the entire database is a huge win if you can manage it.
If you don't have enough RAM, the faster storage you can get the better. Even a cheap SSD makes a massive difference over spinning rust. Don't trust cheap SSDs for production though, they're often not crashsafe and might eat your data.
Learning
Greg Smith's book, PostgreSQL 9.0 High Performance remains relevant despite referring to a somewhat older version. It should be a useful reference.
Join the PostgreSQL general mailing list and follow it.
Reading:
Tuning your PostgreSQL server - PostgreSQL wiki
Number of database connections - PostgreSQL wiki
Use different disk layout:
different disk for $PGDATA
different disk for $PGDATA/pg_xlog
different disk for tem files (per database $PGDATA/base//pgsql_tmp) (see note about work_mem)
postgresql.conf tweaks:
shared_memory: 30% of available RAM but not more than 6 to 8GB. It seems to be better to have less shared memory (2GB - 4GB) for write intensive workloads
work_mem: mostly for select queries with sorts/aggregations. This is per connection setting and query can allocate that value multiple times. If data can't fit then disk is used (pgsql_tmp). Check "explain analyze" to see how much memory do you need
fsync and synchronous_commit: Default values are safe but If you can tolerate data lost then you can turn then off
random_page_cost: if you have SSD or fast RAID array you can lower this to 2.0 (RAID) or even lower (1.1) for SSD
checkpoint_segments: you can go higher 32 or 64 and change checkpoint_completion_target to 0.9. Lower value allows faster after-crash recovery

web development - deletion of user data?

I have finished my first complex web application and I have found out it is probably better to use "isDeleted" flags in db than hard-deleting records. But I wonder what is the recommended approach for data that are stored on filesystem (e.g. photos). Should I delete them when their related entity is (soft-)deleted or keep them as they are? Can junk accumulation cause running out of storage in practice?
It definitely can - you'll need to gather some stats on how much data the typical account generates, and then figure out how many deletions you're seeing to sort out how much junk data will pile up and/or when you'll fill up your storage.
You might also want to try using something like S3 to store your data - at that point, the only reason you would need to delete things would be because it was costing you too much to store it.

Replicating CouchDB to local couch reduces size - why?

I recently started using Couch for a large app I'm working on.
I database with 7907 documents, and wanted to rename the database. I poked around for a bit, but couldn't figure out how to rename it, so I figured I would just replicate it to a local database of the name I wanted.
The first time I tried, the replication failed, I believe the error was a timeout. I tried again, and it worked very quickly, which was a little disconcerting.
After the replication, I'm showing that the new database has the correct amount of records, but the database size is about 1/3 of the original.
Also a little odd is that if I refresh futon, the size of the original fluctuates between 94.6 and 95.5 mb
This leaves me with a few questions:
Is the 2nd database storing references to the first? If so, can I delete the first without causing harm?
Why would the size be so different? Had the original built indexes that the new one eventually will?
Why is the size fluctuating?
edit:
A few things that might be helpful:
This is on a cloudant couchdb install
I checked the first and last record of the new db, and they match, so I don't believe futon is underreporting.
Replicating to a new database is similar to compaction. Both involve certain side-effects (incidentally, and intentionally, respectively) which reduce the size of the new .couch file.
The b-tree indexes get balanced
Data from old document revisions is discarded.
Metadata from previous updates to the DB is discarded.
Replications store to/from checkpoints, so if you re-replicate from the same source, to the same location (i.e. re-run a replication that timed out), it will pick up where it left off.
Answers:
Replication does not create a reference to another database. You can delete the first without causing harm.
Replicating (and compacting) generally reduces disk usage. If you have any views in any design documents, those will re-build when you first query them. View indexes use their own .view file which also consumes space.
I am not sure why the size is fluctuating. Browser and proxy caches are the bane of CouchDB (and web) development. But perhaps it is also a result of internal Cloudant behavior (for example, different nodes in the cluster reporting slightly different sizes).

CouchDB .view file growing out of control?

I recently encountered a situation where my CouchDB instance used all available disk space on a 20GB VM instance.
Upon investigation I discovered that a directory in /usr/local/var/lib/couchdb/ contained a bunch of .view files, the largest of which was 16GB. I was able to remove the *.view files to restore normal operation. I'm not sure why the .view files grew so large and how CouchDB manages .view files.
A bit more information. I have a VM running Ubuntu 9.10 (karmic) with 512MB and CouchDB 0.10. The VM has a cron job which invokes a Python script which queries a view. The cron job runs once every five minutes. Every time the view is queried the size of a .view file increases. I've written a job to monitor this on an hourly basis and after a few days I don't see the file rolling over or otherwise decreasing in size.
Does anyone have any insights into this issue? Is there a piece of documentation I've missed? I haven't been able to find anything on the subject but that may be due to looking in the wrong places or my search terms.
CouchDB is very disk hungry, trading disk space for performance. Views will increase in size as items are added to them. You can recover disk space that is no longer needed with cleanup and compaction.
Every time you create update or delete a document then the view indexes will be updated with the relevant changes to the documents. The update to the view will happen when it is queried. So if you are making lots of document changes then you should expect your index to grow and will need to be managed with compaction and cleanup.
If your views are very large for a given set of documents then you may have poorly designed views. Alternatively your design may just require large views and you will need to manage that as you would any other resource.
It would be easier to tell what is happening if you could describe what document updates (inc create and delete) are happening and what your view functions are emitting, especially for the large view.
That your .view files grow, each time you access a view is because CouchDB updates views on access. CouchDB views need compaction like databases too. If you have frequent changes to your documents, resulting in changes in your view, you should run view compaction from time to time. See http://wiki.apache.org/couchdb/HTTP_view_API#View_Compaction
To reduce the size of your views, have a look at the data, you are emitting. When you emit(foo, doc) the entire document is copied to the view to it is very instantly available when you query the view. the function(doc) { emit(doc.title, doc); } will result in a view as big as the database itself. You could also emit(doc.title, nil); and use the include_docs option to let CouchDB fetch the document from the database when you access the view (which will result in a slightly performance penalty). See http://wiki.apache.org/couchdb/HTTP_view_API#Querying_Options
Use sequential or monotonic id's for documents instead of random
Yes, couchdb is very disk hungry, and it needs regular compactions. But there is another thing that can help reducing this disk usage, specially sometimes when it's unnecessary.
Couchdb uses B+ trees for storing data/documents which is very good data structure for performance of data retrieval. However use of B-tree trades in performance for disk space usage. With completely random Id, B+-tree fans out quickly. As the minimum fill rate is 1/2 for every internal node, the nodes are mostly filled up to the 1/2 (as the data spreads evenly due to its randomness) generating more internal nodes. Also new insertions can cause a rewrite of full tree. That's what randomness can cause ;)
Instead, use of sequential or monotonic ids can avoid all.
I've had this problem too, trying out CouchDB for a browsed-based game.
We had about 100.000 unexpected visitors on the first day of a site launch, and within 2 days the CouchDB database was taking about 40GB in space. This made the server crash because the HD was completely full.
Compaction brought that back to about 50MB. I also set the _revs_limit (which defaults to 1000) to 10 since we didn't care about revision history, and it's running perfectly since. After almost 1M users, the database size is usually about 2-3GB. When i run compaction it's about 500MB.
Setting document revision limit to 10:
curl -X PUT -d "10" http://dbuser:dbpassword#127.0.0.1:5984/yourdb/_revs_limit
Or without user:password (not recommended):
curl -X PUT -d "10" http://127.0.0.1:5984/yourdb/_revs_limit

Resources