Running two instances of MongoDB - node.js

I am working on a highly I/O Intensive application (A selection based on the availability of seats) using MERN Stack.
The app is expected to get 2000 concurrent users.
I want to know whether it's wise to use two instances of MongoDB, one on the RAM (in memory) and another on the Hard drive.
The RAM one to be used to store the available seats.
And the Hard drive one to backup the data after regular intervals.
But at the same time I know that if the server crashes my MongoDB data on the RAM is lost.
Could anyone guide me please?
I am using Socket IO instead of AJAX...

I don't think you need this. You can get a good server, with a good amount of RAM, and if you create your indexes correctly, everything should work fine.
Also Mongo 3 won't lock the entire database on each update, like Mongo 2 used to do.
I believe the best approach would be using something like Memcached in order to improve reads. Also, in order to improve database performance and have automated failover use sharding and replica sets.
Consider also that you would have headaches when your server restarted and you lose your data...

This seems unnecessary, because MongoDB already behaves exactly like that out-of-the-box.
The old engine (MMAPv1) was using memory-mapped files, which means that if you have as much RAM as you have data, it practically behaves like an in-memory database with automatic hard-drive backing.
The new engine (Wired Tiger) works a bit different in detail, but the same in general. It allows you to set a cache size (config key storage.wiredTiger.engineConfig.cacheSizeGB). When the cache size is as large enough, you again have an in-memory database with automatic hard-drive mirroring.
More about that in the storage FAQ.

What you are talking about is a scaling problem. You have two options when it comes to scaling: Add resources causing the bottleneck to your existing setup (more RAM and faster disks, usually) or expand your setup. You should first add resources, almost up to the point where adding resources does not give you an according bang for the buck.
At some point, this "scaling up" will not be feasible any more and you have to distribute the load amongst more nodes.
MongoDB comes with a feature for distributing load amongst (logical) nodes: sharding.
Basically, it works like this: multiple replica sets each form a logical node called a shard. Each shard in turn only holds a subset of your data. Instead of connecting to the shards directly, you acres your data via a mongos query router which is aware of which shard holds the data to answer the query and where to write new data.
By carefully selecting your shard key, your reads and writes should be evenly distributed between the shards.
Side note: putting production data on a standalone instance instead of a replica set crosses the border of negligence in my book. Given the prices of today's (rented) hardware, it has never been easier to eliminate a single point of failure than with a MongoDB replica set.

Related

Azure SQL virtual machine performance - Inserts very slow

I'm trying out different pricing tiers on SQL Server.
Im inserting 4000 rows distributed over 4 tables in 10 seconds
My problem: I don't any performance improvements from a small D2S_V3 to D8S_V3
My application need to insert many rows (bulking is not an option), and this kind of performance is not acceptable
I wonder why I dont see improvements.
So my noob question: Do I need to configure something to see improvements? My naive thinking says I should some difference :-)
What am I doing wrong?
Without knowing much about your schema, it looks like you are storage bound or network bound.
Storage:
Try to mount the database to the local (temporary disk) and see if you notice any difference, if it is faster then your bottleneck is the mounted disk.
Network bound:
Where is the client that's inserting these transaction? On same machine? On Azure?
I suggest you setup a client in the same region and do the tests.
inserting 4000 rows distributed over 4 tables in 10 seconds.I don't any performance improvements from a small D2S_V3 to D8S_V3
I would approach this problem using wait stats approach rather than throwing hardware first with out knowing problem..
For example,running below insert
insert into #t
select orderid from orders o
join
Customers c
on c.custid=o.custid
showed me below wait stats
Wait WaitType="SOS_SCHEDULER_YIELD" WaitTimeMs="1" WaitCount="167"
Wait WaitType="PAGEIOLATCH_SH" WaitTimeMs="12" WaitCount="3" />
Wait WaitType="MEMORY_ALLOCATION_EXT" WaitTimeMs="21" WaitCount="4975" />
most of the time, the query spent time on
PAGEIOLATCH_SH:
getting data from disk into memory
MEMORY_ALLOCATION_EXT :allocating memory for the query to run
based on this i will try to troubleshoot by seeing if i have memory pressure on my system,since this query is trying to get data from disk.
This is just one example,but hopefully this will give you an idea..
Further i will try to see if select is returing data fast
Performance can be directly linked to your hardware or configuration, but it's more likely that it has to do with the structures and the queries. Take a look at the execution plan for the INSERT operation to see how it is being resolved by the optimizer. Also, capture the query metrics using extended events to see how many resources are being used by the operation. These are more likely to lead to a resolution on why the query is performing slowly and enable you to scale the hardware to best serve the query.

Choosing a NoSQL database

I need a NoSQL database that will run on Windows Azure that works well for the following parameters. Right now Azure Table Storage, HBase and Cassandra seems to be the most promising options.
1 billion entities
up to 100 reads per second, though caching will mostly make it much less
around 10 - 50 writes per second
Strong consistency would be a plus, so perhaps HBase would be better than Cassandra in that regard.
Querying will often be done on a secondary in-memory database with various indexes in addition to ElasticSearch or Windows Azure Search for fulltext search and perhaps some filtering.
Azure Table Storage looks like it could be nice, but from what I can tell, the big difference between Azure Table Storage and HBase is that HBase supports updating and reading values for a single property instead of the whole entity at once. I guess there must be some disadvantages to HBase however, but I'm not sure what they would be in this case.
I also think crate.io looks like it could be interesting, but I wonder if there might be unforseen problems.
Anyone have any other ideas of the advantages and disadvantages of the different databases in this case, and if any of them are really unsuited for some reason?
I currently work with Cassandra and I might help with a few pros and cons.
Requirements
Cassandra can easily handle those 3 requirements. It was designed to have fast reads and writes. In fact, Cassandra is blazing fast with writes, mostly because you can write without doing a read.
Also, Cassandra keeps some of its data in memory, so you could even avoid the secondary database.
Consistency
In Cassandra you choose the consistency in each query you make, therefore you can have consistent data if you want to. Normally you use:
ONE - Only one node has to get or accept the change. This means fast reads/writes, but low consistency (You can have other machine delivering the older information while consistency was not achieved).
QUORUM - 51% of your nodes must get or accept the change. This means not as fast reads and writes, but you get FULL consistency IF you use it in BOTH reads and writes. That's because if more than half of your nodes have your data after you inserted/updated/deleted, then, when reading from more than half your nodes, at least one node will have the most recent information, which would be the one to be delivered.
Both this options are the ones recommended because they avoid single points of failure. If all machines had to accept, if one node was down or busy, you wouldn't be able to query.
Pros
Cassandra is the solution for performance, linear scalability and avoid single points of failure (You can have machines down, the others will take the work). And it does most of its management work automatically. You don't need to manage the data distribution, replication, etc.
Cons
The downsides of Cassandra are in the modeling and queries.
With a relational database you model around the entities and the relationships between them. Normally you don't really care about what queries will be made and you work to normalize it.
With Cassandra the strategy is different. You model the tables to serve the queries. And that happens because you can't join and you can't filter the data any way you want (only by its primary key).
So if you have a database for a company with grocery stores and you want to make a query that returns all products of a certain store (Ex.: New York City), and another query to return all products of a certain department (Ex.: Computers), you would have two tables "ProductsByStore" and "ProductsByDepartment" with the same data, but organized differently to serve the query.
Materialized Views can help with this, avoiding the need to change in multiple tables, but it is to show how things work differently with Cassandra.
Denormalization is also common in Cassandra for the same reason: Performance.

fake fsync calls to improve performance [duplicate]

I am switching to PostgreSQL from SQLite for a typical Rails application.
The problem is that running specs became slow with PG.
On SQLite it took ~34 seconds, on PG it's ~76 seconds which is more than 2x slower.
So now I want to apply some techniques to bring the performance of the specs on par with SQLite with no code modifications (ideally just by setting the connection options, which is probably not possible).
Couple of obvious things from top of my head are:
RAM Disk (good setup with RSpec on OSX would be good to see)
Unlogged tables (can it be applied on the whole database so I don't have change all the scripts?)
As you may have understood I don't care about reliability and the rest (the DB is just a throwaway thingy here).
I need to get the most out of the PG and make it as fast as it can possibly be.
Best answer would ideally describe the tricks for doing just that, setup and the drawbacks of those tricks.
UPDATE: fsync = off + full_page_writes = off only decreased time to ~65 seconds (~-16 secs). Good start, but far from the target of 34.
UPDATE 2: I tried to use RAM disk but the performance gain was within an error margin. So doesn't seem to be worth it.
UPDATE 3:*
I found the biggest bottleneck and now my specs run as fast as the SQLite ones.
The issue was the database cleanup that did the truncation. Apparently SQLite is way too fast there.
To "fix" it I open a transaction before each test and roll it back at the end.
Some numbers for ~700 tests.
Truncation: SQLite - 34s, PG - 76s.
Transaction: SQLite - 17s, PG - 18s.
2x speed increase for SQLite.
4x speed increase for PG.
First, always use the latest version of PostgreSQL. Performance improvements are always coming, so you're probably wasting your time if you're tuning an old version. For example, PostgreSQL 9.2 significantly improves the speed of TRUNCATE and of course adds index-only scans. Even minor releases should always be followed; see the version policy.
Don'ts
Do NOT put a tablespace on a RAMdisk or other non-durable storage.
If you lose a tablespace the whole database may be damaged and hard to use without significant work. There's very little advantage to this compared to just using UNLOGGED tables and having lots of RAM for cache anyway.
If you truly want a ramdisk based system, initdb a whole new cluster on the ramdisk by initdbing a new PostgreSQL instance on the ramdisk, so you have a completely disposable PostgreSQL instance.
PostgreSQL server configuration
When testing, you can configure your server for non-durable but faster operation.
This is one of the only acceptable uses for the fsync=off setting in PostgreSQL. This setting pretty much tells PostgreSQL not to bother with ordered writes or any of that other nasty data-integrity-protection and crash-safety stuff, giving it permission to totally trash your data if you lose power or have an OS crash.
Needless to say, you should never enable fsync=off in production unless you're using Pg as a temporary database for data you can re-generate from elsewhere. If and only if you're doing to turn fsync off can also turn full_page_writes off, as it no longer does any good then. Beware that fsync=off and full_page_writes apply at the cluster level, so they affect all databases in your PostgreSQL instance.
For production use you can possibly use synchronous_commit=off and set a commit_delay, as you'll get many of the same benefits as fsync=off without the giant data corruption risk. You do have a small window of loss of recent data if you enable async commit - but that's it.
If you have the option of slightly altering the DDL, you can also use UNLOGGED tables in Pg 9.1+ to completely avoid WAL logging and gain a real speed boost at the cost of the tables getting erased if the server crashes. There is no configuration option to make all tables unlogged, it must be set during CREATE TABLE. In addition to being good for testing this is handy if you have tables full of generated or unimportant data in a database that otherwise contains stuff you need to be safe.
Check your logs and see if you're getting warnings about too many checkpoints. If you are, you should increase your checkpoint_segments. You may also want to tune your checkpoint_completion_target to smooth writes out.
Tune shared_buffers to fit your workload. This is OS-dependent, depends on what else is going on with your machine, and requires some trial and error. The defaults are extremely conservative. You may need to increase the OS's maximum shared memory limit if you increase shared_buffers on PostgreSQL 9.2 and below; 9.3 and above changed how they use shared memory to avoid that.
If you're using a just a couple of connections that do lots of work, increase work_mem to give them more RAM to play with for sorts etc. Beware that too high a work_mem setting can cause out-of-memory problems because it's per-sort not per-connection so one query can have many nested sorts. You only really have to increase work_mem if you can see sorts spilling to disk in EXPLAIN or logged with the log_temp_files setting (recommended), but a higher value may also let Pg pick smarter plans.
As said by another poster here it's wise to put the xlog and the main tables/indexes on separate HDDs if possible. Separate partitions is pretty pointless, you really want separate drives. This separation has much less benefit if you're running with fsync=off and almost none if you're using UNLOGGED tables.
Finally, tune your queries. Make sure that your random_page_cost and seq_page_cost reflect your system's performance, ensure your effective_cache_size is correct, etc. Use EXPLAIN (BUFFERS, ANALYZE) to examine individual query plans, and turn the auto_explain module on to report all slow queries. You can often improve query performance dramatically just by creating an appropriate index or tweaking the cost parameters.
AFAIK there's no way to set an entire database or cluster as UNLOGGED. It'd be interesting to be able to do so. Consider asking on the PostgreSQL mailing list.
Host OS tuning
There's some tuning you can do at the operating system level, too. The main thing you might want to do is convince the operating system not to flush writes to disk aggressively, since you really don't care when/if they make it to disk.
In Linux you can control this with the virtual memory subsystem's dirty_* settings, like dirty_writeback_centisecs.
The only issue with tuning writeback settings to be too slack is that a flush by some other program may cause all PostgreSQL's accumulated buffers to be flushed too, causing big stalls while everything blocks on writes. You may be able to alleviate this by running PostgreSQL on a different file system, but some flushes may be device-level or whole-host-level not filesystem-level, so you can't rely on that.
This tuning really requires playing around with the settings to see what works best for your workload.
On newer kernels, you may wish to ensure that vm.zone_reclaim_mode is set to zero, as it can cause severe performance issues with NUMA systems (most systems these days) due to interactions with how PostgreSQL manages shared_buffers.
Query and workload tuning
These are things that DO require code changes; they may not suit you. Some are things you might be able to apply.
If you're not batching work into larger transactions, start. Lots of small transactions are expensive, so you should batch stuff whenever it's possible and practical to do so. If you're using async commit this is less important, but still highly recommended.
Whenever possible use temporary tables. They don't generate WAL traffic, so they're lots faster for inserts and updates. Sometimes it's worth slurping a bunch of data into a temp table, manipulating it however you need to, then doing an INSERT INTO ... SELECT ... to copy it to the final table. Note that temporary tables are per-session; if your session ends or you lose your connection then the temp table goes away, and no other connection can see the contents of a session's temp table(s).
If you're using PostgreSQL 9.1 or newer you can use UNLOGGED tables for data you can afford to lose, like session state. These are visible across different sessions and preserved between connections. They get truncated if the server shuts down uncleanly so they can't be used for anything you can't re-create, but they're great for caches, materialized views, state tables, etc.
In general, don't DELETE FROM blah;. Use TRUNCATE TABLE blah; instead; it's a lot quicker when you're dumping all rows in a table. Truncate many tables in one TRUNCATE call if you can. There's a caveat if you're doing lots of TRUNCATES of small tables over and over again, though; see: Postgresql Truncation speed
If you don't have indexes on foreign keys, DELETEs involving the primary keys referenced by those foreign keys will be horribly slow. Make sure to create such indexes if you ever expect to DELETE from the referenced table(s). Indexes are not required for TRUNCATE.
Don't create indexes you don't need. Each index has a maintenance cost. Try to use a minimal set of indexes and let bitmap index scans combine them rather than maintaining too many huge, expensive multi-column indexes. Where indexes are required, try to populate the table first, then create indexes at the end.
Hardware
Having enough RAM to hold the entire database is a huge win if you can manage it.
If you don't have enough RAM, the faster storage you can get the better. Even a cheap SSD makes a massive difference over spinning rust. Don't trust cheap SSDs for production though, they're often not crashsafe and might eat your data.
Learning
Greg Smith's book, PostgreSQL 9.0 High Performance remains relevant despite referring to a somewhat older version. It should be a useful reference.
Join the PostgreSQL general mailing list and follow it.
Reading:
Tuning your PostgreSQL server - PostgreSQL wiki
Number of database connections - PostgreSQL wiki
Use different disk layout:
different disk for $PGDATA
different disk for $PGDATA/pg_xlog
different disk for tem files (per database $PGDATA/base//pgsql_tmp) (see note about work_mem)
postgresql.conf tweaks:
shared_memory: 30% of available RAM but not more than 6 to 8GB. It seems to be better to have less shared memory (2GB - 4GB) for write intensive workloads
work_mem: mostly for select queries with sorts/aggregations. This is per connection setting and query can allocate that value multiple times. If data can't fit then disk is used (pgsql_tmp). Check "explain analyze" to see how much memory do you need
fsync and synchronous_commit: Default values are safe but If you can tolerate data lost then you can turn then off
random_page_cost: if you have SSD or fast RAID array you can lower this to 2.0 (RAID) or even lower (1.1) for SSD
checkpoint_segments: you can go higher 32 or 64 and change checkpoint_completion_target to 0.9. Lower value allows faster after-crash recovery

Potential issue with Couchbase paging

It may be too much turkey over the holidays, but I've been thinking about a potential problem that we could have with Couchbase.
Currently we paginate based on time, but I'm thinking a similar issue could occur with other values used for paging for example the atomic counter. I'll try to explain best I can, this would only occur in a load balanced environment.
For example say we have 4 servers load balanced and storing data to our Couchbase cluster. We sort our records based on timestamps currently. If any of the 4 servers writing the data starts to lag behind the others than our pagination would possibly be missing records when retrieving client side. A SQL DB auto-increment and timestamps for example can be created when the record is stored to the DB which will avoid similar issues. Using a NoSql DB like Couchbase you define the data you need to retrieve on before it is stored to the DB. So what I am getting at is if there is a delay in storing to the DB and you are retrieving in a pagination fashion while this delay has occurred, you run the real possibility of missing data. Since we are paging that data may never be viewed.
Interested in what other thoughts people have on this.
EDIT**
Response to Andrew:
Example a facebook or pintrest type app is storing data to a DB, they have many load balanced servers from the frontend writing to the db. If for some reason writing is delayed its a non issue with a SQL DB because a timestamp or auto increment happens when the data is actually stored to the DB. There will be no missing data when paging. asking for 1-7 will give you data that is only stored in the DB, 7-* will contain anything that is delayed because an auto-increment value has not been created for that record becuase it is not actually stored.
In Couchbase its different, you actually get your auto increment value (atomic counter) and then save it. So for example say a record is going to be stored as atomic counter number 4. For some reasons this is delayed in storing to the DB. Other servers are grabbing 5, 6, 7 and storing that data just fine. The client now asks for all data between 1 and 7, 4 is still not stored. Then the next paging request is 7 to *. 4 will never be viewed.
Is there a way around this? Can it be modelled differently in CB, or is this just a potential weakness in CB when needing to page results. As I mentioned are paging is timestamp sensitive.
Michael,
Couchbase is an eventually consistent database with respect to views. It is ACID with respect to documents. There are durability interfaces that let you manage this. This means that you can rest assured you won't lose data and that indexes will catch up eventually.
In my experience with Couchbase, you need to expect that the nodes will never be in-sync. There are many things the database is doing, such as compaction and replication. The most important thing you can do to enhance performance is to put your views on a separate spindle from the data. And you need to ensure that your main data spindles across your cluster can sustain between 3-4 times your ingestion bandwidth. Also, make sure your main document key hashes appropriately to distribute the load.
It sounds like you are discussing a situation where the data exists in your system for less time than it takes to be processed through the view system. If you are removing data that fast, you need either a bigger cluster or faster disk arrays. Of the two choices, I would expand the size of your cluster. I like to think of Couchbase as building a RAIS, Redundant Array of Independent Servers. By expanding the cluster, you reduce the coincidence of hotspots and gain disk bandwidth. My ideal node has two local drives, one each for data and views, and enough RAM for my working set.
Anon,
Andrew

Distributed database, many lightly loaded nodes

I'm working on a hobby project involving a rather CPU-intensive calculation. The problem is embarrassingly parallel. This calculation will need to happen on a large number of nodes (say 1000-10000). Each node can do its work almost completely independently of the others. However, the entire system will need to answer queries from outside the system. Approximately 100000 such queries per second will have to be answered. To answer the queries, the system needs some state that is sometimes shared between two nodes. The nodes need at most 128MB RAM for their calculations.
Obviously, I'm probably not going to afford to actually build this system in the scale described above, but I'm still interested in the engineering challenge of it, and thought I'd set up a small number of nodes as proof-of-concept.
I was thinking about using something like Cassandra and CouchDB to have scalable persistent state across all nodes. If I run a distributed database server on each node, it would be very lightly loaded, but it would be very nice from an ops perspective to have all nodes be identical.
Now to my question:
Can anyone suggest a distributed database implementation that would be a good fit for a cluster of a large number of nodes, each with very little RAM?
Cassandra seems to do what I want, but http://wiki.apache.org/cassandra/CassandraHardware talks about recommending at least 4G RAM for each node.
I haven't found a figure for the memory requirements of CouchDB, but given that it is implemented in Erlang, I figure maybe it isn't so bad?
Anyway, recommendation, hints, suggestions, opinions are welcome!
You should be able to do this with cassandra, though depending on your reliability requirements, an in memory database like redis might be more appropriate.
Since the data set is so small (100 MBs of data), you should be able to run with less than 4GB of ram per node. Adding in cassandra overhead you probably need 200MB of ram for the memtable, and another 200MB of ram for the row cache (to cache the entire data set, turn off the key cache), plus another 500MB of ram for java in general, which means you could get away with 2 gigs of ram per machine.
Using a replication factor of three, you probably only need a cluster on the order of 10's of nodes to serve the number of reads/writes you require (especially since your data set is so small and all reads can be served from the row cache). If you need the computing power of 1000's of nodes, have them talk to the 10's of cassandra nodes storing you data rather than try to split cassandra to run across 1000's of nodes.
I've not used CouchDB myself, but I am told that Couch will run in as little as 256M with around 500K records. At a guess that would mean that each of your nodes might need ~512M, taking into account the extra 128M they need for their calculations. Ultimately you should download and give each a test inside a VPS, but it does sound like Couch will run in less memory than Cassandra.
Okay, after doing some more read-up after posting the question, and trying some thing out, I decided to go with MongoDB.
So far I'm happy. I have very little load, and MongoDB is using very little system resources (~200MB at most). However, my dataset isn't nearly as large as described in the question, and I am only running 1 node, so this doesn't mean anything.
CouchDB doesn't seem to support sharding out-of-the-box, so is not (it turns out) a good fit for the problem described in the question (I know there are addons for sharding).

Resources