I've written an application with a CouchDB backend. I have invested a lot of time into CouchDB and so I'm reluctant to move everything over to a different NoSQL database (like Redis).
The problem is that I now need to implement a rate limiting (based on IP address) feature.
There are plenty of examples on how good Redis is for this kind of task, however because I don't want to drop CouchDB for other tasks this means I would essentially be running (and supporting) two databases (1 for most data, 1 for rate limiting) and so...
Is running CouchDB in tandem with Redis unheard of?
Is CouchDB itself suitable for handling rate limiting itself?
Is running CouchDB in tandem with Redis unheard of?
Redis is commonly used in complement with other storage solutions (MySQL, PostgreSQL, MongoDB, CouchDB, etc ...). Like many other NoSQL solutions, Redis is not adapted to all kind of workloads or situations. The authors of Redis are pragmatic and open people, and they routinely suggest to use other solutions rather than Redis, when they are more adapted to the situation.
Redis is therefore a good team player, and it is generally easy to integrate in an existing infrastructure.
Here is an example of usage of Redis with CouchDB.
Is CouchDB itself suitable for handling rate limiting itself?
CouchDB has a number of useful features to implement the rate limiting strategy described in Chris O'Hara's article. For instance, it supports bulk operations on several documents (with optional atomicity). A "bucket span" can be stored in a single document. In-place incrementation of counters can be covered by using update handlers.
IMO, the main missing feature would be automatic item expiration (which CouchDB does not provide AFAIK). So you would have to design a clever mechanism to get rid of obsolete data on top of CouchDB.
The main problem is CouchDB is not really designed for this kind of workload: it is a log structured document oriented database. Each time a counter has to be incremented, it would involve JSON unpacking/packing operations, some Javascript code to be executed, and writing a new revision of the whole document in append only files. You can find a good article describing how CouchDB stores its data here.
I suspect a rate limiting strategy implemented on top of CouchDB would not scale very well (too many I/Os, too much CPU consumption, inefficient network protocol). For instance, CouchDB is a RESTful server; I would not feel comfortable to initiate client HTTP operations (REST queries to CouchDB) to rate limit each incoming HTTP query of my system.
Redis is much more adapted to this kind of workload (fast, in-memory, no I/O, efficient client protocol, no JSON parsing/formatting, incrementations are native atomic operations, etc ...)
You can do rate limiting with Memcached - it has a nice counter increment command as you mention, plus obsolete data is automatically purged from the cache in due course, so it has all the benefits of Redis for this application without the annoying duplication of capability (and complexity) that running Redis on top of CouchDB would bring.
http://simonwillison.net/2009/jan/7/ratelimitcache/
You could add memcached to your own setup easily enough or you could investigate CouchBase whose current server product integrates a CouchDB derived database with Memcached compatibility baked in:
http://www.couchbase.com/memcached
Personally I dislike the way Couchbase forked from CouchDB, but for your application it might be a perfect fit.
Related
I have been reading through some instructions on using redis and memcache. Can someone input on which would be more suitable?
Our scenario, here is to cache results from a database query and store it for a period of time, if the same value is called it should be used from the cache if its within the time interval.
Can someone share an example of how this can be achieved . just a simple example would do it?
Both Redis and Memcache should work for your use case. But Redis has more features compared to Memcache. Redis can provide persistence of cached data, Memcahe doesn't have persistence. Redis can provide high availability as well as clustering. Memcache doesn't have clusering and HA (some libraries impose HA, clustering from client side, but that is less reliable). Redis provides more data structures and features. Overall, Redis is a better choice as it is kind of Memcache++++.
Need to setup a server backend web-service and contemplating either some MongoDB solution or other NoSQL and cache concoction. I've read several articles indicating how Couchbase is so much faster than MongoDB which isn't a slouch itself. Here's for reference:
http://www.couchbase.com/press-releases/couchbase-dominates-cassandra-datastax-and-mongodb-newly-released-nosql-performance-benchmark
http://prnewswire.com/news-releases/mongodb-30-with-wired-tiger-new-benchmark-measures-performance-vs-couchbase-server-302-300053144.html
So my question how true is this? Has anyone else tested and can confirm such orders of magnitude performance difference?
If so, is there a way to improve MongoDB performance by integrating some cache for it? I think Couchbase is actually a 'cache' with CouchDB store added, how can MongoDB be used/integrated in some manner to provide similar performance?
Why not just use Couchbase if its better?
Well, I was concerned by reading many places about its "lack of documentation". Then I was alarmed by reading this:
"...Couchbase forum threads which are habitually abandoned by Couchbase reps when a developer points out a pretty huge flaw in their code, intentionally or unintentionally..."
http://scalabilitysolved.com/dont-use-couchbase-unless-you-really-really-want-to/
Just go to the bottom of that article linked above and read the entire comment at the bottom by Erutan. Basically if one goes to Couchbase website it does seem that the company is really pushing their "Enterprise" version mainly which is fine, but it is worry-some when people think that they might be purposefully not providing documentation and perhaps I misunderstood, but from what I gather from that Couchbase user's comments, some think that bugs might be left in the code "intentionally" to steer people to the enterprise version?
On the PLUS side, it does seem that all the code is Apache licensed so anyone is free to fix any bugs.
Anyway, for me, I was leaning towards MongoDB for various reasons, although performance was one of them, until happened on some couchbase benchmarks. Looking forward to some affirmations or challenges to these couchbase performance superiority claims and possible solutions to bolster MongoDB setup.
So is Couchbase way faster than any other non-memory proven/stable NoSql?
CouchBase is fast but not the fastest one. I tested it, and in my scenarios Tarantool was 20% faster in terms of requests per second. Both of them are at order of magnitude faster than MongoDB. Maybe you should consider using one of the in-memory with persistence databases instead of MongoDB as your primary data store. One database is more consistent than a database and a cache layer on top of it.
I've been doing some research but have reached the point where I think MongoDB/Mongoose (on Node.js) is not the right tool for the job. Here is the scenario...
Two documents: Account (money) information and Inventory information
Check if user's account has enough money
If so, check and deduct inventory
Deduct funds from Account Information
It seems like I really need a transaction system to prevent other events from altering the data in between steps.
Am I correct, or can this still be handled in MongoDB/Mongoose? If not, is there a NoSQL db that I should check out, preferably with Node.JS support?
Implementing transactional safety is usually tricky and requires more than just transactions on the database, e.g. if you need to communicate with external parties in a reliable fashion or if the transaction runs over minutes, hours or even days. But that's leading to far.
Anyhow, on the db side you can do transactions in MongoDB using two-phase commits, but it's not exactly trivial.
There's a ton of NoSQL databases with transaction support, e.g. redis, cassandra (using the Paxos protocol) and foundationdb.
However, this seems rather random to me because the idea of NoSQL databases is to use one that fits your particular problem. If you just need 'anything' with transactions, an SQL db might do the job, right?
You can always implement your own locking mechanism within your application to lock out other sections of the app while you are making your account and inventory checks and updates. That combined with findAndModify() http://docs.mongodb.org/manual/reference/command/findAndModify/#dbcmd.findAndModify may be enough for your transaction needs while also maintaining the flexibility of a NoSQL solution.
For the distributed lock I'd look at Warlock https://www.npmjs.org/package/node-redis-warlock I've not used it myself but it's node.js based and built on top of Redis, although implementing your own via Redis is not that hard to begin with.
The Setup:
Imagine a 'twitter like' service where a user submits a post, which is then read by many (hundreds, thousands, or more) users.
My question is regarding the best way to architect the cache & database to optimize for quick access & many reads, but still keep the historical data so that users may (if they want) see older posts. The assumption here is that 90% of users would only be interested in the new stuff, and that the old stuff will get accessed occasionally. The other assumption here is that we want to optimize for the 90%, and its ok if the older 10% take a little longer to retrieve.
With this in mind, my research seems to strongly point in the direction of using a cache for the 90%, and then to also store the posts in another longer-term persistent system. So my idea thus far is to use Redis for the cache. The advantages is that Redis is very fast, and also it has built in pub/sub which would be perfect for publishing posts to many people. And then I was considering using MongoDB as a more permanent data store to store the same posts which will be accessed as they expire off of Redis.
Questions:
1. Does this architecture hold water? Is there a better way to do this?
2. Regarding the mechanism for storing posts in both the Redis & MongoDB, I was thinking about having the app do 2 writes: 1st - write to Redis, it then is immediately available for the subscribers. 2nd - after successfully storing to Redis, write to MongoDB immediately. Is this the best way to do it? Should I instead have Redis push the expired posts to MongoDB itself? I thought about this, but I couldn't find much information on pushing to MongoDB from Redis directly.
It is actually sensible to associate Redis and MongoDB: they are good team players. You will find more information here:
MongoDB with redis
One critical point is the resiliency level you need. Both Redis and MongoDB can be configured to achieve an acceptable level of resiliency, and these considerations should be discussed at design time. Also, it may put constraint on the deployment options: if you want master/slave replication for both Redis and MongoDB you need at least 4 boxes (Redis and MongoDB should not be deployed on the same machine).
Now, it may be a bit simpler to keep Redis for queuing, pub/sub, etc ... and store the user data in MongoDB only. Rationale is you do not have to design similar data access paths (the difficult part of this job) for two stores featuring different paradigms. Also, MongoDB has built-in horizontal scalability (replica sets, auto-sharding, etc ...) while Redis has only do-it-yourself scalability.
Regarding the second question, writing to both stores would be the easiest way to do it. There is no built-in feature to replicate Redis activity to MongoDB. Designing a daemon listening to a Redis queue (where activity would be posted) and writing to MongoDB is not that hard though.
I'm looking at building an application which has many data sources, each of which put events into my system. Events have a well defined data structure and could be encoded using JSON or XML.
I would like to be able to guarantee that events are saved persistently, and that the events are used as a part of a publish/subscribe bus with multiple subscribers possible per event.
For the database, availability is very important even as it scales to multiple nodes, and partition tolerance is important so that I can scale the number of places which can store my events. Eventual consistency is good enough for me.
I was thinking of using a JMS enterprise messaging bus (e.g. Mule) or an AMQP enterprise messaging bus (such as RabbitMQ or ZeroMQ).
But for my application, it seems that if I could set up a publish subscribe system with CouchDB or something similar, it would solve my problem without having to integrate a enterprise messaging bus and a persistent storage system.
Which would work better, CouchDB + scaling + loadbalancing + some kind of PubSub mechanism, or an explicit PubSub messaging system with attached eventually-consistent , Available, partition-tolerant storage? Which one is easier to set up, administer, and operate? Which solution will have high throughput for a given cost? Why?
Also, are there any more questions I should ask before selecting my technologies? (BTW, Java is the server-side and client-side language).
I am using a CouchDB message queue in production. (It is not pub/sub, so I do not consider this answer complete.)
Currently (June 2011), CouchDB has huge potential as a messaging substrate:
Good data persistence
Well-poised for clustering (on a LAN, using BigCouch or Lounge)
Well-poised for distribution (between data centers, world-wide)
Good platform. Despite the shortcomings listed below, I love CQS because I can re-use my DB and it works from Erlang, NodeJS, and every web browser.
The _changes query
Continuous feeds, instant delivery without polling
Network going down is no problem, just retry later from the previous position
Still, even a low-volume message system in CouchDB requires careful planning and maintenance. CouchDB is potentially a great messaging server. (It is inspired by Lotus notes, which handles high email volume.)
However, these are the challenges with CouchDB:
Append-only database files grow fast
Be mindful about disk capacity
Be mindful about disk i/o. Compaction will read and re-write all live documents
Deleted documents are not really deleted. They are marked deleted=true and kept forever, even after compaction! This is in fact uniquely good about CouchDB, because the deleted action will propagate through the cluster, even if the network goes down for a time.
Propagating (replicating) deletes is great, but what about the buildup of deleted docs? Eventually it will outstrip everything else. The solution is to purge them, which actually removes them from disk. Unfortunately, if you do 2 or more purges before querying a map/reduce view, the view will completely rebuild itself. That may take too much time, depending on your needs.
As usual, we hear NoSQL databases shouting "free lunch!", "free lunch!" while CouchDB says "you are going to have to work for this."
Unfortunately, unless you have compelling pressure to re-use CouchDB, I would use a dedicated messaging platform. I had a good experience with ejabberd as a messaging platform and to communicate to/from Google App Engine.)
I think that the best solution would be CouchDB + Jabber/XMPP server (ejabberd) + book: http://professionalxmpp.com
JSON is the natural storing mechanism for CouchDB
Jabber/XMPP server includes pubsub support
The book is a must read
While you can use a database as an alternative to a message queueing system, no database is a message queuing system, not even CouchDB. A message queueing system like AMQP provides more than just persistence of messages, in fact with RabbitMQ, persistence is just an invisible service under the hood that takes care of all of the challenges that you have to deal with by yourself on CouchDB.
Take a good look at the RabbitMQ website where there is lots of information about AMQP and how to make use of it. They have done a great job of collecting together articles and blogs about message queueing.