I have been using CouchDB on some prototype applications and it has been brilliant, very easy to use and extremely quick. I was wondering if anyone has been using it in production and have any views on it's reliability, performance suitability for operational management etc ?? I am considering using it to support a service layer and would make use of its replication functionality.
Any comments/experiences would be most welcome.
I've used CouchDB for a few small in-house applications - it's been very stable and I've had no serious complaints. Setting that aside, a few small gripes -
1) Databases can be synchronized, but not nodes. That is, if you have four servers and twenty databases, you have to specify each server, and each database to synchronize. A minor gripe, but I prefer less management to more.
2) Since databases are append only, a database with a bunch of activity gets really big really quickly. Compacting fixes this, but isn't exactly fast, especially on big (e.g. 20 gigabytes) database. Scheduling compact for the weekends solved this, but doing that is probably less of an option for high availability applications.
3) Javascript is the de facto view language. What is not well advertised is that since CouchDB is written in Erlang, it also supports Erlang views, which are faster as they are "native". For applications doing a lot of operations in views, Erlang probably makes more sense.
Setting those minor issues aside, I'd wholeheartedly recommend it.
CouchDB ships in Ubuntu and is a fundamental component of the Ubuntu One service.
Related
I have developed an automation web tool (SaaS app), right now I'm using mongoDb atlas cloud database with amazon EC2 Xlarge instance with quad core EBS enabled processor and 16GB RAM. Is atlas the best or local mongo if so why?, which will give me a better performance, some serious help here.
MongoDB:
you are able to take advantage of this tool since being a non-relational database, it is much easier to build the model of the architecture of the database model. This makes the development time much easier. When working with javascript language, or working with JSON objects and collections, MongoDB makes the connection of services for queries much lighter and optimizes the performance of the applications. Also, you can work, in case you do not know the console commands, with a Desktop database administrator in a graphical way. The learning times really are much faster, which allows a great scalability of the project. In the development department, this optimizes the delivery time with the clients, which makes the projects much more feasible in terms of delivery times.
PROS:
Being a JSON language optimizes the response time of a query, you can directly build a query logic from the same service
You can install a local, database-based environment rather than the non-relational real-time bases such a firebase does not allow, the local environment is paramount since you can work without relying on the internet.
Forming collections in Mango is relatively simple, you do not need to know of query to work with it, since it has a simple graphic environment that allows you to manage databases for those who are not experts in console management.
CONS:
MongoDB seems to be one of the most complete tools in its field, I believe that it has all the features that a non-relational database should have.
Perhaps because it is a relatively new tool there are very few experts in the field of MongoDB.
To Summarize:
Mongo DB is better placed in large projects, with great scalability. It also allows you to work quite comfortably with projects based on programming languages such as javascript angular typescript C #. I believe that its performance is much better with the type of technologies that handle very logical, similar terms of programming. If we use languages like java php, for example, it is better to work with relational databases like postgres or mySql.
MongoDB-atlas:
my department at the company i work at, is using the MongoDB Atlas cluster that we set up on our own servers. It has reached to a point that it becomes hard to manage and to scale. MongoDB Atlas came to our site with the ability to scale and free of management, which saves a lot of effort for us.
PROS:
No infrastructure on our side. Free of management.
Easy to scale up and down.
It has strong authentication and encryption features that make sure that developers don't get lazy and leave out data in the open by leaving their servers unguarded.
CONS:
More granular billing.
More specific alerting system.
One of the drawbacks of MongoDB-Atlas is the cost. Hopefully more competition will bring down the costs over time.
To Summarize:
I would recommend MongoDB Atlas to every person/company who have a significant need in the NoSQL database and do not want to manage their infrastructure. Using MongoDB Atlas can significantly reduce your management time and cost, which saves valuable resources for other tasks. It also suits a smaller company as MongoDB Atlas scales up and down very quickly.
Hopefully I answered your question, Good Luck!
I am implementing a system composed of a collection of small systems, ie. Raspberry, Yun, Beaglebone, the occasional PC. Crossbar.io has real promise ... but, as I understand it, doesn't currently support multiple nodes. Am I correct? Does anyone know when that might happen?
In the meantime it occurred to me that each individual node can offer an http interface that I might be able to use for my purposes. My initial thought is to crate workers that wrap access to the web the interface on subsidiary nodes. This fits the overall architecture of the system I want to create - does it have any merit? Is it tractable? I'm new to websockets - and insight would be a great help.
Thanks for your time,
Al
In general that does sound like a fit for Crossbar.io.
There is no timeline on multi-node (i.e. multiple routers), but we hope to have at least hot-standby nodes for high availability ready in Q1. Other than for high availability, I think that a single instance should provide sufficient performance for most applications out there - on a single current (non-high-end) Xeon we're talking tens of thousands of events a second, and concurrent connections are mostly limited by RAM (and 100s of thousands on a single box are definitely not a problem). (If you need more than that then I'd be very interested in your specific use case - we want to learn more about our users.)
I don't completely understand the second part of your question: What precisely is the architecture you're planning here? If you're talking about the integrated Web server, then with recent optimizations (it can now use multiple cores) this should be enough for even moderately big sites, and with SPAs you're not likely to ever run into performance issues.
Hope this helps, and I'll be glad to answer in more detail once you've clarified the second part.
I have been using CouchDB on some prototype applications and it has been brilliant, very easy to use and extremely quick. I was wondering if anyone has been using it in production and have any views on it's reliability, performance suitability for operational management etc ?? I am considering using it to support a service layer and would make use of its replication functionality.
Any comments/experiences would be most welcome.
I've used CouchDB for a few small in-house applications - it's been very stable and I've had no serious complaints. Setting that aside, a few small gripes -
1) Databases can be synchronized, but not nodes. That is, if you have four servers and twenty databases, you have to specify each server, and each database to synchronize. A minor gripe, but I prefer less management to more.
2) Since databases are append only, a database with a bunch of activity gets really big really quickly. Compacting fixes this, but isn't exactly fast, especially on big (e.g. 20 gigabytes) database. Scheduling compact for the weekends solved this, but doing that is probably less of an option for high availability applications.
3) Javascript is the de facto view language. What is not well advertised is that since CouchDB is written in Erlang, it also supports Erlang views, which are faster as they are "native". For applications doing a lot of operations in views, Erlang probably makes more sense.
Setting those minor issues aside, I'd wholeheartedly recommend it.
CouchDB ships in Ubuntu and is a fundamental component of the Ubuntu One service.
I'm using DDD for a service-oriented application intended to transmit a high volume of messages between a high volume of web clients (i.e., browsers).
Because in the context of required functionality, the need for transmission outweighs the need for storage, I love the idea of relying on RAM primarily and minimizing use of the database.
However I'm unclear on how to architect this from a scalability point of view. A web farm creates high availability of service endpoints and domain logic processing. But no matter how many servers I have, it seems they must all share a common repository so that their data is consistent.
How do I build this repository so that it's as scalable as possible? How can it be splashed across an array of physical machines in a manner such that all machines are consistent and each couldn't care less if another goes down?
Also since touching the database will be required occasionally (e.g., when a client goes missing and messages intended for it must be stored until it returns), how should I organize my memory-based code and data access layer? Are they both considered "the repository"?
There are several ways to solve this issue. No single answer can really cover it all...
One method to ensure your scalability is to simply scale the hardware. Write your web services to be stateless so that you can run a web farm (all running the same identical services, pointing to the same DB) and turn your DB into a cluster. Clustered databases run over multiple servers and work on the same storage. However, this scenario can get complicated and expensive quite quickly.
Some interesting links:
http://scale-out-blog.blogspot.com/2009/09/future-of-database-clustering.html
http://en.wikipedia.org/wiki/Server_farm
Another method is to look at architecture. CQRS is a common architectural model that ensures scalability. For instance, this architecture model -- its name stands for Command/Query Responsibility Segregation -- builds different databases for reading and writing. This seems contradictory, but if you study it, it becomes natural and you wonder why you've never thought of it before. Simply put, most apps do a lot more reading than writing, and writing tends to be a lot more complicated than reading (requiring business rule validation etc.), so why not separate the two? You can use your expensive transactional database for writing and then your cheap, maybe Non-SQL based or open source, database over multiple reading servers. Your read model is then optimized for the screens of your application(s), whereas the write model is optimized solely for writing and is in fact a DDD-based set of repositories.
There's just not enough room here to cover this option in detail, but CQRS is a good way of achieving scalability and even ease of development, once you have a CQRS framework in place. There are many other advantages to CQRS, such as ease of auditing (if you combine it with the "event sourcing" technique, which is common in CQRS-based environments).
Some interesting links:
http://cqrsinfo.com
http://abdullin.com/cqrs
http://blog.fossmo.net/post/Command-and-Query-Responsibility-Segregation-(CQRS).aspx
Are you ready for some reading? There are a lot of options, but I believe you should start by learning about the advantages of modern distributed NoSQL dbs, and enjoy learning from the experience learned in facebook, linkedin and other friends. Start here:
http://highscalability.com/
http://nosql-database.org/
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I am looking for an eventually consistent data store and it looks like it may be coming down to Riak or Cassandra. Has anyone got expereinces of a view on this?
As you probably know, they are both architecturally strongly influenced by Dynamo (eventually consistent, no single points of failure, etc). Both also go beyond Dynamo in providing a "richer than pure K/V" data model -- in Cassandra's case, providing a Bigtable-like ColumnFamily mode, in Riak's, a Document-oriented one. I have seen sane people choose both.
I believe points that favor Cassandra include
speed
support for clusters spanning multiple data centers
big names using it (digg, twitter, facebook, webex, ... -- http://n2.nabble.com/Cassandra-users-survey-tp4040068p4040393.html)
Points that favor Riak include
map/reduce support out of the box
/Cassandra dev, fwiw
Riak is used by
Mozilla Foundation
Ask.com sponsored listings
Comcast
Citigroup
Bet365
I think they both pass the test of credible reference customers/users.
Cassandra seems more mature, and is currently doing better in benchmarks. Riak seems easier to add a node to as your cluster grows.
For completeness: A good (probably biased) comparison between the two can be found at http://docs.basho.com/riak/1.3.2/references/appendices/comparisons/Riak-Compared-to-Cassandra/
Use and download are different. Best to get references.
Perhaps a private conversation could be had where Riak references in these companies could be shared? Not sure how to get such with Cassandra, but there is a community of companies that support Cassandra that seem like a good place to start. As these probably have community participants in Cassandra development, it may be a REALLY reasonable place to start.
I would like to hear Riak's answer to recent and large deployments where customers are happy.
I also would like to see the roadmap for each product. Cassandra is a bit easier to track (http://wiki.apache.org/cassandra/) than Riak in my view as Cassandra's wiki discusses limitations and things that are probably going to change going forward, but neither outline futures well. I could understand that of an open source community ... perhaps ... but I cannot for a product for which I must pay.
I also would suggest research of Cloudant, which has what appears to be a very nice layering of capabilities. It also looks like it is bringing to bear the capabilities elsewhere in Apache land. CouchDB is the Apache platform on which Cloudant is based. BUT the indexing with Lucene seems but the tip of the iceberg when it comes to where Cloudant could go. Creating and managing an index is a very systematic process, a kind of data pipeline, that could be scripted using other Apache community assets. AND capabilities like NLP also could be added through Lucene indirectly, or maybe directly into what is persisted.
It would be nice to see a proposed Cloudant roadmap, especially since the team could mine the riches of the Apache community and integrate such into Cloudant. Such probably exists as there is an operational component to the Cloudant revenue model that will require it, if for no other reason.
Another area of interest ... Cloudant's pricing model ... it is clear their revenue model is not based on software, but around service. That is quite attractive, and it seems consistent with the ecosystem surrounding Cassandra too. I don't know if the Basho folks have won over enough of the nosql community as yet ... don't see such from any buzz around their web site or product.
I like this Cloudant web page (https://cloudant.com/the-data-layer/). I was surprised to see the embedded Erlang capability ... I did not know CouchDB was written in Erlang as this seems unusual to me in the Apache community (my ignorance); CouchDB appears to be older than other nosql products I know (now) to be written in Erlang. Whatever their strategy, they at least count Amazon EC2 and Microsoft Azure as hosting partners, indicating an appreciation of Microsoft and !Microsoft worlds - all very important if properly recognizing the middleware value potential (beyond cache or hash table applications) that these types of data stores could have.
Finally, while I don't know the board well, Andy Palmer's guidance looks like it will be valuable. He can bring some guidance vis-a-vis structured data (through VoltDB) to a world that rightly or wrongly may be unfairly branded as KVP hash tables of unstructured data. The need for structure and ecosystem surrounding nosql "databases" is being recognized ... witness Google's efforts with Spanner ... KVP/little structure/need for search-ability motivated Google's investment in the Spanner space. While we all may not need something like Spanner, we probably do need an improving and robust "enterprise" management and interoperability capability in these nosql databases to make it reasonable to incorporate them into modern cloud architectures. The needed structure can come from ease of interoperability and functional richness. It can also come from new capabilities that support conversion of unstructured data to structured data (e.g. indexes, use of NLP to create structured and parsed renderings of things inside of a KVP blob, and plenty of other things that, if put into a roadmap and published, could entice and grow a user base). Cloudant looks like it has a good chance of success ... I will take a closer look at it ...
And look what I found about CouchDB ...
CouchDB comes with a suite of features, such as on-the-fly document transformation and real-time change notifications, that makes web app development a breeze. It even comes with an easy to use web administration console. You guessed it, served up directly out of CouchDB! We care a lot about distributed scaling. CouchDB is highly available and partition tolerant, but is also eventually consistent. And we care a lot about your data. CouchDB has a fault-tolerant storage engine that puts the safety of your data first.