Mongodb vs Postgres in Nodejs [closed] - node.js

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I'm building a NodeJS application and am utterly torn between NoSQL MongoDB vs RMDS PostregresSql. My project is to create a open source example project for logging visitors and displaying visitor statistics in real time on a webpage using NodeJS. I was planning on using MongoDB at first, because lot of NodeJS examples and tutorials, albeit mostly older ones, used it and paas hosters with a free tier are abounding. However, I was seeing a lot of bashing on MongoDB recently and found that people who tried to use MongoDB ended up switching to Postgres:
http://blog.engineering.kiip.me/post/20988881092/a-year-with-mongodb
http://dieswaytoofast.blogspot.com/2012/09/mysql-vs-postgres-vs-mongodb.html
http://www.plotprojects.com/why-we-use-postgresql-and-slick/
I also a fan of Heroku and have heard a lot about Postgres because of that and find that SQL queries can be nice sometimes.
I'm not a database expert, so I can't tell for the life of me which way to go. I would really appreciate it if you could give some advice on which one to consider and why.
I have a few criteria:
Since I want this to be a example, it would be nice to have a way to host a decently sized amount of data. I know that MongoDB definitely offers this, but Postgres paas like Heroku seem to have pretty small databases (since I am logging every visitor to the website)
A database that is simplistic and easy to explain to others.
Performance doesn't really matter, but speed can't hurt
Thanks for all of the help!
Note: Please no flame wars, everyone has their own opinion :)

Choosing between an SQL database and a NoSQL database is certainly being debated heavily right now and there are plenty of good articles on the subject. I'll list a couple at the end. I have no problem recommending SQL over NOSQL for your particular needs.
NoSQL has a niche group of use cases where data is stored in large tightly coupled packages called documents rather than in a relational model. In a nutshell, data that is tightly coupled to a single entity (like all the text documents used by a single user) is better stored in a NoSQL document model. Data that behaves like excel spreadsheets and fits nicely in rows and is subject to aggregate calculations is better stored in a SQL database of which postgresql is only one of several good choices.
A third option that you might consider is redis (http://redis.io/) which is a simple key value data store that is extremely fast when querying like SQL but not as rigidly typed.
The example you cite seems to be a straightforward row/column type problem. You will find the SQL syntax is much less arcane than the query syntax for MongoDB. Node has many things to recommend it and the toolset has matured significantly in the past year. I would recommend using the monogoose npm package as it reduces the amount of boilerplate code that is required with native mongodb and I have not noticed any performance degradation.
http://slashdot.org/topic/bi/sql-vs-nosql-which-is-better/
http://www.infoivy.com/2013/07/nosql-database-comparison-chart-only.html

Related

Relational Database User trying to understand Non-Relational and how to implement CRUD

I'm currently involved in a app project, and I'm incharge of setting up the backend.
What i'm use to using is a MYSQL database + php for cleaning and managing the data sent to and fro the front end, which I have much more experience in. However, because of certain preferences of my bosses, on this project I've found myself looking at IBMs Bluemix and Cloudant software. Cloudant is a NoSQL database(like CouchDB) and my experience regarding noSQL is severely lacking. All I've mananged to do so far is to create a few JSON documents, and some basic views
What I need to figure out is how to perform the CRUD(create,read,update,delete) actions on a NoSQL database, or at least what it would look like.
In addition to this, I need to know if there are ways to implement security measures(implement security and anti-hacking functions) on a NoSQL database without an external source, or will I need to learn how to reroute the data through some sort of php function first, if i want it cleaned, before sending it to the Cloudant server where my database sits.
Let me know if my attempt to explain my problem is lacking in clarity. I'll try my best to state a different way, if need be.
Generally speaking, there is nothing equivalent to an ANSI to NoSQL databases. In other words, NoSQL databases are not as standardized as SQL databases. All standards are starting to appear. You can think of it as a technology still in the making.
What you have in general is an API with methods such as put_record or delete_record, or a REST interface that is logically equivalent. Also, in general you CRUD the whole record, not parts of the record.
Take a look at the reference: Cloudant - Reading and Writing
Having that said, in your case I would recommend abstracting away from the specific implementation of the NoSQL you want to use if you care about avoiding vendor lock-in. So I would suggest you to wrap CRUD functions using PHP functions that later can be replaced if you want to change the NoSQL database flavor.
This approach has the additional advantage to provide an abstraction for you to implement your own security. Some important NoSQL databases have no concept of multi-tenancy or just implemented that. Again, it is a technology in the making.
When your mindset is the relational one, you tend to think of the database as something that will help you guarantee data consistency as much as possible. But NoSQL databases are not like that. Think of them as a simple repository of documents (in a JSON or XML structure, for instance), without cross references.
Then the obvious question is perhaps: why would anyone want such a thing? One of the possible answers is because NoSQL databases may hold an aggregate of consolidated data. You can then retrieve aggregates to save time reprocessing or re-retrieving data unnecessarily.
As for security, most (if no all) NoSQL databases have some pretty good authentication mechanisms.

Couchbase fastest NoSQL (no Redis)? Can MongoDB performance be increased by using with some cache product? Is Couchbase so much faster than MongoDB?

Need to setup a server backend web-service and contemplating either some MongoDB solution or other NoSQL and cache concoction. I've read several articles indicating how Couchbase is so much faster than MongoDB which isn't a slouch itself. Here's for reference:
http://www.couchbase.com/press-releases/couchbase-dominates-cassandra-datastax-and-mongodb-newly-released-nosql-performance-benchmark
http://prnewswire.com/news-releases/mongodb-30-with-wired-tiger-new-benchmark-measures-performance-vs-couchbase-server-302-300053144.html
So my question how true is this? Has anyone else tested and can confirm such orders of magnitude performance difference?
If so, is there a way to improve MongoDB performance by integrating some cache for it? I think Couchbase is actually a 'cache' with CouchDB store added, how can MongoDB be used/integrated in some manner to provide similar performance?
Why not just use Couchbase if its better?
Well, I was concerned by reading many places about its "lack of documentation". Then I was alarmed by reading this:
"...Couchbase forum threads which are habitually abandoned by Couchbase reps when a developer points out a pretty huge flaw in their code, intentionally or unintentionally..."
http://scalabilitysolved.com/dont-use-couchbase-unless-you-really-really-want-to/
Just go to the bottom of that article linked above and read the entire comment at the bottom by Erutan. Basically if one goes to Couchbase website it does seem that the company is really pushing their "Enterprise" version mainly which is fine, but it is worry-some when people think that they might be purposefully not providing documentation and perhaps I misunderstood, but from what I gather from that Couchbase user's comments, some think that bugs might be left in the code "intentionally" to steer people to the enterprise version?
On the PLUS side, it does seem that all the code is Apache licensed so anyone is free to fix any bugs.
Anyway, for me, I was leaning towards MongoDB for various reasons, although performance was one of them, until happened on some couchbase benchmarks. Looking forward to some affirmations or challenges to these couchbase performance superiority claims and possible solutions to bolster MongoDB setup.
So is Couchbase way faster than any other non-memory proven/stable NoSql?
CouchBase is fast but not the fastest one. I tested it, and in my scenarios Tarantool was 20% faster in terms of requests per second. Both of them are at order of magnitude faster than MongoDB. Maybe you should consider using one of the in-memory with persistence databases instead of MongoDB as your primary data store. One database is more consistent than a database and a cache layer on top of it.

Is there a schema versioning tool for cassandra [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
In the sql world, it's quite common to have a tool that goes through a folder of schema scripts to set up some schema. A widely used approach is to have a table holding the current db version number, and ddl scripts so that we can start from any version of the db and update to any subsequent version in a controller manner. Visual Studio has db projects, redgate have similar tools.
I was wondering if there's something for cassandra as well. I know it won't be too difficult to implement something basic for cassandra, but was wondering if somebody's already done it.
Pillar manages migrations for your Cassandra data stores.
Pillar grew from a desire to automatically manage Cassandra schema as code. Managing schema as code enables automated build and deployment, a foundational practice for an organization striving to achieve Continuous Delivery.
Pillar is to Cassandra what Rails ActiveRecord migrations or Play Evolutions are to relational databases with one key difference: Pillar is completely independent from any application development framework.
https://github.com/comeara/pillar
Your initial question doesn't specify a language, though you later indicate you'd like C#. I don't have a C# answer, but I've extracted the Java versioning component that I'm using for my project. I also created a small sample project that shows how to integrate it. It's bare-bones. There are different approaches to this problem, so I picked one that was simple to build and does what I need. Here are the two GitHub projects:
https://github.com/DonBranson/cql_schema_versioning
https://github.com/DonBranson/cql_schema_versioning_example
This component doesn't store a version # in the schema, but stores the list of scripts it's run. It depends on the sort order of the script names to determine run order. Very basic.
Cassandra is by its nature is 'schemaless' it is a a structured key-value store, so it is very different from a traditional rdbms in that regard.
Cassandra has now evolved to be 'schema-optional' in that it allows to you describe general datatypes that live in a particular column family.
Try looking at Liquibase and/or Flyaway to see if the extensions provide the versioning capability you require.
http://bungeedata.blogspot.com/2013/12/liquibase-and-cassandra.html
http://www.datastax.com/dev/blog/schema-in-cassandra-1-1
http://planetcassandra.org/blog/schema-vs-schema-less/
I was looking for a schema migration tool that could be used for the following scenarios:
Automated upgrade to schema when an application is deployed.
Allow test Cassandra databases to be populated for integration tests.
After some searching, I've found the following two that look like potential candidates:
https://github.com/Contrast-Security-OSS/cassandra-migration
https://github.com/DonBranson/cql_schema_versioning
I'm not aware of anything that exists today.
To the extent that you're using CQL you could probably come up with something but you'll likely run into problems with the limited abilities of CQL to modify tables and then with transformation phase.
When I've used these types of tools with SQL, I always ended up with a bunch of SQL to update the data set after the application of updated DDL.
With CQL, I've had to write code to be applied after the schema change.
If all you're doing is adding or dropping tables, columns and indexes, it should be do-able.

Are relational databases a poor fit for Node.js?

Recently I've been playing around with Node.js a little bit. In my particular case I wound up using MongoDB, partly because it made sense for that project because it was very simple, and partly because Mongoose seemed to be an extremely simple way to get started with it.
I've noticed that there seems to be a degree of antipathy towards relational databases when using Node.js. They seem to be poorly supported compared to non-relational databases within the Node.js ecosystem, but I can't seem to find a concise reason for this.
So, my question is, is there a solid technical reason why relational databases are a poorer fit for working with Node.js than alternatives such as MongoDB?
EDIT: Just want to clarify a few things:
I'm specifically not looking for details relating to a specific application I'm building
Nor am I looking for non-technical reasons (for example, I'm not after answers like "Node and MongoDB are both new so developers use them together")
What I am looking for is entirely technical reasons, ONLY. For instance, if there were a technical reason why relational databases performed unusually poorly when used with Node.js, then that would be the kind of thing I'm looking for (note that from the answers so far it doesn't appear that is the case)
No, there isn't a technical reason. It's mostly just opinion and using NoSQL with Node.js is currently a popular choice.
Granted, Node's ecosystem is largely community-driven. Everything beyond Node's core API requires community involvement. And, certainly, people will be more likely to support what aligns with their personal preferences.
But, many still use and support relational databases with Node.js. Some notable projects include:
mysql
pg
sequelize
I love Node.js, but with Node it actually makes more sense to use a RDBMs, as opposed to a non-relational DB. With a noSQL/non-relational solution you often need to do manual joins in your Node.js code and sometimes work with a lack of transactions, a technical feature of RDBMs that have commit/rollback features. Here are some potential problems with using Non-Relational DBs + Node.js servers:
(a) the joins are slower and responses are slower, because Node is not C/C++
(b) the expensive joins block your
event loop, because the join is happening in your Node.js code not on some database server
(c) manually writing joins is often difficult and error-prone; your
noSQL queries could easily be incorrect or your join code might be
incorrect or suboptimal; optimized joins have been done before by the masters of
RDBMs, and joins in RDBMs are proven to be correct, mathematically in most cases.
(d) Some non-relational databases, like MongoDB, do not support transactions - in my team's case, that means we have to use an external distributed lock so that multiple queries can be grouped together into an atomic transaction. It would be somewhat easier if we could just use transactions and avoid application level locks.
with a more powerful relational database system that can do optimized joins in C/C++ on the database server rather than in your Node.js code, you let your Node.js server do what it's best at.
With that being said, I think it's pretty f*ing stupid that many major noSQL vendors don't support joins (?) Complete de-normalization is only a dream as far as I can see it. And the lack of transactions can be a bit weird. Without transactions, only one query is atomic, you cannot make multiple queries atomic without an application level locking mechanism :/
Take-aways:
If you want non-relational persistence - why not simply de-normalize a relational database? There is nobody forcing you to use a traditional database in a relational manner.
If you use a relational DB with Node.js I recommend this ORM:
https://github.com/typeorm/typeorm
As an aside, I prefer the term "non-relational" as opposed to "noSQL".
In my experience node tends to be popular with databases that have a stateless API, this fits very nicely into nodes async nature. Most relational databases utilize stateful connections for transactions, this minimizes the primary advantages of async non-block i/o.
Can you explain exactly what specific problems you are facing with your chosen database and node.js?
A few reasons why MongoDB could be more popular than relational databases:
MongoDB is essentially a JSON object store, so it translates very well for a javascript application. MongoDB functions are javascript functions.
I am just guessing here, but since NoSQL databases are newer and have more enthusiastic programmers experimenting with it, you probably have more involvement in those NPM modules.
Apart from this, Node.js technically is a perfect choice for any sort of database application. I have personally worked on a small Node.js/MySQL application and I didn't face any hurdles.
But back to my main point, we could talk about this all day, and that is not what this forum is for. If you have any specific issues in any code with Node.js and your database of choice, please ask those questions instead.
Edit: Strictly technical reasons, apart from the JSON compatibility on both sides: There are none.
Anyone wondering about the same question in 2021-
Node has nothing to do with type of databse you choose.
You can choose database of your choice as per your requirement.
If you need to maintain strict data structure then choose relational db, else you can go for NO-SQL.
There are NPM packages for PostgreSQL, MySql and other db which are non-blocking. These db clients will not block the Node process while performing queries.

Cassandra vs Riak [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I am looking for an eventually consistent data store and it looks like it may be coming down to Riak or Cassandra. Has anyone got expereinces of a view on this?
As you probably know, they are both architecturally strongly influenced by Dynamo (eventually consistent, no single points of failure, etc). Both also go beyond Dynamo in providing a "richer than pure K/V" data model -- in Cassandra's case, providing a Bigtable-like ColumnFamily mode, in Riak's, a Document-oriented one. I have seen sane people choose both.
I believe points that favor Cassandra include
speed
support for clusters spanning multiple data centers
big names using it (digg, twitter, facebook, webex, ... -- http://n2.nabble.com/Cassandra-users-survey-tp4040068p4040393.html)
Points that favor Riak include
map/reduce support out of the box
/Cassandra dev, fwiw
Riak is used by
Mozilla Foundation
Ask.com sponsored listings
Comcast
Citigroup
Bet365
I think they both pass the test of credible reference customers/users.
Cassandra seems more mature, and is currently doing better in benchmarks. Riak seems easier to add a node to as your cluster grows.
For completeness: A good (probably biased) comparison between the two can be found at http://docs.basho.com/riak/1.3.2/references/appendices/comparisons/Riak-Compared-to-Cassandra/
Use and download are different. Best to get references.
Perhaps a private conversation could be had where Riak references in these companies could be shared? Not sure how to get such with Cassandra, but there is a community of companies that support Cassandra that seem like a good place to start. As these probably have community participants in Cassandra development, it may be a REALLY reasonable place to start.
I would like to hear Riak's answer to recent and large deployments where customers are happy.
I also would like to see the roadmap for each product. Cassandra is a bit easier to track (http://wiki.apache.org/cassandra/) than Riak in my view as Cassandra's wiki discusses limitations and things that are probably going to change going forward, but neither outline futures well. I could understand that of an open source community ... perhaps ... but I cannot for a product for which I must pay.
I also would suggest research of Cloudant, which has what appears to be a very nice layering of capabilities. It also looks like it is bringing to bear the capabilities elsewhere in Apache land. CouchDB is the Apache platform on which Cloudant is based. BUT the indexing with Lucene seems but the tip of the iceberg when it comes to where Cloudant could go. Creating and managing an index is a very systematic process, a kind of data pipeline, that could be scripted using other Apache community assets. AND capabilities like NLP also could be added through Lucene indirectly, or maybe directly into what is persisted.
It would be nice to see a proposed Cloudant roadmap, especially since the team could mine the riches of the Apache community and integrate such into Cloudant. Such probably exists as there is an operational component to the Cloudant revenue model that will require it, if for no other reason.
Another area of interest ... Cloudant's pricing model ... it is clear their revenue model is not based on software, but around service. That is quite attractive, and it seems consistent with the ecosystem surrounding Cassandra too. I don't know if the Basho folks have won over enough of the nosql community as yet ... don't see such from any buzz around their web site or product.
I like this Cloudant web page (https://cloudant.com/the-data-layer/). I was surprised to see the embedded Erlang capability ... I did not know CouchDB was written in Erlang as this seems unusual to me in the Apache community (my ignorance); CouchDB appears to be older than other nosql products I know (now) to be written in Erlang. Whatever their strategy, they at least count Amazon EC2 and Microsoft Azure as hosting partners, indicating an appreciation of Microsoft and !Microsoft worlds - all very important if properly recognizing the middleware value potential (beyond cache or hash table applications) that these types of data stores could have.
Finally, while I don't know the board well, Andy Palmer's guidance looks like it will be valuable. He can bring some guidance vis-a-vis structured data (through VoltDB) to a world that rightly or wrongly may be unfairly branded as KVP hash tables of unstructured data. The need for structure and ecosystem surrounding nosql "databases" is being recognized ... witness Google's efforts with Spanner ... KVP/little structure/need for search-ability motivated Google's investment in the Spanner space. While we all may not need something like Spanner, we probably do need an improving and robust "enterprise" management and interoperability capability in these nosql databases to make it reasonable to incorporate them into modern cloud architectures. The needed structure can come from ease of interoperability and functional richness. It can also come from new capabilities that support conversion of unstructured data to structured data (e.g. indexes, use of NLP to create structured and parsed renderings of things inside of a KVP blob, and plenty of other things that, if put into a roadmap and published, could entice and grow a user base). Cloudant looks like it has a good chance of success ... I will take a closer look at it ...
And look what I found about CouchDB ...
CouchDB comes with a suite of features, such as on-the-fly document transformation and real-time change notifications, that makes web app development a breeze. It even comes with an easy to use web administration console. You guessed it, served up directly out of CouchDB! We care a lot about distributed scaling. CouchDB is highly available and partition tolerant, but is also eventually consistent. And we care a lot about your data. CouchDB has a fault-tolerant storage engine that puts the safety of your data first.

Resources