Multi tenancy with Merb, DataMapper and CouchDB (or MongoDB) - couchdb

Does Anyone know to achieve, or have any resources about, muti tenancy involving these technologies?
Additionally, is it recommendable to store sensitive data in a relational database and other kinds of data in NoSQL databases?
Thanks in advance.

Cloudant has been providing multitenancy clusters for a few years now. The technology is based on CouchDB but with a bunch of enhancements. Once you outgrow the multitenant cluster you can "hit a button" and flip over to a private cluster, all of which Cloudant will manage for you.
Feel free to reach out to me offline if you want more info or check out http://www.cloudant.com.
I'm less familiar with the hosting options for those other technologies, so I wouldn't feel good about recommending one. In full disclosure I work at Cloudant, but I often recommended them before they hired me. :)
Cheers.

Related

MongoDB Atlas database vs MongoDB local, which is best for SaaS in terms of transaction speed (Querying)

I have developed an automation web tool (SaaS app), right now I'm using mongoDb atlas cloud database with amazon EC2 Xlarge instance with quad core EBS enabled processor and 16GB RAM. Is atlas the best or local mongo if so why?, which will give me a better performance, some serious help here.
MongoDB:
you are able to take advantage of this tool since being a non-relational database, it is much easier to build the model of the architecture of the database model. This makes the development time much easier. When working with javascript language, or working with JSON objects and collections, MongoDB makes the connection of services for queries much lighter and optimizes the performance of the applications. Also, you can work, in case you do not know the console commands, with a Desktop database administrator in a graphical way. The learning times really are much faster, which allows a great scalability of the project. In the development department, this optimizes the delivery time with the clients, which makes the projects much more feasible in terms of delivery times.
PROS:
Being a JSON language optimizes the response time of a query, you can directly build a query logic from the same service
You can install a local, database-based environment rather than the non-relational real-time bases such a firebase does not allow, the local environment is paramount since you can work without relying on the internet.
Forming collections in Mango is relatively simple, you do not need to know of query to work with it, since it has a simple graphic environment that allows you to manage databases for those who are not experts in console management.
CONS:
MongoDB seems to be one of the most complete tools in its field, I believe that it has all the features that a non-relational database should have.
Perhaps because it is a relatively new tool there are very few experts in the field of MongoDB.
To Summarize:
Mongo DB is better placed in large projects, with great scalability. It also allows you to work quite comfortably with projects based on programming languages such as javascript angular typescript C #. I believe that its performance is much better with the type of technologies that handle very logical, similar terms of programming. If we use languages like java php, for example, it is better to work with relational databases like postgres or mySql.
MongoDB-atlas:
my department at the company i work at, is using the MongoDB Atlas cluster that we set up on our own servers. It has reached to a point that it becomes hard to manage and to scale. MongoDB Atlas came to our site with the ability to scale and free of management, which saves a lot of effort for us.
PROS:
No infrastructure on our side. Free of management.
Easy to scale up and down.
It has strong authentication and encryption features that make sure that developers don't get lazy and leave out data in the open by leaving their servers unguarded.
CONS:
More granular billing.
More specific alerting system.
One of the drawbacks of MongoDB-Atlas is the cost. Hopefully more competition will bring down the costs over time.
To Summarize:
I would recommend MongoDB Atlas to every person/company who have a significant need in the NoSQL database and do not want to manage their infrastructure. Using MongoDB Atlas can significantly reduce your management time and cost, which saves valuable resources for other tasks. It also suits a smaller company as MongoDB Atlas scales up and down very quickly.
Hopefully I answered your question, Good Luck!

DataStax Cassandra seems expensive, is there a best practice configuration to use Apache Cassandra in Production?

DataStax seems expensive. Is there a best practice configuration that is available to use Apache Cassandra in production? I am trying to setup Cassandra on EC2.
Thanks
Instead of giving you a commercial for some other product, let me give you some practical advice when choosing to go with OSS vs Commerical licensed products.
You have two things to spend when using any technology. Time or money. Ultimately time is money, but for the sake of this let's say they are different. By your question, you have more time so let's focus on that.
Spend the time to learn the fundamentals. The term black magic is FUD. Some of the world's largest workloads are running on Cassandra. You can do it too.
Seek out peers and learn from those who have been successful. There are organizations that have been running Cassandra in prod for years.
Focus on a single use case/project. Nothing worse than trying to replace all of your infrastructure with a new technology when you are learning. Pick one thing and become proficient. Use that experience for the next projects.
You can get some free training at DataStax Academy. http://academy.datastax.com
Learn from peers by watching talks from the community of awesome users.
You can find something in these 135 talks here: https://www.youtube.com/playlist?list=PLm-EPIkBI3YoiA-02vufoEj4CgYvIQgIk
If you need to ask questions. Stack Overflow, the Cassandra mailing list, and DataStax Academy Slack are all good resources.
Using a commercial product or spending the time is up to you, but don't let anyone try to convince you that it's too hard and you should use something else. We are all here to help if we can.
Disclaimer: I'm a ScyllaDB employee.
There are several alternative to operate Cassandra/Scylla like workloads.
Use OpenSource Cassandra, with best practices. Most of them, unfortunately, where created couple of years ago. So you'll need to learn the black magic of tuning JVM and Cassandra loads.
https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html
There are no "official" AMIs on AWS for recent releases of Cassandra.
Use Scylla OpenSource. It is a drop replacement for Cassandra. Scylla autotunes itself, to minimize the intervention of the operator in the day-to-day operations. Also, Scylla provides opensoure AMIs for EC2 deployment, so, all you need is an AWS account.
Scylla is a C++ implementation of Cassandra, which benefits from the great (and costly) resources on AWS. Thus, offer a better ops/$$ ratio. Scylla highly recommends the usage of I3 instances, you'd be using contemporary CPU technology, excellent I/O (NVMe based) and lots of memory at the fraction of the cost of other EC2 instances.
You can read more about it here:
http://www.scylladb.com/2017/05/10/faster-and-better-what-to-expect-running-scylla-on-aws-i3-instances/
ScyllaDB is committed to provide opensource, optimized AMI versions.
Buy enterprise licenses from DataStax or Scylla.
Hire consultants to help you install a Cassandra setup.
Companies like "the last pickle" or Pythian can help you in that regard.
Use DBaaS offerings from the following companies:
Scylla:
IBM Compose: https://www.compose.com/databases/scylladb
Joyent Triton:https://www.joyent.com/blog/free-trial-managed-scylladb-beta-on-triton
Scylla and Cassandra
Instaclustr: https://www.instaclustr.com/
Hope this helps.

cassandra,solr,lucandra,solandra

I am developing a site using following technologies,
Ruby on Rails,(ruby 1.8.7,rails 2.3.5)
Cassandra 0.6.8,
I want to index the Cassandra Database using Lucandra,
How do I do this?
Is there any RESTful APIs or any web services available for this, so
that I can push the data to index database?
Please share if any ROR example using Lucandra, that really help us to
move forward.
Or Guide me some steps to achieve this.
I am googling for 3 days and I am not getting any examples using
Lucandra in ROR.
Your help will be appreciated in advance
The Solandra project which is replacing Lucandra no longer uses
thrift, only Solr. http://github.com/tjake/Lucandra
This means you can use any of the Solr supported gems like
acts_as_solr
I'm recommending elasticsearch. It has rest api, ruby & rails clients.
https://github.com/angelf/escargot
https://github.com/grantr/rubberband
Elasticsearch is the most advanced free search solution in the world today. It's based on lucene, has High Availability, fault tolerant, partitioned, high performance, scalable, state of art technologhy , open source, more simple than solr... It's success belongs to it's author Shay Banon. He has years of experience as an architect in this field. Solr (and solandra) is nowhere near of it. Simply investigate both, you'll see yourself.
my best
Serdar

Cassandra vs Riak [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I am looking for an eventually consistent data store and it looks like it may be coming down to Riak or Cassandra. Has anyone got expereinces of a view on this?
As you probably know, they are both architecturally strongly influenced by Dynamo (eventually consistent, no single points of failure, etc). Both also go beyond Dynamo in providing a "richer than pure K/V" data model -- in Cassandra's case, providing a Bigtable-like ColumnFamily mode, in Riak's, a Document-oriented one. I have seen sane people choose both.
I believe points that favor Cassandra include
speed
support for clusters spanning multiple data centers
big names using it (digg, twitter, facebook, webex, ... -- http://n2.nabble.com/Cassandra-users-survey-tp4040068p4040393.html)
Points that favor Riak include
map/reduce support out of the box
/Cassandra dev, fwiw
Riak is used by
Mozilla Foundation
Ask.com sponsored listings
Comcast
Citigroup
Bet365
I think they both pass the test of credible reference customers/users.
Cassandra seems more mature, and is currently doing better in benchmarks. Riak seems easier to add a node to as your cluster grows.
For completeness: A good (probably biased) comparison between the two can be found at http://docs.basho.com/riak/1.3.2/references/appendices/comparisons/Riak-Compared-to-Cassandra/
Use and download are different. Best to get references.
Perhaps a private conversation could be had where Riak references in these companies could be shared? Not sure how to get such with Cassandra, but there is a community of companies that support Cassandra that seem like a good place to start. As these probably have community participants in Cassandra development, it may be a REALLY reasonable place to start.
I would like to hear Riak's answer to recent and large deployments where customers are happy.
I also would like to see the roadmap for each product. Cassandra is a bit easier to track (http://wiki.apache.org/cassandra/) than Riak in my view as Cassandra's wiki discusses limitations and things that are probably going to change going forward, but neither outline futures well. I could understand that of an open source community ... perhaps ... but I cannot for a product for which I must pay.
I also would suggest research of Cloudant, which has what appears to be a very nice layering of capabilities. It also looks like it is bringing to bear the capabilities elsewhere in Apache land. CouchDB is the Apache platform on which Cloudant is based. BUT the indexing with Lucene seems but the tip of the iceberg when it comes to where Cloudant could go. Creating and managing an index is a very systematic process, a kind of data pipeline, that could be scripted using other Apache community assets. AND capabilities like NLP also could be added through Lucene indirectly, or maybe directly into what is persisted.
It would be nice to see a proposed Cloudant roadmap, especially since the team could mine the riches of the Apache community and integrate such into Cloudant. Such probably exists as there is an operational component to the Cloudant revenue model that will require it, if for no other reason.
Another area of interest ... Cloudant's pricing model ... it is clear their revenue model is not based on software, but around service. That is quite attractive, and it seems consistent with the ecosystem surrounding Cassandra too. I don't know if the Basho folks have won over enough of the nosql community as yet ... don't see such from any buzz around their web site or product.
I like this Cloudant web page (https://cloudant.com/the-data-layer/). I was surprised to see the embedded Erlang capability ... I did not know CouchDB was written in Erlang as this seems unusual to me in the Apache community (my ignorance); CouchDB appears to be older than other nosql products I know (now) to be written in Erlang. Whatever their strategy, they at least count Amazon EC2 and Microsoft Azure as hosting partners, indicating an appreciation of Microsoft and !Microsoft worlds - all very important if properly recognizing the middleware value potential (beyond cache or hash table applications) that these types of data stores could have.
Finally, while I don't know the board well, Andy Palmer's guidance looks like it will be valuable. He can bring some guidance vis-a-vis structured data (through VoltDB) to a world that rightly or wrongly may be unfairly branded as KVP hash tables of unstructured data. The need for structure and ecosystem surrounding nosql "databases" is being recognized ... witness Google's efforts with Spanner ... KVP/little structure/need for search-ability motivated Google's investment in the Spanner space. While we all may not need something like Spanner, we probably do need an improving and robust "enterprise" management and interoperability capability in these nosql databases to make it reasonable to incorporate them into modern cloud architectures. The needed structure can come from ease of interoperability and functional richness. It can also come from new capabilities that support conversion of unstructured data to structured data (e.g. indexes, use of NLP to create structured and parsed renderings of things inside of a KVP blob, and plenty of other things that, if put into a roadmap and published, could entice and grow a user base). Cloudant looks like it has a good chance of success ... I will take a closer look at it ...
And look what I found about CouchDB ...
CouchDB comes with a suite of features, such as on-the-fly document transformation and real-time change notifications, that makes web app development a breeze. It even comes with an easy to use web administration console. You guessed it, served up directly out of CouchDB! We care a lot about distributed scaling. CouchDB is highly available and partition tolerant, but is also eventually consistent. And we care a lot about your data. CouchDB has a fault-tolerant storage engine that puts the safety of your data first.

What are the best practices in building multi-tenancy applications?

What are the best practices in building applications that support multiple tenants such as Software as a Service?
Links to white papers that expand on this topic are greatly appreciated.
For the database:
A. Put everything on the same database, put a tenant_id column on your tables
Pros: Easy to do
Cons: Very prone to bugs: it's easy to leak data from one tenant to another.
B. Put everything on the same database, but put each tenant in its own namespace (postgresql calls them schemas)
Pros: Provides better data leak protection than option A
Cons: Not supported by all databases. AFAIK PostgreSQL and Oracle supports it.
C. Setup one database per tenant
Pros: Absolutely no chance of data leaking from one tenant to another
Cons: Setting up new tenants is more complicated. Database connections are expensive.
I only learned the above ideas from Guy Naor. Here's a link to his presentation:
http://aac2009.confreaks.com/06-feb-2009-14-30-writing-multi-tenant-applications-in-rails-guy-naor.html
You might find some valuable advise in a series of blog posts by Oren Eini.
This is one of the last posts in the series, with links to previous posts: http://ayende.com/Blog/archive/2008/08/16/Multi-Tenancy--Approaches-and-Applicability.aspx

Resources