Using Cassandra as a "schemaless NoSQL database" - cassandra

I'm looking at using Cassandra for an enterprise web-site I'm working on, which could be used by up to 250 million users. Cassandra seems like an obvious choice because of the way it scales, although I was a little sad not to be able to use a schema-less database like Couch (for political reasons I won't go in to).
I've read that you can still use Cassandra like a schema-less database, using either a super-column or simply serializing objects in to normal columns. At the moment I'm using .NET for my front-end.
Are there any libraries out there already that help with using Cassandra in this way?
Has anyone done anything like this already using .NET? Any tips?
Any advice gratefully received!
Thanks,
Steve.

Datomic is schemaless. Attributes are modeled and generic objects can be created, saved, queried with any combination of attributes.
http://www.datomic.com
http://docs.datomic.com/storage.html#cassandra

Related

How to backup up a Sqlalchmey database?

I am trying to backup a database through sqlachemy and save it as a file. I tried using the extension, Flask-AlchemyDumps, but it appears to no longer be supported.
I musted be missing something obvious as this is surly an action a lot of developers want to do. Does anyone know how I should be backing up the database?
Thanks in advance
J Kirkman
SQLAlchemy is an ORM which sits between your code and the database. It's useful if you want to interact with specific rows and relationships without having to keep track of lots of ids and joins.
What you're looking for is a way to dump the entire contents of your DB to disk, presumably so you can restore it later/elsewhere. This is a bulk action, which is your first clue that an ORM may not be a suitable tool. (ORMs tend to be fast enough for small to medium operations, but slow and not ideal for actions which affect 10s of 1000s of rows at once.) And indeed, this isn't usually something you'd use an ORM for, it's a feature of your DB, presumably Postgres or MySQL. If you happen to be using Heroku, you can use their command line tool to do this.

Mongoose Schema Design approach

I am new to NoSql databases. I am trying to build a project and stuck with the approach of whether to choose sql databases or NoSql Databases for the project.
The requirements of my project are a legal firm would have many clients and each client can have different matter Type such as Immigration, Conveyancing, Family and etc and each MatterType can also have different fields which are never constant and they can fairly change in future.
Due to this nature I thought Nosql databases might be a good choice as they are document based and I can add any new fields to the document structure instead of always adding new columns to a sql data table dynamically which is not a good approach ( atleast i think)
Can anyone please kindly suggest me or refer me to an article which can assist me in deciding my approach
To give my clarity into my question let me explain with an example
For a client name xyz and matterType Immigration I can have fields such as firstName,lastName,Dob at this moment but later on for the same client I might have to add Dependants and their details
For a client name def and matterType conveyancing I would have different fields but those fields should also be added dynamically depending on the matter Type
Thank you in advance
Regards
Anand
In my opinion, you shouldn't only consider this feature in other to decide between NoSql or RBMDS.
In fact, this flexibility sounds very good, but it might be dangerous, once systems tend to raise, then things can get out of hand.
I have a system where I use MongoDB, but even though, I decided creating a schema for my collections.
I would suggest you finish modeling, then after that, conclude if it's really necessary to use NoSql.
I would like to suggest you to look into postgres sql if you are expecting large datasets. It offers the advantages of no sql databases such as support for key value pair and also keeping a rigid data structure like sql databases. Following are links to a few articles which may help you decide which approach to choose:
NoSql vs Sql
postgres vs mongodb

Confusing between Thrift API and CQL

I am working in a Java web application, using NoSQL (target is Cassandra). I use Astyanax as Cassandra client since it is suggested the best client of Cassandra for now. I've just approached Cassandra for 2 weeks, so many things is so weird to me.
During my working, I encountered some problems and I do not know how to overcome:
Is table created from CQL like column family created by Thrift API? I feel they are similar, but maybe there are some differences behind. For example:
table create by CQL command cannot be accessed by Thrift API
Thrift-based APIs cannot work with tables created by CQL, but CQL methods can access column family created by Thrift API!
​Is primary key in table correspond to row key in column family?
In CQL I can declare a table which contains a collection/set/map inside. Can I do the same thing in Thrift API?
If my application needs both of them (column families and tables), how can they deal with each other?
I recognize one thing: I cannot use Thrift API to do manipulating data on tables create by CQL, and vice versa. I wonder that that, how can I remember which table/column family created from which way so that I can use the correct APIs to process data? For the time being, we don't have a general way to handle two of them, do we? AFAIK, Thrift API and CQL do not have a same interface, so they cannot understand each other?!
Could you please help me explain these things? Thank you so much.
Yes. It's impossible to update the Thrift APIs to be CQL-aware without breaking existing applications. So if you use CQL you are committing to using CQL clients only like the Java driver, and not Astyanax, Hector, et al. But this is no great sacrifice since CQL is much more usable.
For a simple PK (i.e., single column), yes. For a compound PK, it's a bit more complicated.
No. The Thrift API operates at a lower level, by design. (So you'd see the individual storage cells that make up the Map, for instance.)
I don't understand the question. With CQL you can do everything you could do with Thrift, but more easily.
Simple; don't mix the two. Stick with one or the other.
In my opinion, I believe focus is shifting towards making cassandra look like a RDBMS with SQL Queries to gain wider adoption.
But with inconsistencies between work done using Hector/Astyanax(thrift) and CQL, i think it will hurt adoption. Its almost a U turn from hector/astyanax to CQL in the middle of the journey.
Atleast CQL should have been planned in such a way that Thrift api (and high level java apis on top of it) have no problem in transitioning.

NoSQL database with high read performances (write accesses are not significant)?

I'm working on a "real-time" website using Nodejs. Currently, I'm using Redis because I need high performance for read-access. The write accesses are not really significant for my use case.
In addition, Redis does not have a query language for the search. So, I create my indexes manually and I use some unions/intersections/... to find some values.
I think that it will be easier to use MongoDB with a embedded finding system and a ORM-like (Mongoose for example). The problem is that I'm not sure that MongoDB is the best choice for my usecase.
What is your advices about the NoSQL DB that I need ? Redis ? CouchDB ? MongoDB ? Cassandra ? etc.
I repeat: I want to have a real good performance for the read accesses and for the searches (the write accesses are not significant), the simplest possible (orm-like ? finding system ? etc.)
Thanks.
I believe that redis would be the better solution for the following reasons.
You require fast read access and redis provides the fastest solution since the keys are in memory, if not most.
Although mongodb is easier to query in the general case, your problem domain is narrow and once you decide how you would like to query the data, you can put the correct data structures and indexes in place.
I would say that Redis is a good fit for your DB, and you should look at something like Solr or elasticsearch to provide your searching.
CouchDB will do better in write heavy environment. I don't use it though.
MongoDB will do better on read heavy environment.
For search and indexing:
MongoDB would require separate index for each of your search criteria for better performance (at least this is what I remember).
Proper index is important in MongoDB. And no joins!!
Here are some links you might go through:
http://www.mongodb.org/display/DOCS/Comparing+Mongo+DB+and+Couch+DB
http://www.snailinaturtleneck.com/blog/2009/06/29/couchdb-vs-mongodb-benchmark/
http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
Hope these will help you find the right db
Goodluck

Querying with Redis?

I've been learning Node.js so I decided to make a simple ad network, but I can't seem to decide on a database to use. I've been messing around with Redis but I can't seem to find a way to query the database by specific criteria, instead I can only get the value of a key or a list or set inside a key.
Am I missing something, or should I be using a more robust database like MongoDB?
I would recommend to read this tutorial about Redis in order to understand its concepts and data types. I also had problems to understand why there is no querying support similar to other (no) SQL databases until I read few articles and try to test and compare Redis with other solutions. Maybe it isn't the right database for your use case, although it is very fast and supports advanced data structures, but lacks querying which is crucial for you. If you are looking for a database which allows you to query your data then you should try mongodb or maybe riak.
Redis is often referred to as a data
structure server since keys can
contain strings, hashes, lists, sets
and sorted sets.
If able(easy to implement) you should use these primitives(strings,hashes,lists,set and sorted sets). The main advantage of Redis is that is lightning fast, but that it is rather primitive key-value store(redis is a little bit more advanced). This also means that it can not be queried like for example SQL.
It would probably be easier to use a more advanced store, like for example Mongodb, which is a document-oriented database. The trade-off you make in this case is PERFORMANCE, but I believe you should only tackle that if that is becoming a problem, which it probably will not be because Mongodb is also pretty fast and has the advantage that it can be queried. I think it would be advisable to have proper indexes for your queries(read>write) to make it fast.
I think that the main answer comes from the data structure. Check this article about NoSQL Data Modelling, for me it was very helpful: NoSql Data Modelling.
A second good article ever about Data Modeling, and making a comparison between SQL and NoSQL is the following: The Relational model anti pattern.

Resources