I have a few questions regarding the integration of the two tools. Not technical questions and how to setup( i will have my fun with that later ) but more on the course of the project and the direction, seeing that JanusGraph is still very young.
I am starting a new project and already decided to use Cassandra for storage and using a graph on top sounds very appealing to me.
A couple of things that i would like to know in advance before i take that road.
JanusGraph is very young and it picks up from where Titan left about a year or so ago. There is gap there but the fact that is part of the Linux Foundation and all the big players are going to support it sounds promising. Is it safe to assume at this point that JanusGraph is here to stay? Would it be safe to depend on Janus as a startup project? And follow development of course and be up to date as much as possible.
Cassandra. Titan/JanusGraph integrates with Cassandra 2.1.9 using the thrift api which will be deprecated eventually in Cassandra 4. I know that work is being done at the moment to make janus work with Cassandra 3 and eventually work with CQL as well. Is it safe to start with existing janus and Cassandra 2.1.9 and deal with the migration later on? Will it be a huge task for a startup to handle?
Production ready JanusGraph.(This question relates to any kind of software in it's early stages and whether it's safe for a start up to use). As i understand it, it will take some time for JanusGraph to be production ready and catch up with the rest of the tools it integrates with( although work is being done as we speak:)). Again would it be safe to start using Janus at this point and follow development and finally migrate to a production ready version? What is the overall roadmap for JanusGraph?
My concern in general is whether the combination of the tools is a safe choice for a start up. The whole stack is already new to us and we are excited to try and learn but we will hit a migration period pretty quickly. Is it something that you would do/recommend? Is it a suicide?
Please share your thoughts and keep in mind that it doesn't have to be about the stack i am talking about. It could be any startup company dealing with any kind of software in its early stages.
Cheers
Full disclosure, I'm a developer for JanusGraph on Compose.
It's as safe as any other OSS software project with a large amount of backers. Everyone could jump on some new toy tomorrow, but I doubt it. Companies are putting money into it and the development community is very active.
There is a CQL backend for Janus that's compatible with the Thrift data model. Migration to CQL should be simple and pretty painless when 0.2.0 is released.
I know there are already people using Titan for production applications. With JanusGraph being forked from Titan, I think it's pretty reasonable to start in with JanusGraph from everything I've seen. As far as a roadmap, I'd check out the JanusGraph mailing list (dev/users) and see what's going on and what's being talked about.
Disclosure: I am one of the co-founders of the JanusGraph project; I am also seeking out and adding production users to our GitHub repo and website, so I may be slightly biased. :)
Regarding your questions:
Is it safe to use?
The project is young, but it is built on a foundation of Titan, a very popular graph database that's been around since 2012 and has already been running in production. We have contributors from a number of well-known companies, and several companies are building their business-critical applications directly on JanusGraph, e.g.,
GRAKN.AI is building their knowledge graph on JanusGraph
IBM's Compose.io has built a managed JanusGraph service
Uber is already running JanusGraph in production (having previously run Titan)
several other companies run JanusGraph as a core part of their production environment
We are also starting to identify companies who will provide consulting services around JanusGraph in case someone needs production-level support for their own self-managed deployments.
So as you can see, there is significant interest in and support for this project.
Cassandra upgrade
#pantalohnes answered this question; I won't repeat it here.
Production readiness
As I linked above (GitHub repo and website), we already have production users of JanusGraph which you can find there. Those are just the companies that are publicly willing to lend their name/logo to the project; I'm sure there are more. Also, Titan has been running in many production environments for several years; JanusGraph is a more up-to-date version of Titan, despite the low version number.
I am also speaking with other companies who are planning to migrate to JanusGraph soon; look for announcements via the #JanusGraph Twitter handle to learn about more production deployments.
Related
DataStax seems expensive. Is there a best practice configuration that is available to use Apache Cassandra in production? I am trying to setup Cassandra on EC2.
Thanks
Instead of giving you a commercial for some other product, let me give you some practical advice when choosing to go with OSS vs Commerical licensed products.
You have two things to spend when using any technology. Time or money. Ultimately time is money, but for the sake of this let's say they are different. By your question, you have more time so let's focus on that.
Spend the time to learn the fundamentals. The term black magic is FUD. Some of the world's largest workloads are running on Cassandra. You can do it too.
Seek out peers and learn from those who have been successful. There are organizations that have been running Cassandra in prod for years.
Focus on a single use case/project. Nothing worse than trying to replace all of your infrastructure with a new technology when you are learning. Pick one thing and become proficient. Use that experience for the next projects.
You can get some free training at DataStax Academy. http://academy.datastax.com
Learn from peers by watching talks from the community of awesome users.
You can find something in these 135 talks here: https://www.youtube.com/playlist?list=PLm-EPIkBI3YoiA-02vufoEj4CgYvIQgIk
If you need to ask questions. Stack Overflow, the Cassandra mailing list, and DataStax Academy Slack are all good resources.
Using a commercial product or spending the time is up to you, but don't let anyone try to convince you that it's too hard and you should use something else. We are all here to help if we can.
Disclaimer: I'm a ScyllaDB employee.
There are several alternative to operate Cassandra/Scylla like workloads.
Use OpenSource Cassandra, with best practices. Most of them, unfortunately, where created couple of years ago. So you'll need to learn the black magic of tuning JVM and Cassandra loads.
https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html
There are no "official" AMIs on AWS for recent releases of Cassandra.
Use Scylla OpenSource. It is a drop replacement for Cassandra. Scylla autotunes itself, to minimize the intervention of the operator in the day-to-day operations. Also, Scylla provides opensoure AMIs for EC2 deployment, so, all you need is an AWS account.
Scylla is a C++ implementation of Cassandra, which benefits from the great (and costly) resources on AWS. Thus, offer a better ops/$$ ratio. Scylla highly recommends the usage of I3 instances, you'd be using contemporary CPU technology, excellent I/O (NVMe based) and lots of memory at the fraction of the cost of other EC2 instances.
You can read more about it here:
http://www.scylladb.com/2017/05/10/faster-and-better-what-to-expect-running-scylla-on-aws-i3-instances/
ScyllaDB is committed to provide opensource, optimized AMI versions.
Buy enterprise licenses from DataStax or Scylla.
Hire consultants to help you install a Cassandra setup.
Companies like "the last pickle" or Pythian can help you in that regard.
Use DBaaS offerings from the following companies:
Scylla:
IBM Compose: https://www.compose.com/databases/scylladb
Joyent Triton:https://www.joyent.com/blog/free-trial-managed-scylladb-beta-on-triton
Scylla and Cassandra
Instaclustr: https://www.instaclustr.com/
Hope this helps.
In My organisation we are planning to use Cassandra and these days we are running some experimental tests against Custom Configuraiton to check the better and stable verison of Cassandra. And we are using DataStax drivers.
We are running tests, INSERT into and Select * from CQL statements in very tight loop with higher load like 10K qps.
So any one has any experience on which Cassandra version is better and stable and which drivers shall be used?
Thanks in advance.
You cannot go wrong with the latest 2.0 release (2.0.9). You can get that version from either the Apache Cassandra project or DataStax. The Apache Cassandra download page also has links for the latest release candidates (RC5 is the latest) of 2.1, but those are still in development, so consider that before installing them.
As for the driver, there are drivers available for more than a dozen languages. Chances are that you probably know or use one of them. There is no one driver (at least that I am aware of) that significantly out-performs all of the others. So pick the driver for the language that either:
You have the most thorough knowledge of.
Complies with the usage standards of your team.
For instance, you could make an argument for using Java. After all, Cassandra is written in Java and all of the examples on the original DataStax Academy are done with the Java CQL Driver. But that argument loses ground quickly if you have never done Java before. Or if your team is a .Net shop, and there's nobody else who understands Java. InfoWorld's Andrew Oliver put it best when he wrote:
The lesson to be learned here is: Don't solve a simple problem with a
completely unfamiliar technology and apply it to use cases it isn't
especially appropriate for.
Again, you cannot go wrong with using a "DataStax Supported Driver" from their downloads page.
“You should not deploy a Cassandra version X.Y.Z to production where Z <= 5.”
Source:
https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/
Hence go with 2.0.x . Currently its 2.0.10
I have been using CouchDB on some prototype applications and it has been brilliant, very easy to use and extremely quick. I was wondering if anyone has been using it in production and have any views on it's reliability, performance suitability for operational management etc ?? I am considering using it to support a service layer and would make use of its replication functionality.
Any comments/experiences would be most welcome.
I've used CouchDB for a few small in-house applications - it's been very stable and I've had no serious complaints. Setting that aside, a few small gripes -
1) Databases can be synchronized, but not nodes. That is, if you have four servers and twenty databases, you have to specify each server, and each database to synchronize. A minor gripe, but I prefer less management to more.
2) Since databases are append only, a database with a bunch of activity gets really big really quickly. Compacting fixes this, but isn't exactly fast, especially on big (e.g. 20 gigabytes) database. Scheduling compact for the weekends solved this, but doing that is probably less of an option for high availability applications.
3) Javascript is the de facto view language. What is not well advertised is that since CouchDB is written in Erlang, it also supports Erlang views, which are faster as they are "native". For applications doing a lot of operations in views, Erlang probably makes more sense.
Setting those minor issues aside, I'd wholeheartedly recommend it.
CouchDB ships in Ubuntu and is a fundamental component of the Ubuntu One service.
Is the latest mongodb native driver mature enough to use with for instance GridFS in a production environment or as specification in a large project?
Referring to http://mongodb.github.com/node-mongodb-native
I would like to consider the rapid changing conventions, as opposed to the maturity of the technology. In short, is it safe to select a version as specification for a high profile production environment?
My limited experience with the technology does not allow me to determine if it would be safe to use in a locked down specification scenario, or even version lock down as per long term support aka Ubuntu, where fix/security patches are OK as opposed to version changes.
Yes. This driver is mature enough to use in production. It is being used in many high profile Node.js deployments already and supports a feature set on par with existing MongoDB drivers. It is also put through the same testing as other MongoDB drivers and performs sufficiently well.
On the MongoDB side there should not be any concern about rapidly changing conventions. The API has shown stability over the past few releases and hasn't introduced any breaking changes through many releases.
Are you really sure that you want to use young technology in the kind of setting you are describing? It requires a lot of maturity for a project to start doing long term support of older versions.
Also in the open source world you rarely see the project itself providing any kind of long term support. Instead you have companies like Canonical and RedHat backporting patches to their specific versions of i.e. MySQL. 10Gen is the company behind MongoDB and mongodb-native and they would be the right ones to ask about long term support.
My experience with mongodb-native is that is a very rapidly improving project and you really need to keep up with what is going on. I would not like to support anything where the mongodb-native version is set in stone for the next n years.
Having said that MongoDB, Node.JS, and mongodb-native are certainly production ready if you are prepared to stay abreast with their rapid development.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I am looking for an eventually consistent data store and it looks like it may be coming down to Riak or Cassandra. Has anyone got expereinces of a view on this?
As you probably know, they are both architecturally strongly influenced by Dynamo (eventually consistent, no single points of failure, etc). Both also go beyond Dynamo in providing a "richer than pure K/V" data model -- in Cassandra's case, providing a Bigtable-like ColumnFamily mode, in Riak's, a Document-oriented one. I have seen sane people choose both.
I believe points that favor Cassandra include
speed
support for clusters spanning multiple data centers
big names using it (digg, twitter, facebook, webex, ... -- http://n2.nabble.com/Cassandra-users-survey-tp4040068p4040393.html)
Points that favor Riak include
map/reduce support out of the box
/Cassandra dev, fwiw
Riak is used by
Mozilla Foundation
Ask.com sponsored listings
Comcast
Citigroup
Bet365
I think they both pass the test of credible reference customers/users.
Cassandra seems more mature, and is currently doing better in benchmarks. Riak seems easier to add a node to as your cluster grows.
For completeness: A good (probably biased) comparison between the two can be found at http://docs.basho.com/riak/1.3.2/references/appendices/comparisons/Riak-Compared-to-Cassandra/
Use and download are different. Best to get references.
Perhaps a private conversation could be had where Riak references in these companies could be shared? Not sure how to get such with Cassandra, but there is a community of companies that support Cassandra that seem like a good place to start. As these probably have community participants in Cassandra development, it may be a REALLY reasonable place to start.
I would like to hear Riak's answer to recent and large deployments where customers are happy.
I also would like to see the roadmap for each product. Cassandra is a bit easier to track (http://wiki.apache.org/cassandra/) than Riak in my view as Cassandra's wiki discusses limitations and things that are probably going to change going forward, but neither outline futures well. I could understand that of an open source community ... perhaps ... but I cannot for a product for which I must pay.
I also would suggest research of Cloudant, which has what appears to be a very nice layering of capabilities. It also looks like it is bringing to bear the capabilities elsewhere in Apache land. CouchDB is the Apache platform on which Cloudant is based. BUT the indexing with Lucene seems but the tip of the iceberg when it comes to where Cloudant could go. Creating and managing an index is a very systematic process, a kind of data pipeline, that could be scripted using other Apache community assets. AND capabilities like NLP also could be added through Lucene indirectly, or maybe directly into what is persisted.
It would be nice to see a proposed Cloudant roadmap, especially since the team could mine the riches of the Apache community and integrate such into Cloudant. Such probably exists as there is an operational component to the Cloudant revenue model that will require it, if for no other reason.
Another area of interest ... Cloudant's pricing model ... it is clear their revenue model is not based on software, but around service. That is quite attractive, and it seems consistent with the ecosystem surrounding Cassandra too. I don't know if the Basho folks have won over enough of the nosql community as yet ... don't see such from any buzz around their web site or product.
I like this Cloudant web page (https://cloudant.com/the-data-layer/). I was surprised to see the embedded Erlang capability ... I did not know CouchDB was written in Erlang as this seems unusual to me in the Apache community (my ignorance); CouchDB appears to be older than other nosql products I know (now) to be written in Erlang. Whatever their strategy, they at least count Amazon EC2 and Microsoft Azure as hosting partners, indicating an appreciation of Microsoft and !Microsoft worlds - all very important if properly recognizing the middleware value potential (beyond cache or hash table applications) that these types of data stores could have.
Finally, while I don't know the board well, Andy Palmer's guidance looks like it will be valuable. He can bring some guidance vis-a-vis structured data (through VoltDB) to a world that rightly or wrongly may be unfairly branded as KVP hash tables of unstructured data. The need for structure and ecosystem surrounding nosql "databases" is being recognized ... witness Google's efforts with Spanner ... KVP/little structure/need for search-ability motivated Google's investment in the Spanner space. While we all may not need something like Spanner, we probably do need an improving and robust "enterprise" management and interoperability capability in these nosql databases to make it reasonable to incorporate them into modern cloud architectures. The needed structure can come from ease of interoperability and functional richness. It can also come from new capabilities that support conversion of unstructured data to structured data (e.g. indexes, use of NLP to create structured and parsed renderings of things inside of a KVP blob, and plenty of other things that, if put into a roadmap and published, could entice and grow a user base). Cloudant looks like it has a good chance of success ... I will take a closer look at it ...
And look what I found about CouchDB ...
CouchDB comes with a suite of features, such as on-the-fly document transformation and real-time change notifications, that makes web app development a breeze. It even comes with an easy to use web administration console. You guessed it, served up directly out of CouchDB! We care a lot about distributed scaling. CouchDB is highly available and partition tolerant, but is also eventually consistent. And we care a lot about your data. CouchDB has a fault-tolerant storage engine that puts the safety of your data first.