Connecting open-source Cassandra to Tableau Desktop - cassandra

I am trying to use the Business Intelligence (BI) software Tableau Desktop to see into a local Cassandra cluster. The Cassandra cluster is the open-source version and not the proprietary version that one pays to use. The version of Cassandra I am using is 2.2.x.
I can successfully connect Cassandra and Tableau after configuring the 64 bit ODBC driver. However, actually querying the tables in the NoSQL database throws errors. For instance in the 'Data Source' view selecting 'Update Now' results in an error from a SQL statement that starts with SELECT 1... I do not think Cassandra can understand, process, SELECT 1 statements.
Errors are also thrown when trying to build graphs of the data as this also results in failed queries.
In the 'Advanced Options' for the ODBC driver I selected to use CQL as the 'Query Mode' and still there were problems with the queries Tableau was sending to Cassandra.
Does anyone know how to get these two technologies to work together? I found this tutorial but it was made almost a year ago, as of this writing, and does not work from my experience. Please see the link to what I am talking about here: http://www.datastax.com/dev/blog/datastax-odbc-cql-connector-apache-cassandra-datastax-enterprise In this article they say to download the driver from here: https://academy.datastax.com/downloads/download-drivers?dxt=DX I am wondering this specific version of the ODBC driver is the problem.
I also read a previous post on this and it was not helpful as it is also obsolete from my experience. The post I am referring to is at the following URL: Connecting cassandra to Tableau Software The first answer is probably the obsolete one but the second one recommends to use the Simba driver, which is some type of proprietary driver. My current hypotheses is maybe the Simba driver is needed to use Tableau and Cassandra together.
Thank-you for reading this.

DataStax licenses Simba Technologies ODBC driver, but the version on their website may be behind the latest version available from Simba. Please download a free evaluation version of the driver and see if you have the same issue: http://www.simba.com/drivers/cassandra-odbc-jdbc/
'SELECT 1' is not a valid CQL query (http://docs.datastax.com/en/cql/3.1/cql/cql_reference/select_r.html).

Related

Datastax Studio License

I was looking for the visualization client to be used in Mac for Cassandra DB when I found Datastax Studio which can be setup in local to see Cassandra db metadata and tables visually. I downloaded Datastax studio using below link:
https://academy.datastax.com/quick-downloads
I tried to look at the few links but couldn't understand about the licensing.
Can someone please confirm if it is free to use for non production environment within an organization?
Thanks!
The DataStax Studio License Terms can be found here https://www.datastax.com/terms/datastax-studio-license-terms.
I won't try to interpret or summarize, but I will highlight this portion:
...only in conjunction with the use of other DataStax commercial software on a trial basis, or for which you have an active, paid subscription.
DataStax Studio will only work with DataStax DSE clusters; connections to OSS Cassandra are not supported. For more details beyond that, I'd recommend asking through DataStax Support.

How to generate the query to clone an existing table

I use the community edition of Cassandra (not DSE). Earlier I used to use a tool called descanter. when I click "clone" in descanter on an existing table. it used to give me the CQL necessary to re-create the existing table.
Now on my new MacOS (High Sierra) the devcenter tool is broken. I searched on the forums and found that DataStax has no intention of maintaining the tool.
So I am now using the command line CQLSH. Now in CQLSH if I have an existing table. How to generate the query to clone an existing table?
I only need the table structure. I don't need any data.
I cannot physically login to the Cassandra server. So everything needs to be done by connecting to Cassandra remotely using CQLSH.
DESCRIBE command should help you

Possibilities of Hadoop with MSSQL Reporting

I have been evaluating Hadoop on azure HDInsight to find a big data solution for our reporting application. The key part of this technology evaluation is that the I need to integrate with MSSQL Reporting Services as that is what our application already uses. We are very short on developer resources so the more I can make this into an engineering exercise the better. What I have tried so far
Use an ODBC connection from MSSQL mapped to the Hive on HDInsight.
Use an ODBC connection from MSSQL using HBASE on HDInsight.
Use SPARKQL locally on the azure HDInsight Remote desktop
What I have found is that HBASE and Hive are far slower to use with our reports. For test data I used a table with 60k rows and found that the report on MSSQL ran in less than 10 seconds. I ran the query on the hive query console and on the ODBC connection and found that it took over a minute to execute. Spark was faster (30 seconds) but there is no way to connect to it externally since ports cannot be opened on the HDInsight cluster.
Big data and Hadoop are all new to me. My question is, am I looking for Hadoop to do something it is not designed to do and are there ways to make this faster?I have considered caching results and periodically refreshing them, but it sounds like a management nightmare. Kylin looks promising but we are pretty married to windows azure, so I am not sure that is a viable solution.
Look at this documentation on optimizing Hive queries: https://azure.microsoft.com/en-us/documentation/articles/hdinsight-hadoop-optimize-hive-query/
Specifically look at ORC and using Tez. I would create a cluster that has Tez on by default and then store your data in ORC format. Your queries should be much more performant then.
If going through Spark is fast enough, you should consider using the Microsoft Spark ODBC driver. I am using it and the performance is not comparable to what you'll get with MSSQL, other RDBMS or something like ElasticSearch but it does work pretty reliably.

Apache Cassandra vs Datastax Cassandra [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
Is Datastax Cassandra the only available Cassandra that can be used in a production environment? Is there any free alternatives available? What about the cassandra available on Apache site?
Datastax Community Edition is also free, it contains a basic version of OpsCenter -- http://planetcassandra.org/cassandra/
Here is the difference between the community edition and DSE
http://www.datastax.com/download/dse-vs-dsc
They can both be used in production. DataStax Enterprise comes with a bunch of extra features on top of Apache Cassandra, and also comes with support.
Datastax is a commercial company, who supports C*. The base source code of Cassandra is taken of the Apache Repositories, then some of their own code is merged. Besides this, as already mentioned by others, Datastax version comes with some additional tools for maintaining a Cassandra Cluster.
One of the benefits of Datastax Enterprise is their neatless SOLR Integration, another great Apache Foundation Project.
Cassandra comes with a Query Language called CQL (Cassandra Query Language) which is "similar" to SQL, you should however think of CQL like a cousin of SQL, not a brother.
One of the great features of the Enterprise edition is that you can query a SOLR index through their CQL integration, also a Cassandra Cluster shares it's resources with SOLR, so you don't need a second Cluster for SOLR.
You could... set up Apache or Datastax Cassandra, you would get almost the same thing, but if you need something similar to SQL Like Statement (natively not available in Cassandra), or you do have a very much denormalized database and you need search capabilities, then Datstax Enterprise (DSE) is your only viable choice.
As someone already has mentioned, DSE is free for startups until they reach an annual revenue of 3m USD, or are funded with 30m. This should give everybody the opportunity to leverage the power of NoSQL and use one of the most reliable databases for big data out there.
For the Cassandra product, you can use the Apache open source offering in production, if your organisation is comfortable with open source.
You can also use the Datastax Community version of Cassandra, which is also open source and free to deploy; that gives you a bit more assurance from DataStax who offer commercial support.
Then there is DataStax Enterprise, which is the version that you pay to use, with a support model included. This still uses open source Cassandra, with additional code from DataStax. They have also put this release through their internal test processes, so that they are happy to support it. That generally means the releases will lag that Apache and Community versions, if that matters to you.
The DataStax 'Dev Center' product is a GUI tool that allows you to enter CQL commands against a Cassandra installation - it is free to use against any release. You may find it useful, though the CQLSH command-line should offer much of what you may need (and Cassandra CLI).
The DataStax 'Ops Center' product is available in a free version, which can run against any Cassandra with the associated 'DataStax Agent' used to collect data from each node. The Enterprise version of Ops Center includes additional functionality; that is available if you purchase the fully support DSE (DataStax Enterprise) stack.
Hope that helps. Much more information available at Planet Cassandra and the DataStax web sites.
Besides Apache Cassandra, there's Scylla which is a drop in replacement for Cassandra written in C++. It claims to be 10 times faster than Apache Cassandra. However, Scylla is still in alpha version, and you should stay away from it in a production environment.
Scylla aims to support all cassandra features together with toolings. It also supports JMX monitoring.
Apache Cassandra also have all features as well as community edition of DataStax . So you can put Apache Cassandra on Production enivorment .
Another good feature of DSE is the ability to do backup and recovery of your Cassandra database which I would think is very important if you are planning to use this in a production setup.

Differences betweeen Hector Cassandra and JDBC

I'm currently starting a project that use Cassandra Apache. So I'm interesting in accessing to my database cassandra from Java. For that, I'm using Hector Cassandra. However, I've some doubts about what's the differences between the access via Hector or JDBC Cassandra (specifically this: https://code.google.com/a/apache-extras.org/p/cassandra-jdbc/).
I believe the following (although I not sure if I'm right):
one difference between both could be that are API of different level (I consider that Hector Cassandra is an API of higher-level than JDBC Cassandra)?
in JDBC Cassandra is used CQL for accessing/modifying the database, while Hector Cassandra don't use CQL (only use the methods provided for that).
I'll be thankful if someone can help me and tell me if I'm right/wrong in the previous lines and more differences between both (Hector and JDBC Cassandra).
Thank in advance!
Official Cassandra Java Driver (https://github.com/datastax/java-driver) is probably the best (IMHO, the only) choice for a new project for several reasons:
New features
All other Cassandra clients (Hector, Astyanax, etc) are based on legacy Thrift RPC protocol. RPC "One response per one request" model has severe limitations, for example it doesn't allow processing several requests at the same time in a single connection or streaming large ResultSets.
So, DataStax developed a new protocol that doesn't have RPC limitations. Thrift API won't be getting new features, it's only kept for backward-compatibility. In contrast, Java Driver is actively developed to incorporate the new features of Cassandra 2.0, like conditional updates, batching prepared statements, etc. The overview of new features is here: http://www.datastax.com/dev/blog/cql-in-cassandra-2-0
Convenience
In early Cassandra days (0.7) in our company we have used in-house low-level Thrift client. Later on we have used Hector, Pelops and Astyanax in various projects. I can say that the clients based on Java Driver look the most simple and clean to me.
Performance
We have made some performance testing of Cassandra Java Driver vs other clients. In most scenarios the performance is roughly the same. However, there are certain situations when Cassandra Java Driver significantly outperforms other clients due to its asynchronous nature.
Btw, there's a couple of related questions with excellent answers:
Advantages of using cql over thrift
Cassandra Client Java API's
EDIT: When I wrote this, I wasn't aware that Achilles (https://github.com/doanduyhai/Achilles) mentioned in another answer has CQL implementation that works via Java Driver. For the same of completeness I must say that Achilles' DAO on top of CQL might be (or might became one day) viable alternative to plain CQL via Java Driver.
#mol
Why do you restrict to Hector and cassandra-jdbc if you're starting a new project ?
There are many other interesting choices:
Astyanax as Martin mentioned (Thrift & CQL3)
FireBrand (Thrift via Hector)
Achilles I've just developed (CQL3 & Cassandra 2.0 via Java driver core)
Java Driver Core for plain CQL3
Hector is indeed a higher-level API. Internally it will use Cassandra's Thrift API to execute its functions. It will not convert them to equivalent CQL calls. But its API also provides access to CQL. In this case it will pass the CQL (via Thrift) to Cassandra's APIs for CQL.
CQL in Cassandra is a SQL-like language that works via the Cassandra APIs. So it does not provide any additional capability in the use of Cassandra than the APIs but does make it easier at times to use. If you are considering using Hector I would also look at Astyanax which is a newer take on a high-level Java API to Cassandra.
Since you are starting a new project, it is best to start with CQL as Java native driver:
http://www.datastax.com/documentation/developer/java-driver/1.0/webhelp/index.html#common/drivers/introduction/introArchOverview_c.html
Per DataStax, it is 10-15% faster than Thrift APIs, as it uses Binary Protocol.

Resources