Can't use Tableau on a EMR Spark cluster - apache-spark

I have a client that wants to use Tableau on their EMR Spark cluster.
The documentation seems straightforward but I'm getting errors when I try to connect.
Here is the setup:
EMR cluster's master doesn't have a public IP, but from the Tableau desktop EC2 instance I am able to ping and telnet into the port 10001 where thrift is running
I am able to test thrift with beeline and it connects fine
I am not using SSL or authentication given the limit access the cluster has
I have installed both data direct 8.0 and simbaodbc
I'm using emr-5.13.0, the Hadoop distribution is Amazon 2.8.3 and the Spark version is 2.3.0.
The error is
Unable to connect to the ODBC Data Source. Check that the necessary drivers are installed and that the connection properties are valid.
[Simba][ThriftExtension] (5) Error occurred while contacting server: No more data to read.. This could be because you are trying to establish a non-SSL connection to an SSL-enabled server.
Unable to connect to the server "IP". Check that the server is running and that you have access privileges to the requested database."
I simply followed the documentation provided by Tableau which says to install the driver only (not mess with ODBC), then us it in Tableau. I have verified that I have set no SSL and no authentication before trying to connect. I also verified by running Datagrip and doing a query from the Tableau EC2 instance, which works as expected.

resolved the issue by ignoring the documentation and just setting up the odbc driver, then choosing it instead of sparksql as a source.

Related

cqlsh "Unable to connect to any servers" on Windows installation

I installed Cassandra on Windows 10. When i trying to run cqlsh from /bin/,
I get the following error:
Connection error: ('Unable to connect to any servers', {'127.0.0.1': \
error(10061, "Tried connecting to [('127.0.0.1', 9042)].
Last error: No connection could be made because the target machine \
actively refused it")})
I installed Cassandra from apache.org official site . also I get reference from
https://phoenixnap.com/kb/install-cassandra-on-windows - Everything is looks good from the reference.
can anyone help me to solve this ? thanks in Advance.
The error states that cqlsh can't connect to the local Cassandra instance. The default configuration in conf/cassandra.yaml is for Cassandra to listen for CQL clients on localhost (127.0.0.1) and CQL port 9042:
native_transport_port: 9042
rpc_address: localhost
Since you're getting a "connection refused" error, the most likely issue is that Cassandra is not running on your Windows machine. Check the Cassandra logs (usually in logs/system.log) for errors which would provide clues as to why Cassandra couldn't start.
As a side note, there is very limited Windows support in Cassandra 3.x and there are several known issues that will not be fixed due to limitations in the operating system.
Furthermore, Windows support has been completely dropped in Cassandra 4.0 due to lack of maintainers and testing (CASSANDRA-16171).
As a workaround, we recommend the following:
Deploy Cassandra in Docker
Deploy Cassandra in a VM using software like VirtualBox
Deploy K8ssandra.io
If you just want to build apps with Cassandra as a backend, Astra DB has a free tier that lets you launch a Cassandra cluster in a few clicks with no credit card required. Cheers!
Do you keep this terminal open and Cassandra runs when you are trying to connect? Notice you have to launch cqlsh from a different terminal window.
Please check the steps again, mostly probably Cassandra simply doesn't run. Keep attention on the p.4 specially.

How to connect multiple cassandra intances using single odbc driver ( from SAS ETL)

We are facing challenges to connect multiple Cassandra instances using a single ODBC driver. We have a SAS ETL server using that we want to connect multiple Cassandra instances, but we are not able to figure out how to do this?
If you have the ODBC driver installed, you can connect to different Cassandra clusters as long as you configure the appropriate ODBC URL/DSN connection for each cluster.
If for example, you want to configure the driver to use multiple contact points, you can only do it if you are connecting to a DataStax Enterprise cluster since that is an enterprise-only feature in the Simba Spark ODBC driver which connects to the AlwaysOn SQL Service in DSE. Cheers!

Connect PowerBI Desktop with Apache Spark local machine installation

Can someone guide me how to connect PBI Desktop to APACHE SPARK installed on a local windows machine? What should be the server details I should pass?
I have read thrift connections are very slow so would want to avoid them unless they are the only choice.
Edit -
Based on the suggestion, I tried to set up thrift connection following the below link - medium.com/#waqasrafiq327/… . Mine is a windows installation. Given paths seems to be for linux? I cant see a hive-site.xml file under /spark/conf folder. I also dont see a /apachehive/conf folder in my spark installtion. My spark installation is the latest version of spark release available. Please guide.
You have to use the thrift server as it is required if you want to connect via ODBC or JDBC. This is the only way to connect from Power BI to Apache Spark.

Unable to Connect apache Cassandra on local machine

I am new to Apache Cassandra, i had just install a Apache Cassandra 3.9.0(64-bit) on my local windows 8.1 64- bit laptop along with No-SQL manager for Cassandra.
When i tried to register server an error message displayed as "None of the hosts tried for query are available".
Can anyone help me out.
Thanks.

Unable to connect Cassandra Test Cluster

I have created 1 node Cassandra cluster (datastax enterprise) on Ubuntu, to which I am able to connect via cqlsh but when i try to connect via OpsCenter or from VS2012 via c# driver I get error like "Unable to connect to cluster" or "All host tried for query are in error", can anyone help me with this error?
EDIT:
I installed the datastax Enterprise following the instructions as given at "datastax.com/documentation/datastax_enterprise/4.0/…; and have all keys with their default value in yaml file except the cluster name. When I run cqlsh it connect with this message:
Connected to fptestcluster at localhost:9160.
[cqlsh 4.1.1 | Cassandra 2.0.6.28 | CQL spec 3.1.1 | Thrift protocol 19.39.0]
But as specified in original question, with OpsCenter and C# driver I am not able to connect.
I found the fix for this, I had to change "listen_address" and "rpc_address" to the IP of machine rather than localhost and then it works, thanks everyone.

Resources