Connect PowerBI Desktop with Apache Spark local machine installation - apache-spark

Can someone guide me how to connect PBI Desktop to APACHE SPARK installed on a local windows machine? What should be the server details I should pass?
I have read thrift connections are very slow so would want to avoid them unless they are the only choice.
Edit -
Based on the suggestion, I tried to set up thrift connection following the below link - medium.com/#waqasrafiq327/… . Mine is a windows installation. Given paths seems to be for linux? I cant see a hive-site.xml file under /spark/conf folder. I also dont see a /apachehive/conf folder in my spark installtion. My spark installation is the latest version of spark release available. Please guide.

You have to use the thrift server as it is required if you want to connect via ODBC or JDBC. This is the only way to connect from Power BI to Apache Spark.

Related

How do I configure Talend Open Studio to connect to a Cassandra cluster?

I referred this Documentation
https://www.javatpoint.com/talend-jdbc-connection
For how to config DB connection on Talend. In the documentation mentioned, MySQL JDBC Connector is used to connect the MySQL DB to Talend. In my case, I need to connect with Cassandra JDBC Connector is used to connect the Cassandra DB to Talend, and the connection is also established successfully.
The documentation mentioned when we right-click on the database connection it will show the popup menu. The pop-menu shows the retrieve schema option. This option is used to show the table. But when I right-click on the DB connection it's not showing a pop-menu on the Talend Open Studio. How to fix this issue.
I suspect the problem is that you're using the wrong JDBC driver although I'm unable to confirm that since you didn't actually say which one you're using.
You will need to download the Simba JDBC Driver for Apache Cassandra from DataStax Downloads in order to connect to a Cassandra cluster. Then you'll need to install the driver on to your Talend.
I don't have the detailed steps for doing that but I've previously written instructions for connecting clients like Pentaho Data Integration and DBeaver to Astra DB which is a Cassandra-as-a-service. The instructions for those should give you an idea of how to configure Talend. Cheers!
I encountered the same problem, you're supposed to make the connection under the 'NoSQL Connections' Tab since Cassandra is a NoSQL database.
I followed the instructions here

Can't use Tableau on a EMR Spark cluster

I have a client that wants to use Tableau on their EMR Spark cluster.
The documentation seems straightforward but I'm getting errors when I try to connect.
Here is the setup:
EMR cluster's master doesn't have a public IP, but from the Tableau desktop EC2 instance I am able to ping and telnet into the port 10001 where thrift is running
I am able to test thrift with beeline and it connects fine
I am not using SSL or authentication given the limit access the cluster has
I have installed both data direct 8.0 and simbaodbc
I'm using emr-5.13.0, the Hadoop distribution is Amazon 2.8.3 and the Spark version is 2.3.0.
The error is
Unable to connect to the ODBC Data Source. Check that the necessary drivers are installed and that the connection properties are valid.
[Simba][ThriftExtension] (5) Error occurred while contacting server: No more data to read.. This could be because you are trying to establish a non-SSL connection to an SSL-enabled server.
Unable to connect to the server "IP". Check that the server is running and that you have access privileges to the requested database."
I simply followed the documentation provided by Tableau which says to install the driver only (not mess with ODBC), then us it in Tableau. I have verified that I have set no SSL and no authentication before trying to connect. I also verified by running Datagrip and doing a query from the Tableau EC2 instance, which works as expected.
resolved the issue by ignoring the documentation and just setting up the odbc driver, then choosing it instead of sparksql as a source.

Connecting Spark from my local machine to a remote HiveServer

How can I connect Spark from my local machine in Eclipse to a remote HiveServer?
Get a copy of the hive-site.xml from the remote server, and add it to $SPARK_HOME/conf
Then, assuming Spark2, you need to use SparkSession.enableHiveSupport() method, and any spark.sql() queries should be able to communicate with Hive.
Also see my answer here

Unable to Connect apache Cassandra on local machine

I am new to Apache Cassandra, i had just install a Apache Cassandra 3.9.0(64-bit) on my local windows 8.1 64- bit laptop along with No-SQL manager for Cassandra.
When i tried to register server an error message displayed as "None of the hosts tried for query are available".
Can anyone help me out.
Thanks.

Microsoft PowerBI with Hortonworks Hive/HBase/Spark Integration

I'm thrilled with Microsoft's offering with PowerBI but still not able to find any possible direct way to integrate with my Hortonworks Hadoop cluster.
I went through the tutorials and found two things:
PowerBI can fetch data from HDInsights Azure cluster using thrift, if that's possible then is it possible to connect with any other Hadoop distro to connect to it as well?
We can connect using ODBC driver which is offered by Simba Technologies but I was wondering if it's possible to connect using Apache Phoenix drivers which offer JDBC drivers for HBase?
Appreciate your thoughts/suggestions/help!
Splice Machine has an ODBC driver for retrieving data that is stored ultimately in HBase.
Checkout the ODBC driver on this page..
http://community.splicemachine.com/
Yes, it is possible to connect Power BI to other Hadoop distros via ODBC driver e.g. http://www.simba.com/webinar/powerbi-demo/
Power BI doesn't support JDBC drivers, but if you are interested in testing ODBC driver for Phoenix please contact Simba Technologies.

Resources