I am using Spark(1.5.0) for utilizing Spark-SQL feature using Spark ThriftServer application and also using Simba Spark ODBC Driver for getting connection.
Using Tableau, I am able to connect and able to do Spark-SQL operations.
But when, I tried to connect Spark-SQL to MS-Excel, It goes connected but not listing database and table names. And I also tried Microsoft Query option of MS-Excel according to Doc to execute custom SQL queries (select * default.airline), but it's throwing error with query (select * from SPARK.default.airline) with catalog name SPARK.
Problem is that how to remove that catalog name from the query, I tried with all the available options.
I work as a Sales Engineer with Simba. The Simba Spark driver should work in Excel with both MS Query and through the Connection Wizard.
Can you please provide more information on this problem? You can enable driver logging through the configuration options in ODBC Administrator. Choose your DSN, go to logging options, and set it to TRACE.
Then restart Excel and try the query again.
Send the logs and a screenshot of your DSN to sales#simba.com
Thanks,
Jeff
Related
I referred this Documentation
https://www.javatpoint.com/talend-jdbc-connection
For how to config DB connection on Talend. In the documentation mentioned, MySQL JDBC Connector is used to connect the MySQL DB to Talend. In my case, I need to connect with Cassandra JDBC Connector is used to connect the Cassandra DB to Talend, and the connection is also established successfully.
The documentation mentioned when we right-click on the database connection it will show the popup menu. The pop-menu shows the retrieve schema option. This option is used to show the table. But when I right-click on the DB connection it's not showing a pop-menu on the Talend Open Studio. How to fix this issue.
I suspect the problem is that you're using the wrong JDBC driver although I'm unable to confirm that since you didn't actually say which one you're using.
You will need to download the Simba JDBC Driver for Apache Cassandra from DataStax Downloads in order to connect to a Cassandra cluster. Then you'll need to install the driver on to your Talend.
I don't have the detailed steps for doing that but I've previously written instructions for connecting clients like Pentaho Data Integration and DBeaver to Astra DB which is a Cassandra-as-a-service. The instructions for those should give you an idea of how to configure Talend. Cheers!
I encountered the same problem, you're supposed to make the connection under the 'NoSQL Connections' Tab since Cassandra is a NoSQL database.
I followed the instructions here
I have 2 questions w.r.t spark and Snowflake datawarehouse.
1) Is there any way to query/create snowflake tables like hive/spark(either new or old versions of spark)
val hive_tables=hiveContext.sql("show tables").foreach(println)
2) hiveContext.sql("create table....")
first question is about knowing what tables are present for that particular user for the particular role. The reason why I am asking question is via web ui of snowflake I am able to query the table but through spark I am not able to query
Exception in thread "main" net.snowflake.client.jdbc.SnowflakeSQLException: SQL compilation error:
Object 'mytable' does not exist.
You should double check things like database/schema/role in your JDBC connection settings. If you don't see a table via JDBC, one of these might be the culprit.
You can validate the current settings by running e.g. show roles, show schemas and show databases on the established JDBC connection.
In general, I highly recommend using Spark-Snowflake connector for communicating with Snowflake from Spark. It also provides Utils.runQuery() for running simple queries like DDL.
I have created a table, ztest7 in the default database in my hive. I am able to query it using beeline. In tableau, I can query it using a custom sql.
However the table does NOT show when I search for it.
Am I missing something here?
Tableau Desktop Version = v10.1.1
Hive = v2.0.1
Spark = v2.1.0
Best Regards
I have the same issue with Tableau Desktop 10 (mac) to Hive (2.1.1) via Spark SQL 2.1 (on centos 7 server)
This is what I got from Tableau Support:
In Tableau Desktop, the ability to connect to Spark SQL without a
defining a default schema is not currently built into the product.
As a preliminary step, to define a default schema, configure the Spark
SQL hivemetastore to utilize a SchemaRDD or DataFrame. This must be
defined in the Hive Metastore for Tableau Desktop to be able to access
it. Pure schema-less Spark RDD's can not be queried by Spark SQL
because of the lack of a schema. RDDs can be converted into
SchemaRDDs, which have additional schema metadata as Spark SQL
provides access to SchemaRDDs. When a SchemaRDD is created, it is only
available in the local namespace or context, and is unavailable to
external services accessing Spark through ODBC and the Spark Thrift
Server. For Tableau to have access, the SchemaRDD needs to be
registered in a catalog that is available outside of just the local
context; the Hive Metastore is currently the only supported service.
I don't know how to check/implement this.
PS: I'd have posted this as a comment because I am not allowed to as I am new to Stack Overflow.
In the file labeled Table on the left side of the screen, Try selecting contains, entering part of your table name and hitting enter
I ran into similar issue. In my case, I had loaded tables using HIVE but the tableau connection to the data source was made using Impala as shown in the image below.
To fix the issue of not seeing the tables in tableau dropdown, try running INVALIDATE METADATA database.table_name in the impala interface. This fixed the problem for me.
To know why this fixes the issue, refer this link.
I am trying to connect Tableau Desktop 10 (mac) to Spark SQL 2.1 (on centos 7 server). I am connecting via Simba ODBC driver with Authentication = Username and Username = . It doesn't give any error but I don't see the tables which are available in Hive. After searching and choosing 'default' schema, and searching for tables, I only see default (default.default) table. However, when I use beeline on the server to connect to Spark SQL, the hive tables are visible.
If I use the custom SQL feature I can query the tables and use the data, but I still have no way to list the tables in Tableau.
I am not sure if the issue is on Tableau side or Spark side. I'd greatly appreciate any help with troubleshooting this issue.
The reason for this behaviour is following:
In spark 2.0, show tables output format is : 'tableName', 'isTemporary'
and
In Spark 2.1 show tables output format is 'database', 'tablename', 'isTemporary'
Now Tableau 10.2.3 or greater are able to parse the output from spark2.1, but 10.2.1 and less are unable to parse this new output format.
Currently driving an RnD project testing hard against Azure's HDInsight Hadoop service. We use SQL Server Integration Services to manage ETL workflows, and so making HDInsight work with SSIS is a must.
I've had good success with a few of the Azure Feature Pack tasks. But there is no native HDInsight/Hadoop Destination task for use with DFTs.
Problem With Microsoft's Hive ODBC Driver Within An SSIS DFT
I create a DFT with a simple SQL Server "OLE DB Source" pointing to the cluster with a "ODBC Destination" using Microsoft HIVE ODBC Driver. (Ignore red error. It has detected the cluster is destroyed).
I've tested the cluster ODBC connection after entering all parameters, and it tests "OK". It is able to read the HIVE table even and map all columns to. The problem arrives at run time. It generally just locks up, with no rows in counter, or it will get to a handful of rows in the buffer and freeze.
I've troubleshooted with:
Verified connection string and Hadoop cluster username/password.
Recreated cluster and task several times.
Source is SQL Server, and runs fine if i point it to only a file destination or recordset destination.
Tested a smaller number off rows to see if it is a simple performance issue (SELECT TOP 100 FROM stupidTable). Also tested with only 4 columns.
Tested on a separate workstation to make sure it wasn't related to the machine.
All that said, and I can't figure out what else to try. I'm not doing much different than examples on the web like this one, except that I'm using the ODBC as a Destination and not a Source.
Has anyone had success with using the HIVE driver or another one within an SSIS Destination task? Thanks in advanced.