When run sql select * from hive.information_schema.columns; in presto client. I get the error infomation:
Query 20170208_085534_00061_ny9tu failed: outputFormat should not be accessed from a null StorageFormat
However, It succeed when select from other table in information_schema like select * from hive.information_schema.tables;
Can anybody help?
Thanks.
This is a bug caused by a table that has metadata we aren't expecting.
I scan all tables in hive and find some tables' InputFormat/OutputFormat is null. It will get the same error information if I DESCRIBE TABLENAME for those table with null InputFormat/OutputFormat in presto.
reference
Related
I'm trying to fetch the data from db2 using
df= spark.read.format(“jdbc”).option(“user”,”user”).option(“password”,”password”)\
.option(“driver”, “com.ibm.db2.jcc.DB2Driver”)\
.option(“url”,”jdbc:db2://url:<port>/<DB>”)\
.option(“query”, query)\
.load()
In my local in options query function is working but in server it is asking me to use dbtable
when i use dbtable i'm getting sqlsyntax error: sql code =-104 sqlstate =42601 and taking wrong columns
can some one help me with this
You can use the AS400 driver to fetch DB2 data using Spark.
Your DB2 URL will look something like this: jdbc:as400://<DBIPAddress>
val query = "(select * from db.temptable) temp"
val df = spark.read.format("jdbc").option("url", <YourURL>).option("driver", "com.ibm.as400.access.AS400JDBCDriver").option("dbtable", query).option("user", <Username>).option("password", <Password>).load()
Please note that you will need to keep the query format as shown above (i.e. give an alias to the query). Hope this resolves your issue.
I am trying to write the data to IBM DB2 (10.5 fix pack 11) using Pyspark (2.4).
When I try to execute below piece of code
df.write.format("jdbc")
.mode('overwrite').option("url",'jdbc:db2://<host>:<port>/<DB>').
option("driver", 'com.ibm.db2.jcc.DB2Driver').
option('sslConnection', 'true')
.option('sslCertLocation','</location/***_ssl.crt?').
option("numPartitions", 1).
option("batchsize", 1000)
.option('truncate','true').
option("dbtable", '<TABLE>').
option("user",'<user>').
option("password", '<PW>')
.save()
job is throwing the following exception:
File
"/usr/local/Cellar/apache-spark/3.0.1/libexec/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 326, in get_return_value py4j.protocol.Py4JJavaError: An error
occurred while calling o97.save. :
com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error:
SQLCODE=-104, SQLSTATE=42601,
SQLERRMC=END-OF-STATEMENT;ABLE<SEHEMA.TABLE>;IMMEDIATE, DRIVER=4.19.80
at com.ibm.db2.jcc.am.b5.a(b5.java:747)
Job is trying to perform truncate but seems like DB2 is expecting ** IMMEDIATE** keyword
In my above code all I am passing is only name of the dbtable, is there a way to pass
IMMEDIATE keyword?
And also from DB2 side, is there a way to set this while opening the session?
Just FYI, my code with out truncate works, but that delete the table and recreates and loads, I don't want to do that on prod environment.
Any thoughts on how to solve this issue are highly appreciated.
DB2Dialect in Spark 2.4 doesn't override the default JDBCDialect's implementation of a TRUNCATE TABLE. Comments in the code suggest to override this method to return a statement that suits your database engine.
/**
* The SQL query that should be used to truncate a table. Dialects can override this method to
* return a query that is suitable for a particular database. For PostgreSQL, for instance,
* a different query is used to prevent "TRUNCATE" affecting other tables.
* #param table The table to truncate
* #param cascade Whether or not to cascade the truncation
* #return The SQL query to use for truncating a table
*/
#Since("2.4.0")
def getTruncateQuery(
table: String,
cascade: Option[Boolean] = isCascadingTruncateTable): String = {
s"TRUNCATE TABLE $table"
}
Perhaps in DB2 case you can actually extend DB2Dialect itself, add your getTruncateQuery() implementation and define your "custom" JDBC protocol, "jdbc:mydb2" for example. You can then use this protocol in JDBC connection URL, .option("url",'jdbc:mydb2://<host>:<port>/<DB>').
I am getting error while inserting data into hive table but data is getting inserted successfully in table.
act = sqlContext.createDataFrame(df,schema)
act.createOrReplaceTempView("act_view")
sqlContext.sql("insert into table project_defect.biweb_t_activity select * from act_view")
Give me this following error:
KeyProviderCache: Could not find uri with key [dfs.encryption.key.provider.uri] to create a keyProvider
I am using Hortonworks Platform, if any one has faced this issues please suggest.
I created a new table in the Bluemix SQL Database service by uploading a csv (baseball.csv) and took the default table name of "baseball".
I created a simple app in Node.js which is just trying to select data from the table with select * from baseball, but I keep getting the following error:
[IBM][CLI Driver][DB2/NT] SQL0204N "USERxxxx.BASEBALL" in an undefined name
Why can't it find my database table?
This issue seems independent of bluemix, rather it is usage error.
This error is possibly caused by following:
The object identified by name is not defined in the database.
User response
Ensure that the object name (including any required qualifiers) is correctly specified in the SQL statement and it exists.
try running "list tables" from command prompt to check if your table spelling is correct or not.
http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.messages.sql.doc/doc/msql00204n.html?cp=SSEPGG_9.7.0%2F2-6-27-0-130
I created the table from SQL Database web UI in bluemix and took the default name of baseball. It looks like this creates a case-sensitive table name.
Unfortunately for me, the sql_db libary (and all db2 clients I believe) auto-capitalizes the SQL query into "SELECT * FROM BASEBALL"
The solution was to either
A. Explicitly name my table BASEBALL in the web UI; or
B. Modify my sql query by quoting the table name:
select * from "baseball"
More info at http://www.ibm.com/developerworks/data/library/techarticle/0203adamache/0203adamache.html#N10121
USE users_tracking;
SELECT user_name FROM visits
where port_name IN
(SELECT port_name FROM ports where location = 'NY' )//as temp;
It gives an error
mismatched input 'SELECT' expecting RULE_T_R_PAREN
Is there any way I can store the inner query in a variable and then use that?
I tried using set#varname := query but it does not recognize the set command.
Nested queries are not allowed in Cassandra CQL. For this kind of complex querying feature you'll need to use Hive or SparkSQL.
Here is a full CQL reference,
http://cassandra.apache.org/doc/cql3/CQL.html