How to send multiple queries in single execute statement for example simplified version of my query is (which i am trying to execute using jaydebeapi )
Create temp table tempTable as
select * from table1 where x=y;
select * from tempTable ;
UPDATE : I am using Netezza JDBC
Related
I have create a tempview as such as:
df.createOrReplaceTempView("table_test")
However, when I run the following command it doesn't work
%sql select top 10 * from table_test;
Try the following
%sql select * from table_test limit 10;
top 10 is more specific to sql server and not the sql engine being used by your notebooks.
I have an hql file. I want to run it using pyspark with Hive warehouse connector. There is an executeQuery method to run queries. I want to know whether hql files can be run like that. Can we run complex queries like that.
Please suggest.
Thanks
I have following solution where i have assumed that there will be multiple queries in hql file.
HQL File : sample_query.hql
select * from schema.table;
select * from schema.table2;
Code : Iterate over each query. You can do as you wish(in terms of HWC operation) in each iteration.
with open('sample_query.hql', 'r') as file:
hql_file = file.read().rstrip()
for query in [x.lstrip().rstrip() for x in hql_file.split(";") if len(x) != 0] :
hive.executeQuery("{0}".format(query))
I have a small log dataframe which has metadata regarding the ETL performed within a given notebook, the notebook is part of a bigger ETL pipeline managed in Azure DataFactory.
Unfortunately, it seems that Databricks cannot invoke stored procedures so I'm manually appending a row with the correct data to my log table.
however, I cannot figure out the correct sytnax to update a table given a set of conditions :
the statement I use to append a single row is as follows :
spark_log.write.jdbc(sql_url, 'internal.Job',mode='append')
this works swimmingly however, as my Data Factory is invoking a stored procedure,
I need to work in a query like
query = f"""
UPDATE [internal].[Job] SET
[MaxIngestionDate] date {date}
, [DataLakeMetadataRaw] varchar(MAX) NULL
, [DataLakeMetadataCurated] varchar(MAX) NULL
WHERE [IsRunning] = 1
AND [FinishDateTime] IS NULL"""
Is this possible ? if so can someone show me how?
Looking at the documentation this only seems to mention using select statements with the query parameter :
Target Database is an Azure SQL Database.
https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html
just to add this is a tiny operation, so performance is a non-issue.
You can't do single record updates using jdbc in Spark with dataframes. You can only append or replace the entire table.
You can do updates using pyodbc- requires installing the MSSQL ODBC driver (How to install PYODBC in Databricks) or you can use jdbc via JayDeBeApi (https://pypi.org/project/JayDeBeApi/)
I am newbie to Hadoop and Hive. My current requirement is to collect the stats of number of records loaded in 15 tables on each run day. Instead of executing each select Count(*) query and copy output manually to XL. Could anyone suggest what is the best method to automate this task please?
Note: we are not having any GUI to run Hive Queries, submitting Hive queries in normal Unix terminal.
Export to the CSV or TSV file, then open file in Excel. Normally it generates TSV file (tab-separated). This is how to transform it to comma-separated if you prefer CSV;
hive -e "SELECT 'table1' as source, count(*) cnt FROM db.table1
UNION ALL
SELECT 'table2' as source, count(*) cnt FROM db.table2" | tr "\t" "," > mydata.csv
Add more tables to the query.
You can mount directory in which you are writing output file in Windows using SAMBA/NFS. Schedule the command using crontab and voila, every day you have updated file.
Also you can connect directly using ODBC drivers:
https://mapr.com/blog/connecting-apache-hive-to-odbc/
https://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-connect-excel-hive-odbc-driver
Error connecting Hortonworks Hive ODBC in Excel 2013
I just started to automate our test process. I made the following case:
Example:
* Settings *
Library Selenium2Library
Library Remote http://localhost:0000/ WITH NAME JDBCDB2 #DB2
Library ExcelLibrary
* Test Cases *
Connect to database connection parm login password
Store Query Result To File select * from table where x=y :\Testresults.txt
what keyword to use (instead of Store query results to file) so i can read the query from a file, instead of writing the full select statement.
How to write the sql results to an excel. Results seperated in columns/records