spark magic - enter sql context as string - python-3.x

Connecting to spark over livy works fine in Jupyter,
as does the following spark magic:
%%spark -c sql
select * from some_table
Now how can I use string variables to query tables?
The following does not work:
query = 'select * from some_table'
Next cell:
%%spark -c sql
query
Nor does the following work:
%%spark -c sql
'select * from some_table'
Any ideas? Is it possible to "echo" the content of a string variable into a cell?

Seems like I found a solution.
There is a function that turns strings into cell magic commands:
%%local
from IPython import get_ipython
ipython = get_ipython()
line = '-c sql -o df'
query = 'select * from some_table'
ipython.run_cell_magic(magic_name='spark', line=line, cell=query)
After this, the query is in the pandas dataframe df.

Related

Dask read_sql_query did not execute sql that I put in

Hi all I'm new to Dask.
I faced an error when I tried using read_sql_query to get data from Oracle database.
Here is my python script:
con_str = "oracle+cx_oracle://{UserID}:{Password}#{Domain}/?service_name={Servicename}"
sql= "
column_a, column_b
from
database.tablename
where
mydatetime >= to_date('1997-01-01 00:00:00','YYYY-MM-DD HH24:MI:SS')
"
from sqlalchemy.sql import select, text
from dask.dataframe import read_sql_query
sa_query= select(text(sql))
ddf = read_sql_query(sql=sa_query, con=con, index_col="index", head_rows=5)
I refered this post: Reading an SQL query into a Dask DataFrame
Remove "select" string from my query.
And I got an cx_Oracle.DatabaseError with missing expression [SQL: SELECT FROM DUAL WHERE ROWNUM <= 5]
But I don't get it where the query came from.
Seem like it didn't execute the sql code I provided.
I'm not sure which part I did not config right.
*Note: using pandas.read_sql is ok , only fail when using dask.dataframe.read_sql_query

psycopg2 SELECT query with inbuilt functions

I have the following SQL statement where i am reading the database to get the records for 1 day. Here is what i tried in pgAdmin console -
SELECT * FROM public.orders WHERE createdat >= now()::date AND type='t_order'
I want to convert this to the syntax of psycopg2but somehow it throws me errors -
Database connection failed due to invalid input syntax for type timestamp: "now()::date"
Here is what i am doing -
query = f"SELECT * FROM {table} WHERE (createdat>=%s AND type=%s)"
cur.execute(query, ("now()::date", "t_order"))
records = cur.fetchall()
Any help is deeply appreciated.
DO NOT use f strings. Use proper Parameter Passing
now()::date is better expressed as current_date. See Current Date/Time.
You want:
query = "SELECT * FROM public.orders WHERE (createdat>=current_date AND type=%s)"
cur.execute(query, ["t_order"])
If you want dynamic identifiers, table/column names then:
from psycopg2 import sql
query = sql.SQL("SELECT * FROM {} WHERE (createdat>=current_date AND type=%s)").format(sql.Identifier(table))
cur.execute(query, ["t_order"])
For more information see sql.

Spark SQL passing variables - Synapse (Spark pool)

I have following SparkSQL (Spark pool - Spark 3.0) code and I want to pass a variable to it. How can I do that? I tried the following:
#cel 1 (Toggle parameter cell):
%%pyspark
stat = 'A'
#cel2:
select * from silver.employee_dim where Status= '$stat'
When you're running your cell as PySpark, you can pass a variable to your query like this:
#cel 1 (Toggle parameter cell):
%%pyspark
stat = 'A' #define variable
#cel2:
%%pyspark
query = "select * from silver.employee_dim where Status='" + stat + "'"
spark.sql(query) #execute SQL
Since you're executing a SELECT statement, I assume you might want to load the result to a DataFrame:
sqlDf = spark.sql(query)
sqlDf.head(5) #select first 5 rows

Store result of a multi-line Athena query in a variable

I am using https://github.com/finklabs/jupyter-athena-sql to query Athena from Jupyter Lab. I need to store the result of a multi-line query in a variable. I can do this for a single-line query as follows:
pd = %athena select 1
pd
However, I can't seem to figure out how to access the result of a multi-line query, such as this:
%%athena
select col1, count(*)
from my_table
group by col1
In the implementation of the Athena extension I can see that it's returning a dataframe and I wonder if there is a standard variable in Jupyter Lab that it gets bound to?
Thanks!
I ended up looking up and calling the magic function that's registered to execute Athena queries in the jupyter-athena-sql extension as follows:
athena = get_ipython().find_cell_magic('athena')
df = athena("""select col1, count(*)
from my_table
group by col1""")

Python - Sqlite3 - Select query with database double dot operator

I'm trying to pull a SELECT query from within the python 3.6.5 code using sqlite3 in PyCharm.
The query has the double dot operator. Here is a sample code:
#!/usr/bin/python
import sqlite3
connection = sqlite3.connect('ddbserver')
c = connection.cursor()
sql_query = """ select * from database..tablename """
res = c.execute(sql_query)
When I run the script, it throws the following error:
sqlite3.OperationalError: near ".": syntax error
The same query runs fine on SQL Server Management Studio, as well as in an xml file.
I've tried using the full path to the table as well i.e.
select * from database.schemaname.tablename
But it throws the same error.
Any sort of help would be appreciated. Thank you!

Resources