Hi all I'm new to Dask.
I faced an error when I tried using read_sql_query to get data from Oracle database.
Here is my python script:
con_str = "oracle+cx_oracle://{UserID}:{Password}#{Domain}/?service_name={Servicename}"
sql= "
column_a, column_b
from
database.tablename
where
mydatetime >= to_date('1997-01-01 00:00:00','YYYY-MM-DD HH24:MI:SS')
"
from sqlalchemy.sql import select, text
from dask.dataframe import read_sql_query
sa_query= select(text(sql))
ddf = read_sql_query(sql=sa_query, con=con, index_col="index", head_rows=5)
I refered this post: Reading an SQL query into a Dask DataFrame
Remove "select" string from my query.
And I got an cx_Oracle.DatabaseError with missing expression [SQL: SELECT FROM DUAL WHERE ROWNUM <= 5]
But I don't get it where the query came from.
Seem like it didn't execute the sql code I provided.
I'm not sure which part I did not config right.
*Note: using pandas.read_sql is ok , only fail when using dask.dataframe.read_sql_query
Related
I have the following SQL statement where i am reading the database to get the records for 1 day. Here is what i tried in pgAdmin console -
SELECT * FROM public.orders WHERE createdat >= now()::date AND type='t_order'
I want to convert this to the syntax of psycopg2but somehow it throws me errors -
Database connection failed due to invalid input syntax for type timestamp: "now()::date"
Here is what i am doing -
query = f"SELECT * FROM {table} WHERE (createdat>=%s AND type=%s)"
cur.execute(query, ("now()::date", "t_order"))
records = cur.fetchall()
Any help is deeply appreciated.
DO NOT use f strings. Use proper Parameter Passing
now()::date is better expressed as current_date. See Current Date/Time.
You want:
query = "SELECT * FROM public.orders WHERE (createdat>=current_date AND type=%s)"
cur.execute(query, ["t_order"])
If you want dynamic identifiers, table/column names then:
from psycopg2 import sql
query = sql.SQL("SELECT * FROM {} WHERE (createdat>=current_date AND type=%s)").format(sql.Identifier(table))
cur.execute(query, ["t_order"])
For more information see sql.
I wish to execute the statement to delete records from a postgresql table older than 45 days in my python script as below:
Consider only the code below:
import psycopg2
from datetime import datetime
cur = conn.cursor()
mpath = None
sql1 = cur.execute(
"Delete from table1 where mdatetime < datetime.today() - interval '45 days'")
This causes the following error:
psycopg2.errors.InvalidSchemaName: schema "datetime" does not exist
LINE 1: Delete from logsearch_maillogs2 where mdatetime <
datetime.t...
How do I exactly change the format or resolve this. Do I need to convert. Saw a few posts which say that postgresql DateTime doesn't exist in PostgreSQL etc, but didn't find exact code to resolve this issue. Please guide.
The query is running in Postgres not Python you need to use SQL timestamp function not Python ones if you are writing a hard coded string. So datetime.today() --> now() per Current Date/time.
sql1 = cur.execute(
"Delete from table1 where mdatetime < now() - interval '45 days'")
Or you need to use parameters per here Parameters to pass in a Python datetime if you want a dynamic query.
sql1 = cur.execute(
"Delete from table1 where mdatetime < %s - interval '45 days'", [datetime.today()])
I am using pyodbc to establish connection with Azure Synapse SQL DW. The connection is successfully established. However when it comes to inserting a pandas dataframe into the database, I am getting an error when I try inserting multiple rows as values. However, it works if I insert rows one by one. Inserting multiple rows together as values used to work fine with AWS Redshift and MS SQL, but fails with Azure Synapse SQL DW. I think the Azure Synapse SQL is T-SQL and not MS-SQL. Nonetheless, I am unable to find any relevant documentation as well.
I have a pandas df named 'df' that looks like this:
student_id admission_date
1 2019-12-12
2 2018-12-08
3 2018-06-30
4 2017-05-30
5 2020-03-11
This code below works fine
import pandas as pd
import pyodbc
#conn object below is the pyodbc 'connect' object
batch_size = 1
i = 0
chunk = df[i:i+batch_size]
conn.autocommit = True
sql = 'insert INTO {} values {}'.format('myTable', ','.join(
str(e) for e in zip(chunk.student_id.values, chunk.admission_date.values.astype(str))))
print(sql)
cursor = conn.cursor()
cursor.execute(sql)
As you can see, it's inserting just 1 row of the 'df'. So, yes, I can loop through and insert one by one but it takes hell lot of time when it comes dataframes of larger sizes
This code below doesn't work when I try to insert all rows together
import pandas as pd
import pyodbc
batch_size = 5
i = 0
chunk = df[i:i+batch_size]
conn.autocommit = True
sql = 'insert INTO {} values {}'.format('myTable', ','.join(
str(e) for e in zip(chunk.student_id.values, chunk.admission_date.values.astype(str))))
print(sql)
cursor = conn.cursor()
cursor.execute(sql)
The error I get this one below:
ProgrammingError: ('42000', "[42000]
[Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Parse error at
line: 1, column: 74: Incorrect syntax near ','. (103010)
(SQLExecDirectW)")
This is the sample SQL query for 2 rows which fails:
insert INTO myTable values (1, '2009-12-12'),(2, '2018-12-12')
That's because Azure Synapse SQL does not support multi-row insert via the values constructor.
One work around is to chain "select (value list) union all". Your pseudo SQL should look like so:
insert INTO {table}
select {chunk.student_id.values}, {chunk.admission_date.values.astype(str)} union all
...
select {chunk.student_id.values}, {chunk.admission_date.values.astype(str)}
COPY statement in Azure Synapse Analytics is a better way for loading your data in Synapse SQL Pool.
COPY INTO test_parquet
FROM 'https://myaccount.blob.core.windows.net/myblobcontainer/folder1/*.parquet'
WITH (
FILE_FORMAT = myFileFormat,
CREDENTIAL=(IDENTITY= 'Shared Access Signature', SECRET='<Your_SAS_Token>')
)
You can save your pandas dataframe into blob storage, and then trigger the copy command using execute method.
Assuming the data.xlsx looks like this:
Column_Name | Table_Name
CUST_ID_1 | Table_1
CUST_ID_2 | Table_2
Here are the SQLs that I'm trying to generate by using the bind_param for db2 in Python:
SELECT CUST_ID_1 FROM TABLE_1 WHERE CUST_ID_1 = 12345
SELECT CUST_ID_2 FROM TABLE_2 WHERE CUST_ID_2 = 12345
And this is how Im trying to generate this query:
import ibm_db
import pandas as pd
from pandas import ExcelWriter
from pandas import ExcelFile
validate_sql = "SELECT ? FROM ? WHERE ?=12345"
validate_stmt = ibm_db.prepare(conn, validate_sql)
df = pd.read_excel("data.xlsx", sheet_name='Sheet1')
for i in df.index:
ibm_db.bind_param(validate_stmt, 1, df['Column_Name'][i])
ibm_db.bind_param(validate_stmt, 2, df['Table_Name'][i])
ibm_db.bind_param(validate_stmt, 3, df['Column_Name'][i])
ibm_db.execute(validate_stmt)
validation_result = ibm_db.fetch_both(validate_stmt)
while validation_result != False:
print(validation_result[0])
validation_result = ibm_db.fetch_both(validate_stmt)
When I try to execute this code, Im hitting a SQLCODE=-104 error.
Any idea how the syntax should be for parameter binding?
Thanks,
Ganesh
2 major errors.
1. You can’t use a parameter marker for a table or column name (2-nd & 3-rd parameters).
2. You must specify the data type of the parameter marker, if it’s not possible to understand it from the query (1-st parameter). You must use something like «cast(? as data-type-desired)». But it’s just for you info, since you try to use it here as a column name, which is not possible as described in 1).
I'm trying to pull a SELECT query from within the python 3.6.5 code using sqlite3 in PyCharm.
The query has the double dot operator. Here is a sample code:
#!/usr/bin/python
import sqlite3
connection = sqlite3.connect('ddbserver')
c = connection.cursor()
sql_query = """ select * from database..tablename """
res = c.execute(sql_query)
When I run the script, it throws the following error:
sqlite3.OperationalError: near ".": syntax error
The same query runs fine on SQL Server Management Studio, as well as in an xml file.
I've tried using the full path to the table as well i.e.
select * from database.schemaname.tablename
But it throws the same error.
Any sort of help would be appreciated. Thank you!