Getting SQLCODE=-104 on binding a parameter for DB2 query in Python - python-3.x

Assuming the data.xlsx looks like this:
Column_Name | Table_Name
CUST_ID_1 | Table_1
CUST_ID_2 | Table_2
Here are the SQLs that I'm trying to generate by using the bind_param for db2 in Python:
SELECT CUST_ID_1 FROM TABLE_1 WHERE CUST_ID_1 = 12345
SELECT CUST_ID_2 FROM TABLE_2 WHERE CUST_ID_2 = 12345
And this is how Im trying to generate this query:
import ibm_db
import pandas as pd
from pandas import ExcelWriter
from pandas import ExcelFile
validate_sql = "SELECT ? FROM ? WHERE ?=12345"
validate_stmt = ibm_db.prepare(conn, validate_sql)
df = pd.read_excel("data.xlsx", sheet_name='Sheet1')
for i in df.index:
ibm_db.bind_param(validate_stmt, 1, df['Column_Name'][i])
ibm_db.bind_param(validate_stmt, 2, df['Table_Name'][i])
ibm_db.bind_param(validate_stmt, 3, df['Column_Name'][i])
ibm_db.execute(validate_stmt)
validation_result = ibm_db.fetch_both(validate_stmt)
while validation_result != False:
print(validation_result[0])
validation_result = ibm_db.fetch_both(validate_stmt)
When I try to execute this code, Im hitting a SQLCODE=-104 error.
Any idea how the syntax should be for parameter binding?
Thanks,
Ganesh

2 major errors.
1. You can’t use a parameter marker for a table or column name (2-nd & 3-rd parameters).
2. You must specify the data type of the parameter marker, if it’s not possible to understand it from the query (1-st parameter). You must use something like «cast(? as data-type-desired)». But it’s just for you info, since you try to use it here as a column name, which is not possible as described in 1).

Related

Dask read_sql_query did not execute sql that I put in

Hi all I'm new to Dask.
I faced an error when I tried using read_sql_query to get data from Oracle database.
Here is my python script:
con_str = "oracle+cx_oracle://{UserID}:{Password}#{Domain}/?service_name={Servicename}"
sql= "
column_a, column_b
from
database.tablename
where
mydatetime >= to_date('1997-01-01 00:00:00','YYYY-MM-DD HH24:MI:SS')
"
from sqlalchemy.sql import select, text
from dask.dataframe import read_sql_query
sa_query= select(text(sql))
ddf = read_sql_query(sql=sa_query, con=con, index_col="index", head_rows=5)
I refered this post: Reading an SQL query into a Dask DataFrame
Remove "select" string from my query.
And I got an cx_Oracle.DatabaseError with missing expression [SQL: SELECT FROM DUAL WHERE ROWNUM <= 5]
But I don't get it where the query came from.
Seem like it didn't execute the sql code I provided.
I'm not sure which part I did not config right.
*Note: using pandas.read_sql is ok , only fail when using dask.dataframe.read_sql_query

Any optimize way to iterate excel and provide data into pd.read_sql() as a string one by one

#here I have to apply the loop which can provide me the queries from excel for respective reports:
df1 = pd.read_sql(SQLqueryB2, con=con1)
df2 = pd.read_sql(ORCqueryC2, con=con2)
if (df1.equals(df2)):
print(Report2 +" : is Pass")
Can we achieve above by something doing like this (by iterating ndarray)
df = pd.read_excel(path) for col, item in df.iteritems():
OR do the only option left to read the excel from "openpyxl" library and iterate row, columns and then provide the values. Hope I am clear with the question, if any doubt please comment me.
You are trying to loop through an excel file, run the 2 queries, see if they match and output the result, correct?
import pandas as pd
from sqlalchemy import create_engine
# add user, pass, database name
con = create_engine(f"mysql+pymysql://{USER}:{PWD}#{HOST}/{DB}")
file = pd.read_excel('excel_file.xlsx')
file['Result'] = '' # placeholder
for i, row in file.iterrows():
df1 = pd.read_sql(row['SQLQuery'], con)
df2 = pd.read_sql(row['Oracle Queries'], con)
file.loc[i, 'Result'] = 'Pass' if df1.equals(df2) else 'Fail'
file.to_excel('results.xlsx', index=False)
This will save a file named results.xlsx that mirrors the original data but adds a column named Result that will be Pass or Fail.
Example results.xlsx:

How to get values based on 2 user inputs in Python

As per below data, using Python how can I get Headers column value for the corresponding given input from DB & Table column.
DB Table Headers
Oracle Cust Id,Name,Mail,Phone,City,County
Oracle Cli Cid,shopNo,State
Oracle Addr Street,Area,City,Country
SqlSer Usr Name,Id,Addr
SqlSer Log LogId,Env,Stg
MySql Loc Flat,Add,Pin,Country
MySql Data Id,Txt,TaskId,No
Output: Suppose if i pass, Oracle & Cli as parameters, then it should return the value as "Cid,shopNo,State" in a list.
Trying with python dictionary, but it takes 2 values key and value. But i have 3 values. how to get ?
Looks like your data is in some sort of tabular format. In that case I would recommend using the pandas package, which is very convenient if you are working with tabular data.
pandas can read data into a DataFrame from a CSV file using pandas.read_csv. This dataframe you can then filter using the column names and the required values.
In the example below I assume that your data is tab (\t) separated. I read in the data from a string using io.StringIO. Normally you would just use pandas.read_csv('filename.csv').
import pandas as pd
import io
data = """DB\tTable\tHeaders
Oracle\tCust\tId,Name,Mail,Phone,City,County
Oracle\tCli\tCid,shopNo,State
Oracle\tAddr\tStreet,Area,City,Country
SqlSer\tUsr\tName,Id,Addr
SqlSer\tLog\tLogId,Env,Stg
MySql\tLoc\tFlat,Add,Pin,Country
MySql\tData\tId,Txt,TaskId,No"""
dataframe = pd.read_csv(io.StringIO(data), sep='\t')
db_is_oracle = dataframe['DB'] == 'Oracle'
table_is_cli = dataframe['Table'] == 'Cli'
filtered_dataframe = dataframe[db_is_oracle & table_is_cli]
print(filtered_dataframe)
This will result in :
DB Table Headers
1 Oracle Cli Cid,shopNo,State
Or to get the actual headers of the first match:
print(filtered_dataframe['Headers'].iloc[0])
>>> Cid,shopNo,State

Multiple WHERE conditions in Pandas read_sql

I've got my data put into an SQLite3 database, and now I'm trying to work on a little script to access data I want for given dates. I got the SELECT statement to work with the date ranges, but I can't seem to add another condition to fine tune the search.
db columns id, date, driverid, drivername, pickupStop, pickupPkg, delStop, delPkg
What I've got so far:
import pandas as pd
import sqlite3
sql_data = 'driverperformance.sqlite'
conn = sqlite3.connect(sql_data)
cur = conn.cursor()
date_start = "2021-12-04"
date_end = "2021-12-10"
df = pd.read_sql_query("SELECT DISTINCT drivername FROM DriverPerf WHERE date BETWEEN :dstart and :dend", params={"dstart": date_start, "dend": date_end}, con=conn)
drivers = df.values.tolist()
for d in drivers:
driverDF = pd.read_sql_query("SELECT * FROM DriverPerf WHERE drivername = :driver AND date BETWEEN :dstart and :dend", params={"driver": d, "dstart": date_start, "dend": date_end}, con=conn)
I've tried a few different versions of the "WHERE drivername" part but it always seems to fail.
Thanks!
If I'm not mistaken, drivers will be a list of lists. Have you tried
.... params={"driver": d[0] ....

How can I avoid this TypeError when creating a new column for my dataframe which multiplies the values in another column

I am trying to add a column which converts bitcoin values to GBP to my dataframe in pyspark however when I run the code I get a Type Error. I have tried to create a variable with the same Type as the column to avoid this but I am unable to resolve the error.
bc_value = DecimalType("4000")
df_j2 = df_j2.withColumn("value",df_j2["value"].cast(DecimalType()))
df_group = df_j2.groupBy("pubkey").sum("value")
df_final = df_group.sort(df_group["sum(value)"].desc()).limit(10)
df_with_pound = df_final.withColumn("pound", col(bc_value*("value")
df_with_pound.show()
Here is the error shown on screen:
There are some syntax errors in your code, including how decimal columns are defined and how columns are used. You can try the code below:
from pyspark.sql.types import *
from pyspark.sql.functions import col, lit
bc_value = lit(4000).cast(DecimalType())
df_with_pound = df_final.withColumn("pound", col("value") * bc_value)

Resources