Multiple WHERE conditions in Pandas read_sql

Multiple WHERE conditions in Pandas read_sql - python-3.x

I've got my data put into an SQLite3 database, and now I'm trying to work on a little script to access data I want for given dates. I got the SELECT statement to work with the date ranges, but I can't seem to add another condition to fine tune the search.
db columns id, date, driverid, drivername, pickupStop, pickupPkg, delStop, delPkg
What I've got so far:
import pandas as pd
import sqlite3
sql_data = 'driverperformance.sqlite'
conn = sqlite3.connect(sql_data)
cur = conn.cursor()
date_start = "2021-12-04"
date_end = "2021-12-10"
df = pd.read_sql_query("SELECT DISTINCT drivername FROM DriverPerf WHERE date BETWEEN :dstart and :dend", params={"dstart": date_start, "dend": date_end}, con=conn)
drivers = df.values.tolist()
for d in drivers:
driverDF = pd.read_sql_query("SELECT * FROM DriverPerf WHERE drivername = :driver AND date BETWEEN :dstart and :dend", params={"driver": d, "dstart": date_start, "dend": date_end}, con=conn)
I've tried a few different versions of the "WHERE drivername" part but it always seems to fail.
Thanks!

If I'm not mistaken, drivers will be a list of lists. Have you tried
.... params={"driver": d[0] ....

Related

Any optimize way to iterate excel and provide data into pd.read_sql() as a string one by one

#here I have to apply the loop which can provide me the queries from excel for respective reports:
df1 = pd.read_sql(SQLqueryB2, con=con1)
df2 = pd.read_sql(ORCqueryC2, con=con2)
if (df1.equals(df2)):
print(Report2 +" : is Pass")
Can we achieve above by something doing like this (by iterating ndarray)
df = pd.read_excel(path) for col, item in df.iteritems():
OR do the only option left to read the excel from "openpyxl" library and iterate row, columns and then provide the values. Hope I am clear with the question, if any doubt please comment me.

You are trying to loop through an excel file, run the 2 queries, see if they match and output the result, correct?
import pandas as pd
from sqlalchemy import create_engine
# add user, pass, database name
con = create_engine(f"mysql+pymysql://{USER}:{PWD}#{HOST}/{DB}")
file = pd.read_excel('excel_file.xlsx')
file['Result'] = '' # placeholder
for i, row in file.iterrows():
df1 = pd.read_sql(row['SQLQuery'], con)
df2 = pd.read_sql(row['Oracle Queries'], con)
file.loc[i, 'Result'] = 'Pass' if df1.equals(df2) else 'Fail'
file.to_excel('results.xlsx', index=False)
This will save a file named results.xlsx that mirrors the original data but adds a column named Result that will be Pass or Fail.
Example results.xlsx:

Python for loop for multiple postgres queries based on dates

I want to create a for loop that queries data from my database one week at a time for three months.
For example:
import tempfile
import pandas as pd
DB_URL = 'db/url/here:1234'
engine = sqlalchemy.create_engine(DB_URL)
conn = engine.connect()
queries=
{'d_week1_start':2017-12-1,
'd_week1_end':2017-12-7,
'd_week2_start':2017-12-7,
'd_week2_end':2017-12-14,
'd_week3_start': 2017-12-14,
'd_week3_end':2017-12-21,
'd_week4_start': 2017-12-21,
'd_week4_end':2017-12-31}
for qr in queries:
one_week_df = pd.read_sql(qr, conn)
one_week_df.to_pickle('one_week.pkl')
How would I go about doing something like this?
Update: I realized I wasn't passing any queries. I changed the items in the dictionary to queries like:
'''SELECT * FROM table WHERE time > '20180201T120000'' AND time < '20180207T120000' ORDER BY time ASC'''

Error while getting user input and using Pandas DataFrame to extract data from LEFT JOIN

I am trying to create Sqlite3 statement in Python 3 to collect data from two tables called FreightCargo & Train where a train ID is the input value. I want to use Pandas since its easy to read the tables.
I have created the code below which is working perfectly fine, but its static and looks for only one given line in the statement.
import pandas as pd
SQL = '''SELECT F.Cargo_ID, F.Name, F.Weight, T.Train_ID, T.Assembly_date
FROM FreightCargo F LEFT JOIN [Train] T
ON F.Cargo_ID = T.Cargo_ID
WHERE Train_ID = 2;'''
cursor = conn.cursor()
cursor.execute( SQL )
names = [x[0] for x in cursor.description]
rows = cursor.fetchall()
Temp = pd.DataFrame( rows, columns=names)
Temp'''
I want to be able to create a variable with an input. The outcome of this action will then be determined with what has been given from the user. For example the user is asked for a train_id which is a primary key in a table and the relations with the train will be listed.
I expanded the code, but I am getting an error: ValueError: operation parameter must be str
Train_ID = input('Train ID')
SQL = '''SELECT F.Cargo_ID, F.Name, F.Weight, T.Train_ID, T.Assembly_date
FROM FreightCargo F LEFT JOIN [Train] T
ON F.Cargo_ID = T.Cargo_ID
WHERE Train_ID = ?;''', (Train_ID)
cursor = conn.cursor()
cursor.execute( SQL )
names = [x[0] for x in cursor.description]
rows = cursor.fetchall()
Temp = pd.DataFrame( rows, columns=names)
Temp

The problem lays in your definition of the SQL variable.
You are creating a tuple/collection of two elements. If you print type(SQL) you will see something like this: ('''SELECT...?;''', ('your_user's_input')).
When you pass this to cursor.execute(sql[, parameters]), it is expecting a string as the first argument, with the "optional" parameters. Your parameters are not really optional, since they are defined by your SQL-query's [Train]. Parameters must be a collection, for example a tuple.
You can unwrap your SQL statement with cursor.execute(*SQL), which will pass each element of your SQL list as a different argument, or you can move the parameters to the execute function.
Train_ID = input('Train ID')
SQL = '''SELECT F.Cargo_ID, F.Name, F.Weight, T.Train_ID, T.Assembly_date
FROM FreightCargo F LEFT JOIN [Train] T
ON F.Cargo_ID = T.Cargo_ID
WHERE Train_ID = ?;'''
cursor = conn.cursor()
cursor.execute( SQL, (Train_ID,) )
names = [x[0] for x in cursor.description]
rows = cursor.fetchall()
Temp = pd.DataFrame( rows, columns=names)
Temp

Getting SQLCODE=-104 on binding a parameter for DB2 query in Python

Assuming the data.xlsx looks like this:
Column_Name | Table_Name
CUST_ID_1 | Table_1
CUST_ID_2 | Table_2
Here are the SQLs that I'm trying to generate by using the bind_param for db2 in Python:
SELECT CUST_ID_1 FROM TABLE_1 WHERE CUST_ID_1 = 12345
SELECT CUST_ID_2 FROM TABLE_2 WHERE CUST_ID_2 = 12345
And this is how Im trying to generate this query:
import ibm_db
import pandas as pd
from pandas import ExcelWriter
from pandas import ExcelFile
validate_sql = "SELECT ? FROM ? WHERE ?=12345"
validate_stmt = ibm_db.prepare(conn, validate_sql)
df = pd.read_excel("data.xlsx", sheet_name='Sheet1')
for i in df.index:
ibm_db.bind_param(validate_stmt, 1, df['Column_Name'][i])
ibm_db.bind_param(validate_stmt, 2, df['Table_Name'][i])
ibm_db.bind_param(validate_stmt, 3, df['Column_Name'][i])
ibm_db.execute(validate_stmt)
validation_result = ibm_db.fetch_both(validate_stmt)
while validation_result != False:
print(validation_result[0])
validation_result = ibm_db.fetch_both(validate_stmt)
When I try to execute this code, Im hitting a SQLCODE=-104 error.
Any idea how the syntax should be for parameter binding?
Thanks,
Ganesh

2 major errors.
1. You can’t use a parameter marker for a table or column name (2-nd & 3-rd parameters).
2. You must specify the data type of the parameter marker, if it’s not possible to understand it from the query (1-st parameter). You must use something like «cast(? as data-type-desired)». But it’s just for you info, since you try to use it here as a column name, which is not possible as described in 1).

Loop through columns - SQLAlchemy Core

I am trying to loop through the columns of all the tables in my database to select empty columns. I finally used raw sql and .format to get it to work, but how do I use SQLAlchemy to achieve the same result? Here is the code I've written:
from sqlalchemy import MetaData, create_engine, select
from sqlalchemy.sql import func
engine = create_engine('...')
conn = engine.connect()
tablemeta = MetaData(bind=engine, reflect=True)
for t in tablemeta.sorted_tables:
for col in t.c:
s = select([func.count(t.c[str(col)].distinct())])
s = s.scalar()
if s <= 1:
print(s)
But this results in a KeyError.

OK I got it to work:
for t in tablemeta.sorted_tables:
for col in t.c:
s = select([func.count(t.c[col.name].distinct())])
s = s.scalar()
if s <= 1:
print(s)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Multiple WHERE conditions in Pandas read_sql - python-3.x

If I'm not mistaken, drivers will be a list of lists. Have you tried .... params={"driver": d[0] ....

Related

Any optimize way to iterate excel and provide data into pd.read_sql() as a string one by one

Python for loop for multiple postgres queries based on dates

Error while getting user input and using Pandas DataFrame to extract data from LEFT JOIN

Getting SQLCODE=-104 on binding a parameter for DB2 query in Python

Loop through columns - SQLAlchemy Core

Categories

Resources