Querying from Microsoft SQL to a Pandas Dataframe - python-3.x

I am trying to write a program in Python3 that will run a query on a table in Microsoft SQL and put the results into a Pandas DataFrame.
My first try of this was the below code, but for some reason I don't understand the columns do not appear in the order I ran them in the query and the order they appear in and the labels they are given as a result change, stuffing up the rest of my program:
import pandas as pd, pyodbc
result_port_mapl = []
# Use pyodbc to connect to SQL Database
con_string = 'DRIVER={SQL Server};SERVER='+ <server> +';DATABASE=' +
<database>
cnxn = pyodbc.connect(con_string)
cursor = cnxn.cursor()
# Run SQL Query
cursor.execute("""
SELECT <field1>, <field2>, <field3>
FROM result
""")
# Put data into a list
for row in cursor.fetchall():
temp_list = [row[2], row[1], row[0]]
result_port_mapl.append(temp_list)
# Make list of results into dataframe with column names
## FOR SOME REASON HERE row[1] AND row[0] DO NOT CONSISTENTLY APPEAR IN THE
## SAME ORDER AND SO THEY ARE MISLABELLED
result_port_map = pd.DataFrame(result_port_mapl, columns={'<field1>', '<field2>', '<field3>'})
I have also tried the following code
import pandas as pd, pyodbc
# Use pyodbc to connect to SQL Database
con_string = 'DRIVER={SQL Server};SERVER='+ <server> +';DATABASE=' + <database>
cnxn = pyodbc.connect(con_string)
cursor = cnxn.cursor()
# Run SQL Query
cursor.execute("""
SELECT <field1>, <field2>, <field3>
FROM result
""")
# Put data into DataFrame
# This becomes one column with a list in it with the three columns
# divided by a comma
result_port_map = pd.DataFrame(cursor.fetchall())
# Get column headers
# This gives the error "AttributeError: 'pyodbc.Cursor' object has no
# attribute 'keys'"
result_port_map.columns = cursor.keys()
If anyone could suggest why either of those errors are happening or provide a more efficient way to do it, it would be greatly appreciated.
Thanks

If you just use read_sql? Like:
import pandas as pd, pyodbc
con_string = 'DRIVER={SQL Server};SERVER='+ <server> +';DATABASE=' + <database>
cnxn = pyodbc.connect(con_string)
query = """
SELECT <field1>, <field2>, <field3>
FROM result
"""
result_port_map = pd.read_sql(query, cnxn)
result_port_map.columns.tolist()

Related

Pandas .to_sql fails silently randomly

I have several large pandas dataframes (about 30k+ rows) and need to upload a different version of them daily to a MS SQL Server db. I am trying to do so with the to_sql pandas function. On occasion, it will work. Other times, it will fail - silently - as if the code uploaded all of the data despite not having uploaded a single row.
Here is my code:
class SQLServerHandler(DataBaseHandler):
...
def _getSQLAlchemyEngine(self):
'''
Get an sqlalchemy engine
from the connection string
The fast_executemany fails silently:
https://stackoverflow.com/questions/48307008/pandas-to-sql-doesnt-insert-any-data-in-my-table/55406717
'''
# escape special characters as required by sqlalchemy
dbParams = urllib.parse.quote_plus(self.connectionString)
# create engine
engine = sqlalchemy.create_engine(
'mssql+pyodbc:///?odbc_connect={}'.format(dbParams))
return engine
#logExecutionTime('Time taken to upload dataframe:')
def uploadData(self, tableName, dataBaseSchema, dataFrame):
'''
Upload a pandas dataFrame
to a database table <tableName>
'''
engine = self._getSQLAlchemyEngine()
dataFrame.to_sql(
tableName,
con=engine,
index=False,
if_exists='append',
method='multi',
chunksize=50,
schema=dataBaseSchema)
Switching the method to None seems to work properly but the data takes an insane amount of time to upload (30+ mins). Having multiple tables (20 or so) a day of this size discards this solution.
The proposed solution here to add the schema as a parameter doesn't work. Neither does creating a sqlalchemy session and passsing it to the con parameter with session.get_bind().
I am using:
ODBC Driver 17 for SQL Server
pandas 1.2.1
sqlalchemy 1.3.22
pyodbc 4.0.30
Does anyone know how to make it raise an exception if it fails?
Or why it is not uploading any data?
In rebuttal to this answer, if to_sql() was to fall victim to the issue described in
SQL Server does not finish execution of a large batch of SQL statements
then it would have to be constructing large anonymous code blocks of the form
-- Note no SET NOCOUNT ON;
INSERT INTO gh_pyodbc_262 (id, txt) VALUES (0, 'row0');
INSERT INTO gh_pyodbc_262 (id, txt) VALUES (1, 'row1');
INSERT INTO gh_pyodbc_262 (id, txt) VALUES (2, 'row2');
…
and that is not what to_sql() is doing. If it were, then it would start to fail well below 1_000 rows, at least on SQL Server 2017 Express Edition:
import pandas as pd
import pyodbc
import sqlalchemy as sa
print(pyodbc.version) # 4.0.30
table_name = "gh_pyodbc_262"
num_rows = 400
print(f" num_rows: {num_rows}") # 400
cnxn = pyodbc.connect("DSN=mssqlLocal64", autocommit=True)
crsr = cnxn.cursor()
crsr.execute(f"TRUNCATE TABLE {table_name}")
sql = "".join(
[
f"INSERT INTO {table_name} ([id], [txt]) VALUES ({i}, 'row{i}');"
for i in range(num_rows)
]
)
crsr.execute(sql)
row_count = crsr.execute(f"SELECT COUNT(*) FROM {table_name}").fetchval()
print(f"row_count: {row_count}") # 316
Using to_sql() for that same operation works
import pandas as pd
import pyodbc
import sqlalchemy as sa
print(pyodbc.version) # 4.0.30
table_name = "gh_pyodbc_262"
num_rows = 400
print(f" num_rows: {num_rows}") # 400
df = pd.DataFrame(
[(i, f"row{i}") for i in range(num_rows)], columns=["id", "txt"]
)
engine = sa.create_engine(
"mssql+pyodbc://#mssqlLocal64", fast_executemany=True
)
df.to_sql(
table_name,
engine,
index=False,
if_exists="replace",
)
with engine.connect() as conn:
row_count = conn.execute(
sa.text(f"SELECT COUNT(*) FROM {table_name}")
).scalar()
print(f"row_count: {row_count}") # 400
and indeed will work for thousands and even millions of rows. (I did a successful test with 5_000_000 rows.)
Ok, this seems to be an issue with SQL Server itself.
SQL Server does not finish execution of a large batch of SQL statements

How to apply multiple whereclause in sqlalchmey in dask while fetching large dataset from teradata

I am trying to fetch larger dataset from teradata using dask and sqlalchmey. I am able to apply single whereclause and able to fetch data.below is the working code
td_engine = create_engine(connString)
metadata = MetaData()
t = Table(
"table",
metadata,
Column("c1"),
schema="schema",
)
sql = select([t]).where(
t.c.c1 == 'abc',
)
)
start = perf_counter()
df = dd.read_sql_table(sql, connString, index_col="c1",schema="schema")
end = perf_counter()
print("Time taken to execute the code {}".format(end - start))
print(df.head())
but when I am trying to apply and in whereclause I am getting error
sql = select([t]).where(
and_(
t.c.c1 == 'abc',
t.c.c2 == 'xyz'
)
)
More context would be helpful. If you simply need to execute the query, have you considered using the pandas read_sql function and composing the SQL request yourself?
import teradatasql
import pandas as pd
with teradatasql.connect(host="whomooz",user="guest",password="please") as con:
df = pd.read_sql("select c1 from mytable where c1='abc' and c2='xyz'", con)
print(df.head())
Or is there a specific need to use the pandas functions to construct the SQL request?

Why are utf-8 emojis not getting rendered in my pandas dataframe when I read from the SQL database?

I have the following line to read from a csv file:
coronavirus_df = pd.read_csv('Path\coronavirus_March-3-2020.csv')
I Have this other lines to read from MSSQL:
import pandas as pd
import pyodbc
conn = pyodbc.connect('Driver={SQL Server};'
'Server=MyServer;'
'Database=Mydb;'
'Trusted_Connection=yes;')
cursor = conn.cursor()
sql_tweets_df = pd.read_sql_query('SELECT * FROM my table',conn)
In both cases, I can get the data from the data sources and create a data frame, but there is an important difference:
coronavirus_df['text'].loc[9] gives the result:
-> 'YEP 👍some more text.'
sql_tweets_df['Text'].loc[9] gives this other result:
-> 'YEP ðŸ‘\x8d some more text'
Why is this happening?, the emoji is not rendered when I'm getting the information from the database.
In both the database and in the excel file, that record seems to be precisely the same.
I'm using python 3 and Jupyter notebooks

PYODBC - Type Error: the first argument to execute must be a string or unicode query

Been trying to connect our ERP ODBC by using PYODBC, Although I got the syntax correct the only error I'm getting at this point is this 'TypeError: the first argument to execute must be a string or unicode query'
I've tried adding .decode('utf-8').
import pyodbc
import pandas as pd
conn = pyodbc.connect(
'DRIVER={SQL Server};'
'SERVER=192.168.1.30;'
'DATABASE=Datamart;'
'Trusted_Connection=yes;')
cursor = conn.cursor()
for row in cursor.tables(tableType='TABLE'):
print(row)
sql = """SELECT * FROM ETL.Dim_FC_UPS_Interface_Detail"""
cursor.execute(row, sql)
df = pd.read_sql(sql, conn)
df.head()
I think your ordering of commands is off a bit for use of pyodbc cursor execute function. See the docs.
cursor = conn.cursor()
sql = """SELECT * FROM ETL.Dim_FC_UPS_Interface_Detail"""
cursor.execute(sql)
for row in cursor:
print(row)

How to insert a row into a table in MS SQL using Python pandas

when trying to insert a row into a table in MS SQL using Python pandas, I got the error " 'nonetype' object is not iterable" when trying to execute the INSERT query in python.I use Python 3.6 and microsoft sql server management studio 2008
my code:
import pyodbc
import pandas as pd
server = 'ACER'
db = 'fin'
# Create the connection
conn = pyodbc.connect('DRIVER={SQL Server};SERVER=' + server + ';DATABASE=' + db + ';Trusted_Connection=yes')
# query db
sql = """INSERT INTO [fin].[dbo].[items] (itemdate, itemtype, name, amount) VALUES('2017-04-01','income','bonus',350) """
#df = pd.read_sql(sql, conn)
df = pd.read_sql(sql, conn)
print(df.to_string())
Somebody suggested using SET NOCOUNT ON, so I tried to modify the query to:
sql = """ SET NOCOUNT ON
---
INSERT INTO [fin].[dbo].[items] (itemdate, itemtype, name, amount) VALUES('2017-04-01','income','bonus',350) """.split("---")
but the execution failed.

Resources