How to insert a row into a table in MS SQL using Python pandas - python-3.x

when trying to insert a row into a table in MS SQL using Python pandas, I got the error " 'nonetype' object is not iterable" when trying to execute the INSERT query in python.I use Python 3.6 and microsoft sql server management studio 2008
my code:
import pyodbc
import pandas as pd
server = 'ACER'
db = 'fin'
# Create the connection
conn = pyodbc.connect('DRIVER={SQL Server};SERVER=' + server + ';DATABASE=' + db + ';Trusted_Connection=yes')
# query db
sql = """INSERT INTO [fin].[dbo].[items] (itemdate, itemtype, name, amount) VALUES('2017-04-01','income','bonus',350) """
#df = pd.read_sql(sql, conn)
df = pd.read_sql(sql, conn)
print(df.to_string())
Somebody suggested using SET NOCOUNT ON, so I tried to modify the query to:
sql = """ SET NOCOUNT ON
---
INSERT INTO [fin].[dbo].[items] (itemdate, itemtype, name, amount) VALUES('2017-04-01','income','bonus',350) """.split("---")
but the execution failed.

Related

Pandas .to_sql fails silently randomly

I have several large pandas dataframes (about 30k+ rows) and need to upload a different version of them daily to a MS SQL Server db. I am trying to do so with the to_sql pandas function. On occasion, it will work. Other times, it will fail - silently - as if the code uploaded all of the data despite not having uploaded a single row.
Here is my code:
class SQLServerHandler(DataBaseHandler):
...
def _getSQLAlchemyEngine(self):
'''
Get an sqlalchemy engine
from the connection string
The fast_executemany fails silently:
https://stackoverflow.com/questions/48307008/pandas-to-sql-doesnt-insert-any-data-in-my-table/55406717
'''
# escape special characters as required by sqlalchemy
dbParams = urllib.parse.quote_plus(self.connectionString)
# create engine
engine = sqlalchemy.create_engine(
'mssql+pyodbc:///?odbc_connect={}'.format(dbParams))
return engine
#logExecutionTime('Time taken to upload dataframe:')
def uploadData(self, tableName, dataBaseSchema, dataFrame):
'''
Upload a pandas dataFrame
to a database table <tableName>
'''
engine = self._getSQLAlchemyEngine()
dataFrame.to_sql(
tableName,
con=engine,
index=False,
if_exists='append',
method='multi',
chunksize=50,
schema=dataBaseSchema)
Switching the method to None seems to work properly but the data takes an insane amount of time to upload (30+ mins). Having multiple tables (20 or so) a day of this size discards this solution.
The proposed solution here to add the schema as a parameter doesn't work. Neither does creating a sqlalchemy session and passsing it to the con parameter with session.get_bind().
I am using:
ODBC Driver 17 for SQL Server
pandas 1.2.1
sqlalchemy 1.3.22
pyodbc 4.0.30
Does anyone know how to make it raise an exception if it fails?
Or why it is not uploading any data?
In rebuttal to this answer, if to_sql() was to fall victim to the issue described in
SQL Server does not finish execution of a large batch of SQL statements
then it would have to be constructing large anonymous code blocks of the form
-- Note no SET NOCOUNT ON;
INSERT INTO gh_pyodbc_262 (id, txt) VALUES (0, 'row0');
INSERT INTO gh_pyodbc_262 (id, txt) VALUES (1, 'row1');
INSERT INTO gh_pyodbc_262 (id, txt) VALUES (2, 'row2');
…
and that is not what to_sql() is doing. If it were, then it would start to fail well below 1_000 rows, at least on SQL Server 2017 Express Edition:
import pandas as pd
import pyodbc
import sqlalchemy as sa
print(pyodbc.version) # 4.0.30
table_name = "gh_pyodbc_262"
num_rows = 400
print(f" num_rows: {num_rows}") # 400
cnxn = pyodbc.connect("DSN=mssqlLocal64", autocommit=True)
crsr = cnxn.cursor()
crsr.execute(f"TRUNCATE TABLE {table_name}")
sql = "".join(
[
f"INSERT INTO {table_name} ([id], [txt]) VALUES ({i}, 'row{i}');"
for i in range(num_rows)
]
)
crsr.execute(sql)
row_count = crsr.execute(f"SELECT COUNT(*) FROM {table_name}").fetchval()
print(f"row_count: {row_count}") # 316
Using to_sql() for that same operation works
import pandas as pd
import pyodbc
import sqlalchemy as sa
print(pyodbc.version) # 4.0.30
table_name = "gh_pyodbc_262"
num_rows = 400
print(f" num_rows: {num_rows}") # 400
df = pd.DataFrame(
[(i, f"row{i}") for i in range(num_rows)], columns=["id", "txt"]
)
engine = sa.create_engine(
"mssql+pyodbc://#mssqlLocal64", fast_executemany=True
)
df.to_sql(
table_name,
engine,
index=False,
if_exists="replace",
)
with engine.connect() as conn:
row_count = conn.execute(
sa.text(f"SELECT COUNT(*) FROM {table_name}")
).scalar()
print(f"row_count: {row_count}") # 400
and indeed will work for thousands and even millions of rows. (I did a successful test with 5_000_000 rows.)
Ok, this seems to be an issue with SQL Server itself.
SQL Server does not finish execution of a large batch of SQL statements

How to insert multiple rows of a pandas dataframe into Azure Synapse SQL DW using pyodbc?

I am using pyodbc to establish connection with Azure Synapse SQL DW. The connection is successfully established. However when it comes to inserting a pandas dataframe into the database, I am getting an error when I try inserting multiple rows as values. However, it works if I insert rows one by one. Inserting multiple rows together as values used to work fine with AWS Redshift and MS SQL, but fails with Azure Synapse SQL DW. I think the Azure Synapse SQL is T-SQL and not MS-SQL. Nonetheless, I am unable to find any relevant documentation as well.
I have a pandas df named 'df' that looks like this:
student_id admission_date
1 2019-12-12
2 2018-12-08
3 2018-06-30
4 2017-05-30
5 2020-03-11
This code below works fine
import pandas as pd
import pyodbc
#conn object below is the pyodbc 'connect' object
batch_size = 1
i = 0
chunk = df[i:i+batch_size]
conn.autocommit = True
sql = 'insert INTO {} values {}'.format('myTable', ','.join(
str(e) for e in zip(chunk.student_id.values, chunk.admission_date.values.astype(str))))
print(sql)
cursor = conn.cursor()
cursor.execute(sql)
As you can see, it's inserting just 1 row of the 'df'. So, yes, I can loop through and insert one by one but it takes hell lot of time when it comes dataframes of larger sizes
This code below doesn't work when I try to insert all rows together
import pandas as pd
import pyodbc
batch_size = 5
i = 0
chunk = df[i:i+batch_size]
conn.autocommit = True
sql = 'insert INTO {} values {}'.format('myTable', ','.join(
str(e) for e in zip(chunk.student_id.values, chunk.admission_date.values.astype(str))))
print(sql)
cursor = conn.cursor()
cursor.execute(sql)
The error I get this one below:
ProgrammingError: ('42000', "[42000]
[Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Parse error at
line: 1, column: 74: Incorrect syntax near ','. (103010)
(SQLExecDirectW)")
This is the sample SQL query for 2 rows which fails:
insert INTO myTable values (1, '2009-12-12'),(2, '2018-12-12')
That's because Azure Synapse SQL does not support multi-row insert via the values constructor.
One work around is to chain "select (value list) union all". Your pseudo SQL should look like so:
insert INTO {table}
select {chunk.student_id.values}, {chunk.admission_date.values.astype(str)} union all
...
select {chunk.student_id.values}, {chunk.admission_date.values.astype(str)}
COPY statement in Azure Synapse Analytics is a better way for loading your data in Synapse SQL Pool.
COPY INTO test_parquet
FROM 'https://myaccount.blob.core.windows.net/myblobcontainer/folder1/*.parquet'
WITH (
FILE_FORMAT = myFileFormat,
CREDENTIAL=(IDENTITY= 'Shared Access Signature', SECRET='<Your_SAS_Token>')
)
You can save your pandas dataframe into blob storage, and then trigger the copy command using execute method.

PYODBC - Type Error: the first argument to execute must be a string or unicode query

Been trying to connect our ERP ODBC by using PYODBC, Although I got the syntax correct the only error I'm getting at this point is this 'TypeError: the first argument to execute must be a string or unicode query'
I've tried adding .decode('utf-8').
import pyodbc
import pandas as pd
conn = pyodbc.connect(
'DRIVER={SQL Server};'
'SERVER=192.168.1.30;'
'DATABASE=Datamart;'
'Trusted_Connection=yes;')
cursor = conn.cursor()
for row in cursor.tables(tableType='TABLE'):
print(row)
sql = """SELECT * FROM ETL.Dim_FC_UPS_Interface_Detail"""
cursor.execute(row, sql)
df = pd.read_sql(sql, conn)
df.head()
I think your ordering of commands is off a bit for use of pyodbc cursor execute function. See the docs.
cursor = conn.cursor()
sql = """SELECT * FROM ETL.Dim_FC_UPS_Interface_Detail"""
cursor.execute(sql)
for row in cursor:
print(row)

Querying from Microsoft SQL to a Pandas Dataframe

I am trying to write a program in Python3 that will run a query on a table in Microsoft SQL and put the results into a Pandas DataFrame.
My first try of this was the below code, but for some reason I don't understand the columns do not appear in the order I ran them in the query and the order they appear in and the labels they are given as a result change, stuffing up the rest of my program:
import pandas as pd, pyodbc
result_port_mapl = []
# Use pyodbc to connect to SQL Database
con_string = 'DRIVER={SQL Server};SERVER='+ <server> +';DATABASE=' +
<database>
cnxn = pyodbc.connect(con_string)
cursor = cnxn.cursor()
# Run SQL Query
cursor.execute("""
SELECT <field1>, <field2>, <field3>
FROM result
""")
# Put data into a list
for row in cursor.fetchall():
temp_list = [row[2], row[1], row[0]]
result_port_mapl.append(temp_list)
# Make list of results into dataframe with column names
## FOR SOME REASON HERE row[1] AND row[0] DO NOT CONSISTENTLY APPEAR IN THE
## SAME ORDER AND SO THEY ARE MISLABELLED
result_port_map = pd.DataFrame(result_port_mapl, columns={'<field1>', '<field2>', '<field3>'})
I have also tried the following code
import pandas as pd, pyodbc
# Use pyodbc to connect to SQL Database
con_string = 'DRIVER={SQL Server};SERVER='+ <server> +';DATABASE=' + <database>
cnxn = pyodbc.connect(con_string)
cursor = cnxn.cursor()
# Run SQL Query
cursor.execute("""
SELECT <field1>, <field2>, <field3>
FROM result
""")
# Put data into DataFrame
# This becomes one column with a list in it with the three columns
# divided by a comma
result_port_map = pd.DataFrame(cursor.fetchall())
# Get column headers
# This gives the error "AttributeError: 'pyodbc.Cursor' object has no
# attribute 'keys'"
result_port_map.columns = cursor.keys()
If anyone could suggest why either of those errors are happening or provide a more efficient way to do it, it would be greatly appreciated.
Thanks
If you just use read_sql? Like:
import pandas as pd, pyodbc
con_string = 'DRIVER={SQL Server};SERVER='+ <server> +';DATABASE=' + <database>
cnxn = pyodbc.connect(con_string)
query = """
SELECT <field1>, <field2>, <field3>
FROM result
"""
result_port_map = pd.read_sql(query, cnxn)
result_port_map.columns.tolist()

How to use passed parameter as table Name in Select query python?

i have the following function which extracts data from table, but i want to pass the table name in function as parameter...
def extract_data(table):
try:
tableName = table
conn_string = "host='localhost' dbname='Aspentiment' user='postgres' password='pwd'"
conn=psycopg2.connect(conn_string)
cursor = conn.cursor()
cursor.execute("SELECT aspects_name, sentiments FROM ('%s') " %(tableName))
rows = cursor.fetchall()
return rows
finally:
if conn:
conn.close()
when i call function as extract_data(Harpar) : Harpar is table name
but it give an error that 'Harpar' is not defined.. any hepl ?
Update: As of psycopg2 version 2.7:
You can now use the sql module of psycopg2 to compose dynamic queries of this type:
from psycopg2 import sql
query = sql.SQL("SELECT aspects_name, sentiments FROM {}").format(sql.Identifier(tableName))
cursor.execute(query)
Pre < 2.7:
Use the AsIs adapter along these lines:
from psycopg2.extensions import AsIs
cursor.execute("SELECT aspects_name, sentiments FROM %s;",(AsIs(tableName),))
Without the AsIs adapter, psycopg2 will escape the table name in your query.

Resources