Python pymssql insert error - pymssql

My table has 24 columns around half of the column in my table are of float datatype. Specified 24 filed, I have truncated the insert statement here.
csv_data = csv.reader(file('filename.csv'))
for row in csv_data:
cursor.execute('insert into ddreplication (CTX, Mode,...,Max_repl_streams) values (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)', tuple(row))
Error:
File "pymssql.pyx", line 467, in pymssql.Cursor.execute
(pymssql.c:7561)
pymssql.OperationalError: (8114, 'Error converting data type varchar to float.DB-Lib error message 20018, severity 16:\nGeneral SQL
Server error: Check messages from the SQL Server\n')
Im having almost the same code on another script which is running fine without any issues.
Output of "SELECT COLUMN_NAME, DATA_TYPE FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME='ddreplication' ORDER BY ORDINAL_POSITION"
[(u'CTX', u'int'), (u'Mode', u'nvarchar'), (u'Destination', u'nvarchar'), (u'Connection_Host', u'nvarchar'), (u'Enabled', u'nvarchar'), (u'Low_bandwidth_optimization', u'nvarchar'), (u'Replication_encryption', u'nvarchar'), (u'Replication_propagate_retention_lock', u'nvarchar'), (u'Local_fs_status', u'nvarchar'), (u'Connection', u'nvarchar'), (u'State', u'nvarchar'), (u'Error', u'nvarchar'), (u'Network_bytes_to_destination', u'float'), (u'PreComp_bytes_written_to_source', u'float'), (u'PreComp_bytes_sent_to_destination', u'float'), (u'Bytes_after_synthetic_optimization', u'float'), (u'Bytes_after_filtering_by_destination', u'float'), (u'Bytes_after_low_bandwidth_optimization', u'float'), (u'Bytes_after_local_comp', u'float'), (u'PreComp_bytes_remaining', u'float'), (u'Compression_ration', u'float'), (u'Synced_as_of_time', u'nvarchar'), (u'Current_throttle', u'nvarchar'), (u'Max_repl_streams', u'nvarchar')]

Below function helped me to find out the errors on my original data. It give me details such as data type of each of the value that I'm trying to insert and count of values that is getting passed into the insert statement. Now I managed to override and get data inserted into the db, excluding the problem record.
Now I'm working on fixing my problem record and get it into DB.
def is_float(s):
try:
if s is None:
return False
f = float(s)
return True
except ValueError:
return False
print (len(row))
for i,v in enumerate(row):
print(i, is_float(v),row[i])

As you have discovered, the troublesome line in your source data contains 'N/A' for the float column "Compression_ration", and 'N/A' cannot be parsed to a numeric value. You can replace 'N/A' with None to insert a null by changing your query parameters from
tuple(row)
to
tuple(None if x == 'N/A' else x for x in row)

Related

Using parameters in query to search between dates

Silly newbe here.
So I'm banging my head on this:
Can't quite figure out the parameterized query and if it's properly formatted.
import sqlite3
def readSqliteTable():
try:
sqliteConnection = sqlite3.connect('testDB.sqlite')
cursor = sqliteConnection.cursor()
print("Connected to SQLite")
startdate = "2022-11-05"
enddate = "2022-11-25"
print("startdate =", startdate, "enddate =", enddate)
cursor.execute("SELECT * FROM tz WHERE UL_Time BETWEEN '%s' AND '%s'" % (startdate, enddate))
print(cursor.fetchall())
records = cursor.fetchall()
print("Total rows are: ", len(records))
print("Printing each row")
for row in records:
print("Id: ", row[0])
print("Updated: ", row[1])
print("Title: ", row[2])
print("UL_Time: ", row[3])
print("Size: ", row[4])
print("\n")
cursor.close()
except sqlite3.Error as error:
print("Failed to read data from sqlite table", error)
finally:
if sqliteConnection:
sqliteConnection.close()
print("The SQLite connection is closed")
It works fine if I substitute arbitrary dates as:
cursor.execute("SELECT * FROM tz WHERE UL_Time BETWEEN 2022-11-01 AND 2022-11-25")
but won't work in this form
First of all, you don't understand what a parameterized query is. Read the official Python documentation and do tutorials.
Your expression
"SELECT * FROM tz WHERE UL_Time BETWEEN '%s' AND '%s'" % (startdate, enddate))
shows string interpolation and has nothing to do with parameterized queries. This query also implements textual filter. It should work properly, so long all your dates are formatted as YYYY-MM-DD.
The second query is meaningless, as your WHERE clause defines an integer filter:
WHERE UL_Time BETWEEN 2010 AND 1986

Solution found: I'm getting this error when executing my code --> TypeError: not enough arguments for format string

Not sure how to fix this problem. When I used the debugger, the execute function does not work.
# this function updates the record chosen by the user
def update_record(vector_to_update):
logging.debug("Executing: update_record")
try:
with mydb.cursor() as cursor:
# prepared statement to update a record in the database
update_query = ('UPDATE myTable SET ref_date = %s, geo = %s, sex = %s, age_group = %s, '
'student_response = %s, uom = %s, uom_id = %s, scalar_factor = %s, scalar_id = %s, '
'vector = %s, coordinate = %s, value_field = %s, decimals = %s WHERE vector = "%s"')
# calls on the user_input function in the dataModel file and stores input in "data"
data = dataModel.user_input()
# execute the query using the vector_to_update in the query
cursor.execute(update_query, (data, vector_to_update))
# commit changes to database
mydb.commit()
print('Updating data for vector: {}'.format(vector_to_update))
cursor.close()
except pymysql.DatabaseError as error:
# if no connection to database
print("Update record failed to execute {}".format(error))
# tells the user the input is invalid and goes back thru the delete_record function
menu_update_record()

psycopg2 doesn't save data in to postgres database

So I am web scraping from a news site for certain articles. And I am using psycopg2 to connect to postgres database and save data from this article.
with conn.cursor() as cur:
query = """INSERT INTO
articles (title, article_body, author, author_title, source_date, "createdAt", "updatedAt")
VALUES (%s, %s, %s, %s, %s, %s, %s);"""
cur.execute(query, (articleTitle, parsedText, articleAuthor, articleAuthorTitle, articlePostDate, now, now))
cur.execute('SELECT author FROM articles')
rows = cur.fetchall()
print ('')
print (rows)
print ('')
The thing is that when second query is executed, it returns the data from the articles table, but when I make a query through terminal psql it shows that articles table is empty.
Try this. Hope it helps.
with conn.cursor() as cur:
query = """INSERT INTO
articles (title, article_body, author, author_title, source_date, "createdAt", "updatedAt")
VALUES (%s, %s, %s, %s, %s, %s, %s);"""
cur.execute(query, (articleTitle, parsedText, articleAuthor, articleAuthorTitle, articlePostDate, now, now))
conn.commit()
cur.execute('SELECT author FROM articles')
rows = cur.fetchall()
print ('')
print (rows)
print ('')

How to avoid SQL injection if I insert data from CSV file with variables in Python3, Pymsql?

My ENV is:
MySQL(mariadb) DB Version is 5.5.56
Python3 Version is 3.6
Situation:
I have a telephone statistics CSV file that will generate everyday, and I need to insert those data in my MYSQL DB.
Type: Extension Statistic Report,,,,,,,,
From 2018/4/17 上午 12:00:00 To 2018/4/18 上午 12:00:00
Agent Extension: Any number
,,,,,,,,
Agent Extension,,Inbound,,Outbound,,Total,,Total Talking time
,, Answered,Unanswered,Answered,Unanswered,Answered,Unanswered,
100 MeetingRoom,,0,0,0,0,0,0,00:00:00
101 Build,,0,0,0,0,0,0,00:00:00
102 Lead,,0,0,2.00,1.00,2.00,1.00,01:36:09
103 Discover,,0,0,0,0,0,0,00:00:00
105 Fatto,,1.00,0,28.00,9.00,29.00,9.00,01:07:27
106 Meditare,,0,0,0,0,0,0,00:00:00
Total:,,122.00,41.00,152.00,49.00,274.00,90.00,10h 43m 17s
This is my Code:
import csv, sys, os
import pymysql
from datetime import datetime, timedelta
# DB Config
dbconn = pymysql.connect(host='192.168.X.X',
port=3306,
user='root',
passwd='********',
db='test',
charset='utf8')
cursor = dbconn.cursor()
# Get today's date.
def get_date(d):
toDay = timedelta(days=d)
yesDay = datetime.now() + toDay
return yesDay.strftime("%Y%m%d")
# Get today's str value.
yesterday = get_date(-1)
beforeyesterday = get_date(-2)
with open("/Users/fiona/Downloads/statistics_1704_v1nNHbvGjnIQ2mVwsMLr.csv") as file:
readCSV = csv.reader(file)
extensionCodes = [] # Store extension Number
usersName = [] # Store User Name
inboundsAnswered = [] # Store Inbound Answered
inboundsUnanswered = [] # Store Inbound Unanswered
outboundsAnswered = [] # Store Outbound Answered
outboundsUnanswered = [] # Store Outbound Unanswered
totalsAnswered = [] # Store Total Answered
totalsUnanswered = [] # Store Total Unanswered
totalsTalkingTime = [] # Store Total Talking time
for index, rows in enumerate(readCSV):
if index not in range(0, 7) and rows[0] != "":
if str(rows[0])[:3] != "Tot":
extensionCode = str(rows[0])[:3] # Store every rows extension number
elif str(rows[0])[:5] == "Total":
break
userName = rows[0] # Store every rows name
inboundAnswered = float(rows[2])
inboundUnanswered = float(rows[3])
outboundAnswered = float(rows[4])
outboundUnanswered = float(rows[5])
totalAnswered = float(rows[6])
totalUnanswered = float(rows[7])
totalTalkingTime = rows[8]
sql = """
INSERT INTO
test (extension_number, username, inbound_answered, inbound_unanswered,
outbound_answered, outbound_unanswered, total_answered, total_unanswered,
total_talking_time, createtime)
VALUES
(%d, %s, %d, %d, %d, %d, %d, %d, %s, %s);
""" % (int(extensionCode), "'"+userName+"'", int(inboundAnswered), int(inboundUnanswered),
int(outboundAnswered), int(outboundUnanswered), int(totalAnswered),
int(totalUnanswered), "'"+totalTalkingTime+"'", yesterday)
print(sql) # Testing SQL Syntax
cursor.execute(sql)
dbconn.commit()
cursor.close()
dbconn.close()
Using above code I can insert my data into DB, but I also want to save the SQL injection problem. So I have done some research and change my code, but still can not successful.
Python best practice and securest to connect to MySQL and execute queries
How can I escape the input to a MySQL db in Python3?
How to use variables in SQL statement in Python?
Python MySQL Parameterized Queries
Now, I known if I want to avoid SQL injection, I can not use % to get my variable values, I have to use , to get values.
But, I find out that using , seems the values will become str that make my %d will failed.
My DB Design is like:
Picture
Is there anyone who can give me some advice or direction?
Thank you for your help!
Update 1:
if I use reference 4.
sql = """
INSERT INTO test (extension_number, username, inbound_answered, inbound_unanswered, outbound_answered, outbound_unanswered, total_answered, total_unanswered,
total_talking_time, createtime)
VALUES (%d, %s, %d, %d, %d, %d, %d, %d, %s, %s)
""", (int(extensionCode), userName, int(inboundAnswered), int(inboundUnanswered), int(outboundAnswered), int(outboundUnanswered),
int(totalAnswered), int(totalUnanswered), totalTalkingTime, yesterday)
it will shows:
packet = prelude + sql[:packet_size-1]
TypeError: can't concat tuple to bytes
('\n INSERT INTO test (extension_number, username, inbound_answered, inbound_unanswered, \n outbound_answered, outbound_unanswered, total_answered, total_unanswered, \n total_talking_time, createtime)\n VALUES (%d, %s, %d, %d, %d, %d, %d, %d, %s, %s)\n ', (100, 'MeetingRoom', 0, 0, 0, 0, 0, 0, '00:00:00', '20180423'))
Process finished with exit code 1
Update 2:
I tried another way,
sql = "INSERT INTO test (extension_number, username, inbound_answered, inbound_unanswered, " \
"outbound_answered, outbound_unanswered, total_answered, total_unanswered, total_talking_time, " \
"createtime) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s)", \
(int(extensionCode), userName, int(inboundAnswered), int(inboundUnanswered),
int(outboundAnswered), int(outboundUnanswered), int(totalAnswered),
int(totalUnanswered), totalTalkingTime, yesterday)
cursor.execute(sql)
but, still not working
packet = prelude + sql[:packet_size-1]
TypeError: can't concat tuple to bytes
Update 3:
Finally, I find out the way,
sql = "INSERT INTO test (extension_number, username, inbound_answered, " \
"inbound_unanswered, outbound_answered, outbound_unanswered, " \
"total_answered, total_unanswered, total_talking_time, createtime) " \
"VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s)"
data = (extensionCode, userName, inboundAnswered, inboundUnanswered,
outboundAnswered, outboundUnanswered, totalAnswered,
totalUnanswered, totalTalkingTime, yesterday)
cursor.execute(sql, data)
So, It seems like if I want to use variable in cursor.execute(), I have to separate the sql syntax and value.
If I want to use sql syntax and value in one line, I have to use cursor.execute(sql syntax and value) directly and double quotes or triple quotes are both fine.
such as:
cursor.execute("""INSERT INTO test (extension_number, username, inbound_answered, inbound_unanswered,
outbound_answered, outbound_unanswered, total_answered, total_unanswered, total_talking_time,
createtime) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s)""",
(extensionCode, userName, inboundAnswered, inboundUnanswered, outboundAnswered, outboundUnanswered, totalAnswered, totalUnanswered, totalTalkingTime, yesterday))
separate sql syntax and values or just put them all together in cursor.execute which one is more security?
Thank you for your advice, let me find the right direction!

Why is my data insertion in my cassandra database so slow?

This is my query if the current data ID is present or absent in the Cassandra database
row = session.execute("SELECT * FROM articles where id = %s", [id])
Resolved messages in Kafka, then determine whether or not this message exists in the cassandra database if it does not exist, then it should perform an insert operation, if it does exist, it should not be inserted in the data.
messages = consumer.get_messages(count=25)
if len(messages) == 0:
print 'IDLE'
sleep(1)
continue
for message in messages:
try:
message = json.loads(message.message.value)
data = message['data']
if data:
for article in data:
source = article['source']
id = article['id']
title = article['title']
thumbnail = article['thumbnail']
#url = article['url']
text = article['text']
print article['created_at'],type(article['created_at'])
created_at = parse(article['created_at'])
last_crawled = article['last_crawled']
channel = article['channel']#userid
category = article['category']
#scheduled_for = created_at.replace(minute=created_at.minute + 5, second=0, microsecond=0)
scheduled_for=(datetime.utcnow() + timedelta(minutes=5)).replace(second=0, microsecond=0)
row = session.execute("SELECT * FROM articles where id = %s", [id])
if len(list(row))==0:
#id parse base62
ids = [id[0:2],id[2:9],id[9:16]]
idstr=''
for argv in ids:
num = int(argv)
idstr=idstr+encode(num)
url='http://weibo.com/%s/%s?type=comment' % (channel,idstr)
session.execute("INSERT INTO articles(source, id, title,thumbnail, url, text, created_at, last_crawled,channel,category) VALUES (%s,%s, %s, %s, %s, %s, %s, %s, %s, %s)", (source, id, title,thumbnail, url, text, created_at, scheduled_for,channel,category))
session.execute("INSERT INTO schedules(source,type,scheduled_for,id) VALUES (%s, %s, %s,%s) USING TTL 86400", (source,'article', scheduled_for, id))
log.info('%s %s %s %s %s %s %s %s %s %s' % (source, id, title,thumbnail, url, text, created_at, scheduled_for,channel,category))
except Exception, e:
log.exception(e)
#log.info('error %s %s' % (message['url'],body))
print e
continue
Edit:
I have one ID which only has one unique table row, which I want to be like this. As soon as I add different scheduled_for times for the unique ID my system crashes. Add this if len(list(row))==0: is the right thought but my system is very slow after that.
This is my table description:
DROP TABLE IF EXISTS schedules;
CREATE TABLE schedules (
source text,
type text,
scheduled_for timestamp,
id text,
PRIMARY KEY (source, type, scheduled_for, id)
);
This scheduled_for is changeable. Here is also a concrete example
Hao article 2016-01-12 02:09:00+0800 3930462206848285
Hao article 2016-01-12 03:09:00+0801 3930462206848285
Hao article 2016-01-12 04:09:00+0802 3930462206848285
Hao article 2016-01-12 05:09:00+0803 3930462206848285
Thanks for your replies!
Why don't you use insert if not exists ?
https://docs.datastax.com/en/cql/3.1/cql/cql_reference/insert_r.html

Resources