So I am web scraping from a news site for certain articles. And I am using psycopg2 to connect to postgres database and save data from this article.
with conn.cursor() as cur:
query = """INSERT INTO
articles (title, article_body, author, author_title, source_date, "createdAt", "updatedAt")
VALUES (%s, %s, %s, %s, %s, %s, %s);"""
cur.execute(query, (articleTitle, parsedText, articleAuthor, articleAuthorTitle, articlePostDate, now, now))
cur.execute('SELECT author FROM articles')
rows = cur.fetchall()
print ('')
print (rows)
print ('')
The thing is that when second query is executed, it returns the data from the articles table, but when I make a query through terminal psql it shows that articles table is empty.
Try this. Hope it helps.
with conn.cursor() as cur:
query = """INSERT INTO
articles (title, article_body, author, author_title, source_date, "createdAt", "updatedAt")
VALUES (%s, %s, %s, %s, %s, %s, %s);"""
cur.execute(query, (articleTitle, parsedText, articleAuthor, articleAuthorTitle, articlePostDate, now, now))
conn.commit()
cur.execute('SELECT author FROM articles')
rows = cur.fetchall()
print ('')
print (rows)
print ('')
My ENV is:
MySQL(mariadb) DB Version is 5.5.56
Python3 Version is 3.6
Situation:
I have a telephone statistics CSV file that will generate everyday, and I need to insert those data in my MYSQL DB.
Type: Extension Statistic Report,,,,,,,,
From 2018/4/17 上午 12:00:00 To 2018/4/18 上午 12:00:00
Agent Extension: Any number
,,,,,,,,
Agent Extension,,Inbound,,Outbound,,Total,,Total Talking time
,, Answered,Unanswered,Answered,Unanswered,Answered,Unanswered,
100 MeetingRoom,,0,0,0,0,0,0,00:00:00
101 Build,,0,0,0,0,0,0,00:00:00
102 Lead,,0,0,2.00,1.00,2.00,1.00,01:36:09
103 Discover,,0,0,0,0,0,0,00:00:00
105 Fatto,,1.00,0,28.00,9.00,29.00,9.00,01:07:27
106 Meditare,,0,0,0,0,0,0,00:00:00
Total:,,122.00,41.00,152.00,49.00,274.00,90.00,10h 43m 17s
This is my Code:
import csv, sys, os
import pymysql
from datetime import datetime, timedelta
# DB Config
dbconn = pymysql.connect(host='192.168.X.X',
port=3306,
user='root',
passwd='********',
db='test',
charset='utf8')
cursor = dbconn.cursor()
# Get today's date.
def get_date(d):
toDay = timedelta(days=d)
yesDay = datetime.now() + toDay
return yesDay.strftime("%Y%m%d")
# Get today's str value.
yesterday = get_date(-1)
beforeyesterday = get_date(-2)
with open("/Users/fiona/Downloads/statistics_1704_v1nNHbvGjnIQ2mVwsMLr.csv") as file:
readCSV = csv.reader(file)
extensionCodes = [] # Store extension Number
usersName = [] # Store User Name
inboundsAnswered = [] # Store Inbound Answered
inboundsUnanswered = [] # Store Inbound Unanswered
outboundsAnswered = [] # Store Outbound Answered
outboundsUnanswered = [] # Store Outbound Unanswered
totalsAnswered = [] # Store Total Answered
totalsUnanswered = [] # Store Total Unanswered
totalsTalkingTime = [] # Store Total Talking time
for index, rows in enumerate(readCSV):
if index not in range(0, 7) and rows[0] != "":
if str(rows[0])[:3] != "Tot":
extensionCode = str(rows[0])[:3] # Store every rows extension number
elif str(rows[0])[:5] == "Total":
break
userName = rows[0] # Store every rows name
inboundAnswered = float(rows[2])
inboundUnanswered = float(rows[3])
outboundAnswered = float(rows[4])
outboundUnanswered = float(rows[5])
totalAnswered = float(rows[6])
totalUnanswered = float(rows[7])
totalTalkingTime = rows[8]
sql = """
INSERT INTO
test (extension_number, username, inbound_answered, inbound_unanswered,
outbound_answered, outbound_unanswered, total_answered, total_unanswered,
total_talking_time, createtime)
VALUES
(%d, %s, %d, %d, %d, %d, %d, %d, %s, %s);
""" % (int(extensionCode), "'"+userName+"'", int(inboundAnswered), int(inboundUnanswered),
int(outboundAnswered), int(outboundUnanswered), int(totalAnswered),
int(totalUnanswered), "'"+totalTalkingTime+"'", yesterday)
print(sql) # Testing SQL Syntax
cursor.execute(sql)
dbconn.commit()
cursor.close()
dbconn.close()
Using above code I can insert my data into DB, but I also want to save the SQL injection problem. So I have done some research and change my code, but still can not successful.
Python best practice and securest to connect to MySQL and execute queries
How can I escape the input to a MySQL db in Python3?
How to use variables in SQL statement in Python?
Python MySQL Parameterized Queries
Now, I known if I want to avoid SQL injection, I can not use % to get my variable values, I have to use , to get values.
But, I find out that using , seems the values will become str that make my %d will failed.
My DB Design is like:
Picture
Is there anyone who can give me some advice or direction?
Thank you for your help!
Update 1:
if I use reference 4.
sql = """
INSERT INTO test (extension_number, username, inbound_answered, inbound_unanswered, outbound_answered, outbound_unanswered, total_answered, total_unanswered,
total_talking_time, createtime)
VALUES (%d, %s, %d, %d, %d, %d, %d, %d, %s, %s)
""", (int(extensionCode), userName, int(inboundAnswered), int(inboundUnanswered), int(outboundAnswered), int(outboundUnanswered),
int(totalAnswered), int(totalUnanswered), totalTalkingTime, yesterday)
it will shows:
packet = prelude + sql[:packet_size-1]
TypeError: can't concat tuple to bytes
('\n INSERT INTO test (extension_number, username, inbound_answered, inbound_unanswered, \n outbound_answered, outbound_unanswered, total_answered, total_unanswered, \n total_talking_time, createtime)\n VALUES (%d, %s, %d, %d, %d, %d, %d, %d, %s, %s)\n ', (100, 'MeetingRoom', 0, 0, 0, 0, 0, 0, '00:00:00', '20180423'))
Process finished with exit code 1
Update 2:
I tried another way,
sql = "INSERT INTO test (extension_number, username, inbound_answered, inbound_unanswered, " \
"outbound_answered, outbound_unanswered, total_answered, total_unanswered, total_talking_time, " \
"createtime) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s)", \
(int(extensionCode), userName, int(inboundAnswered), int(inboundUnanswered),
int(outboundAnswered), int(outboundUnanswered), int(totalAnswered),
int(totalUnanswered), totalTalkingTime, yesterday)
cursor.execute(sql)
but, still not working
packet = prelude + sql[:packet_size-1]
TypeError: can't concat tuple to bytes
Update 3:
Finally, I find out the way,
sql = "INSERT INTO test (extension_number, username, inbound_answered, " \
"inbound_unanswered, outbound_answered, outbound_unanswered, " \
"total_answered, total_unanswered, total_talking_time, createtime) " \
"VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s)"
data = (extensionCode, userName, inboundAnswered, inboundUnanswered,
outboundAnswered, outboundUnanswered, totalAnswered,
totalUnanswered, totalTalkingTime, yesterday)
cursor.execute(sql, data)
So, It seems like if I want to use variable in cursor.execute(), I have to separate the sql syntax and value.
If I want to use sql syntax and value in one line, I have to use cursor.execute(sql syntax and value) directly and double quotes or triple quotes are both fine.
such as:
cursor.execute("""INSERT INTO test (extension_number, username, inbound_answered, inbound_unanswered,
outbound_answered, outbound_unanswered, total_answered, total_unanswered, total_talking_time,
createtime) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s)""",
(extensionCode, userName, inboundAnswered, inboundUnanswered, outboundAnswered, outboundUnanswered, totalAnswered, totalUnanswered, totalTalkingTime, yesterday))
separate sql syntax and values or just put them all together in cursor.execute which one is more security?
Thank you for your advice, let me find the right direction!
This is my query if the current data ID is present or absent in the Cassandra database
row = session.execute("SELECT * FROM articles where id = %s", [id])
Resolved messages in Kafka, then determine whether or not this message exists in the cassandra database if it does not exist, then it should perform an insert operation, if it does exist, it should not be inserted in the data.
messages = consumer.get_messages(count=25)
if len(messages) == 0:
print 'IDLE'
sleep(1)
continue
for message in messages:
try:
message = json.loads(message.message.value)
data = message['data']
if data:
for article in data:
source = article['source']
id = article['id']
title = article['title']
thumbnail = article['thumbnail']
#url = article['url']
text = article['text']
print article['created_at'],type(article['created_at'])
created_at = parse(article['created_at'])
last_crawled = article['last_crawled']
channel = article['channel']#userid
category = article['category']
#scheduled_for = created_at.replace(minute=created_at.minute + 5, second=0, microsecond=0)
scheduled_for=(datetime.utcnow() + timedelta(minutes=5)).replace(second=0, microsecond=0)
row = session.execute("SELECT * FROM articles where id = %s", [id])
if len(list(row))==0:
#id parse base62
ids = [id[0:2],id[2:9],id[9:16]]
idstr=''
for argv in ids:
num = int(argv)
idstr=idstr+encode(num)
url='http://weibo.com/%s/%s?type=comment' % (channel,idstr)
session.execute("INSERT INTO articles(source, id, title,thumbnail, url, text, created_at, last_crawled,channel,category) VALUES (%s,%s, %s, %s, %s, %s, %s, %s, %s, %s)", (source, id, title,thumbnail, url, text, created_at, scheduled_for,channel,category))
session.execute("INSERT INTO schedules(source,type,scheduled_for,id) VALUES (%s, %s, %s,%s) USING TTL 86400", (source,'article', scheduled_for, id))
log.info('%s %s %s %s %s %s %s %s %s %s' % (source, id, title,thumbnail, url, text, created_at, scheduled_for,channel,category))
except Exception, e:
log.exception(e)
#log.info('error %s %s' % (message['url'],body))
print e
continue
Edit:
I have one ID which only has one unique table row, which I want to be like this. As soon as I add different scheduled_for times for the unique ID my system crashes. Add this if len(list(row))==0: is the right thought but my system is very slow after that.
This is my table description:
DROP TABLE IF EXISTS schedules;
CREATE TABLE schedules (
source text,
type text,
scheduled_for timestamp,
id text,
PRIMARY KEY (source, type, scheduled_for, id)
);
This scheduled_for is changeable. Here is also a concrete example
Hao article 2016-01-12 02:09:00+0800 3930462206848285
Hao article 2016-01-12 03:09:00+0801 3930462206848285
Hao article 2016-01-12 04:09:00+0802 3930462206848285
Hao article 2016-01-12 05:09:00+0803 3930462206848285
Thanks for your replies!
Why don't you use insert if not exists ?
https://docs.datastax.com/en/cql/3.1/cql/cql_reference/insert_r.html