using sqlite with python, fetchall() - python-3.x

I am trying to compare a list of links to the liks stored in an sqlite database.
assuming the links in the database are:
link.com\page1
link.com\page2
link.com\page3
I have written the following code to chick if a given link exists in the database and adds it if it did not exist.
links = ['link.com\page2', 'link.com\page4']
c.execute('SELECT link FROM ads')
previouslinks = c.fetchall()
for l in links:
if l not in previouslinks:
c.execute('''INSERT INTO ads(link) VALUES(?)''', (l))
conn.commit()
else:
pass
the problem is even though the link is in the database, the script does not recognise it!
when I try to print previouslinks variable, results look something like this:
[('link.com\page1',), ('link.com\page2',), ('link.com\page3',)]
I think the problem is with the extra parentheses and commas, but I am not exactly sure.

fetchall() returns a list of rows, where each row is a tuple containing all column values. A tuple containing a string is not the same as the string.
You have to extract the values from the rows (and you don't need fetchall() when iterating over a cursor):
previouslinks = [row[0] for row in c]

Related

Python3 print selected values of dict

In this simple code to read a tsv file of many columes:
InColnames = ['Chr','Pos','Ref','Alt']
tsvin = csv.DictReader(fin, delimiter='\t')
for row in tsvin:
print(', '.join(row[InColnames]))
How can I make the print work ?
The following will do:
for row in tsvin:
print(', '.join(row[col] for col in InCOlNames))
You cannot pass a list of keys to the dict's item-lookup and magically get a list of values. You have to somehow iterate the keys and retrieve each one's value individually. The approach at hand uses a generator expression for that.

Appending values to dictionary/list

I have a mylist = [[a,b,c,d],...[]] with 650 lists inside. I am trying to insert this into a relational database with dictionaries. I have the following code:
for i in mylist:
if len(i) == 4:
cve_ent = {'state':[], 'muni':[], 'area':[]}
cve_ent['state'].append(i[1])
cve_ent['muni'].append(i[2])
cve_ent['area'].append(i[3])
However this code just yields the last list in mylist in the dictionary. I have tried also with a counter and a while loop but I cannot make it run.
I do not know if this is the fastest way to store the data, what I will do is compare the values of the first and second keys with other tables to multiply the values of the third key.
First of all, pull
cve_ent = {'state':[], 'muni':[], 'area':[]}
out of your for loop. That will solve issues with re-writing things.

Joining tuples within a list without converting them to Str

I'm trying to create the Mysql Insert query like this for inserting million of records:
INSERT INTO mytable (fee, fi) VALUES
('data1',96)
,('data2',33)
,('boot',17)
My values is stored as tuple in the list:
datatuplst = [("data1",96), ("data2", 33),("data3", 17)]
My code:
c3 = con.cursor()
c3.execute("INSERT INTO `bigdataloadin` (`block_key`, `id`) VALUES %s" %','.join(datatuplst))
This is not working and I'm getting error:
TypeError: sequence item 0: expected str instance, tuple found
Need help on how to create the dynamic query with values stored in tuples list.
You can generate the string that you need. The error is explanatory enough. When it is ','.join(datatuplst) the interpreter is forced to join tuples. So using list comprehension you can say this instead:
','.join([str(el) for el in datatuplst])
The output for this statement is going to be: "('data1', 96),('data2', 33),('data3', 17)"
Then your actual INSERT statement will be interpreted as follows:
"INSERT INTO `bigdataloadin` (`block_key`, `id`) VALUES ('data1', 96),
('data2', 33),('data3', 17)"
Good luck!

Can sqlite3 reference primary keys during a for loop?

I have a database where I have imported texts as a primary keys.
I then have columns with keywords that can pertain to the texts, for example column "arson". Each of these columns has a default value of 0.
I am trying to get the SQLite3 database to read the texts, check for the presence of specific keywords, and then assign a 1 value to the keywords column, for the row where the text contained the keyword.
The below example is of me trying to change the values in the arson column only for rows where the text contains the words "Arson".
The program is reading the texts and printing yes 3 times, indicating that three of the texts have the words "Arson" in them. However, I cannot get the individual rows to update with 1's. I have tried a few variations of the code below but seem to be stuck on this one.
!# Python3
#import sqlite3
sqlite_file = 'C:\\Users\\xxxx\\AppData\\Local\\Programs\\Python\\Python35-32\\database.sqlite'
conn = sqlite3.connect(sqlite_file)
c = conn.cursor()
texts = c.execute("SELECT texts FROM database")
for articles in texts:
for words in articles:
try:
if "Arson" in words:
print('yes')
x = articles
c.execute("UPDATE database SET arson = 1 WHERE ID = ?" (x))
except TypeError:
pass
conn.commit()
conn.close()
This expression:
c.execute("UPDATE database SET arson = 1 WHERE ID = ?" (x))
always will raise a TypeError, because you are trying to treat the string as a function. You are basically doing "..."(argument), as if "..." were callable.
You'd need to add some commas for it to be an attempt to pass in x as a SQL parameter:
c.execute("UPDATE database SET arson = 1 WHERE ID = ?", (x,))
The first comma separates the two arguments passed to c.execute(), so now you pass a query string, and a separate sequence of parameters.
The second comma makes (..,) a tuple with one element in it. It is the comma that matters there, although the (...) parentheses are still needed to disambiguate what the comma represents.
You can drop the try...except TypeError altogether. If the code is still raising TypeError exceptions, you still have a bug.
Four hours later I have finally been able to fix this. I added the commas as recommended above; however, this led to other issues, as the code did not execute the entire loop correctly. To do this, I had to add another cursor object and use the second cursor inside my loop. The revised code may be seen below:
!# Python3
import sqlite3
sqlite_file = 'C:\\Users\\xxxx\\AppData\\Local\\Programs\\Python\\Python35-32\\database.sqlite'
conn = sqlite3.connect(sqlite_file)
c = conn.cursor()
c2 = conn.cursor()
atexts = c.execute("SELECT texts FROM database")
for articles in atexts:
for words in articles:
if "arson" in words:
print('yes')
c2.execute("UPDATE database SET arson = 1 WHERE texts = ?", (words,))
conn.commit()
conn.close()

Generators for processing large result sets

I am retrieving information from a sqlite DB that gives me back around 20 million rows that I need to process. This information is then transformed into a dict of lists which I need to use. I am trying to use generators wherever possible.
Can someone please take a look at this code and suggest optimization please? I am either getting a “Killed” message or it takes a really long time to run. The SQL result set part is working fine. I tested the generator code in the Python interpreter and it doesn’t have any problems. I am guessing the problem is with the dict generation.
EDIT/UPDATE FOR CLARITY:
I have 20 million rows in my result set from my sqlite DB. Each row is of the form:
(2786972, 486255.0, 4125992.0, 'AACAGA', '2005’)
I now need to create a dict that is keyed with the fourth element ‘AACAGA’ of the row. The value that the dict will hold is the third element, but it has to hold the values for all the occurences in the result set. So, in our case here, ‘AACAGA’ will hold a list containing multiple values from the sql result set. The problem here is to find tandem repeats in a genome sequence. A tandem repeat is a genome read (‘AACAGA’) that is repeated atleast three times in succession. For me to calculate this, I need all the values in the third index as a list keyed by the genome read, in our case ‘AACAGA’. Once I have the list, I can subtract successive values in the list to see if there are three consecutive matches to the length of the read. This is what I aim to accomplish with the dictionary and lists as values.
#!/usr/bin/python3.3
import sqlite3 as sql
sequence_dict = {}
tandem_repeat = {}
def dict_generator(large_dict):
dkeys = large_dict.keys()
for k in dkeys:
yield(k, large_dict[k])
def create_result_generator():
conn = sql.connect('sequences_mt_test.sqlite', timeout=20)
c = conn.cursor()
try:
conn.row_factory = sql.Row
sql_string = "select * from sequence_info where kmer_length > 2"
c.execute(sql_string)
except sql.Error as error:
print("Error retrieving information from the database : ", error.args[0])
result_set = c.fetchall()
if result_set:
conn.close()
return(row for row in result_set)
def find_longest_tandem_repeat():
sortList = []
for entry in create_result_generator():
sequence_dict.setdefault(entry[3], []).append(entry[2])
for key,value in dict_generator(sequence_dict):
sortList = sorted(value)
for i in range (0, (len(sortList)-1)):
if((sortList[i+1]-sortList[i]) == (sortList[i+2]-sortList[i+1])
== (sortList[i+3]-sortList[i+2]) == (len(key))):
tandem_repeat[key] = True
break
print(max(k for k, v in tandem_repeat.items() if v))
if __name__ == "__main__":
find_longest_tandem_repeat()
I got some help with this on codereview as #hivert suggested. Thanks. This is much better solved in SQL rather than just code. I was new to SQL and hence could not write complex queries. Someone helped me out with that.
SELECT *
FROM sequence_info AS middle
JOIN sequence_info AS preceding
ON preceding.sequence_info = middle.sequence_info
AND preceding.sequence_offset = middle.sequence_offset -
length(middle.sequence_info)
JOIN sequence_info AS following
ON following.sequence_info = middle.sequence_info
AND following.sequence_offset = middle.sequence_offset +
length(middle.sequence_info)
WHERE middle.kmer_length > 2
ORDER BY length(middle.sequence_info) DESC, middle.sequence_info,
middle.sequence_offset;
Hope this helps someone with around the same idea. Here is a link to the thread on codereview.stackexchange.com

Resources