Update Sqlite w/ Python: InterfaceError: Error binding parameter 0 and None Type is not subscriptable - python-3.x

I've scraped some websites and stored the html info in a sqlite database. Now, I want to extract and store the email addresses. I'm able to successfully extract and print the id and emails. But, I keep getting TypeError: "'NoneType' object is not subscriptable" and "sqlite3.InterfaceError: Error binding parameter 0 - probably unsupported type" when I try to update the database with these new email addresses.
I've verified that the data types I'm using in the update statement are the same as my database (id is class int and email is str). I've googled a bunch of different examples and mucked around with the syntax alot.
I also tried removing the Where Clause in the update statement but got the same errors.
import sqlite3
import re
conn = sqlite3.connect('spider.sqlite')
cur = conn.cursor()
x = cur.execute('SELECT id, html FROM Pages WHERE html is NOT NULL and email is NULL ORDER BY RANDOM()').fetchone()
#print(x)#for testing purposes
for row in x:
row = cur.fetchone()
id = row[0]
html = row[1]
email = re.findall(b'[a-z0-9\.\-+_]+#[a-z0-9\.\-+_]+\.[a-z]+', html)
#print(email)#testing purposes
if not email:
email = 'no email found'
print(id, email)
cur.execute('''UPDATE pages SET email = ? WHERE id = ? ''', (email, id))
conn.commit
I want the update statement to update the database with the extracted email addresses for the appropriate row.

There are a few things going on here.
First off, you don't want to do this:
for row in x:
row = cur.fetchone()
If you want to iterate over the results returned by the query, you should consider something like this:
for row in cur.fetchall():
id = row[0]
html = row[1]
# ...
To understand the rest of the errors you are seeing, let's take a look at them step by step.
TypeError: "'NoneType' object is not subscriptable":
This is likely generated here:
row = cur.fetchone()
id = row[0]
Cursor.fetchone returns None if the executed query doesn't match any rows or if there are no rows left in the result set. The next line, then, is trying to do None[0] which would raise the error in question.
sqlite3.InterfaceError: Error binding parameter 0 - probably unsupported type:
re.findall returns a list of non-overlapping matches, not an individual match. There's no support for binding a Python list to a sqlite3 text column type. To fix this, you'll need to get the first element from the matched list (if it exists) and then pass that as your email parameter in the UPDATE.

.findall() returns a list.
You want to iterate over that list:
for email in re.findall(..., str(html)):
print(id, email)
cur.execute(...)
Not sure what's going on with that b'[a-z...' expression.
Recommend you use a raw string instead: r'[a-z...'.
It handles regex \ backwhacks nicely.

Related

Elastic search parse as object but found nested values

I'm currently working on a project where I have a data stored from previous processing in a csv, and I'd like to give a try to ElasticSearch + Kibana to analyse my data*. The problem is I have a column with json values and some None values that I send with nested type. To clean the None I repalced it by 'null' but I get the following error:
Tried to parse field as object but found a concrete value
I think ES doesn't like fields which may have a 'NULL' or nested type in for a field. How may I solve this and keep the principle of a null value to allow filtering later? Thanks for the help :)
I'm using python and eland module which deals with sending pandas dataframe to ES.
ES version:
'version': {'number': '7.7.0',
'build_flavor': 'default',
'build_type': 'deb',
'build_hash': '81a1e9eda8e6183f5237786246f6dced26a10eaf',
'build_date': '2020-05-12T02:01:37.602180Z',
'build_snapshot': False,
'lucene_version': '8.5.1',
'minimum_wire_compatibility_version': '6.8.0',
'minimum_index_compatibility_version': '6.0.0-beta1'},
'tagline': 'You Know, for Search'}
EDIT
I'm sending my data using the code extract below (python3), which is now working thanks to #Gibbs' answer
INDEX_NAME = 'my_index'
DATA_PATH = './data4analysis.csv'
def csv_jsonconverter_todict(field):
if not field:
return {'null_value': 'NULL'}
if "'" in field: # cleaning if bad json column, ok for me
field = field.replace("'", '"')
try:
return json.loads(field)
except Exception as e:
logger.exception('json.loads(field) failed on field= %s', field, exc_info=True)
raise e
def loadNprepare_data(path, sep=';'):
df = pd.read_csv(path, sep=sep, encoding='cp1252',
converters={'ffprobe': csv_jsonconverter_todict)
# cleaning NaNs to avoid " json_parse_exception Non-standard token 'NaN'"
df = df.applymap(lambda cell: 'null_value' if pd.isna(cell) or not cell else cell)
return df
if __name__ == '__main__':
es_client = Elasticsearch(hosts=[ES_HOST], http_compress=True)
if es_client.indices.exists(INDEX_NAME):
logger.info(f"deleting '{INDEX_NAME}' index...")
res = es_client.indices.delete(index=INDEX_NAME)
logger.info(f"response: '{res}'")
# since we are running locally, use one shard and no replicas
request_body = {
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
}
}
logger.info(f"creating '{INDEX_NAME}' index...")
res = es_client.indices.create(index=INDEX_NAME, body=request_body)
logger.info(f" response: '{res}'")
logger.info("Sending data to ES")
data = loadNprepare_data(DATA_PATH)
try:
el_df = eland.pandas_to_eland(data, es_client,
es_dest_index=INDEX_NAME,
es_if_exists='replace',
es_type_overrides= {'ffprobe': 'nested'})
except Exception as e:
logger.error('Elsatic Search error', exc_info=True)
raise e
Problem is that you defined a type for a column. And you are trying to insert string 'null' in that column.
Two different types are not supported. It will accept Null value if you do as mentioned here
A null value cannot be indexed or searched. When a field is set to null, (or an empty array or an array of null values) it is treated as though that field has no values.
The null_value parameter allows you to replace explicit null values with the specified value so that it can be indexed and searched
<disclosure: I'm maintainer of Eland and employed by Elastic>
Are you trying to import the DataFrame into Elasticsearch into an existing index? Otherwise it may be worth it to look at the mapping that Eland created for you and see which field is mapped to a type that you're not expecting. You may need to make certain fields for numerics "nullable" if you're planning on having null values in Elasticsearch.
from elasticsearch import Elasticsearch
es = Elasticsearch(<your cluster info>)
resp = es.indices.get_mapping("<your index>")
print(resp)
If you are able to post your index mapping and an example of the type of CSV rows you're inserting that'd go a long way in me being able to help you too

Trouble with parsing JSON msg

I am calling a QUANDL API for data and getting a JSON msg back, which i am having trouble parsing, before sending to a database. My parsing code is clearly not reading the JSON correctly.
Via the below code, i am getting the following (truncated for simplicity) JSON:
{"datatable":{"data":[["AAPL","MRY","2018-09-29",265595000000],["AAPL","MRY","2017-09-30",229234000000],["AAPL","MRY","2016-09-24",215639000000],["AAPL","MRY","2015-09-26",233715000000],["AAPL","MRY","2014-09-27",182795000000],["AAPL","MRY","2013-09-28",170910000000],["AAPL","MRT","2018-09-29",265595000000],["AAPL","MRT","2018-06-30",255274000000],["AAPL","MRT","2018-03-31",247417000000],["AAPL","MRT","2017-12-30",239176000000],["AAPL","MRT","2017-09-30",229234000000],["AAPL","MRT","2017-07-01",223507000000],["AAPL","MRT","2017-04-01",220457000000],["AAPL","MRT","2016-12-31",218118000000],["AAPL","MRT","2016-09-24",215639000000],["AAPL","MRT","2016-06-25",220288000000],["AAPL","MRT","2016-03-26",227535000000],["AAPL","MRT","2015-12-26",234988000000],["AAPL","MRT","2015-09-26",233715000000],["AAPL","MRT","2015-06-27",224337000000],["AAPL","MRT","2015-03-28",212164000000],["AAPL","MRT","2014-12-27",199800000000],["AAPL","MRT","2014-09-27",182795000000],["AAPL","MRT","2014-06-28",178144000000],["AAPL","MRT","2014-03-29",176035000000],"columns":[{"name":"ticker","type":"String"},{"name":"dimension","type":"String"},{"name":"datekey","type":"Date"},{"name":"revenue","type":"Integer"}]},"meta":{"next_cursor_id":null}}
import quandl, requests
from flask import request
from cs50 import SQL
db = SQL("sqlite:///formula.db")
data =
requests.get(f"https://www.quandl.com/api/v3/datatables/SHARADAR/SF1.json?ticker=AAPL&qopts.columns=ticker,dimension,datekey,revenue&api_key=YOURAPIKEY")
responses = data.json()
print(responses)
for response in responses:
ticker=str(response["ticker"])
dimension=str(response["dimension"])
datekey=str(response["datekey"])
revenue=int(response["revenue"])
db.execute("INSERT INTO new(ticker, dimension, datekey, revenue) VALUES(:ticker, :dimension, :keydate, :revenue)", ticker=ticker, dimension=dimension, datekey=datekey, revenue=revenue)
I'm getting the following error msg (which i have in the past, and successfully addressed it) so strongly believe i am not reading the json correctly:
File "new2.py", line 12, in
ticker=str(response["ticker"])
TypeError: string indices must be integers
I want to be able to loop through the json and be able to isolate specific data to then populate a database.
for your response structure, you have a nested dict object:
datatable
data
list of lists of data
so, this will happen:
responses = data.json()
datatable = responses['datatable'] # will get you the information mapped to the 'datatables' key
datatable_data = datatable['data'] # will get you the list mapped to the 'data' key
Now, datatable_data is a list of lists, right? and lists can only be accessed by index point, not by strings
so, lets say you want the first response.
first_response = datatable_data[0]
that will result in
first_response = ["AAPL","MRY","2018-09-29",265595000000]
which you can now access by index point:
for idx, val in enumerate(first_response):
print(f'{idx}\t{val}')
which will print out
0 AAPL
1 MRY
2 2018-09-29
3 265595000000
so, with all this information, you need to alter your program to ensure you're accessing the data key in the response, and then iterate over the list of lists.
So, something like this:
data = responses['datatable']['data']
for record in data:
ticker, dimension, datekey, revenue = record # unpack list into named variables
db.execute(...)

fetching specific data from database using where clause providing values from user to print the data related to him using python

i'm trying to print the files related to a user using user name and password
i'm entering value in a box as password from user, than trying to fetch its data from database using where clause
#func i'm using to fetch data
def fun():
p=int(password.get())
d=sign.execute("select * from login_details where Name=?",p)
for x in d:
print(x)
#entry box to get the password
password=tk.Entry()
password.pack()
this is the error i'm getting
sqlite3.ProgrammingError: Incorrect number of bindings supplied. The
current statement uses 1, and there are 0 supplied.
Try this out:
def fun():
p=int(password.get())
sign.execute('select * from login_details where Name= ?',(p,))
all_rows = sign.fetchall()
print(all_rows)
#entry box to get the password
password=tk.Entry()
password.pack()
You were missing the column when you passed in the p parameter. Also i wnet along and gave another way of printing all the rows in the table.
Also I noticed that you are trying to a where clause on Name, (which I assume is a string) but you are passing in an integer? Is that a mistake?

using sqlite with python, fetchall()

I am trying to compare a list of links to the liks stored in an sqlite database.
assuming the links in the database are:
link.com\page1
link.com\page2
link.com\page3
I have written the following code to chick if a given link exists in the database and adds it if it did not exist.
links = ['link.com\page2', 'link.com\page4']
c.execute('SELECT link FROM ads')
previouslinks = c.fetchall()
for l in links:
if l not in previouslinks:
c.execute('''INSERT INTO ads(link) VALUES(?)''', (l))
conn.commit()
else:
pass
the problem is even though the link is in the database, the script does not recognise it!
when I try to print previouslinks variable, results look something like this:
[('link.com\page1',), ('link.com\page2',), ('link.com\page3',)]
I think the problem is with the extra parentheses and commas, but I am not exactly sure.
fetchall() returns a list of rows, where each row is a tuple containing all column values. A tuple containing a string is not the same as the string.
You have to extract the values from the rows (and you don't need fetchall() when iterating over a cursor):
previouslinks = [row[0] for row in c]

Can sqlite3 reference primary keys during a for loop?

I have a database where I have imported texts as a primary keys.
I then have columns with keywords that can pertain to the texts, for example column "arson". Each of these columns has a default value of 0.
I am trying to get the SQLite3 database to read the texts, check for the presence of specific keywords, and then assign a 1 value to the keywords column, for the row where the text contained the keyword.
The below example is of me trying to change the values in the arson column only for rows where the text contains the words "Arson".
The program is reading the texts and printing yes 3 times, indicating that three of the texts have the words "Arson" in them. However, I cannot get the individual rows to update with 1's. I have tried a few variations of the code below but seem to be stuck on this one.
!# Python3
#import sqlite3
sqlite_file = 'C:\\Users\\xxxx\\AppData\\Local\\Programs\\Python\\Python35-32\\database.sqlite'
conn = sqlite3.connect(sqlite_file)
c = conn.cursor()
texts = c.execute("SELECT texts FROM database")
for articles in texts:
for words in articles:
try:
if "Arson" in words:
print('yes')
x = articles
c.execute("UPDATE database SET arson = 1 WHERE ID = ?" (x))
except TypeError:
pass
conn.commit()
conn.close()
This expression:
c.execute("UPDATE database SET arson = 1 WHERE ID = ?" (x))
always will raise a TypeError, because you are trying to treat the string as a function. You are basically doing "..."(argument), as if "..." were callable.
You'd need to add some commas for it to be an attempt to pass in x as a SQL parameter:
c.execute("UPDATE database SET arson = 1 WHERE ID = ?", (x,))
The first comma separates the two arguments passed to c.execute(), so now you pass a query string, and a separate sequence of parameters.
The second comma makes (..,) a tuple with one element in it. It is the comma that matters there, although the (...) parentheses are still needed to disambiguate what the comma represents.
You can drop the try...except TypeError altogether. If the code is still raising TypeError exceptions, you still have a bug.
Four hours later I have finally been able to fix this. I added the commas as recommended above; however, this led to other issues, as the code did not execute the entire loop correctly. To do this, I had to add another cursor object and use the second cursor inside my loop. The revised code may be seen below:
!# Python3
import sqlite3
sqlite_file = 'C:\\Users\\xxxx\\AppData\\Local\\Programs\\Python\\Python35-32\\database.sqlite'
conn = sqlite3.connect(sqlite_file)
c = conn.cursor()
c2 = conn.cursor()
atexts = c.execute("SELECT texts FROM database")
for articles in atexts:
for words in articles:
if "arson" in words:
print('yes')
c2.execute("UPDATE database SET arson = 1 WHERE texts = ?", (words,))
conn.commit()
conn.close()

Resources