inserting into jsonb type column - python-3.x

import psycopg2, json, requests, hidden
# Load secrets
secrets = hidden.secrets()
conn = psycopg2.connect(host=secrets['host'],port=secrets['port'],
....,connect_timeout=3)
cur = conn.cursor()
defaulturl = 'https://pokeapi.co/api/v2/pokemon?limit=100&offset=0'
sql = '''
CREATE TABLE IF NOT EXISTS pokeapi
(id INTEGER, body JSONB);
'''
print(sql)
cur.execute(sql)
response = requests.get(defaulturl)
js = json.loads(response.text)
# js is a library and i'm interested in the values of 'results' key.
results = js['results']
# 'results' is a list of libraries and i want to loop through each element of the list
# and extract the value of 'url' key
# I NEED TO INSERT EACH VALUE INTO pokeapi (body), note that 'body' is of type JSONB
for x in range(len(results)):
body = requests.get(results[x]['url'])
js_body = json.loads(body.text)
sql = f"INSERT INTO pokeapi (body) VALUES ('{js_body}')::JSONB";
cur.execute(sql, (defaulturl))
print('Closing database connection...')
conn.commit()
cur.close()
This script keeps throwing back an error:
CREATE TABLE IF NOT EXISTS pokeapi (id INTEGER, body text); Traceback
(most recent call last): File "pokeapi.py", line 45, in
cur.execute(sql, (defaulturl)) psycopg2.errors.SyntaxError: syntax error at or near "{" LINE 1: INSERT INTO pokeapi (body) VALUES
{'abilities': [{'ability':...
I have tried to insert into pokeapi (body) without casting to jsonb but I keep getting the same error back. Is there a fundamental that I'm missing?

You should pass the JSON string normally and not parse it, and without quotes and casting:
js_body = body.text
sql = "INSERT INTO pokeapi (body) VALUES (%s)";
cur.execute(sql, [js_body])
IMPORTANT: DO NOT USE format on random internet data! Always use
psycopg2's built-in parameter handling. It will correctly handle SQL
injection risks for you.
Currently you aren't using defaulturl, if you want to insert it then you need a column to insert it into. Also, you need to make the id auto increment:
sql = '''
CREATE TABLE IF NOT EXISTS pokeapi
("id" int8 NOT NULL GENERATED BY DEFAULT AS IDENTITY, body JSONB);
'''
If not you will have to supply an id with the body.
Finally, you should generally avoid trying to execute once every loop. If you have the memory for it, you should just loop over the payloads then use execute_values(): https://www.psycopg.org/docs/extras.html
rows = list()
for result in results:
response = requests.get(result['url'])
rows.append([response.text])
sql = "INSERT INTO pokeapi (body) VALUES %s";
sql_template = "(%s)"
execute_values(cur, sql, rows, sql_template)
(Also, for future reference, the requests library has a .json() method on the responses which can load the json string into python primitives for you. That said you don't need to parse the json in this case. https://docs.python-requests.org/en/master/user/quickstart/#json-response-content)

Here is the solution I ended coming up with. What I learnt was the need to understand deserialization of 'response' to proper python dictionary and then serialization of python dictionary before casting it to JSONB type.
import psycopg2, json, requests, hidden
# Load secrets
secrets = hidden.secrets()
conn = psycopg2.connect(host=secrets['host'],port=secrets['port'],
....,connect_timeout=3)
cur = conn.cursor()
defaulturl = 'https://pokeapi.co/api/v2/pokemon?limit=100&offset=0'
sql = '''
CREATE TABLE IF NOT EXISTS pokeapi
(id SERIAL, body JSONB); # <== CREATING id OF SERIAL TYPE HELPS AUTO-
# GENERATE ids of INTEGER TYPE.
'''
print(sql)
cur.execute(sql)
response = requests.get(defaulturl)
js = response.json() # <== THIS IS ONE OF THE CORRECTIONS, I NEEDED TO DE-
# SERIALIZE THE RESPONSE SO THAT IT'S A PROPER
# PYTHON DICTIONERY
# js is a library and i'm interested in the values of 'results' key.
results = js['results']
# 'results' is a list of libraries and i want to loop through each element of the list
# and extract the value of 'url' key
# I NEED TO INSERT EACH VALUE INTO pokeapi (body), note that 'body' is of type JSONB
for x in range(len(results)):
body = requests.get(results[x]['url'])
js_body = json.dumps(body) # <== 2ND MAJOR CORRECTION, I HAVE TO
# SERIALIZE THE PYTHON DICTIONERY/LIST
# TO BE ABLE TO CAST IT TO JSONB BELLOW
sql = f"INSERT INTO pokeapi (body) VALUES ('{js_body}'::JSONB)";
cur.execute(sql, (defaulturl))
print('Closing database connection...')
conn.commit()
cur.close()

Related

How to add a list values from JSON to sqlite column

I receive a json message thru rabbitMQ and I want to store it to the database.
Message looks like this:
message ={
"ticket": "ticket_12334345",
"items" : ["item1","item2","item3"],
"prices" : [10,20,15],
"full price" : [45]
}
Storing to database looks like this:
def callback(ch, method, properties, body):
print("%r" % body)
body = json.loads(body)
conn = sqlite3.connect('pythonDB.db')
c = conn.cursor()
c.execute('CREATE TABLE IF NOT EXISTS Table_3 (ticket TEXT, items TEXT, prices INTEGER,'
'FullPrice INTEGER)')
c.execute("INSERT INTO Table_3 VALUES(?,?,?,?)", (body["ticket"],
body["items"], body["prices"], body["full price"],
))
conn.commit()
I get an error sqlite3.InterfaceError: Error binding parameter 0 - probably unsupported type.
I assume it's because I'm not storing list data correctly. I want in the column all the values from the list in one cell.
Use str() function, like this:
def callback(ch, method, properties, body):
print("%r" % body)
body = json.loads(body)
conn = sqlite3.connect('pythonDB.db')
c = conn.cursor()
c.execute('CREATE TABLE IF NOT EXISTS Table_3 (ticket TEXT, items TEXT, prices INTEGER,'
'FullPrice INTEGER)')
c.execute("INSERT INTO Table_3 VALUES(?,?,?,?)", (body["ticket"],
str(body["items"]), str(body["prices"]), str(body["full price"][0]),
))
conn.commit()

how to avoid duplication in BigQuery by streaming insert

I made a function that inserts .CSV data into BigQuery in every 5~6 seconds. I've been looking for the way to avoid duplicating the data in BigQuery after inserting. I want to remove data that has same luid but I have no idea how to remove it so is it possible to check each data of .CSV has already existed in BigQuery table before inserting .
I put row_ids parameter to avoid duplicate luid but it seems not to work well .
Could you give me any idea ?? Thanks.
def stream_upload():
# BigQuery
client = bigquery.Client()
project_id = 'test'
dataset_name = 'test'
table_name = "test"
full_table_name = dataset_name + '.' + table_name
json_rows = []
with open('./test.csv','r') as f:
for line in csv.DictReader(f):
del line[None]
line_json = dict(line)
json_rows.append(line_json)
errors = client.insert_rows_json(
full_table_name,json_rows,row_ids=[row['luid'] for row in json_rows]
)
if errors == []:
print("New rows have been added.")
else:
print("Encountered errors while inserting rows: {}".format(errors))
print("end")
schedule.every(0.5).seconds.do(stream_upload)
while True:
schedule.run_pending()
time.sleep(0.1)
BigQuery doesn't have a native way to deal with this. You could either create a view off of this table that performs deduping or create an external cache of luids and lookup if they have already been written to BigQuery before writing and update the cache after writing new data. This could be as simple as a file cache or you could use an additional database.

How to convert sql query to list?

I am trying to convert my sql query output into a list to look a certain way.
Here is my code:
def get_sf_metadata():
import sqlite3
#Tables I want to be dynamically created
table_names=['AcceptedEventRelation','Asset', 'Book']
#SQLIte Connection
conn = sqlite3.connect('aaa_test.db')
c = conn.cursor()
#select the metadata table records
c.execute("select name, type from sf_field_metadata1 limit 10 ")
print(list(c))
get_sf_metadata()
Here is my output:
[('Id', 'id'), ('RelationId', 'reference'), ('EventId', 'reference')]
Is there any way to make the output looks like this:
[Id id, RelationId reference, EventId reference]
You can try
print(["{} {}".format(i[0], i[1]) for i in list(c)])
That will print you
['Id id', 'RelationId reference', 'EventId reference']

Need help fetching data from a column

Sorry for this but I'm real new to sqlite: i've created a database from an excel sheet I had, and I can't seem to fetch the values of the column I need
query = """ SELECT GNCR from table"""
cur.execute(query)
This actually works, but
query = """ SELECT ? from table"""
cur.execute(query, my_tuple)
doesn't
Here's my code:
def print_col(to_print):
db = sqlite3.connect('my_database.db')
cur = db.cursor()
query = " SELECT ? FROM my_table "
cur.execute(query, to_print)
results = cur.fetchall()
print(results)
print_col(('GNCR',))
The result is:
[('GNCR',), ('GNCR',), ('GNCR',), ('GNCR',), [...]]
instead of the actual values
What's the problem ? I can't figure it out
the "?" character in query is used for parameter substitution. Sqlite will escape the parameter you passed and replace "?" with the send text. So in effect you query after parameter substitution will be SELECT 'GNCR' FROM my_table where GNCR will be treated as text so you will get the text for each row returned by you query instead of the value of that column.
Basically you should use the query parameter where you want to substitute the parameter with escaped string like in where clause. You can't use it for column name.

An issue with inserting blob data into SQL tables

I'm trying to create a code piece that inserts an object I've created to store data in a very specific way into an SQL table as a blob type, and it keeps giving me an ' sqlite3.InterfaceError: Error binding parameter 1 - probably unsupported type.' error.
Has any of you encountered something similar before? Do you have any ideas how to deal with it?
conn = sqlite3.connect('my_database.db')
c = conn.cursor()
params = (self.question_id, i) #i is the object in question
c.execute('''
INSERT INTO '''+self.current_test_name+''' VALUES (?, ?)
''',params)
conn.commit()
conn.close()
For starters, this would be a more appropriate execute statement as it is way cleaner:
c.execute("INSERT INTO "+self.current_test_name+" VALUES (?, ?)", (self.question_id, i))
You are also missing the table you are inserting into (or the columns if self.current_test_name is the table name.)
Also, Is the column in the database setup to handle the data type for the provided input for self.question_id and i? (Not expecting TEXT when you provided INT?)
Example of a working script to insert into a table that has 2 columns named test and test2:
import sqlite3
conn = sqlite3.connect('my_database.db')
c = conn.cursor()
c.execute("CREATE TABLE IF NOT EXISTS test(test INT, test2 INT)")
conn.commit()
for i in range(10):
params = (i, i) # i is the object in question
c.execute("INSERT INTO test (test, test2) VALUES (?, ?)", params)
conn.commit()
conn.close()

Resources