creating python pop function for sqlite3 - python-3.x

I'm trying to create a pop function getting a row of data from a sqlite database and deleting that same row. I would like to not have to create an ID column so I am using ROWID. I want to always get the first row and return it. This is the code I have:
import sqlite3
db = sqlite3.connect("Test.db")
c=db.cursor()
def sqlpop():
c.execute("SELECT * from DATA WHERE ROWID=1")
data = c.fetchall()
c.execute("DELETE from DATA WHERE ROWID=1")
db.commit()
return(data)
when I call the function it gets the first item correctly, but after the first call the function returns nothing. like this:
>>> sqlpop()
[(1603216325, 'placeholder IP line 124', 'placeholder Device line 124', '1,2,0', 1528, 1564)]
>>> sqlpop()
[]
>>> sqlpop()
[]
>>> sqlpop()
[]
what do I need to change for this function to work correctly?
update:
using what Schwern said I got the funtion to work:
def sqlpop():
c.execute("SELECT * from DATA ORDER BY ROWID LIMIT 1")
data = c.fetchone()
c.execute("DELETE from DATA ORDER BY ROWID LIMIT 1")
db.commit()
return data

rowid is not the row order, it is a unique identifier for the row created by SQLite unless you say otherwise.
SQL rows have no inherent order. You could grab just one row...
select * from table limit 1;
But you'll get them in no guaranteed order. And without a rowid you have no way to identify it again to delete it.
If you want to get the "first" row you must define what "first" means. To do that you need something to order by. For example, a timestamp. Or perhaps an auto-incrementing integer. You cannot use rowid, rowids are not guaranteed to be assigned in any particular order.
select *
from table
where created_at = max(created_at)
limit 1
So long as created_at is indexed, that should work fine. Then delete by its rowid.
You also don't need to use fetchall to fetch one row, use fetchone. In general, fetchall should be avoided as it risks consuming all your memory by slurping all the data in at once. Instead, use iterators.
for row in c.execute(...)

Related

mariadb python - executemany using SELECT

Im trying to input many rows to a table in a mariaDB.
For doing this i want to use executemany() to increase speed.
The inserted row is dependent on another table, which is found with SELECT.
I have found statements that SELECT doent work in a executemany().
Are there other ways to sole this problem?
import mariadb
connection = mariadb.connect(host=HOST,port=PORT,user=USER,password=PASSWORD,database=DATABASE)
cursor = connection.cursor()
query="""INSERT INTO [db].[table1] ([col1], [col2] ,[col3])
VALUES ((SELECT [colX] from [db].[table2] WHERE [colY]=? and
[colZ]=(SELECT [colM] from [db].[table3] WHERE [colN]=?)),?,?)
ON DUPLICATE KEY UPDATE
[col2]= ?,
[col3] =?;"""
values=[input_tuplets]
When running the code i get the same value for [col1] (the SELECT-statement) which corresponds to the values from the from the first tuplet.
If SELECT doent work in a executemany() are there another workaround for what im trying to do?
Thx alot!
I think that reading out the tables needed,
doing the search in python,
use exeutemany() to insert all data.
It will require 2 more queries (to read to tables) but will be OK when it comes to calculation time.
Thanks for your first question on stackoverflow which identified a bug in MariaDB Server.
Here is a simple script to reproduce the problem:
CREATE TABLE t1 (a int);
CREATE TABLE t2 LIKE t1;
INSERT INTO t2 VALUES (1),(2);
Python:
>>> cursor.executemany("INSERT INTO t1 VALUES \
(SELECT a FROM t2 WHERE a=?))", [(1,),(2,)])
>>> cursor.execute("SELECT a FROM t1")
>>> cursor.fetchall()
[(1,), (1,)]
I have filed an issue in MariaDB Bug tracking system.
As a workaround, I would suggest reading the country table once into an array (according to Wikipedia there are 195 different countries) and use these values instead of a subquery.
e.g.
countries= {}
cursor.execute("SELECT country, id FROM countries")
for row in cursor:
countries[row[0]]= row[1]
and then in executemany
cursor.executemany("INSERT INTO region (region,id_country) values ('sounth', ?)", [(countries["fra"],) (countries["ger"],)])

How can I convert from SQLite3 format to dictionary

How can i convert my SQLITE3 TABLE to a python dictionary where the name and value of the column of the table is converted to key and value of dictionary.
I have made a package to solve this issue if anyone got into this problem..
aiosqlitedict
Here is what it can do
Easy conversion between sqlite table and Python dictionary and vice-versa.
Get values of a certain column in a Python list.
Order your list ascending or descending.
Insert any number of columns to your dict.
Getting Started
We start by connecting our database along with
the reference column
from aiosqlitedict.database import Connect
countriesDB = Connect("database.db", "user_id")
Make a dictionary
The dictionary should be inside an async function.
async def some_func():
countries_data = await countriesDB.to_dict("my_table_name", 123, "col1_name", "col2_name", ...)
You can insert any number of columns, or you can get all by specifying
the column name as '*'
countries_data = await countriesDB.to_dict("my_table_name", 123, "*")
so you now have made some changes to your dictionary and want to
export it to sql format again?
Convert dict to sqlite table
async def some_func():
...
await countriesDB.to_sql("my_table_name", 123, countries_data)
But what if you want a list of values for a specific column?
Select method
you can have a list of all values of a certain column.
country_names = await countriesDB.select("my_table_name", "col1_name")
to limit your selection use limit parameter.
country_names = await countriesDB.select("my_table_name", "col1_name", limit=10)
you can also arrange your list by using ascending parameter
and/or order_by parameter and specifying a certain column to order your list accordingly.
country_names = await countriesDB.select("my_table_name", "col1_name", order_by="col2_name", ascending=False)

Getting NONE in the last row of dataframe when using pd.read_sql_query

I am trying to create a db using sqlite3. i created methods to read write delete and show table. however in order to view table in proper format on Command line, i decided to use pandas (pd.read_sql_query). However, when i do that i get None in the last row of the first column.
I tried writing the table to a csv and there was no none value there.
def show_table():
df = pd.read_sql_query("SELECT * FROM ticket_info", SQLITEDB.conn, index_col='resource_id')
print(df)
df.to_csv('hahaha.csv')
def fetch_from_db(query):
df = pd.read_sql_query('SELECT * FROM ticket_info WHERE {}'.format(query), SQLITEDB.conn, index_col='resource_id')
print(df)
here's the output as a picture.output image
Everything is correct but the last None value, where is it coming from? and how do i gt rid of it?
You are adding query as a variable. You might have a query that doesn't return any data from you table.

Iterate on page of returning execute_values

http://initd.org/psycopg/docs/extras.html
psycopg2.extras.execute_values has a parameters page_size.
I'm doing an INSERT INTO... ON CONFLICT... with RETURNING ID.
The problem is that the cursor.fetchall() give me back only the last "page", that is, 100 ids (default of page_size).
Without modifying page_size parameters, is it possible to iterate over the results, to get the total number of rows updated ?
The best and shortest answer would be using fetch = True in parameter as stated in here
all_ids = psycopg2.extras.execute_values(cur, query, data, template=None, page_size=10000, fetch=True)
# all_ids will return all affected rows with array like this [ [1], [2], [3] .... ]
I ran into the same issue. I work around this issue by batching my calls to execute_values(). I'll set my_page_size=1000, then iterate over my values, filling argslist until i have my_page_size items. Then I'll call execute_values(cur, sql, argslist, page_size=my_page_size). And iterate over cur to get those ids.
Without modifying page_size parameters, is it possible to iterate over
the results, to get the total number of rows updated ?
Yes.
try:
conn = psycopg2.connect(...)
cur = conn.cursor()
query = """
WITH
items (eggs) AS (VALUES %s),
inserted AS (
INSERT INTO spam (eggs)
SELECT eggs FROM items
ON CONFLICT (eggs) DO NOTHING
RETURNING id
)
SELECT id FROM spam
WHERE eggs IN (SELECT eggs FROM items)
UNION
SELECT id FROM inserted
"""
eggs = (('egg_{}'.format(i % 666),) for i in range(10_000))
ids = psycopg2.extras.execute_values(cur, query, argslist=eggs, fetch=True)
# Do whatever with `ids`. `len(ids)` I suppose?
finally:
if connection:
cur.close()
conn.close()
I overkilled query on purpose to address some gotchas:
WITH items (eggs) AS (VALUES %s) is done to be able to use argslist in two places at once;
RETURNING with ON CONFLICT will return only ids which were actually inserted, conflicting ones are omitted from INSERT' direct results. To solve that all this SELECT ... WHERE ... UNION SELECT mumbo jumbo is done;
to get all values which you asked for: ids = psycopg2.extras.execute_values(..., fetch=True).
A horrible interface oddity considering that all other cases are done like
cur.execute(...) # or other kind of `execute`
rows = cur.fetchall() # or other kind of `fetch`
So if you want only the number of inserted rows then do
try:
conn = psycopg2.connect(...)
cur = conn.cursor()
query = """
INSERT INTO spam (eggs)
VALUES %s
ON CONFLICT (eggs) DO NOTHING
RETURNING id
"""
eggs = (('egg_{}'.format(i % 666),) for i in range(10_000))
ids = psycopg2.extras.execute_values(cur, query, argslist=eggs, fetch=True)
print(len(ids)
finally:
if connection:
cur.close()
conn.close()

Storing time ranges in cassandra

I'm looking for a good way to store data associated with a time range, in order to be able to efficiently retrieve it later.
Each entry of data can be simplified as (start time, end time, value). I will need to later retrieve all the entries which fall inside a (x, y) range. In SQL, the query would be something like
SELECT value FROM data WHERE starttime <= x AND endtime >= y
Can you suggest a structure for the data in Cassandra which would allow me to perform such queries efficiently?
This is an oddly difficult thing to model efficiently.
I think using Cassandra's secondary indexes (along with a dummy indexed value which is unfortunately still needed at the moment) is your best option. You'll need to use one row per event with at least three columns: 'start', 'end', and 'dummy'. Create a secondary index on each of these. The first two can be LongType and the last can be BytesType. See this post on using secondary indexes for more details. Since you have to use an EQ expression on at least one column for a secondary index query (the unfortunate requirement I mentioned), the EQ will be on 'dummy', which can always set to 0. (This means that the EQ index expression will match every row and essentially be a no-op.) You can store the rest of the event data in the row alongside start, end, and dummy.
In pycassa, a Python Cassandra client, your query would look like this:
from pycassa.index import *
start_time = 12312312000
end_time = 12312312300
start_exp = create_index_expression('start', start_time, GT)
end_exp = create_index_expression('end', end_time, LT)
dummy_exp = create_index_expression('dummy', 0, EQ)
clause = create_index_clause([start_exp, end_exp, dummy_exp], count=1000)
for result in entries.get_indexed_slices(clause):
# do stuff with result
There should be something similar in other clients.
The alternative that I considered first involved OrderPreservingPartitioner, which is almost always a Bad Thing. For the index, you would use the start time as the row key and the finish time as the column name. You could then perform a range slice with start_key=start_time and column_finish=finish_time. This would scan every row after the start time and only return those with columns before the finish_time. Not very efficient, and you have to do a big multiget, etc. The built-in secondary index approach is better because nodes will only index local data and most of the boilerplate indexing code is handled for you.

Resources