Deleting millions of rows with Many-To-Many Relationship SQLAlchemy

Deleting millions of rows with Many-To-Many Relationship SQLAlchemy - python-3.x

I have a couple of tables with the following many-to-many relationship
def TableOne(db.Model):
__tablename__ = "table_one"
id = db.Column(db.Integer, primary_key=True)
table_twos = db.relationship(
"TableTwo", secondary=relationship_table, lazy="subquery"
)
# Some other attributes
def TableTwo(db.Model):
__tablename__ = "table_two"
id = db.Column(db.Integer, primary_key=True)
relationship_table = db.Table(
"relationship_table",
db.Column("table_one_id", db.Integer, db.ForeignKey("table_one.id"), primary_key=True),
db.Column(
"table_two_id",
db.Integer,
db.ForeignKey("table_two.id"),
primary_key=True,
),
)
Normally, I've worked on small projects, and I can delete all the relationships as follows
tables = db.session.query(TableOne).all()
for t in tables:
t.table_twos = []
db.session.flush()
db.session.commit()
table_twos = db.session.query(TableTwo).all()
for t in table_twos:
db.session.delete(t)
db.session.flush()
db.session.commit()
However, since I am working with millions of rows I can't load them all into memory. If I try to just delete all the TableTwo rows, it gives me an error about foreign keys.
How can I delete all of the relationships at once and then delete the TableTwo rows all at once?
Thank you

If you want to delete all rows in a table, it's much faster to use TRUNCATE, that simply trash the table files on disk instead of deleting every row one by one. It will also reclaim disk space, unlike DELETE which will only create free space in the table file.
If there are foreign keys:
You can TRUNCATE the referencing table (TableTwo).
But you cannot truncate the referenced table because that would break the foreign key references. But if want to delete all rows in the referencing and referenced table, just truncate both:
TRUNCATE table1, table2;
If the referencing and referenced table are listed in the same truncate command, it will work. Do not use two independent TRUNCATE commands, or postgres will refuse to break your foreign keys (as it should!).
Note if you want row deletions in table1 to also delete the referencing rows in table2, you must set your foreign key to "ON DELETE CASCADE". Then you could use DELETE on table1, but to delete all rows TRUNCATE is much faster.

Related

Delete multiple rows from Association Table in SQLAlchemy using db.session.execute syntax?

I have an association table that contains relationships between two other SQLAlchemy models that I would like to delete:
class ItemCategories(db.Model):
id = Column(Integer, primary_key=True)
item_id = Column(Integer, ForeignKey("item.id"))
category_id = Column(Integer, ForeignKey("category.id"))
# ... other fields
The old syntax was to use something like:
db.session.query(ItemCategories).filter_by(category_id=5).filter(ItemCategories.name="Shelved").delete()
But with the newer syntax, I tried:
db.session.execute(db.select(ItemCategories).filter_by(category_id=5).filter(ItemCategories.name="Shelved").delete())
But this errored with:
AttributeError: 'Select' object has no attribute 'delete'
Flask-SQLAlchemy suggests doing:
db.session.delete(Model Object)
But this only deletes a single row, and I would like to delete multiple rows at once. I know I can loop through all the rows and do a session delete one-by-one, but would prefer a bulk delete instead like with the session.query line.
Is there a way to do multiple deletes with db.session.execute()?

Trying to save a sqlite table inside another table using python

The problem now is that I can only enter one record. No errors are recorded. It just takes the first record from one database and puts in the other database. I am trying to create a machine usable database from the user interface database. I will try to transfer around 100 records once it is working. I would appreciate in comments or suggestions. Thank you!
import sqlite3
sql = 'INSERT INTO heavenStream (scene, cascade, enclosure, sensor, streamer, dither) VALUES (?, ?, ?, ?, ?, ?)'
def dropTable(crs,conn):
crs.execute("DROP TABLE IF EXISTS heavenStream")
def createTable(crs,conn):
sql ='''CREATE TABLE heavenStream(
id INTEGER PRIMARY KEY AUTOINCREMENT,
scene TEXT,
cascade TEXT,
enclosure TEXT,
sensor TEXT,
streamer TEXT,
dither TEXT,
timeStream TEXT,
streamTime TEXT
)'''
crs.execute(sql)
print("Table created successfully........")
def insert_one(conn, crs):
crs.execute("SELECT * FROM animalStream")
for row in crs:
scene = row[1]
cascade = row[2]
enclosure = row[3]
sensor = row[4]
streamer = row[5]
dither = row[6]
print(f"{row[1]} {row[2]} {row[3]} {row[4]} {row[5]} {row[6]}")
try:
crs.execute(sql, (scene, cascade, enclosure,
sensor,streamer,dither))
except sqlite3.IntegrityError as err:
print('sqlite error: ', err.args[0]) # column name is
not unique
conn.commit()
def main():
conn = sqlite3.connect("/home/harry/interface/wildlife.db")
crs = conn.cursor()
dropTable(crs,conn)
createTable(crs,conn)
insert_one(conn, crs)
# conn.commit()
conn.close()
print('done')
main()
The user interface database has had records deleted. There is one record with an id of 64 and the rest are in the 90's.

The cursor (crs) changes here
crs.execute(sql, (scene, cascade, enclosure,sensor,streamer,dither))
after the first insert. Therefore, there are "no more rows to fetch" in the orginal crs.
One solution would be to instantiate another cursor for the insert. Another solution would be to fetchall() the rows into a variable and iterate over that variable as with:
rows = crs.execute("SELECT * FROM animalStream").fetchall()
for row in rows:

Deletion of a row from an association table

I am working on an app using python3 and SqlAlchemy for SQLite3 database management. I have some tables that have a Many to Many relationship. I've created an association table to handle this relationship.
Class Machine(Base):
__tablename__ 'machine'
machine_ID = Column(Integer, primary_key=True)
etc...
Class Options(Base):
__tableName__ 'options'
options_ID = Column(Integer, primary_key=True)
etc...
The association table
Machine_Options = table('machine_options', Base.metadata,
Column('machine_FK', Integer, ForeignKey('machine.machine_ID'),
primary_key=True),
Column('options_FK',Integer, ForeignKey('options.options_ID'),
primary_key=True))
All the items for the Machine and Options are inserted independently. When I want to associate a machine with an option I use an append query which works very well.
My problem is when I want to break this association between a machine and an option. I have tried a direct row deletion from the association table using a FILTER() clause on the machine_FK and the options_FK but SqlAlchemy gives me an error informing me that 'Machine_Options' table has no field 'machine_FK'.
I have tried to remove the row from 'Machine_Options' indirectly using joins with the machine and options table but received another error that I can not delete or update using joins.
I am looking for the code to only delete a row from the association table without affecting the original machine or options table.
So far my internet search has been fruitless.

The answer to my problem is to use myparent.children.remove(somechild)
The association is made using machine.children.append(option)
Using the same code as the 'append' and substituting 'remove' unmakes the association
The code:
def removeOption(machineKey, OptionKey):
session = connectToDatabase()
machineData = session.query(Machine).filter(Machine.machine_ID == machineKey).one()
optionData = session.query(Options).filter(Options. options_ID == OptionKey).one()
machineData.children.remove(optionData)
session.add(machineData)
session.commit()
session.close()

How to insert value in already created Database table through pandas `df.to_sql()`

I'm creating new table then inserting values in it because the tsv file doesn't have headers so i need to create table structure first then insert the value. I'm trying to insert the value in database table which is been created. I'm using df.to_sql function to insert tsv values into database table but its creating table but it's not inserting values in that table and its not giving any type of error either.
I have tried to create new table through sqalchemy and insert value it worked but it didn't worked for already created table.
conn, cur = create_conn()
engine = create_engine('postgresql://postgres:Shubham#123#localhost:5432/walmart')
create_query = '''create table if not exists new_table(
"item_id" TEXT, "product_id" TEXT, "abstract_product_id" TEXT,
"product_name" TEXT, "product_type" TEXT, "ironbank_category" TEXT,
"primary_shelf" TEXT, apparel_category" TEXT, "brand" TEXT)'''
cur.execute(create_query)
conn.commit()
file_name = 'new_table'
new_file = "C:\\Users\\shubham.shinde\\Desktop\\wallll\\new_file.txt"
data = pd.read_csv(new_file, delimiter="\t", chunksize=500000, error_bad_lines=False, quoting=csv.QUOTE_NONE, dtype="unicode", iterator=True)
with open(file_name + '_bad_rows.txt', 'w') as f1:
sys.stderr = f1
for df in data:
df.to_sql('new_table', engine, if_exists='append')
data.close()
I want to insert values from df.to_sql() into database table

Not 100% certain if this argument works with postgresql, but I had a similar issue when doing it on mssql. .to_sql() already creates the table in the first argument of the method in new_table. The if_exists = append also doesn't check for duplicate values. If data in new_file is overwritten, or run through your function again, it will just add to the table. As to why you're seeing the table name, but not seeing the data in it, might be due to the size of the df. Try setting fast_executemany=True as the second argument of the create_engine.
My suggestion, get rid of create_query, and handle the data types after to_sql(). Once the SQL table is created, you can use your actual SQL table, and join against this staging table for duplicate testing. The non-duplicates can be written to the actual table, converting datatypes on UPDATE to match the tables data type structure.

Foreign keys Sqlite3 Python3

I have been having some trouble with my understanding of how foreign keys work in sqlite3.
Im trying to get the userid (james) in one table userstuff to appear as foreign key in my otherstuff table. Yet when I query it returns None.
So far I have tried:
Enabling foreign key support
Rewriting a test script (that is being discussed here) to isolate issue
I have re-written some code after finding issues in how I had initially written it
After some research I have come across joins but I do not think this is the solution as my current query is an alternative to joins as far as I am aware
Code
import sqlite3 as sq
class DATAB:
def __init__(self):
self.conn = sq.connect("Atest.db")
self.conn.execute("pragma foreign_keys")
self.c = self.conn.cursor()
self.createtable()
self.defaultdata()
self.show_details() # NOTE DEFAULT DATA ALREADY RAN
def createtable(self):
self.c.execute("CREATE TABLE IF NOT EXISTS userstuff("
"userid TEXT NOT NULL PRIMARY KEY,"
" password TEXT)")
self.c.execute("CREATE TABLE IF NOT EXISTS otherstuff("
"anotherid TEXT NOT NULL PRIMARY KEY,"
"password TEXT,"
"user_id TEXT REFERENCES userstuff(userid))")
def defaultdata(self):
self.c.execute("INSERT INTO userstuff (userid, password) VALUES (?, ?)", ('james', 'password'))
self.c.execute("INSERT INTO otherstuff (anotherid, password, user_id) VALUES (?, ?, ?)",('aname', 'password', 'james'))
self.conn.commit()
def show_details(self):
self.c.execute("SELECT user_id FROM otherstuff, userstuff WHERE userstuff.userid=james AND userstuff.userid=otherstuff.user_id")
print(self.c.fetchall())
self.conn.commit()
-----NOTE CODE BELOW THIS IS FROM NEW FILE---------
import test2 as ts
x = ts.DATAB()
Many thanks

A foreign key constraint is just that, a constraint.
This means that it prevents you from inserting data that would violate the constraint; in this case, it would prevent you from inserting a non-NULL user_id value that does not exist in the parent table.
By default, foreign key constraints allow NULL values. If you want to prevent userstuff rows without a parent row, add a NOT NULL constraint to the user_id column.
In any case, a constraint does not magically generate data (and the database cannot know which ID you want). If you want to reference a specific row of the parent table, you have to insert its ID.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Deleting millions of rows with Many-To-Many Relationship SQLAlchemy - python-3.x

Related

Delete multiple rows from Association Table in SQLAlchemy using db.session.execute syntax?

Trying to save a sqlite table inside another table using python

Deletion of a row from an association table

How to insert value in already created Database table through pandas `df.to_sql()`

Foreign keys Sqlite3 Python3

Categories

Resources