Is Celery still necessary in Django - python-3.x

I'm creating a web app with Django 3.1 and there's lots of DB interactions mostly between three tables. The queries mostly use results that are recent inputs. So query1 will run and update table1, query2 will use table1 to update2 table2 and query3 will use the column updated by query2 to update other columns of table2. All these run every time users input or update info.
Perhaps a visual will be clearer.
query1 = Model1.objects.filter(...).annotate(...)
query2 = Model2.objects.filter(...).update(A=query1)
query3 = Model2.objects.filter(...).update(B=A*C)
I'm beginning to worry about speed between python and PostgreSQL and can lose data when multiple users start using it same time. I read about celery and Django Asynchronous support, but it's not clear if I need celery or not.
This is a very simplified version but you get the gist. Can someone help me out here please.

You would consider using Celery if your Django view has a long running task and you don't want the user to wait for completion or for the application server to time out. If the database updates are quick then you probably do not need it. PostgreSQL is a multi-user database so you do not need to worry too much about users clobbering other users' changes.

Related

Fetching 3.6 million records with Sequelize crashes Node, MariaDB Connector works. Any idea why?

as the title already says, I'm trying to run a raw SELECT query that results in 3.6 records.
When I use the MariaDB Connector (https://github.com/mariadb-corporation/mariadb-connector-nodejs) it works just fine and is done in ~2 minutes.
But Sequelize takes much longer and in the end, Node crashes after 20 to 30 minutes.
Do you guys have any idea how to fix this?
Thank you very much and have a great day!
Take care,
Patrick
When you perform your request, sequelize will perform a SELECT on the underlying database.
Then two thing will happend consecutively :
MariaDB will load all the data matching your criteria
MariaDB will send all the data to sequelize, that will :
Overload your app memory (all the data will be stored into node.js memory)
Crash sequelize because it is not made to handle that much data
When you perform request on huge dataset, use cursors. With cursors, MariaDB will load all the data but then, sequelize will get the data by group (For example, sequelize load 100 data, you treat it, then it load 100 data again, which means that at top you will have loaded 100 data on your node.js memory).
https://github.com/Kaltsoon/sequelize-cursor-pagination
https://mariadb.com/kb/en/cursor-overview/
Redesign your app to do more of the work in the database. Downloading 3.6M rows is poor design for any application.
If you would like to explain what you are doing with that many rows, maybe we can help create a workaround. (And it may even work faster than 2 minutes!)

How to optimize knex migration?

i'm working on a project that has been using bookshelfjs (with knexjs migration system) since its beginning (1 year and a half).
We now have a little bit less than 80 migrations and it's starting to take a lot of time (more than 2 minutes) to run all migrations. We are deploying using continuous integration so the migrations have to be run in the test process and in the deployment process.
I'd like to know how to optimize that. Is that possible to start from a clean state ? I don't care about losing rollback possibilities. The project is much more mature right now and we don't need to iterate much anymore on the data structure part.
Is there any best practice ? I'm coming from the Doctrine (PHP) world and it's really different.
Thanks for your advice !
Create database dump from your current database state.
Always use that dump to initialize new database for tests
Run migrations on top of already initialized database
In that way migration system applies only newly added migrations to top of existing initial dump.
When using knex.schema.createTable to create table with foregin keys from another table, and later when you run knex migrate:latest, the table with foreign keys should be processed before the one using the foreign keys. For example, table1 has foreign key key1 from talbe2, to make sure table2 is processed first, you can add numbers before the name of the table. Then in your migrations folder, there will be 1table2.js, 2table1.js. This looks hacky and not pretty, but it works!

Postgres with elasticsearch (keep in sync) - nodeJS

I want to set up postgres and elasticsearch. But before throwing data into elasticsearch, I want to prevent data loss when network or server goes down. After reading on this topic: https://gocardless.com/blog/syncing-postgres-to-elasticsearch-lessons-learned/. I came up with 3 solutions.
Create a database table ei: store, and add any new/updated data to it.
During queries: insert data into store.
Select new data: SELECT data FROM store WHERE modified > (:last modified time from elasticsearch)
Send "new" data over to elasticsearch
Use redis to pub/sub requests, and make elasticsearch listen/subscribe for upcoming data. If elasticsearch breaks, the data will be in the queue
Catch any errors during transaction to elasticsearch and save data into a safe place (ei: store table mentioned above). Then have a cron job pushing this data back.
Of course the easiest thing would be to insert data to elasticsearch straight away. But doing so prevents data to be stored in a safe place during corruptions. 1 is too slow in my opinion, unlike 2. And 3 requires mantaining error handling code.
For now 2 is my option.
Are there better ways to do this? I'd like to hear your opinions and new suggestions
:D
Redis (2) isn't reliable.
What I decided to do add data to elasticsearch straight away and add data to updates table. Then run a sync() function straight after connecting to elasticsearch client (if cluster went down before) + run a cron job every 24 hours to launch sync(). All sync() does is selects newest data (time or id) from updates A and elasticsearch B and compares if there are records A > B. If so, insert data using bulk API.
Hope this helps :)
And I am still opened to suggestions and fedback...

Cassandra Prepared Statement and adding new columns

We are using cached PreparedStatement for queries to DataStax Cassandra. But if we need to add new columns to a table, we need to restart our application server to recache the prepared statement.
I came across this bug in cassandra, that explains the solution
https://datastax-oss.atlassian.net/browse/JAVA-420
It basically gives a work around to not use "SELECT * FROM table" in the query, but use "SELECT column_names FROM table"
But now we came across the same issue with Delete statements. After adding a new column to a table, the Delete prepared statement does not delete a record.
I don't think we can use the same work around as mentioned in the ticket for Select statement, as * or column_names does not make sense with Deleting a row.
Any help would be appreciated. We basically want to avoid having to restart our application server for any additions to database tables
We basically want to avoid having to restart our application server for any additions to database tables
Easy solution that require a little bit of coding: use JMX
Let me explain.
In your application code, keep a cache (you can use Guava cache implementation for example) of all prepared statement. The key to access the cache can be, for example, the query string.
Now, expose a JMX method to clear the cache and force the application to re-prepare again the queries.
Every time you update a schema, just call the appropriate method(s) to clean the cache, you don't need to restart your application

Weird issue with Azure SQL Database v12: the database is always slow on the first insert or delete execution, but not with V11

We are using MVC4, ASP.NET 4.5, Entity Framework 6.
When we used Azure SQL Database v11, initial record inserts and deletes via EF, worked fine and quickly. However now, on v12, I notice that initial inserts and deletes can be very slow, especially if we choose a new value when inserting. If we insert a new record with the same value, the response is rapid. The delay I am talking about can be about 30 on S1, 15 secs on S2, 7 secs on S3.
As I say, we never encountered this on v11.
Any ideas gratefully received.
EDIT1
Just been doing some diagnostics and it seems that a view that I was using now runs very slowly first time:
db.ExecuteStoreCommand("DELETE FROM Vw_Widget where Id={0}", ID);
Do I need to rejig views in anyway for Azure SQL Database v12?
EDIT2
Looking at the Code a little more I see that I have added a delete trigger to the View, so basically I have set up a view so I can use this trigger code in certain situations. I am now trying to take out the trigger code and run it from the app, which does run alot quicker. Perhaps this code should be a stored procedure.
Definitely you need to do some diagnostics for your view to check the performance of your query and you may need to tune your query. The time measures you are saying is so high to perform any operation. Please make sure to do insert or deletes on your target tables and not views. The best practice is not to use views to insert or delete.
You can use views only in select statements.
I had a similar problem when make a migration of sql database v2 to v12. Actually i was working with business model and I tried to migrate to S0. The performance of the DB was not good. After sometime i discover that dtu model has particular views to monitor what type of provison model do you need. If is on the first time the problem, probably your application are making a lot of queries to load data in memory and these can be affecting the performance of your CRUD statement.
SELECT end_time
, (SELECT Max(v)
FROM (VALUES (avg_cpu_percent)
, (avg_data_io_percent)
, (avg_log_write_percent)
) AS value(v)) AS [avg_DTU_percent]
FROM sys.dm_db_resource_stats
ORDER BY end_time DESC;
more information about that, can be found on these page:
https://azure.microsoft.com/en-us/documentation/articles/sql-database-upgrade-server-portal/

Resources