implications of sharing in-memory sqlite with multiple process - python-3.x

Initially I have created an sqlite database('temp.db') and shared its connection to multiple process. I started to get lots of database locked error.
I needed database for temporary storage only. The only operation performed are INSERT and SELECT on a single TABLE also no COMMIT is done on the database.
To overcome above lock issue I have created an in-memory(':memory:') sqlite database and shared its connection to multiple process. I have not run into any database lock error until now.
In both the cases I have not used any locking mechanisms. Using in first case might have resolved the issue, but dont want to increase execution time.
Is locking needed in second case? What could be other pitfalls to care for? Its impact on long running application?

Related

Is it a possible scenario that a text storage file gets updated simultaneously in NodeJS and causing data integrity violation?

I'm practicing NodeJS and working on a small application that has an endpoint that stores json objects to text files and also has the ability to search for objects in files. I'm using a single file for that purpose, I know I should be using multiple files for many considerations, but lets consider the case of having a single storage text file.
As far as I understand, (Not 100% sure) that NodeJS is single threaded, and it is not possible that two simultaneous API calls would update the file at the same time since each process of them will be in a single separate thread.
So is my understanding correct? or there is a possiblitiy that a file gets updated simultaneously which will cause data integrity violation? and if yes, how to handle this ? Is there a way to lock a file until the process completes ?
Yes, it can happen. While NodeJS is single-threaded, I/O is not. That is one of its core tenets.
You could mitigate such problems by locking files before writing to them (the link is just one example how to do this).
Another approach would be to use SQLite. There's no database server set up or administer (like MySQL, for example). The entire database is contained in a single file, but it handles things such as locking in case of multiple writes, crashes while writing and so on.
"Think of SQLite not as a replacement for Oracle [a database server] but as a replacement for fopen() [working with plain files]"

Is sharing one SQLite connection inside desktop app safe? Sharing one connection vs creating new connections for each query

I have found similar question on stackoverflow but it is solely focused on performance and answer is pretty obvious: creating new connection for each query = slower performance (how much slower? it depends)
I am more worried about transaction isolation aspect. In SQLite documentation I have found out that there is no isolation within a single connection. I am using sqlite3 library in my electron desktop app and I was planning on sharing a single connection throught the whole time that my app is running (to make it a little faster) but now I am wondering if it is safe. If there is no isolation within a single connection then is this scenario possible?:
Client triggers 2 unrelated processes
1.
db.serialize(()=>{
db.run("BEGIN");
try{
db.run("foo");
db.run("bar");
}catch(e){
db.run("ROLLBACK")
}
db.run("COMMIT")
});
2.
db.run("another foobar")
1. and 2. are ran parallel so it is possible that 2. finishes somewhere in between of "begin" and "commit"/"rollback" from 1.
Does that mean that it is possible for queries from 2. to be rolledback or commited by 1. even though they are entierly separate or is 2. using some implicit transaction to prevent this?
I think it is possible since there is no isolation within single connection but I might be missing something because I have never worked with SQLite and sqlite3 (or I might have missed something more basic) so I would like to confirm if this scenario is a potential danger of using single sqlite3 connection.

DB for Pi Zero running multiple NodeJs processes

I need a local DB on a pi zero, with multiple processes running that need to write and read data. That kind of rules SQLite out (I think). From my experience SQLite only allows one connection at a time and is tricky with multiple processes trying to do database work. All of my data transmission would be JSON driven so NOSQL makes sense but I need something light weight to store a few configs and to store data that will synced up to the server. But what NOSQL options would be best to run on a pi with great NODE support?
SQLite is generally fine when using it with multiple concurrent processes. From the SQLite FAQ:
We are aware of no other embedded SQL database engine that supports as much concurrency as SQLite. SQLite allows multiple processes to have the database file open at once, and for multiple processes to read the database at once. When any process wants to write, it must lock the entire database file for the duration of its update. But that normally only takes a few milliseconds. Other processes just wait on the writer to finish then continue about their business. Other embedded SQL database engines typically only allow a single process to connect to the database at once.
For the majority of applications, that should be fine. If only one of your processes is doing writes, and the other only reads, it should have no impact at all.
If you're looking for something that's NoSQL-specific, you can also consider LevelDB, which is used in Google Chrome. With Node, the best way to access it is through the levelup library.

Using a single QSqlDatabase connection in multiple qt threads

I have a multi threaded Qt application which has multiple threads accessing a single database. Am I required create separate QSqlDatabase connections for performing SELECT / INSERT / UPDATE in each thread?
From Qt documentation, I am unable to understand if the following guideline is discouraging the above approach I suggested:
"A connection can only be used from within the thread that created it.
Moving connections between threads or creating queries from a
different thread is not supported."
I have practically tried using the same connection in my multiple QThreads and all works fine practically but wanted to understand if its the correct thing to do.
FYI, I am using sqlite3 from within Qt (using Qtsql API) which I understand supports serialized mode by
default: https://www.sqlite.org/threadsafe.html
The reason I want to use the same connection name in multiple threads is because when I tried using different connections to the same database on multiple threads and performed SELECT / INSERT / UPDATE, I got database locked issue quite frequently. However, on using the same connection in multiple threads, this issue got eliminated completely.
Kindly guide on the same.
Regards,
Saurabh Gandhi
The documentation is not merely discouraging it, it flatly states that you must not do it (emphasis mine):
A connection can only be used from within the thread that created it.
So, no, you can't use one connection from multiple threads. It might happen to work, but it's not guaranteed to work, and you're invoking what amounts to undefined behavior. It's not guaranteed to crash either, mind you.
You need to either:
Serialize the access to the database on your end, or
Change the connection parameters so that locks don't reject a query but block until the database becomes available. I'm not quite sure what the database locked "issue" is: you should never see that error code (I presume it is SQLITE_LOCKED) if you actually use multiple connections. Sqlite 3 can be easily used from multiple threads, it shouldn't require any effort on your end other than enabling multithreading and using separate connections.

SQL Azure distributing heavy read queries for reporting

We are using SQL Azure for our application and need some inputs on how to handle queries that scan a lot data for reporting. Our application is both read/write intensive and so we don't want the report queries to block the rest of the operations.
To avoid connection pooling issues caused by long running queries we put the code that queries the DB for reporting onto a worker role. This still does not avoid the database getting hit with a bunch of read only queries.
Is there something we are missing here - Could we setup a read only replica which all the reporting calls hit?
Any suggestions would be greatly appreciated.
Have a look at SQL Azure Data Sync. It will allow you to incrementally update your reporting database.
here are a couple of links to get you started
http://msdn.microsoft.com/en-us/library/hh667301.aspx
http://social.technet.microsoft.com/wiki/contents/articles/1821.sql-data-sync-overview.aspx
I think it is still in CTP though.
How about this:
Create a separate connection string for reporting, for example use a different Application Name
For your reporting queries use SET TRANSACTION ISOLATION LEVEL SNAPSHOT
This should prevent your long running queries blocking your operational queries. This will also allow your reports to get a consistent read.
Since you're talking about reporting I'm assuming you don't need real time data. In that case, you can consider creating a copy of your production database at a regular interval (every 12 hours for example).
In SQL Azure it's very easy to create a copy:
-- Execute on the master database.
-- Start copying.
CREATE DATABASE Database1B AS COPY OF Database1A;
Your reporting would happen on Database1B without impacting the actual production database (Database1A).
You are saying you have a lot of read-only queries...any possibility of caching them? (perfect since it is read-only)
What reporting tool are you using? You can output cache the queries as well if needed.

Resources