SQLITE3 CLSQL multithreaded insert results in error - multithreading

I want to use my sqlite3 database with multiple threads in parallel. I read that using connection pools makes the access threadsafe but I still get errors while inserting data.
(make-thread
#'(lambda()
(dotimes (i 100)
(with-database (db ("/path/to/db")
:database-type :sqlite3 :pool T)
(do-stuff-with db)))))
When using multiple threads in this fashion in this error
While accessing database #
with expression "INSERT INTO ...":
Error 5 / database is locked
Is it even possible to do a multi threaded insert with an sqlite3 database? If yes how?

SQLite does not support concurrency of multiple write transactions. From the SQlite site:
SQLite supports an unlimited number of simultaneous readers, but it will only allow one writer at any instant in time. For many situations, this is not a problem. Writer queue up. Each application does its database work quickly and moves on, and no lock lasts for more than a few dozen milliseconds. But there are some applications that require more concurrency, and those applications may need to seek a different solution.
Cl-sql has been written to give a "unified" interface for the typical client-server relational DBMS, like other "standardized" libraries (e.g. JDBC or ODBC), but SQLite is an "untypical" database management system: in practice it is a library that offers SQL as language to access a simple "database-in-a-file", and a few other functionalities of DBMSs. For instance, it has no real concurrency control (it uses the Operating Systems functions to lock the db file), so it cannot be considered a "real" DBMS, and cl-sql cannot offer nothing more than the functionalities of the underlying system.
So, if you need concurrent insertions into a database, you should use something else, for instance PostgreSQL.

Related

MongoDB Performance: single collection vs multiple collections for concurrent read/writes

I'm utilizing a local database on my web server to sync certain data from external APIs. The local database would be used to serve the web application. The data I'm syncing is different for each user who would be visiting the web app. Since the sync job is periodically but continuously writing to the DB while users are accessing their data from the web page, I'm wondering what would give me the best performance here.
Since the sync job is continuously writing to the DB, I believe the collection is locked until it's done. I'm thinking that having multiple collections would help here since the lock would be on a particular collection that is being written to rather than on a single collection every time.
Is my thinking correct here? I basically don't want reads to get throttled since the write operation is continuously locking up one collection.
Collection level locking was never a thing in MongoDB. Before the WiredTiger storage engine arrived with MongoDB 4.x there were plenty of occcasions when the whole database would lock.
Nowdays with WiredTiger writing with multiple threads and/or processes to a single collection is extremely efficient. The right way to distribute a very heavy write load in MongoDB is to shard your collection.
To test a sharded vs unsharded config you can easily spin up both configurations in parallel with MongoDB Atlas.
There is an extensive amount of information regarding lock granularity and locking in MongoDB in general here.
In general, writing to multiple collections, for a small to medium value of "multiple", and assuming all of the collections are created in advance, can be faster than using a single collection, at the cost of queries becoming awkward as well as potentially slow if you have to perform joins via the aggregation pipeline instead of performing a single collection/index scan, for example.
If you have so many collections that there are so many files open that either the DB or the OS starts evicting files out of their respective caches, performance will start dropping again.
Creating collections may also be relatively slow, so if this happens under load it may not be very good for performance.

How should I keep temporary data for socket.io interactions in node.js?

I am building a simple game in node.js using socket.io. My web experience with node.js has typically involved saving everything to a relational database and keeping nothing in memory. I set up a relational database for the state of a game. I am using sqlite3 for development and I might use something like PostgreSQL or MySQL for production.
My concern is that, every time an event is emitted from the socket the whole game-state is loaded into memory from the server. I feel that in practice this will be less efficient than just keeping all of the game-state data in memory. Events will probably be emitted every 5 seconds or so during a game. All of the game data is temporary, none of it will be needed after the game is over. A game-state consists of a set of about 120 groups of small strings and integers (about 10 per group but subject to change).
Is it good practice to keep this type of data in memory?
If not, should I stick with relational databases or switch to a third option like a file-based storage structure?
Should I not load the whole gamestate in for every event even though that will lead to a lot more read/writes (at least triple)?
I would not keep this data in the memory of your NodeJS application. Its best avoid storing state in your app server. If you really need faster read access than sql provides consider using a cache like Redis or Memcached as a layer between your app and db.
All that being said its best not to prematurely optimize you code. Most SQL engines have their own form of cacheing, and optimizing your sql queries is a better place to start if your experiencing performance issues. Postgresql Query Optimization
But don't worry about it until its an actual problem (because most likely it never will be).
Sounds like relational, SQL type database is a huge overhead for your specifics. Do you have idea how big your data is and how many users you'd like to handle? Then you could compare it with your's server ability. If result is negative (couldn't handle with mem) - then i'd go with some quick nosql, like mongo. For yours example its sounds like the best choice. It'll be faster to get data for single session, easier to dump, more elastic in structure.

node.js keep a small in-memory database

I have an API-service in Node.js, basically what it does is gets id from request, reads record with this id from the database and returns it back in response.
While there are many clients with different ids usually only about 10-20 of them are used in a given timespan.
Is it a good idea to create an object with ids as keys and store the resulting record along with last_requested time to emulate a small database with fast-access? Whenever a record is requested I will update the last_requested field with new Date(). Also, create a setInterval() to delete those keys which were not used for some time.
Records in the database do not change often, and when they do I can restart the service (there are several instances running simultaneously via PM2, so they can be gracefully restarted).
If the required id is not found in this "database" a request to real database will be performed and the result will be stored in the object in a new key.
You're talking about caching. And it's very useful, if
You have a lot of reads, but not a lot of writes. i.e. Lots of people request a record, and it changes rarely.
You have a lot of free memory, or not many records.
You have a good indication of when to invalidate the cache.
For trivial usecases (i.e. under 50 requests / second), you probably don't need an in-memory cache for the database. Moreover, database access is very fast if you use the tools the database gives you (like persistent connection pools, consistent parameterized queries, query cache, etc).
It all depends on your specific usecase. But I wouldn't do it until I actually start encountering performance problems, and determine that the database is the bottleneck.
It's not just a good idea, caching is a necessity in different level of a computational system. Caching start from the CPU level (L1, L2, L3), OS Level up to application level which must be done by the developer.
Even if you have a well structured Database with good indexes, still there is an overhead for TCP-IP communication between your app and database. So if you are going to access some row frequently it's a must to have them in your app process.
The good news is Node.js apps are single process resident in memory (unlike PHP or other scripting programs which come and go). So you can load frequent required data and omit the database access.
The best mechanism to store the record can be an LRU (least-recently-used) cache. There are several LRU cache packages available for node.js:
https://github.com/adzerk/node-lru-native
https://github.com/isaacs/node-lru-cache
https://www.npmjs.com/package/simple-lru-cache
In an LRU cache you can define how much memory the cache can use, expiry age of each item, and how many item it can store! or you can write your own!

Database for Embedded Linux, and Architecture

below is architecture of my applications.
sensor↔parser app↔database↔application1↔ethernet↔server
application2 and application3 are same level of application1.
database = sqlite3
problem is that too many transaction occured on database system.
parser app and applications are queries whole range of database for checking any differences every second.
so i would like to change architecture or database.
is there any database which has better performance than sqlite3?
or which part do i have to change?
I would switch out sqlite3 in favor of MySQL or PostgreSQL, those database systems are meant to handle multiple clients where as sqlite3 won't be able to do this because everything is stored in a single file. Each (write) access therefore has to block the entire database instead of only a single row of the table in question.

Rate limiting - using CouchDB with Redis or CouchDB on its own

I've written an application with a CouchDB backend. I have invested a lot of time into CouchDB and so I'm reluctant to move everything over to a different NoSQL database (like Redis).
The problem is that I now need to implement a rate limiting (based on IP address) feature.
There are plenty of examples on how good Redis is for this kind of task, however because I don't want to drop CouchDB for other tasks this means I would essentially be running (and supporting) two databases (1 for most data, 1 for rate limiting) and so...
Is running CouchDB in tandem with Redis unheard of?
Is CouchDB itself suitable for handling rate limiting itself?
Is running CouchDB in tandem with Redis unheard of?
Redis is commonly used in complement with other storage solutions (MySQL, PostgreSQL, MongoDB, CouchDB, etc ...). Like many other NoSQL solutions, Redis is not adapted to all kind of workloads or situations. The authors of Redis are pragmatic and open people, and they routinely suggest to use other solutions rather than Redis, when they are more adapted to the situation.
Redis is therefore a good team player, and it is generally easy to integrate in an existing infrastructure.
Here is an example of usage of Redis with CouchDB.
Is CouchDB itself suitable for handling rate limiting itself?
CouchDB has a number of useful features to implement the rate limiting strategy described in Chris O'Hara's article. For instance, it supports bulk operations on several documents (with optional atomicity). A "bucket span" can be stored in a single document. In-place incrementation of counters can be covered by using update handlers.
IMO, the main missing feature would be automatic item expiration (which CouchDB does not provide AFAIK). So you would have to design a clever mechanism to get rid of obsolete data on top of CouchDB.
The main problem is CouchDB is not really designed for this kind of workload: it is a log structured document oriented database. Each time a counter has to be incremented, it would involve JSON unpacking/packing operations, some Javascript code to be executed, and writing a new revision of the whole document in append only files. You can find a good article describing how CouchDB stores its data here.
I suspect a rate limiting strategy implemented on top of CouchDB would not scale very well (too many I/Os, too much CPU consumption, inefficient network protocol). For instance, CouchDB is a RESTful server; I would not feel comfortable to initiate client HTTP operations (REST queries to CouchDB) to rate limit each incoming HTTP query of my system.
Redis is much more adapted to this kind of workload (fast, in-memory, no I/O, efficient client protocol, no JSON parsing/formatting, incrementations are native atomic operations, etc ...)
You can do rate limiting with Memcached - it has a nice counter increment command as you mention, plus obsolete data is automatically purged from the cache in due course, so it has all the benefits of Redis for this application without the annoying duplication of capability (and complexity) that running Redis on top of CouchDB would bring.
http://simonwillison.net/2009/jan/7/ratelimitcache/
You could add memcached to your own setup easily enough or you could investigate CouchBase whose current server product integrates a CouchDB derived database with Memcached compatibility baked in:
http://www.couchbase.com/memcached
Personally I dislike the way Couchbase forked from CouchDB, but for your application it might be a perfect fit.

Resources