How to implement a fast, queryable and persistant database in phantomjs? - node.js

I have been using phantomjs for doing some heavy lifting for me in a server side dom environment. Till now I have been putting by data structures in-memory (i.e. doing nothing special with them) and everything was fine.
But recently under some use cases i started running into following problems:
memory usage becoming too high making swap to kick in and seriously effecting my performance.
not being able to resume from the last save point since in-memory data structures are not persistent (obviously)
This forced me to look for a database solution to be used on phantom but again I am running into issues while deciding on a solution:
I don't want my performance to get too effected.
it has to be persistent and queryable
how do i even connect to a database from inside phantom script.
Can anyone guide me to a satisfactory solution?
Note: I have almost decided on sqlite but connecting to it from phantom is still an issue. Nodejs provides sqlite3 node module, i am trying to browserify it for phantom.
Note Note: Browserify didn't worked! Back to ground zero!! :-(
Thanx in advance!

Phantomjs' filesystem API allows you to read and write binary files with:
buf = fs.read(FILENAME, 'b') and
fs.write(FILENAME, buf, 'b')
sql.js (https://github.com/kripken/sql.js/) gives you a javascript SQLite
implementation you can run in phantomjs.
Combine the 2 and you have a fast, persistent, queryable SQL database.
Example walkthrough
Get javascript SQLite implementation (saving to /tmp/sql.js)
$ wget https://raw.githubusercontent.com/kripken/sql.js/master/js/sql.js -O /tmp/sql.js
Create a test SQLite database using the command-line sqlite3 app (showing it is persistent and external to your phantomjs application).
sqlite3 /tmp/eg.db
sqlite> CREATE TABLE IF NOT EXISTS test(id INTEGER PRIMARY KEY AUTOINCREMENT, created INTEGER NOT NULL DEFAULT CURRENT_TIMESTAMP);
sqlite> .quit
Save this test phantomjs script to add entries to the test database and verify behaviour.
$ cat /tmp/eg.js
var fs = require('fs'),
sqlite3 = require('./sql.js'),
dbfile = '/tmp/eg.db',
sql = 'INSERT INTO test(id) VALUES (NULL)',
// fs.read returns binary 'string' (not 'String' or 'Uint8Array')
read = fs.read(dbfile, 'b'),
// Database argument must be a 'string' (binary) not 'Uint8Array'
db = new sqlite3.Database(read),
write,
uint8array;
try {
db.run(sql);
} catch (e) {
console.error('ERROR: ' + e);
phantom.exit();
}
// db.export() returns 'Uint8Array' but we must pass binary 'string' to write
uint8array = db.export();
write = String.fromCharCode.apply(null, Array.prototype.slice.apply(uint8array));
fs.write(dbfile, write, 'b');
db.close();
phantom.exit();
Run the phantomjs script to test
$ /usr/local/phantomjs-2.0.0-macosx/bin/phantomjs /tmp/eg.js
Use external tool to verify changes were persisted.
sqlite3 /tmp/eg.db
sqlite> SELECT * FROM test;
id created
1 2015-03-28 10:21:09
sqlite>
Some things to keep in mind:
The database is modified on disk only when you call fs.write.
Any changes you make are invisible to external programs accessing the same SQLite database file until you call fs.write.
The entire database is read into memory with fs.read.
You may want to have different OS files for different tables -- or versions of tables -- depending on your application and the amount of data in the tables, to address the memory requirements you mentioned.
Passing what is returned by sqlite3.export() to fs.write will corrupt the SQLite database file on disk (it will no longer be a valid SQLite database file).
Uint8Array is NOT the correct type for the fs.write parameter.

Writing a binary data in phantomjs works like this:
var db_file = fs.open(db_name, {mode: 'wb', charset: ''});
db_file.write(String.fromCharCode.apply(null, db.export()));
db_file.close();
You have to set the charset to '' because otherwise the writing goes wrong.

Related

Why can't I store a PriorityQueue into MongoDB

Recently I have decided to replace arrays with priority queues for storing my list of jobs for a user into MongoDB. I use NodeJS and ExpressJS for backend. The priority queue I attempted to store is from an external package which can be installed by running the following command in terminal:
yarn add js-priority-queue
For some reason the priority queue works perfectly prior to storing it into MongoDB. However, the next time I attempt to take it out of MongoDB and use it, its functionality is missing. I declare its type as Schema.Types.Mixed in the Schema. Am I doing something wrong or is it not possible to store instantiated class objects into MongoDB?
As far as I know, when you store things in MongoDB they are stored as extended JSON (EJSON) in binary format (BSON)
const { EJSON } = require('bson');
const test = EJSON.stringify({a: new Date(), foo:function(){console.log('foo');}})
console.log(test) // "{"a":{"$date":"2020-07-07T14:45:49.475Z"}}"
So any sort of function is lost.

atomically replace content of SQLite database, preferably in SQL or python3

I have a python script that supports an --edit-the-database option to invoke the user's preferred editor on a dump of the script's SQLite database. This option is intended to facilitate quick access to parts of the database that the script's other options don't provide access to, particularly during the development of this script.
Once the script has dumped the database's content, launched the editor and verified that the modified content is still valid then it needs to replace the existing database content.
First it removes all existing content by executing this SQL (using python's sqlite module):
PRAGMA writable_schema = 1;
DELETE FROM sqlite_master WHERE type IN ('table', 'index', 'trigger');
PRAGMA writable_schema = 0;
VACUUM;
and then it loads the new content using the sqlite module's executescript() method:
cursor.executescript(sql_slurped_from_user_modified_dump)
The problem is that these two operations (deleting existing content, loading new content) are not executed atomically: press CTRL-C at the wrong moment and the database content has been lost.
If I try to execute those two blocks of code inside a transaction then I get the error:
Error: cannot VACUUM from within a transaction
And if I keep the transaction but remove the VACUUM then I get the error:
Error: table first_table already exists
I have an ugly workaround in place: prior to calling the editor, the script copies the dump file to a safe location, writes a warning message to the user:
WARNING: if anything goes wrong then a backup of the database
can be found in /some/path
and, if the script continues and completes loading the new content, then it deletes the copy of the dump. But this is pretty ugly!
I could use DROP TABLE instead of the DELETE FROM sqlite_master ..., but if I am trying to allow the database to be modified in this way then I am allowing that the list of tables itself may change. I.e. if the user adds this to the dump:
CREATE TABLE t3 (n INT);
then a hard-coded list of DROPs like this:
BEGIN TRANSACTION
DROP TABLE t1;
DROP TABLE t2;
DROP INDEX ...
...
cursor.executescript(sql_slurped_from_user_modified_dump)
...
END TRANSACTION;
isn't going to work second time round (because it doesn't delete table t3).
I could use filesystem-atomic operations (i.e. something like: load the modified dump into a new database file; hardlink new file to old file), but that would require the script to close its database connection and reopen it afterwards, which, for reasons beyond the scope of this question, I would prefer not to do.
Does anybody have any better ideas for atomically replacing the entire content of a database whose list of tables is not predictable?
In case Google leads you here ...
I managed to do the first half of the task (delete existing content inside a single transaction) with something like this pseudocode:
-- Make the order in which tables are dropped irrelevant. Unfortunately, this
-- cannot be done just around the table dropping because it doesn't work inside
-- transactions.
PRAGMA foreign_keys = 0;
BEGIN TRANSACTION;
indexes = (SELECT name
FROM sqlite_master
WHERE type = 'index' AND
name NOT LIKE 'sqlite_autoindex_%';)
triggers = (SELECT name
FROM sqlite_master
WHERE type = 'trigger';)
tables = (SELECT name
FROM sqlite_master
WHERE type = 'table';)
for thing in indexes+triggers+tables:
DROP thing;
At which point I thought the second half (loading new content in the same transaction) would just be this:
cursor.executescript(sql_slurped_from_user_modified_dump)
END TRANSACTION;
-- Reinstate foreign key constraints.
PRAGMA foreign_keys = 1;
Unfortunately, pressing CTRL-C in the middle of the two blocks resulted in any empty database. The cause? cursor.executescript() does an immediate COMMIT before running the provided SQL. That turns the above code into two transactions!
This isn't the first time I've been caught out by this module's hidden transaction
management, but this time I was motivated to try the apsw module instead. This switch was remarkably easy. The latter half of the code now looks like this:
cursor.execute(sql_slurped_from_user_modified_dump)
END TRANSACTION;
-- Reinstate foreign key constraints.
PRAGMA foreign_keys = 1;
and it works perfectly!

NodeJS + PostgreSQL integration testing

I would like to include Postgres interaction into my integration tests, i.e. not mock the database part, and I need help on figuring out the best way to do the test cleanup.
My setup is NodeJS, Postgres, Sequelize, Karma+Mocha. Currently, before running the tests a new database is created and migrated, after each test I run a raw query that truncates all the tables, and after all tests cases are finished the test database is dropped. As you probably guessed it, the execution time for running tests like this is pretty slow.
I was wondering if there is a way to speed the process up. Is there an in-memory psql database that I could use for my test cases (I've search for one for a while but couldn't find it), or something like that.
To be more precise, I'm looking for a way to clear the database after a test wrote something to it, in a way that does not require truncating all the tables after every test case.
Incorporated https://stackoverflow.com/a/12082038/2018521 into my cleanup:
afterEach(async () => {
await db.sequelize.query(`
DO
$func$
BEGIN
EXECUTE
(SELECT 'TRUNCATE TABLE ' || string_agg(oid::regclass::text, ', ') || ' RESTART IDENTITY CASCADE'
FROM pg_class
WHERE relkind = 'r' -- only tables
AND relnamespace = 'public'::regnamespace
);
END
$func$;
`);
});
Truncate now runs almost instantly.

Resource Conflict after syncing with PouchDB

I am new to CouchDB / PouchDB and until now I somehow could manage the start of it all. I am using the couchdb-python library to send initial values to my CouchDB before I start the development of the actual application. Here I have one database with templates of the data I want to include and the actual database of all the data I will use in the application.
couch = couchdb.Server()
templates = couch['templates']
couch.delete('data')
data = couch.create('data')
In Python I have a loop in which I send one value after another to CouchDB:
value = templates['Template01']
value.update({ '_id' : 'Some ID' })
value.update({'Other Attribute': 'Some Value'})
...
data.save(value)
It was working fine the whole time, I needed to run this several times as my data had to be adjusted. After I was satisfied with the results I started to create my application in Javascript. Now I synced PouchDB with the data database and it was also working. However, I found out that I needed to change something in the Python code, so I ran the first python script again, but now I get this error:
couchdb.http.ResourceConflict: (u'conflict', u'Document update conflict.')
I tried to destroy() the pouchDB database data and delete the CouchDB database as well. But I still get this error at this part of the code:
data.save(value)
What I also don't understand is, that a few values are actually passed to the database before this error comes. So some values are saved() into the db.
I read it has something to do with the _rev values of the documents, but I cannot get an answer. Hope someone can help here.

Dropped and Recreated ArangoDB Databases Retain Collections

A deployment script creates and configures databases, collections, etc. The script includes code to drop databases before beginning so testing them can proceed normally. After dropping the database and re-adding it:
var graphmodule = require("org/arangodb/general-graph");
var graphList = graphmodule._list();
var dbList = db._listDatabases();
for (var j = 0; j < dbList.length; j++) {
if (dbList[j] == 'myapp')
db._dropDatabase('myapp');
}
db._createDatabase('myapp');
db._useDatabase('myapp');
db._create('appcoll'); // Collection already exists error occurs here
The collections that had previously been added to mydb remain in mydb, but they are empty. This isn't exactly a problem for my particular use case since the collections are empty and I had planned to rebuild them anyway, but I'd prefer to have a clean slate for testing and this behavior seems odd.
I've tried closing the shell and restarting the database between the drop and the add, but that didn't resolve the issue.
Is there a way to cleanly remove and re-add a database?
The collections should be dropped when db._dropDatabase() is called.
However, if you run db._dropDatabase('mydb'); directly followed by db._createDatabase('mydb'); and then retrieve the list of collections via db._collections(), this will show the collections from the current database (which is likely the _system database if you were able to run the commands)?.
That means you are probably looking at the collections in the _system database all the time unless you change the database via db._useDatabase(name);. Does this explain it?
ArangoDB stores additional information for managed graphs;
Therefore when working with named graphs, you should use the graph management functions to delete graphs to make shure nothing remains in the system:
var graph_module = require("org/arangodb/general-graph");
graph_module._drop("social", true);
The current implementation of the graph viewer in the management interface stores your view preferences (like the the attribute that should become the label of a graph) in your browsers local storage, so thats out of the reach of these functions.

Resources