Synchronization problems with between WRITER and READER RDS servers - amazon-rds

I have a basic structure on AWS that has an Aurora MySQL cluster and two servers, a WRITER (db.t2.medium) and a READER (db.r5.large).
I have an application in NodeJS that runs the following routine:
Insert a row in the database using WRITER server
Search this line ID in the table using READER server
Insert relational information into other tables using the generated
ID using WRITER server
In terms of code its structured like this (considering that db.writer is an instance of knex that executes queries on the WRITER server and db.reader is the instance that executes on the READER server):
let theNewId = await db.writer.raw("INSERT INTO users (`name`,`email`) VALUES ('John Doe','mail#contoso.org')").catch(err => { console.log(err)}).then(async R=>{
return await db.reader.raw("SELECT * FROM users WHERE `email`='mail#contoso.org'").catch(err => { console.log(err)}).then( async result=>{
let theId = result[0].id;
await db.writer.raw(""INSERT INTO users_emails (`user_id`,`email`) VALUES ('"+theId+"','mail#contoso.org')"");
return theId;
});
});
Note: I'm not executing RAW queryes, I'm just using it as example.
The problem I am having in all the similar functionalities of the code: the reader, even though it is running inside the result of the insert function, doesn't find the line that, when I will check, was inserted by WRITER correctly.
Is there any type of configuration or best programming practice that can be done so that this asynchronism does not occur in this type of situation?

Just to finish this question: after a lot of research I discovered that there is an asynchrony between the reading and writing instances that forces me to use, in these specific cases, the same write server to perform the searches.

Related

Datastax Node.js Cassandra driver When to use a Mapper vs. Query

I'm working with the Datastax Node.js driver and I can't figure out when to use a mapper vs. query. Both seem to be able to perform the same CRUD operations.
With a query:
const q = SELECT * FROM mykeyspace.mytable WHERE id='12345';
client.execute(q).then(result => console.log('This is the data', result);
With mapper:
const tableRow = await tableMapper.find({ id: '12345' });
When should I use the mapper over a query and vice versa?
Mapper is a feature from cassandra-driver released in 2018. Using mapper, cassandra-driver can make a map from your cassandra table to an object in nodejs and you can handle in your nodejs application like a set of document.
Using mapper you can make selects or inserts in your database like said in this article:
https://www.datastax.com/blog/2018/12/introducing-datastax-nodejs-mapper-apache-cassandra
With query method, if you need to use or reuse any property from your json you will need to make a Json.Parse().
The short answer is: whatever you find more comfortable.
The Mapper lets you deal with database data as documents (JavaScript objects), builds the CQL query for you, executes the query and maps the results.
On the other hand, the core driver only supports executing CQL queries that you have to write yourself.

Do I have to write the reverse operation for migration rollback?

I use knex.js and it's good query builder for PostgreSQL. I haven't found any docs to explain how to do the migration rollback in a right way.
For now, I just write the reverse migrate operation in the down function to do the migration rollback. Is this a correct way?
import * as Knex from 'knex';
exports.up = async (knex: Knex): Promise<any> => {
await knex.schema.raw(`
ALTER TABLE IF EXISTS "GOOGLE_CHANNEL"
ADD COLUMN IF NOT EXISTS google_channel_ad_group_cpc_bid INTEGER NOT NULL DEFAULT 0;
`);
await knex.schema.raw(`
UPDATE "GOOGLE_CHANNEL" as gc
SET
google_channel_ad_group_cpc_bid = 7
FROM "CAMPAIGN_TEMPLATE" as ct
WHERE ct.campaign_channel_id = gc.campaign_channel_id;
`);
};
exports.down = async (knex: Knex): Promise<any> => {
// TODO: migration rollback
await knex.schema.raw(``);
};
I have two concerns:
If there are a lot of SQL statements in up function, I have to do write a lot of SQL statements in down function too in order to rollback the migration.
Why doesn't knex.js do the migration rollback without writing the reverse operation for us? I mean, knex.js can take a snapshot or record a savepoint of the database.
Yes, to rollback you use the down function of a migration script. When you run knex migrate:rollback the down function will run. Knex has meta tables in the database that are used to figure out what migrations that have run or not.
For example:
exports.up = function (knex, Promise) {
return knex.schema
.createTable('role', function (table) {
table.increments('role_id').primary();
table.string('title').notNullable().unique();
table.string('description');
table.integer('level').notNullable(),
})
.createTable('user_account', function (table) {
table.increments('user_id').primary();
table.integer('role_id').references('role_id').inTable('role').notNullable();
table.string('username').notNullable().unique();
table.string('passwordHashed').notNullable();
table.string('email', 50).notNullable().unique();
});
};
exports.down = function (knex, Promise) {
return knex.schema
.dropTable('user_account')
.dropTable('role');
};
Here I create two tables in the up function. The user_account has a foreign key constraint, and links with the role table, which means I have to drop the user_account table before the role table in the down function.
In your case, you use a update statement. In the down function you have to either make a new update with a hard-coded value (the old one before the migration), or make sure you store the old value in a history table.
As for your concerns:
Yes, if you add a lot of stuff, you also have to add a lot of code to reverse whatever you are doing. However, you can skip making the down scripts, but then you won't be able to rollback. Some (many?) choose to only go forward and never rollback. If they have to fix something they don't rollback but make a new migration script with the fix.
I would recommend you to create the down functions in the beginning. You can consider skipping making them when the time is right. People who don't make down functions usually have to test their migrations more thoroughly in a test or staging environment before deploying to production. This is to make sure it works, because they can't rollback after all.
I can't really answer for the Knex creators here. However, what you are describing as a potential solution is basically a backup of the database before a migration is done. After all, a migration does more than just change the layout of the tables, etc. A migration script will typically add or remove new rows as well. You can use the backup approach, but you have to take the backups yourself.
Knex is a fairly simple query builder. If you want the migration scripts to be written for you, you might want to go for a full-blown OR mapper.

Sync elasticsearch on connection with database - nodeJS

Aim: sync elasticsearch with postgres database
Why: sometimes newtwork or cluster/server break so future updates should be recorded
This article https://qafoo.com/blog/086_how_to_synchronize_a_database_with_elastic_search.html suggests that I should create a separate table updates that will sync elasticsearch's id, allowing to select new data (from database) since the last record (in elasticsearch). So I thought what if I could record elasticsearch's failure and successful connection: if client ponged back successfully (returned a promise), I could launch a function to sync records with my database.
Here's my elasticConnect.js
import elasticsearch from 'elasticsearch'
import syncProcess from './sync'
const client = new elasticsearch.Client({
host: 'localhost:9200',
log: 'trace'
});
client.ping({
requestTimeout: Infinity,
hello: "elasticsearch!"
})
.then(() => syncProcess) // successful connection
.catch(err => console.error(err))
export default client
This way, I don't even need to worry about running cron job (if question 1 is correct), since I know that cluster is running.
Questions
Will syncProcess run before export default client? I don't want any requests coming in while syncing...
syncProcess should run only once (since it's cached/not exported), no matter how many times I import elasticConnect.js. Correct?
Is there any advantages using the method with updates table, instead of just selecting data from parent/source table?
The articles' comments say "don't use timestamp to compare new data!".Ehhh... why? It should be ok since database is blocking, right?
For 1: As it is you have not warranty that syncProcess will have run by the time the client is exported. Instead you should do something like in this answer and export a promise instead.
For 2: With the solution I linked to in the above question, this would be taken care of.
For 3: An updates table would also catch record deletions, while simply selecting from the DB would not, since you don't know which records have disappeared.
For 4: The second comment after the article you linked to provides the answer (hint: timestamps are not strictly monotonic).

Delete multiple couchbase entities having common key pattern

I have a use case where I have to remove a subset of entities stored in couchbase, e.g. removing all entities with keys starting with "pii_".
I am using NodeJS SDK but there is only one remove method which takes one key at a time: http://docs.couchbase.com/sdk-api/couchbase-node-client-2.0.0/Bucket.html#remove
In some cases thousands of entities need to be deleted and it takes very long time if I delete them one by one especially because I don't keep list of keys in my application.
I agree with the #ThinkFloyd when he saying: Delete on server should be delete on server, rather than requiring three steps like get data from server, iterate over it on client side and finally for each record fire delete on the server again.
In this regards, I think old fashioned RDBMS were better all you need to do is 'DELETE * from database where something=something'.
Fortunately, there is something similar to SQL is available in CouchBase called N1QL (pronounced nickle). I am not aware about JavaScript (and other language syntax) but this is how I did it in python.
Query to be used: DELETE from <bucketname> b where META(b).id LIKE "%"
layer_name_prefix = cb_layer_key + "|" + "%"
query = ""
try:
query = N1QLQuery('DELETE from `test-feature` b where META(b).id LIKE $1', layer_name_prefix)
cb.n1ql_query(query).execute()
except CouchbaseError, e:
logger.exception(e)
To achieve the same thing: alternate query could be as below if you are storing 'type' and/or other meta data like 'parent_id'.
DELETE from <bucket_name> where type='Feature' and parent_id=8;
But I prefer to use first version of the query as it operates on key, and I believe Couchbase must have some internal indexes to operate/query faster on key (and other metadata).
The best way to accomplish this is to create a Couchbase view by key and then range query over that view via your NodeJS code, making deletes on the results.
http://docs.couchbase.com/admin/admin/Views/views-querySample.html
http://docs.couchbase.com/couchbase-manual-2.0/#couchbase-views-writing-querying-selection-partial
http://docs.couchbase.com/sdk-api/couchbase-node-client-2.0.8/ViewQuery.html
For example, your Couchbase view could look like the following:
function(doc, meta) {
emit(meta.id, null);
}
Then in your NodeJS code, you could have something that looks like this:
var couchbase = require('couchbase');
var ViewQuery = couchbase.ViewQuery;
var query = ViewQuery.from('designdoc', 'by_id');
query.range("pii_", "pii_" + "\u0000", false);
var myBucket = myCluster.openBucket();
myBucket.query(query, function(err, results) {
for(i in results) {
// Delete code in here
}
});
Of course your Couchbase design document and view will be named differently than the example that I gave, but the important part is the ViewQuery.range function that was used.
All document ids prefixed with pii_ would be returned, in which case you can loop over them and start deleting.
Best,

How to implement a fast, queryable and persistant database in phantomjs?

I have been using phantomjs for doing some heavy lifting for me in a server side dom environment. Till now I have been putting by data structures in-memory (i.e. doing nothing special with them) and everything was fine.
But recently under some use cases i started running into following problems:
memory usage becoming too high making swap to kick in and seriously effecting my performance.
not being able to resume from the last save point since in-memory data structures are not persistent (obviously)
This forced me to look for a database solution to be used on phantom but again I am running into issues while deciding on a solution:
I don't want my performance to get too effected.
it has to be persistent and queryable
how do i even connect to a database from inside phantom script.
Can anyone guide me to a satisfactory solution?
Note: I have almost decided on sqlite but connecting to it from phantom is still an issue. Nodejs provides sqlite3 node module, i am trying to browserify it for phantom.
Note Note: Browserify didn't worked! Back to ground zero!! :-(
Thanx in advance!
Phantomjs' filesystem API allows you to read and write binary files with:
buf = fs.read(FILENAME, 'b') and
fs.write(FILENAME, buf, 'b')
sql.js (https://github.com/kripken/sql.js/) gives you a javascript SQLite
implementation you can run in phantomjs.
Combine the 2 and you have a fast, persistent, queryable SQL database.
Example walkthrough
Get javascript SQLite implementation (saving to /tmp/sql.js)
$ wget https://raw.githubusercontent.com/kripken/sql.js/master/js/sql.js -O /tmp/sql.js
Create a test SQLite database using the command-line sqlite3 app (showing it is persistent and external to your phantomjs application).
sqlite3 /tmp/eg.db
sqlite> CREATE TABLE IF NOT EXISTS test(id INTEGER PRIMARY KEY AUTOINCREMENT, created INTEGER NOT NULL DEFAULT CURRENT_TIMESTAMP);
sqlite> .quit
Save this test phantomjs script to add entries to the test database and verify behaviour.
$ cat /tmp/eg.js
var fs = require('fs'),
sqlite3 = require('./sql.js'),
dbfile = '/tmp/eg.db',
sql = 'INSERT INTO test(id) VALUES (NULL)',
// fs.read returns binary 'string' (not 'String' or 'Uint8Array')
read = fs.read(dbfile, 'b'),
// Database argument must be a 'string' (binary) not 'Uint8Array'
db = new sqlite3.Database(read),
write,
uint8array;
try {
db.run(sql);
} catch (e) {
console.error('ERROR: ' + e);
phantom.exit();
}
// db.export() returns 'Uint8Array' but we must pass binary 'string' to write
uint8array = db.export();
write = String.fromCharCode.apply(null, Array.prototype.slice.apply(uint8array));
fs.write(dbfile, write, 'b');
db.close();
phantom.exit();
Run the phantomjs script to test
$ /usr/local/phantomjs-2.0.0-macosx/bin/phantomjs /tmp/eg.js
Use external tool to verify changes were persisted.
sqlite3 /tmp/eg.db
sqlite> SELECT * FROM test;
id created
1 2015-03-28 10:21:09
sqlite>
Some things to keep in mind:
The database is modified on disk only when you call fs.write.
Any changes you make are invisible to external programs accessing the same SQLite database file until you call fs.write.
The entire database is read into memory with fs.read.
You may want to have different OS files for different tables -- or versions of tables -- depending on your application and the amount of data in the tables, to address the memory requirements you mentioned.
Passing what is returned by sqlite3.export() to fs.write will corrupt the SQLite database file on disk (it will no longer be a valid SQLite database file).
Uint8Array is NOT the correct type for the fs.write parameter.
Writing a binary data in phantomjs works like this:
var db_file = fs.open(db_name, {mode: 'wb', charset: ''});
db_file.write(String.fromCharCode.apply(null, db.export()));
db_file.close();
You have to set the charset to '' because otherwise the writing goes wrong.

Resources