Make sure firestore collection docChanges keeps alive - node.js

The final solution is at the bottom of this post.
I have a nodeJS server application that listens to a rather big collection:
//here was old code
This works perfectly fine: these are lots of documents and the server can serve them from cache instead of database, which saves me tons of document reads (and is a lot faster).
I want to make sure, this collection is staying alive forever, this means reconnecting if a change is not coming trough.
Is there any way to create this certainty? This server might be online for years.
Final solution:
database listener that saves the timestamp on a change
export const lastRolesChange = functions.firestore
.document(`${COLLECTIONS.ROLES}/{id}`)
.onWrite(async (_change, context) => {
return firebase()
.admin.firestore()
.collection('syncstatus')
.doc(COLLECTIONS.ROLES)
.set({
lastModified: context.timestamp,
docId: context.params.id
});
});
logic that checks if the server has the same updated timesta.mp as the database. If it is still listening, it should have, otherwise refresh listener because it might have stalled.
import { firebase } from '../google/auth';
import { COLLECTIONS } from '../../../configs/collections.enum';
class DataObjectTemplate {
constructor() {
for (const key in COLLECTIONS) {
if (key) {
this[COLLECTIONS[key]] = [] as { id: string; data: any }[];
}
}
}
}
const dataObject = new DataObjectTemplate();
const timestamps: {
[key in COLLECTIONS]?: Date;
} = {};
let unsubscribe: Function;
export const getCachedData = async (type: COLLECTIONS) => {
return firebase()
.admin.firestore()
.collection(COLLECTIONS.SYNCSTATUS)
.doc(type)
.get()
.then(async snap => {
const lastUpdate = snap.data();
/* we compare the last update of the roles collection with the last update we
* got from the listener. If the listener would have failed to sync, we
* will find out here and reset the listener.
*/
// first check if we already have a timestamp, otherwise, we set it in the past.
let timestamp = timestamps[type];
if (!timestamp) {
timestamp = new Date(2020, 0, 1);
}
// if we don't have a last update for some reason, there is something wrong
if (!lastUpdate) {
throw new Error('Missing sync data for ' + type);
}
const lastModified = new Date(lastUpdate.lastModified);
if (lastModified.getTime() > timestamp.getTime()) {
console.warn('Out of sync: refresh!');
console.warn('Resetting listener');
if (unsubscribe) {
unsubscribe();
}
await startCache(type);
return dataObject[type] as { id: string; data: any }[];
}
return dataObject[type] as { id: string; data: any }[];
});
};
export const startCache = async (type: COLLECTIONS) => {
// tslint:disable-next-line:no-console
console.warn('Building ' + type + ' cache.');
const timeStamps: number[] = [];
// start with clean array
dataObject[type] = [];
return new Promise(resolve => {
unsubscribe = firebase()
.admin.firestore()
.collection(type)
.onSnapshot(querySnapshot => {
querySnapshot.docChanges().map(change => {
timeStamps.push(change.doc.updateTime.toMillis());
if (change.oldIndex !== -1) {
dataObject[type].splice(change.oldIndex, 1);
}
if (change.newIndex !== -1) {
dataObject[type].splice(change.newIndex, 0, {
id: change.doc.id,
data: change.doc.data()
});
}
});
// tslint:disable-next-line:no-console
console.log(dataObject[type].length + ' ' + type + ' in cache.');
timestamps[type] = new Date(Math.max(...timeStamps));
resolve(true);
});
});
};

If you want to be sure you have all changes, you'll have to:
keep a lastModified type field in each document,
use a query to get documents that we modified since you last looked,
store the last time you queried on your server.
Unrelated to that, you might also be interested in the recently launched ability to serve bundled Firestore content as it's another way to reduce the number of charged reads you have to do against the Firestore server.

Related

How to avoid overriding set data in Cloud Function

I have a function to adding player like so:
inside my external js (player.js)
module.exports.player = function(appPlayer, db){
appPlayer.post('/', async(req, res) => {
const data = req.body
// data
const ipAdress = data.ipAdress
const carslidekey = data.id
const nickname = data.nickname
// random color for
const randomCol = Math.floor(Math.random()*16777215).toString(16)
let player = {
ipAdress: ipAdress,
nickname: nickname,
color: `#${randomCol}`
}
await db.collection('kartesian').doc(carslidekey)
.collection('player')
.doc(ipAdress)
.set(player).then(() => {
let dataResponse = {
status: true,
idDoc: ipAdress,
ipAdress: ipAdress,
color: `#${randomCol}`
}
const jsonStr_1 = JSON.stringify(dataResponse)
res.send(jsonStr_1)
}).catch((err) => {
let dataResponse = {
status: false,
idDoc: ipAdress,
ipAdress: ipAdress,
msg: `Terjadi kesalahan ${err.data}`
}
const jsonStr_1 = JSON.stringify(dataResponse)
res.send(jsonStr_1)
})
})
}
and my index.js (cloud functions) i wrote https request like below:
......
const playerHandler = require('./src/player')
playerHandler.player(app, db)
exports.player = functions.https.onRequest(app)
......
My problem is sometime this function called by different devices and might be concurrently, even though I've created a method set with a different id but why sometimes I find the previous documentation replaced by the document after. how to ensure that there is no overlap? Thanks
Option 1:
I don't think there is a way besides reading the document as a pre-check before writing. I would recommend implementing a queue -> the cloud function would push to the queue, and a separate cloud function would read the queue and execute on it synchronously.
Also, you're mixing await and promises (e.g. .then), you do not need to do that, see code below.
For example:
Note: I didn't run this code, but it should be close to working
Note2: #Nanda Z made a good point, it's probably not a good idea to run background functions without implementing them as cloud tasks and cloud events
let playerQueue = []
var isProcessingQueue = false
async function handlePlayerQueue(appPlayer, db) {
if (isProcessingQueue) return
isProcessingQueue = true
if (playerQueue.length == 0) return
// remove the first item from the queue
player = playerQueue.shift()
let ref = db.collection('kartesian').doc(player.carslidekey)
.collection('player')
.doc(player.ipAdress)
// Check if player exists
var playerExists = false
try {
playerExists = await ref.get().exists
} catch (err) {
console.error("Could not get player record", err)
}
// exit early
if (playerExists) {
isProcessingQueue = false
// recursively handle queue
handlePlayerQueue(appPlayer, db)
return
}
// wrap await in try...catch to catch exceptions.
try {
let setResponse = await ref.set(player)
let dataResponse = {
status: true,
idDoc: player.ipAdress,
...player
}
const jsonStr_1 = JSON.stringify(dataResponse)
console.log(`Player data pushed: ${jsonStr_1}`)
} catch(err) {
let dataResponse = {
status: false,
idDoc: player.ipAdress,
...player
msg: `Terjadi kesalahan ${err.data}`
}
const jsonStr_1 = JSON.stringify(dataResponse)
console.error(`Failed to set player: ${jsonStr_1}`)
}
isProcessingQueue = false
// recursively handle queue
handlePlayerQueue(appPlayer, db)
}
module.exports.player = function(appPlayer, db){
appPlayer.post('/', async(req, res) => {
const data = req.body
// data
const ipAdress = data.ipAdress
const carslidekey = data.id
const nickname = data.nickname
// random color for
const randomCol = Math.floor(Math.random()*16777215).toString(16)
let player = {
carslidekey, // need to add this so that the queueHandler can use it
ipAdress: ipAdress,
nickname: nickname,
color: `#${randomCol}`
}
// if javascript wasn't singlethreaded, this is where there'd be a lock =)
playerQueue.push(player)
// call the queue handler
handlePlayerQueue(appPlayer, db)
// Tell the client we received the request, they won't know the status, but you can set up another endpoint to check if the record exists
res.send({message: "Request received and queue\'d!"})
})
}
Option 2:
You could instead add a Firestore rule to not allow writing to a document if it's already populated:
match /kartesian/{carslidekey}/player/{player} {
// you can use the `resource.data` variable to access the player document:
// resource will be null if the player doesn't exist. resource.data is overkill but I put it there just fyi
allow write: if request.auth != null && (resource == null || resource.data == null);
}
I tested this rule to make sure it behaves as expected. It should let you write only if you're authenticated AND the resource is empty. So writes that come in later will fail.
It's important to note that rules only apply for online writing and will not be applied to offline/cached data.

How should a class close a connection pool?

I'm building a very rudimentary ORM while I learn Node. I have the constructor currently accept either a name or id parameter but not both. If the name is provided, the class creates the record in the database. If the id is provided, it looks up the record. Here's the complete file.
const mysql = require('mysql2/promise');
const { v4: uuid } = require('uuid');
const pool = require('../helpers/pool.js');
const xor = require('../helpers/xor.js');
class List {
constructor({ name = null, id = null} = {}) {
if (!xor(name, id)) {
throw new TypeError('Lists must have either a name or an id');
}
this.table_name = 'lists';
if (name) {
this.name = name;
this.uuid = uuid();
this.#insert_record();
return;
}
this.id = id;
this.#retrieve_record();
}
async #insert_record() {
await pool.query(
`INSERT INTO ${this.table_name} SET ?`,
{
name: this.name,
uuid: this.uuid
}
).then(async (results) => {
this.id = results[0].insertId;
return this.#retrieve_record();
});
}
async #retrieve_record() {
return await pool.execute(
`SELECT * FROM ${this.table_name} WHERE id = ?`,
[this.id]
).then(([records, fields]) => {
this.#assign_props(records[0], fields);
pool.end();
})
}
#assign_props(record, fields) {
fields.forEach((field) => {
this[field.name] = record[field.name];
})
console.log(this);
}
}
const list = new List({name: 'my list'});
const db_list = new List({id: 50});
You probably see the problem run this as is. I get intermittent errors. Sometimes everything works fine. Generally I see a console log of the retrieved list first, and then a see a log of the new list. But sometimes the pool gets closed with the retrieval before the insert can happen.
I tried placing the pool within the class, but that just caused other errors.
So, what's the proper way to have an ORM class use a connection pool? Note that I'm building features as I learn, and eventually there'll be a Table class from which all of the entity classes will inherit. But I'm first just trying to get this one class to work properly on its own.

How do I run an update while streaming using pg-query-stream and pg-promise?

I am trying to load 50000 items from the database with text in them, tag them and update the tags
I am using pg-promise and pg-query-stream for this purpoes
I was able to get the streaming part working properly but updating has become problematic with so many update statements
Here is my existing code
const QueryStream = require('pg-query-stream')
const JSONStream = require('JSONStream')
function prepareText(title, content, summary) {
let description
if (content && content.length) {
description = content
} else if (summary && summary.length) {
description = summary
} else {
description = ''
}
return title.toLowerCase() + ' ' + description.toLowerCase()
}
async function tagAll({ db, logger, tagger }) {
// you can also use pgp.as.format(query, values, options)
// to format queries properly, via pg-promise;
const qs = new QueryStream(
'SELECT feed_item_id,title,summary,content FROM feed_items ORDER BY pubdate DESC, feed_item_id DESC'
)
try {
const result = await db.stream(qs, (s) => {
// initiate streaming into the console:
s.pipe(JSONStream.stringify())
s.on('data', async (item) => {
try {
s.pause()
// eslint-disable-next-line camelcase
const { feed_item_id, title, summary, content } = item
// Process text to be tagged
const text = prepareText(title, summary, content)
const tags = tagger.tag(text)
// Update tags per post
await db.query(
'UPDATE feed_items SET tags=$1 WHERE feed_item_id=$2',
// eslint-disable-next-line camelcase
[tags, feed_item_id]
)
} catch (error) {
logger.error(error)
} finally {
s.resume()
}
})
})
logger.info(
'Total rows processed:',
result.processed,
'Duration in milliseconds:',
result.duration
)
} catch (error) {
logger.error(error)
}
}
module.exports = tagAll
The db object is the one from pg-promise whereas the tagger simply extracts an array of tags from text contained in the variable tags
Too many update statements are executing from what I can see in the diagnostics, is there a way to batch them?
If you can do everything with one sql statement, you should! Here you're paying the price of a back and forth between node and your DB for each line of your table, which will take most of the time of your query.
Your request can be implemented in pure sql:
update feed_items set tags=case
when (content = '') is false then lower(title) || ' ' || lower(content)
when (summary = '') is false then lower(title) || ' ' || lower(summary)
else title end;
This request will update all your table at once. I'm sure it'd be some order of magnitude faster than your method. On my machine, with a table containing 100000 rows, the update time is about 600ms.
Some remarks:
you don't need to order to update. As ordering is quite slow, it's better not to.
I guess the limit part was because it is too slow? If it is the case, then you can drop it, 50000 rows is not a big table for postgres.
I bet this pg-stream things does not really stream stuff out of the DB, it only allows you to use a stream-like api from the results it gathered earlier... No problem about that, but I thought maybe there was a misconception here.
This is the best I could come up with to batch the queries inside the stream so that we dont need to load all data in memory or run too many queries. If anyone knows a better way to batch especially with t.sequence feel free to add another answer
const BATCH_SIZE = 5000
async function batchInsert({ db, pgp, logger, data }) {
try {
// https://vitaly-t.github.io/pg-promise/helpers.ColumnSet.html
const cs = new pgp.helpers.ColumnSet(
[
{ name: 'feed_item_id', cast: 'uuid' },
{ name: 'tags', cast: 'varchar(64)[]' },
],
{
table: 'feed_items',
}
)
const query =
pgp.helpers.update(data, cs) + ' WHERE v.feed_item_id=t.feed_item_id'
await db.none(query)
} catch (error) {
logger.error(error)
}
}
async function tagAll({ db, pgp, logger, tagger }) {
// you can also use pgp.as.format(query, values, options)
// to format queries properly, via pg-promise;
const qs = new QueryStream(
'SELECT feed_item_id,title,summary,content FROM feed_items ORDER BY pubdate DESC, feed_item_id DESC'
)
try {
const queryValues = []
const result = await db.stream(qs, (s) => {
// initiate streaming into the console:
s.pipe(JSONStream.stringify())
s.on('data', async (item) => {
try {
s.pause()
// eslint-disable-next-line camelcase
const { feed_item_id, title, summary, content } = item
// Process text to be tagged
const text = prepareText(title, summary, content)
const tags = tagger.tag(text)
queryValues.push({ feed_item_id, tags })
if (queryValues.length >= BATCH_SIZE) {
const data = queryValues.splice(0, queryValues.length)
await batchInsert({ db, pgp, logger, data })
}
} catch (error) {
logger.error(error)
} finally {
s.resume()
}
})
})
await batchInsert({ db, pgp, logger, data: queryValues })
return result
} catch (error) {
logger.error(error)
}
}

How to use MongoDB locally and directline-js for state management in Bot Framework using NodeJs and Mongoose?

I am maintaining the bot state in a local MongoDB storage. When I am trying to hand-off the conversation to an agent using directline-js, it shows an error of BotFrameworkAdapter.sendActivity(): Missing Conversation ID. The conversation ID is being saved in MongoDB
The issue is arising when I change the middle layer from Array to MongoDB. I have already successfully implemented the same bot-human hand-off using directline-js with an Array and the default Memory Storage.
MemoryStorage in BotFramework
const { BotFrameworkAdapter, MemoryStorage, ConversationState, UserState } = require('botbuilder')
const memoryStorage = new MemoryStorage();
conversationState = new ConversationState(memoryStorage);
userState = new UserState(memoryStorage);
Middle Layer for Hand-Off to Agent
case '#connect':
const user = await this.provider.connectToAgent(conversationReference);
if (user) {
await turnContext.sendActivity(`You are connected to
${ user.userReference.user.name }\n ${ JSON.stringify(user.messages) }`);
await this.adapter.continueConversation(user.userReference, async
(userContext) => {
await userContext.sendActivity('You are now connected to an agent!');
});
}
else {
await turnContext.sendActivity('There are no users in the Queue right now.');
}
The this.adapter.continueConversation throws the error when using MongoDB.
While using Array it works fine. The MongoDB and Array object are both similar in structure.
Since this works with MemoryStorage and not your MongoDB implementation, I'm guessing that there's something wrong with your MongoDB implementation. This answer will focus on that. If this isn't the case, please provide your MongoDb implementation and/or a link to your repo and I can work off that.
Mongoose is only necessary if you want to use custom models/types/interfaces. For storage that implements BotState, you just need to write a custom Storage adapter.
The basics of this are documented here. Although written for C#, you can still apply the concepts to Node.
1. Install mongodb
npm i -S mongodb
2. Create a MongoDbStorage class file
MongoDbStorage.js
var MongoClient = require('mongodb').MongoClient;
module.exports = class MongoDbStorage {
constructor(connectionUrl, db, collection) {
this.url = connectionUrl;
this.db = db;
this.collection = collection;
this.mongoOptions = {
useNewUrlParser: true,
useUnifiedTopology: true
};
}
async read(keys) {
const client = await this.getClient();
try {
var col = await this.getCollection(client);
const data = {};
await Promise.all(keys.map(async (key) => {
const doc = await col.findOne({ _id: key });
data[key] = doc ? doc.document : null;
}));
return data;
} finally {
client.close();
}
}
async write(changes) {
const client = await this.getClient();
try {
var col = await this.getCollection(client);
await Promise.all(Object.keys(changes).map((key) => {
const changesCopy = { ...changes[key] };
const documentChange = {
_id: key,
document: changesCopy
};
const eTag = changes[key].eTag;
if (!eTag || eTag === '*') {
col.updateOne({ _id: key }, { $set: { ...documentChange } }, { upsert: true });
} else if (eTag.length > 0) {
col.replaceOne({ _id: eTag }, documentChange);
} else {
throw new Error('eTag empty');
}
}));
} finally {
client.close();
}
}
async delete(keys) {
const client = await this.getClient();
try {
var col = await this.getCollection(client);
await Promise.all(Object.keys(keys).map((key) => {
col.deleteOne({ _id: key });
}));
} finally {
client.close();
}
}
async getClient() {
const client = await MongoClient.connect(this.url, this.mongoOptions)
.catch(err => { throw err; });
if (!client) throw new Error('Unable to create MongoDB client');
return client;
}
async getCollection(client) {
return client.db(this.db).collection(this.collection);
}
};
Note: I've only done a little testing on this--enough to get it to work great with the Multi-Turn-Prompt Sample. Use at your own risk and modify as necessary.
I based this off of a combination of these three storage implementations:
memoryStorage
blobStorage
cosmosDbStorage
3. Use it in your bot
index.js
const MongoDbStorage = require('./MongoDbStorage');
const mongoDbStorage = new MongoDbStorage('mongodb://localhost:27017/', 'testDatabase', 'testCollection');
const conversationState = new ConversationState(mongoDbStorage);
const userState = new UserState(mongoDbStorage);

Mongoose encryption middleware doesn't call after aggregation

I have a mongoose's schema with a 'mongoose-encryption' plugin, for example:
let someSchema = new Schema({name, age});
someSchema.plugin(mongoose-encryption, {
encryptionKey: 'eKey',
signingKey: 'sKey',
encryptedFields: ['age'],
decryptPostSave: false
});
After initiating the model and the repository I tried to aggregate some query:
let aggregation = []; // just return all the docs.
someModel.aggregate(aggregation, (err, persons) => {
return persons;
});
As a result I'm still getting the age field encrypted, little reading has revealed that the 'post' method of the 'init' event isn't called after aggregation (as explained here - Mongoose Middleware Docs).
Is there a good solution? or any other workaround?
the data MUST be encrypt.
the aggregation is also required (in real life - lookup to other collection)
As I didn't find a better answer, I changed my code (as workaround unfortunately) to decrypt the object by myself -
using the code of mongoose-encryption to decrypt after the aggregation has finished.
Most of the code was taken from GitHub (called decryptOne in my code):
decryptSync function of mongoose-encryption
The 'tricky' thing was to decrypt the inner lookup value - the inner document also has the "_ct" field that should be decrypted.
let lookup: { [innerField: string]: string[]; } = {
user: ['bio']
};
this.decryptAggregation(aggregationResult, lookup);
My function gets a dictionary of the known lookup collection and its wanted fields after decryption. In that example, the other collection named users and its encrypted field is just his bio.
decryptAggregation(res: any[], innerLookup: { [innerField: string]: string[]; }) {
for (let doc of res) {
this.decryptSync(doc, innerLookup);
}
}
private decryptSync(doc: any, innerLookup: { [innerField: string]: string[]; }) {
this.decryptOne(doc, this.encryptedFields);
for (let innerObj in innerLookup) {
if (innerLookup.hasOwnProperty(innerObj)) {
this.decryptOne(doc[innerObj], innerLookup[innerObj]);
}
}
};
private decryptOne(doc: any, fields: string[]) {
let ct, ctWithIV, decipher, iv, idString, decryptedObject, decryptedObjectJSON, decipheredVal;
if (doc._ct) {
ctWithIV = doc._ct.hasOwnProperty('buffer') ? doc._ct.buffer : doc._ct;
iv = ctWithIV.slice(this.VERSION_LENGTH, this.VERSION_LENGTH + this.IV_LENGTH);
ct = ctWithIV.slice(this.VERSION_LENGTH + this.IV_LENGTH, ctWithIV.length);
decipher = crypto.createDecipheriv(this.ENCRYPTION_ALGORITHM, this.encryptionKey, iv);
try {
decryptedObjectJSON = decipher.update(ct, undefined, 'utf8') + decipher.final('utf8');
decryptedObject = JSON.parse(decryptedObjectJSON);
} catch (err) {
if (doc._id) {
idString = doc._id.toString();
} else {
idString = 'unknown';
}
throw new Error('Error parsing JSON during decrypt of ' + idString + ': ' + err);
}
fields.forEach((field) => {
decipheredVal = mpath.get(field, decryptedObject);
//JSON.parse returns {type: "Buffer", data: Buffer} for Buffers
//https://nodejs.org/api/buffer.html#buffer_buf_tojson
if (_.isObject(decipheredVal) && decipheredVal.type === "Buffer") {
this.setFieldValue(doc, field, decipheredVal.data);
} else {
this.setFieldValue(doc, field, decipheredVal);
}
});
doc._ct = undefined;
doc._ac = undefined;
}
}
After those function I got my wanted object fully decrypted, the last thing to do was to project the wanted fields back to the client - with lodash.pick

Resources