Mongoose encryption middleware doesn't call after aggregation - node.js

I have a mongoose's schema with a 'mongoose-encryption' plugin, for example:
let someSchema = new Schema({name, age});
someSchema.plugin(mongoose-encryption, {
encryptionKey: 'eKey',
signingKey: 'sKey',
encryptedFields: ['age'],
decryptPostSave: false
});
After initiating the model and the repository I tried to aggregate some query:
let aggregation = []; // just return all the docs.
someModel.aggregate(aggregation, (err, persons) => {
return persons;
});
As a result I'm still getting the age field encrypted, little reading has revealed that the 'post' method of the 'init' event isn't called after aggregation (as explained here - Mongoose Middleware Docs).
Is there a good solution? or any other workaround?
the data MUST be encrypt.
the aggregation is also required (in real life - lookup to other collection)

As I didn't find a better answer, I changed my code (as workaround unfortunately) to decrypt the object by myself -
using the code of mongoose-encryption to decrypt after the aggregation has finished.
Most of the code was taken from GitHub (called decryptOne in my code):
decryptSync function of mongoose-encryption
The 'tricky' thing was to decrypt the inner lookup value - the inner document also has the "_ct" field that should be decrypted.
let lookup: { [innerField: string]: string[]; } = {
user: ['bio']
};
this.decryptAggregation(aggregationResult, lookup);
My function gets a dictionary of the known lookup collection and its wanted fields after decryption. In that example, the other collection named users and its encrypted field is just his bio.
decryptAggregation(res: any[], innerLookup: { [innerField: string]: string[]; }) {
for (let doc of res) {
this.decryptSync(doc, innerLookup);
}
}
private decryptSync(doc: any, innerLookup: { [innerField: string]: string[]; }) {
this.decryptOne(doc, this.encryptedFields);
for (let innerObj in innerLookup) {
if (innerLookup.hasOwnProperty(innerObj)) {
this.decryptOne(doc[innerObj], innerLookup[innerObj]);
}
}
};
private decryptOne(doc: any, fields: string[]) {
let ct, ctWithIV, decipher, iv, idString, decryptedObject, decryptedObjectJSON, decipheredVal;
if (doc._ct) {
ctWithIV = doc._ct.hasOwnProperty('buffer') ? doc._ct.buffer : doc._ct;
iv = ctWithIV.slice(this.VERSION_LENGTH, this.VERSION_LENGTH + this.IV_LENGTH);
ct = ctWithIV.slice(this.VERSION_LENGTH + this.IV_LENGTH, ctWithIV.length);
decipher = crypto.createDecipheriv(this.ENCRYPTION_ALGORITHM, this.encryptionKey, iv);
try {
decryptedObjectJSON = decipher.update(ct, undefined, 'utf8') + decipher.final('utf8');
decryptedObject = JSON.parse(decryptedObjectJSON);
} catch (err) {
if (doc._id) {
idString = doc._id.toString();
} else {
idString = 'unknown';
}
throw new Error('Error parsing JSON during decrypt of ' + idString + ': ' + err);
}
fields.forEach((field) => {
decipheredVal = mpath.get(field, decryptedObject);
//JSON.parse returns {type: "Buffer", data: Buffer} for Buffers
//https://nodejs.org/api/buffer.html#buffer_buf_tojson
if (_.isObject(decipheredVal) && decipheredVal.type === "Buffer") {
this.setFieldValue(doc, field, decipheredVal.data);
} else {
this.setFieldValue(doc, field, decipheredVal);
}
});
doc._ct = undefined;
doc._ac = undefined;
}
}
After those function I got my wanted object fully decrypted, the last thing to do was to project the wanted fields back to the client - with lodash.pick

Related

Dynamic function call from dynamic string using nodejs

I'm using express nodejs, in my project, one scenario occurred while doing work on a specific module. I have made different functions for different roles, so based on the logged-in user's role, dynamically calling the other function. I want a trick that code should be a minimum of lines. So please suggest me a good solution.
if (userRole.code === ROLE.REFERRINGPROVIDER) {
query = await referralUser(tabCode, userId, query);
} else if (userRole.code === ROLE.CONSULTINGPROVIDER) {
query = await consultingUser(tabCode, userId, query);
} else if (userRole.code === ROLE.PARENT) {
query = await parentUser(tabCode, userId, query);
} else if (userRole.code === ROLE.PHYSICIAN) {
query = await physicianUser(tabCode, userId, query);
}
As shown in the above example I have to write that code for different users, so I have to make it a simple one-line function.
You can use this solution :)
const userRole = { code: 'referral' };
async function referralUser(tabCode, userId, query){
console.log(tabCode, userId, query);
return "referralUser Called!";
}
async function consultingUser(tabCode, userId, query){
console.log(tabCode, userId, query);
return "consultingUser Called!";
}
async function parentUser(tabCode, userId, query){
console.log(tabCode, userId, query);
return "parentUser Called!"
}
let functionName = userRole.code + 'User';
eval(functionName)("tabCode", "userId", "query").then((results)=>{
console.log(results);
});
You can call functions by their string name. For instance:
function funcOne() {
console.log('funcOne');
}
function funcTwo() {
console.log('funcTwo');
}
function funcThree() {
console.log('funcThree');
}
function funcFour() {
console.log('funcFour');
}
function funcFive() {
console.log('funcFive');
}
const func: { [K: string]: Function } = {
funcOne,
funcTwo,
funcThree,
funcFour,
funcFive
};
// console log output: "funcOne"
func['funcOne']();
// console log output: "funcFour"
func['funcFour']();
// console log output: "funcTwo"
func['funcTwo']();
In your case, use ROLE to map its keys to functions:
const func: { [K: string]: Function } = {
[ROLE.REFERRINGPROVIDER]: referralUser,
[ROLE.CONSULTINGPROVIDER]: consultingUser,
[ROLE.PARENT]: parentUser,
[ROLE.PHYSICIAN]: physicianUser
};
query = await func[userRole.code](tabCode, userId, query);

Make sure firestore collection docChanges keeps alive

The final solution is at the bottom of this post.
I have a nodeJS server application that listens to a rather big collection:
//here was old code
This works perfectly fine: these are lots of documents and the server can serve them from cache instead of database, which saves me tons of document reads (and is a lot faster).
I want to make sure, this collection is staying alive forever, this means reconnecting if a change is not coming trough.
Is there any way to create this certainty? This server might be online for years.
Final solution:
database listener that saves the timestamp on a change
export const lastRolesChange = functions.firestore
.document(`${COLLECTIONS.ROLES}/{id}`)
.onWrite(async (_change, context) => {
return firebase()
.admin.firestore()
.collection('syncstatus')
.doc(COLLECTIONS.ROLES)
.set({
lastModified: context.timestamp,
docId: context.params.id
});
});
logic that checks if the server has the same updated timesta.mp as the database. If it is still listening, it should have, otherwise refresh listener because it might have stalled.
import { firebase } from '../google/auth';
import { COLLECTIONS } from '../../../configs/collections.enum';
class DataObjectTemplate {
constructor() {
for (const key in COLLECTIONS) {
if (key) {
this[COLLECTIONS[key]] = [] as { id: string; data: any }[];
}
}
}
}
const dataObject = new DataObjectTemplate();
const timestamps: {
[key in COLLECTIONS]?: Date;
} = {};
let unsubscribe: Function;
export const getCachedData = async (type: COLLECTIONS) => {
return firebase()
.admin.firestore()
.collection(COLLECTIONS.SYNCSTATUS)
.doc(type)
.get()
.then(async snap => {
const lastUpdate = snap.data();
/* we compare the last update of the roles collection with the last update we
* got from the listener. If the listener would have failed to sync, we
* will find out here and reset the listener.
*/
// first check if we already have a timestamp, otherwise, we set it in the past.
let timestamp = timestamps[type];
if (!timestamp) {
timestamp = new Date(2020, 0, 1);
}
// if we don't have a last update for some reason, there is something wrong
if (!lastUpdate) {
throw new Error('Missing sync data for ' + type);
}
const lastModified = new Date(lastUpdate.lastModified);
if (lastModified.getTime() > timestamp.getTime()) {
console.warn('Out of sync: refresh!');
console.warn('Resetting listener');
if (unsubscribe) {
unsubscribe();
}
await startCache(type);
return dataObject[type] as { id: string; data: any }[];
}
return dataObject[type] as { id: string; data: any }[];
});
};
export const startCache = async (type: COLLECTIONS) => {
// tslint:disable-next-line:no-console
console.warn('Building ' + type + ' cache.');
const timeStamps: number[] = [];
// start with clean array
dataObject[type] = [];
return new Promise(resolve => {
unsubscribe = firebase()
.admin.firestore()
.collection(type)
.onSnapshot(querySnapshot => {
querySnapshot.docChanges().map(change => {
timeStamps.push(change.doc.updateTime.toMillis());
if (change.oldIndex !== -1) {
dataObject[type].splice(change.oldIndex, 1);
}
if (change.newIndex !== -1) {
dataObject[type].splice(change.newIndex, 0, {
id: change.doc.id,
data: change.doc.data()
});
}
});
// tslint:disable-next-line:no-console
console.log(dataObject[type].length + ' ' + type + ' in cache.');
timestamps[type] = new Date(Math.max(...timeStamps));
resolve(true);
});
});
};
If you want to be sure you have all changes, you'll have to:
keep a lastModified type field in each document,
use a query to get documents that we modified since you last looked,
store the last time you queried on your server.
Unrelated to that, you might also be interested in the recently launched ability to serve bundled Firestore content as it's another way to reduce the number of charged reads you have to do against the Firestore server.

How do I run an update while streaming using pg-query-stream and pg-promise?

I am trying to load 50000 items from the database with text in them, tag them and update the tags
I am using pg-promise and pg-query-stream for this purpoes
I was able to get the streaming part working properly but updating has become problematic with so many update statements
Here is my existing code
const QueryStream = require('pg-query-stream')
const JSONStream = require('JSONStream')
function prepareText(title, content, summary) {
let description
if (content && content.length) {
description = content
} else if (summary && summary.length) {
description = summary
} else {
description = ''
}
return title.toLowerCase() + ' ' + description.toLowerCase()
}
async function tagAll({ db, logger, tagger }) {
// you can also use pgp.as.format(query, values, options)
// to format queries properly, via pg-promise;
const qs = new QueryStream(
'SELECT feed_item_id,title,summary,content FROM feed_items ORDER BY pubdate DESC, feed_item_id DESC'
)
try {
const result = await db.stream(qs, (s) => {
// initiate streaming into the console:
s.pipe(JSONStream.stringify())
s.on('data', async (item) => {
try {
s.pause()
// eslint-disable-next-line camelcase
const { feed_item_id, title, summary, content } = item
// Process text to be tagged
const text = prepareText(title, summary, content)
const tags = tagger.tag(text)
// Update tags per post
await db.query(
'UPDATE feed_items SET tags=$1 WHERE feed_item_id=$2',
// eslint-disable-next-line camelcase
[tags, feed_item_id]
)
} catch (error) {
logger.error(error)
} finally {
s.resume()
}
})
})
logger.info(
'Total rows processed:',
result.processed,
'Duration in milliseconds:',
result.duration
)
} catch (error) {
logger.error(error)
}
}
module.exports = tagAll
The db object is the one from pg-promise whereas the tagger simply extracts an array of tags from text contained in the variable tags
Too many update statements are executing from what I can see in the diagnostics, is there a way to batch them?
If you can do everything with one sql statement, you should! Here you're paying the price of a back and forth between node and your DB for each line of your table, which will take most of the time of your query.
Your request can be implemented in pure sql:
update feed_items set tags=case
when (content = '') is false then lower(title) || ' ' || lower(content)
when (summary = '') is false then lower(title) || ' ' || lower(summary)
else title end;
This request will update all your table at once. I'm sure it'd be some order of magnitude faster than your method. On my machine, with a table containing 100000 rows, the update time is about 600ms.
Some remarks:
you don't need to order to update. As ordering is quite slow, it's better not to.
I guess the limit part was because it is too slow? If it is the case, then you can drop it, 50000 rows is not a big table for postgres.
I bet this pg-stream things does not really stream stuff out of the DB, it only allows you to use a stream-like api from the results it gathered earlier... No problem about that, but I thought maybe there was a misconception here.
This is the best I could come up with to batch the queries inside the stream so that we dont need to load all data in memory or run too many queries. If anyone knows a better way to batch especially with t.sequence feel free to add another answer
const BATCH_SIZE = 5000
async function batchInsert({ db, pgp, logger, data }) {
try {
// https://vitaly-t.github.io/pg-promise/helpers.ColumnSet.html
const cs = new pgp.helpers.ColumnSet(
[
{ name: 'feed_item_id', cast: 'uuid' },
{ name: 'tags', cast: 'varchar(64)[]' },
],
{
table: 'feed_items',
}
)
const query =
pgp.helpers.update(data, cs) + ' WHERE v.feed_item_id=t.feed_item_id'
await db.none(query)
} catch (error) {
logger.error(error)
}
}
async function tagAll({ db, pgp, logger, tagger }) {
// you can also use pgp.as.format(query, values, options)
// to format queries properly, via pg-promise;
const qs = new QueryStream(
'SELECT feed_item_id,title,summary,content FROM feed_items ORDER BY pubdate DESC, feed_item_id DESC'
)
try {
const queryValues = []
const result = await db.stream(qs, (s) => {
// initiate streaming into the console:
s.pipe(JSONStream.stringify())
s.on('data', async (item) => {
try {
s.pause()
// eslint-disable-next-line camelcase
const { feed_item_id, title, summary, content } = item
// Process text to be tagged
const text = prepareText(title, summary, content)
const tags = tagger.tag(text)
queryValues.push({ feed_item_id, tags })
if (queryValues.length >= BATCH_SIZE) {
const data = queryValues.splice(0, queryValues.length)
await batchInsert({ db, pgp, logger, data })
}
} catch (error) {
logger.error(error)
} finally {
s.resume()
}
})
})
await batchInsert({ db, pgp, logger, data: queryValues })
return result
} catch (error) {
logger.error(error)
}
}

Pagination in DynamoDB using Node.js?

I've had a read through AWS's docs around pagination:
As their docs specify:
In a response, DynamoDB returns all the matching results within the scope of the Limit value. For example, if you issue a Query or a Scan request with a Limit value of 6 and without a filter expression, DynamoDB returns the first six items in the table that match the specified key conditions in the request (or just the first six items in the case of a Scan with no filter)
Which means that given I have a table called Questions with an attribute called difficulty(that can take any numeric value ranging from 0 to 2) I might end up with the following conundrum:
A client makes a request, think GET /questions?difficulty=0&limit=3
I forward that 3 to the DynamoDB query, which might return 0 items as the first 3 in the collection might not be of difficulty == 0
I then have to perform a new query to fetch more questions that match that criteria without knowing I might return duplicates
How can I then paginate based on a query correctly? Something where I'll get as many results as I asked for whilst having the correct offset
Using async/await.
const getAllData = async (params) => {
console.log("Querying Table");
let data = await docClient.query(params).promise();
if(data['Items'].length > 0) {
allData = [...allData, ...data['Items']];
}
if (data.LastEvaluatedKey) {
params.ExclusiveStartKey = data.LastEvaluatedKey;
return await getAllData(params);
} else {
return data;
}
}
I am using a global variable allData to collect all the data.
Calling this function is enclosed within a try-catch
try {
await getAllData(params);
console.log("Processing Completed");
// console.log(allData);
} catch(error) {
console.log(error);
}
I am using this from within a Lambda and it works fine.
The article here really helped and guided me. Thanks.
Here is an example of how to iterate over a paginated result set from
a DynamoDB scan (can be easily adapted for query as well) in Node.js.
You could save the LastEvaluatedKey state serverside and pass an identifier back to your client, which it would send with its next request and your server would pass that value as ExclusiveStartKey in the next request to DynamoDB.
const AWS = require('aws-sdk');
AWS.config.logger = console;
const dynamodb = new AWS.DynamoDB({ apiVersion: '2012-08-10' });
let val = 'some value';
let params = {
TableName: "MyTable",
ExpressionAttributeValues: {
':val': {
S: val,
},
},
Limit: 1000,
FilterExpression: 'MyAttribute = :val',
// ExclusiveStartKey: thisUsersScans[someRequestParamScanID]
};
dynamodb.scan(scanParams, function scanUntilDone(err, data) {
if (err) {
console.log(err, err.stack);
} else {
// do something with data
if (data.LastEvaluatedKey) {
params.ExclusiveStartKey = data.LastEvaluatedKey;
dynamodb.scan(params, scanUntilDone);
} else {
// all results scanned. done!
someCallback();
}
}
});
Avoid using recursion to prevent call stack overflow. An iterative solution extending #Roshan Khandelwal's approach:
const getAllData = async (params) => {
const _getAllData = async (params, startKey) => {
if (startKey) {
params.ExclusiveStartKey = startKey
}
return this.documentClient.query(params).promise()
}
let lastEvaluatedKey = null
let rows = []
do {
const result = await _getAllData(params, lastEvaluatedKey)
rows = rows.concat(result.Items)
lastEvaluatedKey = result.LastEvaluatedKey
} while (lastEvaluatedKey)
return rows
}
I hope you figured out. So just in case others might find it useful. AWS has QueryPaginator/ScanPaginator as simple as below:
const paginator = new QueryPaginator(dynamoDb, queryInput);
for await (const page of paginator) {
// do something with the first page of results
break
}
See more details at https://github.com/awslabs/dynamodb-data-mapper-js/tree/master/packages/dynamodb-query-iterator
2022-05-19:
For AWS SDK v3 see how to use paginateXXXX at this blog post https://aws.amazon.com/blogs/developer/pagination-using-async-iterators-in-modular-aws-sdk-for-javascript/
Query and Scan operations return LastEvaluatedKey in their responses. Absent concurrent insertions, you will not miss items nor will you encounter items multiple times, as long as you iterate calls to Query/Scan and set ExclusiveStartKey to the LastEvaluatedKey of the previous call.
For create pagination in dynamodb scan like
var params = {
"TableName" : "abcd",
"FilterExpression" : "#someexperssion=:someexperssion",
"ExpressionAttributeNames" : {"#someexperssion":"someexperssion"},
"ExpressionAttributeValues" : {":someexperssion" : "value"},
"Limit" : 20,
"ExclusiveStartKey" : {"id": "9ee10f6e-ce6d-4820-9fcd-cabb0d93e8da"}
};
DB.scan(params).promise();
where ExclusiveStartKey is LastEvaluatedKey return by this query last execution time
Using async/await, returning the data in await.
Elaboration on #Roshan Khandelwal's answer.
const getAllData = async (params, allData = []) => {
const data = await dynamodbDocClient.scan(params).promise()
if (data['Items'].length > 0) {
allData = [...allData, ...data['Items']]
}
if (data.LastEvaluatedKey) {
params.ExclusiveStartKey = data.LastEvaluatedKey
return await getAllData(params, allData)
} else {
return allData
}
}
Call inside a try/catch:
try {
const data = await getAllData(params);
console.log("my data: ", data);
} catch(error) {
console.log(error);
}
you can do a index secundary by difficulty and at query set KeyConditionExpression where difficulty = 0. Like this
var params = {
TableName: questions,
IndexName: 'difficulty-index',
KeyConditionExpression: 'difficulty = :difficulty ',
ExpressionAttributeValues: {':difficulty':0}
}
You can also achieve this using recrusion instead of a global variable, like:
const getAllData = async (params, allData = []) => {
let data = await db.scan(params).promise();
return (data.LastEvaluatedKey) ?
getAllData({...params, ExclusiveStartKey: data.LastEvaluatedKey}, [...allData, ...data['Items']]) :
[...allData, ...data['Items']];
};
Then you can simply call it like:
let test = await getAllData({ "TableName": "test-table"}); // feel free to add try/catch
Using DynamoDB pagination with async generators:
let items = []
let params = {
TableName: 'mytable',
Limit: 1000,
KeyConditionExpression: 'mykey = :key',
ExpressionAttributeValues: {
':key': { S: 'myvalue' },
},
}
async function* fetchData({
params
}) {
let data
do {
data = await dynamodb.query(params).promise()
yield data.Items
params.ExclusiveStartKey = data.LastEvaluatedKey
} while (typeof data.LastEvaluatedKey != 'undefined')
}
for await (const data of fetchData(params)) {
items = [...items, ...data]
}

Is there a way to define a virtual field in Geddy model?

Is it possible to define a temp field / virtual field in geddy model?
Like in the form I've use the input fields tmpFirstName and tmpLastName but when submitted I want to store the information in a single column name.
Thanks
This can be trivially achieved with the new lifecycle methods (thanks to you!).
In your controller:
this.create = function (req, resp, params) {
var self = this
, person = geddy.model.Person.create(params);
person.firstname = params.firstname;
person.lastname = params.lastname;
if (!person.isValid()) {
this.respondWith(person);
}
else {
person.save(function(err, data) {
if (err) {
throw err;
}
self.respondWith(person, {status: err});
});
}
};
In your model:
this.defineProperties({
name: {type: 'string'}
});
this.beforeSave = function () {
this.name = this.firstname + ' ' + this.lastname;
}
Note that you don't declare the "virtual" properties, otherwise geddy will store them in the database.

Resources