I'm working on a test suite for a build tool and I'm finding that I'm wasting quite a bit of time updating what are essentially manual snapshots like the following:
export const defaultBuild = {
'bundle.abcde.js': 1000,
'bundle.abcde.css': 112,
}
The idea here being that we do want to catch changes in file hash and changes in the file size beyond +/- 5%. But, every time one of these do change, it's a manual effort to go and rewrite these. As there are a couple dozen entries in reality, this is rather time consuming. Ideally, I'd like to switch this over to use Jest's snapshot tests, as (importantly), they can be updated with a flag. Much quicker!
Now, I'm easily able to support testing stable file hashes. A little bit of readdir manipulation gets me a stringified directory structure and all is good. The part I'm struggling with is the directory sizes though. It really is quite important that +/- 5% be acceptable, else the snapshots will be updating almost constantly and be too noisy. But, from what I can tell, Jest offers to custom comparator for the snapshot testing. It's exact matches only.
Is there any good method out there for storing something like file sizes in a snapshot so they can be auto-updated, but also allowing some wiggle room?
To give an idea of where I've started:
expect.extend({
// Pending more thoughtful name
toSortaMatchSnapshot(dir) {
const lines = dir.split('\n');
for (const line of lines) {
const match = line.match(/(^[^:]*):\s([0-9]*)/);
if (match) {
let fileName = match[1].trim();
if (/\.\w{5}/.test(fileName)) {
fileName = fileName.replace(/\.\w{5}/, '');
}
const fileSize = Number(match[2]);
const expectedMin = fileSize * 0.95;
const expectedMax = fileSize * 1.05;
//const message = (comparator, val) =>
// `expected '${fileName}' to be ${comparator} than ${val}, but it's ${receivedSize}`;
// This won't really ever work
toMatchSnapshot.call(this, fileSize, fileName);
}
}
return {
message: () => '',
pass: true,
};
},
});
Example input would look something like this:
`build
bundle.abcde.js: 1000
bundle.abcde.css: 112
`;
Related
I'm trying and failing at learning to use this kdbxweb library. The documentation for it is confusing me, probably because I lack some prerequisite knowledge that is standard and so the documentation isn't really written for me yet.
Below is my code where I'm trying to learn to use it. All I really want to use this for is a place to store passwords rather than in plain text in a way I can send script to a team member and they can setup a similar credentials database either within the script or outside it and it will pull in their various ODBC database passwords.
The idea eventually would be to create the entry name as the name of the given ODBC connection and then based on a request to initiate connection the UID and PWD would be retrieved and added into connection string. I'm trying to get away from MS Access/VBA for this sort of thing and learn to use NodeJS/TypeScript for it instead.
import * as fs from 'fs';
import * as kdbx from 'kdbxweb';
(async() => {
try {
const database = kdbx.Kdbx.create(new kdbx.Credentials(kdbx.ProtectedValue.fromString('test')),'credentials');
//const group = database.createGroup(database.getDefaultGroup(),'subgroup');
//const entry = database.createEntry(group);
//entry.fields.set('Password',kdbx.ProtectedValue.fromString('test'));
//entry.pushHistory();
//entry.times.update();
await database.save();
//fs.writeFileSync('credentials/credentials.kdbx',data);
} catch (e :any) {
throw e;
}
})();
The error I'm getting when trying to do this is "argon2 not implemented" and while argon2 is mentioned at the top of documentation, I don't understand what that is even talking about in the least. It sounded like it has to do with an additional cryptography API that I don't think I even should need. I tried to take the code of the example implementation but I had no idea how to actually make use of that at all.
I also tried reading code for the web-app written using this library, but the way it's integrated into the application makes it completely impossible for me to parse at this point. I can't tell what type of objects are being passed around/etc. to trace the flow of information.
Old solution below
I found a better solution, it was painful to learn how to do this, but I did eventually get it working. I'm using the node C++ implementation of argon instead and it no longer echos the minified script into console
This is setup as argon/argon2node.ts, and requires the argon2 node library. I think now that I got this working if I wanted to switch to the rust version or something like that I could probably work that out. It's mostly about figuring out exactly where the parameters need to go, since sometimes the names are a little different and you have to convert various parameters around.
import { Argon2Type, Argon2Version } from "kdbxweb/dist/types/crypto/crypto-engine";
import argon from 'argon2';
export default async function argon2(
password: ArrayBuffer,
salt: ArrayBuffer,
memory: number,
iterations: number,
length: number,
parallelism: number,
type: Argon2Type,
version: Argon2Version
): Promise<ArrayBuffer> {
try {
//https://github.com/keeweb/kdbxweb/blob/master/test/test-support/argon2.ts - reviewed this and eventually figured out how to switch to the C++ implementation below after much pain
const hashArr = new Uint8Array(await argon.hash(
Buffer.from(new Uint8Array(password)), {
timeCost: iterations,
memoryCost: memory,
parallelism: parallelism,
version: version,
type: type,
salt: Buffer.from(new Uint8Array(salt)),
raw: true
}
));
return Promise.resolve(hashArr);
} catch (e) {
return Promise.reject(e);
}
}
And below is my odbc credentials lookup based on it
import * as fs from 'fs';
import * as kdbx from 'kdbxweb';
import argon2 from './argon/argon2node';
import * as byteUtils from 'kdbxweb/lib/utils/byte-utils';
export default async(title : string) => {
try {
kdbx.CryptoEngine.setArgon2Impl(argon2);
const readBuffer = byteUtils.arrayToBuffer(fs.readFileSync('./SQL/credentials/credentials.kdbx'));
const database = await kdbx.Kdbx.load(
readBuffer,
new kdbx.Credentials(kdbx.ProtectedValue.fromString('CredentialsStorage1!'))
);
let result;
database.getDefaultGroup().entries.forEach((e) => {
if(e.fields.get('Title') === title) {
const password = (<kdbx.ProtectedValue>e.fields.get('Password')).getText();
const user = <string>e.fields.get('UserName');
result = `UID=${user};PWD=${password}`;
return;
}
});
return result;
} catch(e : any) {
throw e;
}
}
To resolve this, I had to do a bit of reading to understand buffers and arraybuffers and such from the documentation a bit, which wasn't easy but I eventually figured it out and created below testing reading and writing entries and such. I still have a bit to learn but this is close enough I thought it worth sharing for anyone else who may try to use this
I also had to get a copy of argon2-asm.min.js and argon2.ts which I pulled from the github for keeweb which is built with this library.
import * as fs from 'fs';
import * as kdbx from 'kdbxweb';
import { argon2 } from './argon/argon2';
function toArrayBuffer(buffer : Buffer) {
return buffer.buffer.slice(buffer.byteOffset, buffer.byteOffset + buffer.byteLength);
}
function toBuffer(byteArray : ArrayBuffer) {
return Buffer.from(byteArray);
}
(async() => {
try {
kdbx.CryptoEngine.setArgon2Impl(argon2);
fs.unlinkSync('./SQL/credentials/credentials.kdbx');
const database = kdbx.Kdbx.create(new kdbx.Credentials(kdbx.ProtectedValue.fromString('test')),'credentials');
const entry = database.createEntry(database.getDefaultGroup());
entry.fields.set('Title','odbc');
entry.fields.set('Password',kdbx.ProtectedValue.fromString('test'));
const data = await database.save();
fs.writeFileSync('./SQL/credentials/credentials.kdbx',new DataView(data));
const readData = toArrayBuffer(fs.readFileSync('./SQL/credentials/credentials.kdbx'));
console.log('hithere');
const read = await kdbx.Kdbx.load(
readData,
new kdbx.Credentials(kdbx.ProtectedValue.fromString('test'))
);
console.log('bye');
console.log(read.getDefaultGroup().entries[0].fields.get('Title'));
const protectedPass = <kdbx.ProtectedValue>read.getDefaultGroup().entries[0].fields.get('Password');
console.log(
new kdbx.ProtectedValue(
protectedPass.value,
protectedPass.salt
).getText()
);
} catch (e :any) {
console.error(e);
throw e;
}
})();
Things that I don't grasp I'd like to understand better include why the argon implementation isn't built-in. He says " Due to complex calculations, you have to implement it manually " but this just seems odd. Perhaps not appropriate for this forum, but would be nice to know about alternatives if this is slow or something.
Edit: Removing irrelevant code to improve readability
Edit 2: Reducing example to only uploadGameRound function and adding log output with times.
I'm working on a mobile multiplayer word game and was previously using the Firebase Realtime Database with fairly snappy performance apart from the cold starts. Saving an updated game and setting stats would take at most a few seconds. Recently I made the decision to switch to using Firestore for my game data and player stats / top lists, primarily because of the more advanced queries and the automatic scaling with no need for manual sharding. Now I've got things working on Firestore, but the time it takes to save an updated game and update a number of stats is just ridiculous. I'm clocking average between 3-4 minutes before the game is updated, stats added and everything is available in the database for other clients and viewable in the web interface. I'm guessing and hoping that this is because of something I've messed up in my implementation, but the transactions all go through and there are no warnings or anything else to go on really. Looking at the cloud functions log, the total time from function call to completion log statement appears to be a bit more than a minute, but that log doesn't appear until after same the 3-4 minute wait for the data.
Here's the code as it is. If someone has time to have a look and maybe spot what's wrong I'd be hugely grateful!
This function is called from Unity client:
exports.uploadGameRound = functions.https.onCall((roundUploadData, response) => {
console.log("UPLOADING GAME ROUND. TIME: ");
var d = new Date();
var n = d.toLocaleTimeString();
console.log(n);
// CODE REMOVED FOR READABILITY. JUST PREPARING SOME VARIABLES TO USE BELOW. NOTHING HEAVY, NO DATABASE TRANSACTIONS. //
// Get a new write batch
const batch = firestoreDatabase.batch();
// Save game info to activeGamesInfo
var gameInfoRef = firestoreDatabase.collection('activeGamesInfo').doc(gameId);
batch.set(gameInfoRef, gameInfo);
// Save game data to activeGamesData
const gameDataRef = firestoreDatabase.collection('activeGamesData').doc(gameId);
batch.set(gameDataRef, { gameDataCompressed: updatedGameDataGzippedString });
if (foundWord !== undefined && foundWord !== null) {
const wordId = foundWord.timeStamp + "_" + foundWord.word;
// Save word to allFoundWords
const wordRef = firestoreDatabase.collection('allFoundWords').doc(wordId);
batch.set(wordRef, foundWord);
exports.incrementNumberOfTimesWordFound(gameInfo.language, foundWord.word);
}
console.log("COMMITTING BATCH. TIME: ");
var d = new Date();
var n = d.toLocaleTimeString();
console.log(n);
// Commit the batch
batch.commit().then(result => {
return gameInfoRef.update({ roundUploaded: true }).then(function (result2) {
console.log("DONE COMMITTING BATCH. TIME: ");
var d = new Date();
var n = d.toLocaleTimeString();
console.log(n);
return;
});
});
});
Again, any help with understanding this weird behaviour massively appreciated!
Ok, so I found the problem now and thought I should share it:
Simply adding a return statement before the batch commit fixed the function and reduced the time from 4 minutes to less than a second:
RETURN batch.commit().then(result => {
return gameInfoRef.update({ roundUploaded: true }).then(function (result2) {
console.log("DONE COMMITTING BATCH. TIME: ");
var d = new Date();
var n = d.toLocaleTimeString();
console.log(n);
return;
});
});
Your function isn't returning a promise that resolves with the data to send to the client app. In the absence of a returned promise, it will return immediately, with no guarantee that any pending asynchronous work will terminate correctly.
Calling then on a single promise isn't enough to handle promises. You likely have lots of async work going on here, between commit() and other functions like incrementNumberOfTimesWordFound. You will need to handle all of the promises correctly, and make sure your overall function returns only a single promise that resolves when all that work is complete.
I strongly suggest taking some time to learn how promises work in JavaScript - this is crucial to writing effective functions. Without a full understanding, things will appear to go wrong, or not at all, in strange ways.
Here is the test code (in an express environment just because that's what I happen to be messing around with):
const fs = require('fs-extra');
const fsPromises = fs.promises;
const express = require('express');
const app = express();
const speedtest = async function (req, res, next) {
const useFsPromises = (req.params.promises == 'true');
const jsonFileName = './json/big-file.json';
const hrstart = process.hrtime();
if (useFsPromises) {
await fsPromises.readFile(jsonFileName);
} else {
fs.readFileSync(jsonFileName);
}
res.send(`time taken to read: ${process.hrtime(hrstart)[1]/1000000} ms`);
};
app.get('/speedtest/:promises', speedtest);
The big-file.json file is around 16 MB. Using node 12.18.4.
Typical results (varies quite a bit around these values, but the following are "typical"):
https://dev.mydomain.com/speedtest/false
time taken to read: 3.948152 ms
https://dev.mydomain.com/speedtest/true
time taken to read: 61.865763 ms
UPDATE to include two more variants... plain fs.readFile() and also a promisified version of this:
const fs = require('fs-extra');
const fsPromises = fs.promises;
const util = require('util');
const readFile = util.promisify(fs.readFile);
const express = require('express');
const app = express();
const speedtest = async function (req, res, next) {
const type = req.params.type;
const jsonFileName = './json/big-file.json';
const hrstart = process.hrtime();
if (type == 'readFileFsPromises') {
await fsPromises.readFile(jsonFileName);
} else if (type == 'readFileSync') {
fs.readFileSync(jsonFileName);
} else if (type == 'readFileAsync') {
return fs.readFile(jsonFileName, function (err, jsondata) {
res.send(`time taken to read: ${process.hrtime(hrstart)[1]/1000000} ms`);
});
} else if (type == 'readFilePromisified') {
await readFile(jsonFileName);
}
res.send(`time taken to read: ${process.hrtime(hrstart)[1]/1000000} ms`);
};
app.get('/speedtest/:type', speedtest);
I am finding that the fsPromises.readFile() is the slowest, while the others are much faster and all roughly the same in terms of reading time. I should add that in a different example (which I can't fully verify so I'm not sure what was going on) the time difference was vastly bigger than reported here. Seems to me at present that fsPromises.readFile() should simply be avoided because there are other async/promise options.
After stepping through each implementation in the debugger (fs.readFileSync and fs.promises.readFile), I can confirm that the synchronous version reads the entire file in one large chunk (the size of the file). Whereas fs.promises.readFile() reads 16,384 bytes at a time in a loop, with an await on each read. This is going to make fs.promises.readFile() go back to the event loop multiple times before it can read the entire file. Besides giving other things a chance to run, it's extra overhead to go back to the event loop every cycle through a for loop. There's also memory management overhead because fs.promises.readFile() allocates a series of Buffer objects and then combines them all at the end, whereas fs.readFileSync() allocates one large Buffer object at the beginning and just reads the entire file into that one Buffer.
So, the synchronous version, which is allowed to hog the entire CPU, is just faster from a pure time to completion point of view (it's significantly less efficient from a CPU cycles used point of view in a multi-user server because it blocks the event loop from doing anything else during the read). The asynchronous version is reading in smaller chunks, probably to avoid blocking the event loop too much so other things can effectively interleave and run while fs.promises.readFile() is doing its thing.
For a project I worked on awhile ago, I wrote my own simple asynchronous version of readFile() that reads the entire file at once and it was significantly faster than the built-in implementation. I was not concerned about event loop blockage in that particular project so I did not investigate if that's an issue.
In addition, fs.readFile() reads the file in 524,288 byte chunks (much larger chunks that fs.promises.readFile()) and does not use await, using just plain callbacks. It is apparently just coded more optimally than the promise implementation. I don't know why they rewrote the function in a slower way for the fs.promises.readFile() implementation. For now, it appears that wrapping fs.readFile() with a promise would be faster.
I've read up on module.exports and how it works but I'm not sure if I can accomplish what I want with it - or at least I'm not sure how to. I have some helper functions in a file, one of which is used in a majority of files in my project. I'm wondering if it is possible to just "require" the file one time and then just use it across the entirety of the project when needed.
My file looks something like this:
discord-utils.js
const { MessageEmbed, Permissions } = require('discord.js')
module.exports = {
embedResponse (message, embedOptions, textChannel = null) {
const embed = new MessageEmbed()
if (embedOptions.color) embed.setColor(embedOptions.color)
if (embedOptions.title) embed.setTitle(embedOptions.title)
if (embedOptions.description) embed.setDescription(embedOptions.description)
if (embedOptions.url) embed.setURL(embedOptions.url)
if (embedOptions.author) embed.setAuthor(embedOptions.author)
if (embedOptions.footer) embed.setFooter(embedOptions.footer)
if (embedOptions.fields) {
for (const field of embedOptions.fields) {
embed.addFields({
name: field.name,
value: field.value,
inline: field.inline ? field.inline : false
})
}
}
if (textChannel) {
textChannel.send(embed)
return
}
message.embed(embed)
},
inVoiceChannel (voiceState, message, response = null) {
if (!voiceState.channel) {
this.embedResponse(message, {
color: 'RED',
description: response === null ? 'You need to be in a voice channel to use this command.' : response
})
console.warn(`${message.author.tag} attempted to run a music command without being in a voice channel.`)
return false
}
return true
},
isAdminOrHasPerms (user, permissionRole) {
return user.hasPermisssion(Permissions.FLAGS.ADMINISTRATOR) || user.hasPermission(permissionRole)
}
}
In pretty much every other file, I use the embedResponse function. So in the project I have to do require('discord-utils) and then do things like: discordUtils.embedResponse(blahblah...) and while that's fine, it seems really redundant since I know I'm going to be using it just about everywhere. I'm wondering if there's a way I can just use one require statement and pull the functions I need at any time?
You may define a globally accessible variable using the global object in NodeJS. However, this neither a common nor a recommended pattern in NodeJS.
global.foo = 1 // make the foo variable globally accessible
https://nodejs.org/api/globals.html#globals_global
Node.js actually has a neat little caching system that can be taken advantage of to achieve a singleton effect. The first time you require a file, it runs and sets module.exports. Every time you require that same file afterwards, it will return a reference to the same object that was returned the fist time, instead of actually re-executing.
There are some caveats though. It's not always a guarantee that the file won't execute a second time. For example sometimes if you require the file from a very different location far from the first one, it might re-execute the file. Like if you first required the file as require('./my-file') and later require it with require('../../../../my-file'), it could sometimes re-execute it and clear the cached reference.
I would like to perform some arbitrarily expensive work on an arbitrarily large set of files. I would like to report progress in real-time and then display results after all files have been processed. If there are no files that match my expression, I'd like to to throw an error.
Imagine writing a test framework that loads up all of your test files, executes them (in no particular order), reports on progress in real-time, and then displays aggregate results after all tests have been completed.
Writing this code in a blocking language (like Ruby for example), is extremely straightforward.
As it turns out, I'm having trouble performing this seemingly simple task in node, while also truly taking advantage of asynchronous, event-based IO.
My first design, was to perform each step serially.
Load up all of the files, creating a collection of files to process
Process each file in the collection
Report the results when all files have been processed
This approach does work, but doesn't seem quite right to me since it causes the more computationally expensive portion of my program to wait for all of the file IO to complete. Isn't this the kind of waiting that Node was designed to avoid?
My second design, was to process each file as it was asynchronously found on disk. For the sake of argument, let's imagine a method that looks something like:
eachFileMatching(path, expression, callback) {
// recursively, asynchronously traverse the file system,
// calling callback every time a file name matches expression.
}
And a consumer of this method that looks something like this:
eachFileMatching('test/', /_test.js/, function(err, testFile) {
// read and process the content of testFile
});
While this design feels like a very 'node' way of working with IO, it suffers from 2 major problems (at least in my presumably erroneous implementation):
I have no idea when all of the files have been processed, so I don't know when to assemble and publish results.
Because the file reads are nonblocking, and recursive, I'm struggling with how to know if no files were found.
I'm hoping that I'm simply doing something wrong, and that there is some reasonably simple strategy that other folks use to make the second approach work.
Even though this example uses a test framework, I have a variety of other projects that bump up against this exact same problem, and I imagine anyone writing a reasonably sophisticated application that accesses the file system in node would too.
What do you mean by "read and process the content of testFile"?
I don't understand why you have no idea when all of the files are processed. Are you not using Streams? A stream has several events, not just data. If you handle the end events then you will know when each file has finished.
For instance you might have a list of filenames, set up the processing for each file, and then when you get an end event, delete the filename from the list. When the list is empty you are done. Or create a FileName object that contains the name and a completion status. When you get an end event, change the status and decrement a filename counter as well. When the counter gets to zero you are done, or if you are not confident you could scan all the FileName object to make sure that their status is completed.
You might also have a timer that checks the counter periodically, and if it doesn't change for some period of time, report that the processing might be stuck on the FileName objects whose status is not completed.
... I just came across this scenario in another question and the accepted answer (plus the github link) explains it well. Check out for loop over event driven code?
As it turns out, the smallest working solution that I've been able to build is much more complicated than I hoped.
Following is code that works for me. It can probably be cleaned up or made slightly more readable here and there, and I'm not interested in feedback like that.
If there is a significantly different way to solve this problem, that is simpler and/or more efficient, I'm very interested in hearing it. It really surprises me that the solution to this seemingly simple requirement would require such a large amount of code, but perhaps that's why someone invented blocking io?
The complexity is really in the desire to meet all of the following requirements:
Handle files as they are found
Know when the search is complete
Know if no files are found
Here's the code:
/**
* Call fileHandler with the file name and file Stat for each file found inside
* of the provided directory.
*
* Call the optionally provided completeHandler with an array of files (mingled
* with directories) and an array of Stat objects (one for each of the found
* files.
*
* Following is an example of a simple usage:
*
* eachFileOrDirectory('test/', function(err, file, stat) {
* if (err) throw err;
* if (!stat.isDirectory()) {
* console.log(">> Found file: " + file);
* }
* });
*
* Following is an example that waits for all files and directories to be
* scanned and then uses the entire result to do something:
*
* eachFileOrDirectory('test/', null, function(files, stats) {
* if (err) throw err;
* var len = files.length;
* for (var i = 0; i < len; i++) {
* if (!stats[i].isDirectory()) {
* console.log(">> Found file: " + files[i]);
* }
* }
* });
*/
var eachFileOrDirectory = function(directory, fileHandler, completeHandler) {
var filesToCheck = 0;
var checkedFiles = [];
var checkedStats = [];
directory = (directory) ? directory : './';
var fullFilePath = function(dir, file) {
return dir.replace(/\/$/, '') + '/' + file;
};
var checkComplete = function() {
if (filesToCheck == 0 && completeHandler) {
completeHandler(null, checkedFiles, checkedStats);
}
};
var onFileOrDirectory = function(fileOrDirectory) {
filesToCheck++;
fs.stat(fileOrDirectory, function(err, stat) {
filesToCheck--;
if (err) return fileHandler(err);
checkedFiles.push(fileOrDirectory);
checkedStats.push(stat);
fileHandler(null, fileOrDirectory, stat);
if (stat.isDirectory()) {
onDirectory(fileOrDirectory);
}
checkComplete();
});
};
var onDirectory = function(dir) {
filesToCheck++;
fs.readdir(dir, function(err, files) {
filesToCheck--;
if (err) return fileHandler(err);
files.forEach(function(file, index) {
file = fullFilePath(dir, file);
onFileOrDirectory(file);
});
checkComplete();
});
}
onFileOrDirectory(directory);
};
2 ways of doing this, first and probably considered serially would go something like
var files = [];
doFile(files, oncomplete);
function doFile(files, oncomplete) {
if (files.length === 0) return oncomplete();
var f = files.pop();
processFile(f, function(err) {
// Handle error if any
doFile(files, oncomplete); // Recurse
});
};
function processFile(file, callback) {
// Do whatever you want to do and once
// done call the callback
...
callback();
};
Second way, lets call it parallel is similar and goes summin like:
var files = [];
doFiles(files, oncomplete);
function doFiles(files, oncomplete) {
var exp = files.length;
var done = 0;
for (var i = 0; i < exp; i++) {
processFile(files[i], function(err) {
// Handle errors (but still need to increment counter)
if (++done === exp) return oncomplete();
});
}
};
function processFile(file, callback) {
// Do whatever you want to do and once
// done call the callback
...
callback();
};
Now it may seem obvious you should use the second approach but you'll find that for IO intensive operations you dont really get any performance gains when parallelising. One dissadvantage of first approach is that the recursion can blow out your stack trace.
Tnx
Guido