update Ok, this seems to be linked to through2's "highWaterMark" property. Basically, it means "don't buffer more than x files, wait for someone to consume it and then only then accept another batch of files". Since it works this way by design, the snippet in this question is being reviewed. There must be a better way to handle many files.
Quick fix, allowing 8000 files:
through.obj({ highWaterMark: 8000 }, (file, enc, next) => { ... })
Original question
I'm using a gulp task to create translation files. It scans an src folder for *.i18n.json files and saves one .json per language it finds within the source files.
It works fine - until it finds more than 16 files. It's using through2 for the processing of each file. See source code below. The method processAll18nFiles() is a custom pipe that receives the matching input files, reads the content of each files, constructs the resulting dictionaries on the fly, then finally hands it over to the on('finish) handler to write the dictionaries.
Tested on windows and mac. There seems to be a limitation that my approach hits, because it's working just fine with 16 files or less.
Still looking, clues welcome :-)
source file example: signs.i18n.json
{
"path": "profile.signs",
"data": {
"title": {
"fr": "mes signes précurseurs",
"en": "my warning signs"
},
"add": {
"fr": "ajouter un nouveau signe",
"en": "add a new warning sign"
}
}
}
output file example: en.json
{"profile":{"signs":{"title":"my warning signs","add":"add a new warning sign"}}}
gulpfile.js
const fs = require('fs');
const path = require('path');
const gulp = require('gulp');
const watch = require('gulp-watch');
const through = require('through2');
const searchPatternFolder = 'src/app/**/*.i18n.json';
const outputFolder = path.join('src', 'assets', 'i18n');
gulp.task('default', () => {
console.log('Ionosphere Gulp tasks');
console.log(' > gulp i18n builds the i18n file.');
console.log(' > gulp i18n:watch watches i18n file and trigger build.');
});
gulp.task('i18n:watch', () => watch(searchPatternFolder, { ignoreInitial: false }, () => gulp.start('i18n')));
gulp.task('i18n', done => processAll18nFiles(done));
function processAll18nFiles(done) {
const dictionary = {};
console.log('[i18n] Rebuilding...');
gulp
.src(searchPatternFolder)
.pipe(
through.obj((file, enc, next) => {
console.log('doing ', file.path);
const i18n = JSON.parse(file.contents.toString('utf8'));
composeDictionary(dictionary, i18n.data, i18n.path.split('.'));
next(null, file);
})
)
.on('finish', () => {
const writes = [];
Object.keys(dictionary).forEach(langKey => {
console.log('lang key ', langKey);
writes.push(writeDictionary(langKey, dictionary[langKey]));
});
Promise.all(writes)
.then(data => done())
.catch(err => console.log('ERROR ', err));
});
}
function composeDictionary(dictionary, data, path) {
Object.keys(data)
.map(key => ({ key, data: data[key] }))
.forEach(({ key, data }) => {
if (isString(data)) {
setDictionaryEntry(dictionary, key, path, data);
} else {
composeDictionary(dictionary, data, [...path, key]);
}
});
}
function isString(x) {
return Object.prototype.toString.call(x) === '[object String]';
}
function initDictionaryEntry(key, dictionary) {
if (!dictionary[key]) {
dictionary[key] = {};
}
return dictionary[key];
}
function setDictionaryEntry(dictionary, langKey, path, data) {
initDictionaryEntry(langKey, dictionary);
let subDict = dictionary[langKey];
path.forEach(subKey => {
isLastToken = path[path.length - 1] === subKey;
if (isLastToken) {
subDict[subKey] = data;
} else {
subDict = initDictionaryEntry(subKey, subDict);
}
});
}
function writeDictionary(lang, data) {
return new Promise((resolve, reject) => {
fs.writeFile(
path.join(outputFolder, lang + '.json'),
JSON.stringify(data),
'utf8',
err => (err ? reject(err) : resolve())
);
});
}
Ok, as explained here, one must consume the pipe. This is done by adding a handler of 'data' events such as:
gulp
.src(searchPatternFolder)
.pipe(
through.obj({ highWaterMark: 4, objectMode: true }, (file, enc, next) => {
const { data, path } = JSON.parse(file.contents.toString('utf8'));
next(null, { data, path });
})
)
// The next line handles the "consumption" of upstream pipings
.on('data', ({ data, path }) => ++count && composeDictionary(dictionary, data, path.split('.')))
.on('end', () =>
Promise.all(Object.keys(dictionary).map(langKey => writeDictionary(langKey, dictionary[langKey])))
.then(() => {
console.log(`[i18n] Done, ${count} files processed, language count: ${Object.keys(dictionary).length}`);
done();
})
.catch(err => console.log('ERROR ', err))
);
Related
i am trying to unzip multiple files using NodeJs and StreamZip. this is what i am trying:
export const unpackZip = async (folder: string) => {
const zipPath = await getFilesByExtension(folder, ".zip").then((zipFile)=>{console.log("ZIPFILE", zipFile)
return zipFile})
console.log("DEBUG PATH: ", zipPath)
let zip: StreamZip
await Promise.all(zipPath.map((zipFile) => {
return new Promise<string>((resolve, reject) => {
zip = new StreamZip({ storeEntries: true, file: `${folder}/${zipFile}` });
zip.on('error', function (err) { console.error('[ERROR]', err); });
zip.on('ready', function () {
console.log('All entries read: ' + zip.entriesCount);
console.log(zip.entries());
});
zip.on('entry', function (entry) {
const pathname = path.resolve('./tmp', entry.name);
if (/\.\./.test(path.relative('./tmp', pathname))) {
console.warn("[zip warn]: ignoring maliciously crafted paths in zip file:", entry.name);
return;
}
if ('/' === entry.name[entry.name.length - 1]) {
console.log('[DIR]', entry.name);
return;
}
zip.extract(entry, `tmp/${entry.name}`, (err?: string, res?: number | undefined) => {
resolve(entry.name)
})
})
})
}))
};
the problem is, that it does indeed go through all the zip files in the folder (getFilesByExtension returns an array of filename strings like [asdf.zip, asdf1.zip, ... ])
but the actual filecontent from all unpacked zips is from the first zip. a Screenshot may say more than i can:
can someone spot the problem in the code?! i am kind of clueless where the issue could be :/
any help would be awesome!! Thanks!!
Never mind... i cannot use map here - if i use a for loop and await my new Promise it works:
for (const zipFile of zipPath) {
await new Promise<string>((resolve, reject) => {
zip = new StreamZip({ storeEntries: true, file: `${folder}/${zipFile}` });
zip.on('error', function (err) { console.error('[ERROR]', err); });
zip.on('ready', function () {
console.log('All entries read: ' + zip.entriesCount);
console.log(zip.entries());
});
zip.on('entry', function (entry) {
const pathname = path.resolve('./tmp', entry.name);
if (/\.\./.test(path.relative('./tmp', pathname))) {
console.warn("[zip warn]: ignoring maliciously crafted paths in zip file:", entry.name);
return;
}
if ('/' === entry.name[entry.name.length - 1]) {
console.log('[DIR]', entry.name);
return;
}
zip.extract(entry, `tmp/${entry.name}`, (err?: string, res?: number | undefined) => {
resolve(entry.name)
})
})
})
}
I am using JEST for testing a package I am creating and I am using 100% coverage in all aspects.
On windows everything works fine, coverage is 100% as expected but when I run it on a docker even though the node version is the same (14.18.1), jest fails to detect 2 lines of code that I'm am sure that they were reached, here is the code:
import callIf from 'src/tools/call-if';
import zlib from 'zlib';
export default class CompressorEngine {
static preparaData<T>(data: T): Buffer {
const isNullOrUndefined = [null, undefined].includes(data as any);
if (isNullOrUndefined) {
return Buffer.from('null', 'utf-8');
}
return Buffer.from(JSON.stringify(data), 'utf-8');
}
static compress(uncompressedData: Buffer, compressionLevel: number = 9) {
return new Promise<Buffer>((resolve, reject) => {
zlib.deflate(uncompressedData, { level: compressionLevel }, (error, result) => {
callIf(error, reject, error);
callIf(!error, resolve, result);
});
});
}
static decompress(compressedData: Buffer, compressionLevel: number = 9) {
return new Promise<Buffer>((resolve, reject) => {
zlib.inflate(compressedData, { level: compressionLevel }, (error, result) => {
callIf(error, reject, error);
callIf(!error, resolve, result);
});
});
}
}
the src/tools/call-if code:
const callIf = (expression: any, callback: Function, ...args: any[]) => {
if (expression) return callback(...args);
};
export default callIf;
and here are my tests for this class:
import CompressorEngine from 'src/core/compressor';
describe('CompressorEngine', () => {
it('should compress and decompress data', async () => {
const input = 'test';
const compressed = await CompressorEngine.compress(CompressorEngine.preparaData(input));
const uncompressed = await CompressorEngine.decompress(compressed);
expect(uncompressed.toString('utf-8')).toEqual(`"${input}"`);
});
it('should compress and decompress null as "null"', async () => {
const input = null;
const compressed = await CompressorEngine.compress(CompressorEngine.preparaData(input));
const uncompressed = await CompressorEngine.decompress(compressed);
expect(JSON.parse(uncompressed.toString('utf-8'))).toEqual(null);
});
describe('prepareData', () => {
it('should return a buffer for null value as "null"', () => {
Array.prototype.includes = jest.fn(Array.prototype.includes);
const buffer = CompressorEngine.preparaData(null);
expect(buffer.toString('utf-8')).toBe('null');
expect(Array.prototype.includes).toHaveBeenCalled();
expect(Array.prototype.includes).toHaveReturnedWith(true);
});
it('should return a buffer for undefined value as "null"', () => {
const buffer = CompressorEngine.preparaData(undefined);
expect(buffer.toString('utf-8')).toBe('null');
});
});
});
The problem is on prepareData it says as if the if(true) is not reached, but it is as the first prepareData specific test describes "expect(Array.prototype.includes).toHaveReturnedWith(true);" but the generated coverage says it wasn't covered, how come?
below is the working code in which it can post images but is there any way i can also share videos as instagram story?
the error i get when i try to post video instead of image are:**
error image
PS D:\Softwares\programming\Insta Bot\story> node index.js
18:45:11 - info: Dry Run Activated
18:45:11 - info: Post() called! ======================
18:45:11 - debug: 1 files found in ./images/
18:45:11 - warn: Record file not found, saying yes to D:\Softwares\programming\Insta Bot\story\images\meme.mp4
18:45:11 - debug: Read File Success
18:45:11 - error: undefined
(MAIN CODE)
index.js
const logger = require("./logger.js")
const { random, sleep } = require('./utils')
require('dotenv').config();
const { IgApiClient, IgLoginTwoFactorRequiredError } = require("instagram-private-api");
const ig = new IgApiClient();
const Bluebird = require('bluebird');
const inquirer = require('inquirer');
const { CronJob } = require('cron');
const path = require("path");
const fs = require("fs");
const fsp = fs.promises;
const sharp = require("sharp");
//==================================================================================
const statePath = "./etc/state.conf";
const recordPath = "./etc/usedfiles.jsonl";
const imgFolderPath = "./images/";
const dryrun = true;
const runOnStart = true;
//==================================================================================
(async () => { // FOR AWAIT
// LOGIN TO INSTAGRAM
if (!dryrun) {
await login();
logger.info("Log In Successful");
} else {
logger.info("Dry Run Activated");
}
// SCHEDULER
// logger.silly("I'm a schedule, and I'm running!! :)");
const job = new CronJob('38 43 * * * *', post, null, true); //https://crontab.guru/
if (!runOnStart) logger.info(`Next few posts scheduled for: \n${job.nextDates(3).join("\n")}\n`);
else post();
// MAIN POST COMMAND
async function post() {
logger.info("Post() called! ======================");
let postPromise = fsp.readdir(imgFolderPath)
.then(filenames => {
if (filenames.length < 1) throw new Error(`Folder ${imgFolderPath} is empty...`)
logger.debug(`${filenames.length} files found in ${imgFolderPath}`);
return filenames;
})
.then(filenames => filenames.map(file => path.resolve(imgFolderPath + file)))
.then(filenames => pickUnusedFileFrom(filenames, filenames.length))
.then(filename => {
if (!dryrun) registerFileUsed(filename)
return filename
})
.then(fsp.readFile)
.then(async buffer => {
logger.debug("Read File Success "); //TODO move this to previous then?
return sharp(buffer).jpeg().toBuffer()
.then(file => {
logger.debug("Sharp JPEG Success");
return file
})
})
.then(async file => {
if (!dryrun) {
// await sleep(random(1000, 60000)) //TODO is this necessary?
return ig.publish.story({ file })
.then(fb => logger.info("Posting successful!?"))
}
else return logger.info("Data not sent, dryrun = true")
})
.then(() => logger.info(`Next post scheduled for ${job.nextDates()}\n`))
.catch(logger.error)
}
})();
//=================================================================================
async function login() {
ig.state.generateDevice(process.env.IG_USERNAME);
// ig.state.proxyUrl = process.env.IG_PROXY;
//register callback?
ig.request.end$.subscribe(async () => {
const serialized = await ig.state.serialize();
delete serialized.constants; // this deletes the version info, so you'll always use the version provided by the library
await stateSave(serialized);
});
if (await stateExists()) {
// import state accepts both a string as well as an object
// the string should be a JSON object
const stateObj = await stateLoad();
await ig.state.deserialize(stateObj)
.catch(err => logger.debug("deserialize: " + err));
} else {
let standardLogin = async function() {
// login like normal
await ig.simulate.preLoginFlow();
logger.debug("preLoginFlow finished");
await ig.account.login(process.env.IG_USERNAME, process.env.IG_PASSWORD);
logger.info("Logged in as " + process.env.IG_USERNAME);
process.nextTick(async () => await ig.simulate.postLoginFlow());
logger.debug("postLoginFlow finished");
}
// Perform usual login
// If 2FA is enabled, IgLoginTwoFactorRequiredError will be thrown
return Bluebird.try(standardLogin)
.catch(
IgLoginTwoFactorRequiredError,
async err => {
logger.info("Two Factor Auth Required");
const {username, totp_two_factor_on, two_factor_identifier} = err.response.body.two_factor_info;
// decide which method to use
const verificationMethod = totp_two_factor_on ? '0' : '1'; // default to 1 for SMS
// At this point a code should have been sent
// Get the code
const { code } = await inquirer.prompt([
{
type: 'input',
name: 'code',
message: `Enter code received via ${verificationMethod === '1' ? 'SMS' : 'TOTP'}`,
},
]);
// Use the code to finish the login process
return ig.account.twoFactorLogin({
username,
verificationCode: code,
twoFactorIdentifier: two_factor_identifier,
verificationMethod, // '1' = SMS (default), '0' = TOTP (google auth for example)
trustThisDevice: '1', // Can be omitted as '1' is used by default
});
},
)
.catch(e => logger.error('An error occurred while processing two factor auth', e, e.stack));
}
return
//================================================================================
async function stateSave(data) {
// here you would save it to a file/database etc.
await fsp.mkdir(path.dirname(statePath), { recursive: true }).catch(logger.error);
return fsp.writeFile(statePath, JSON.stringify(data))
// .then(() => logger.info('state saved, daddy-o'))
.catch(err => logger.error("Write error" + err));
}
async function stateExists() {
return fsp.access(statePath, fs.constants.F_OK)
.then(() => {
logger.debug('Can access state info')
return true
})
.catch(() => {
logger.warn('Cannot access state info')
return false
});
}
async function stateLoad() {
// here you would load the data
return fsp.readFile(statePath, 'utf-8')
.then(data => JSON.parse(data))
.then(data => {
logger.info("State load successful");
return data
})
.catch(logger.error)
}
}
async function registerFileUsed( filepath ) {
let data = JSON.stringify({
path: filepath,
time: new Date().toISOString()
}) + '\n';
return fsp.appendFile(recordPath, data, { encoding: 'utf8', flag: 'a+' } )
.then(() => {
logger.debug("Writing filename to record file");
return filepath
})
}
function pickUnusedFileFrom( filenames, iMax = 1000) {
return new Promise((resolve, reject) => {
let checkFileUsed = async function ( filepath ) {
return fsp.readFile(recordPath, 'utf8')
.then(data => data.split('\n'))
.then(arr => arr.filter(Boolean))
.then(arr => arr.map(JSON.parse))
.then(arr => arr.some(entry => entry.path === filepath))
}
let trythis = function( iMax, i = 1) {
let file = random(filenames);
checkFileUsed(file)
.then(async used => {
if (!used) {
logger.info(`Unused file found! ${file}`);
resolve(file);
} else if (i < iMax) {
logger.debug(`Try #${i}: File ${file} used already`);
await sleep(50);
trythis(iMax, ++i)
} else {
reject(`I tried ${iMax} times and all the files I tried were previously used`)
}
})
.catch(err => {
logger.warn("Record file not found, saying yes to " + file);
resolve(file);
})
}( iMax );
})
}
On my local dev machine accessing localhost the following code works beautifully even with network settings changed to "Slow 3G." However, when running on my VPS, it fails to process the file on the server. Here are two different codes blocks I tried (again, both work without issue on local dev machine accessing localhost)
profilePicUpload: async (parent, args) => {
const file = await args.file;
const fileName = `user-${nanoid(3)}.jpg`;
const tmpFilePath = path.join(__dirname, `../../tmp/${fileName}`);
file
.createReadStream()
.pipe(createWriteStream(tmpFilePath))
.on('finish', () => {
jimp
.read(`tmp/${fileName}`)
.then(image => {
image.cover(300, 300).quality(60);
image.writeAsync(`static/uploads/users/${fileName}`, jimp.AUTO);
})
.catch(error => {
throw new Error(error);
});
});
}
It seems like this code block doesn't wait long enough for the file upload to finish since if I check the storage location on the VPS, I see this:
I also tried the following with no luck:
profilePicUpload: async (parent, args) => {
const { createReadStream } = await args.file;
let data = '';
const fileStream = await createReadStream();
fileStream.setEncoding('binary');
// UPDATE: 11-2
let i = 0;
fileStream.on('data', chunk => {
console.log(i);
i++;
data += chunk;
});
fileStream.on('error', err => {
console.log(err);
});
// END UPDATE
fileStream.on('end', () => {
const file = Buffer.from(data, 'binary');
jimp
.read(file)
.then(image => {
image.cover(300, 300).quality(60);
image.writeAsync(`static/uploads/users/${fileName}`, jimp.AUTO);
})
.catch(error => {
throw new Error(error);
});
});
}
With this code, I don't even get a partial file.
jimp is a JS library for image manipulation.
If anyone has any hints to get this working properly, I'd appreciate it very much. Please let me know if I'm missing some info.
I was able to figure out a solution by referring to this article: https://nodesource.com/blog/understanding-streams-in-nodejs/
Here is my final, working code:
const { createWriteStream, unlink } = require('fs');
const path = require('path');
const { once } = require('events');
const { promisify } = require('util');
const stream = require('stream');
const jimp = require('jimp');
profilePicUpload: async (parent, args) => {
// have to wait while file is uploaded
const { createReadStream } = await args.file;
const fileStream = createReadStream();
const fileName = `user-${args.uid}-${nanoid(3)}.jpg`;
const tmpFilePath = path.join(__dirname, `../../tmp/${fileName}`);
const tmpFileStream = createWriteStream(tmpFilePath, {
encoding: 'binary'
});
const finished = promisify(stream.finished);
fileStream.setEncoding('binary');
// apparently async iterators is the way to go
for await (const chunk of fileStream) {
if (!tmpFileStream.write(chunk)) {
await once(tmpFileStream, 'drain');
}
}
tmpFileStream.end(() => {
jimp
.read(`tmp/${fileName}`)
.then(image => {
image.cover(300, 300).quality(60);
image.writeAsync(`static/uploads/users/${fileName}`, jimp.AUTO);
})
.then(() => {
unlink(tmpFilePath, error => {
console.log(error);
});
})
.catch(error => {
console.log(error);
});
});
await finished(tmpFileStream);
}
I have a stream that's checking a CSV. It works fine except when emitting an error it hangs even after I send the response back.
export function ValidateCSV(options) {
let opt = options;
if (!(this instanceof ValidateCSV)) return new ValidateCSV(opt);
if (!opt) opt = {};
opt.objectMode = true;
opt.highWaterMark = 1000000;
Transform.call(this, opt);
}
util.inherits(ValidateCSV, Transform);
ValidateCSV.prototype.destroy = function () {
this.readable = false;
this.writable = false;
this.emit('end');
};
ValidateCSV.prototype._transform = function (chunk, encoding, done) {
// Do some stuff to the chunk
// Emit error
if (required.length > 0) {
this.emit('error', `The following columns are required: ${required.join(', ')}`);
}
done();
};
I was able to fix it by adding a destroy method but it is still slow and hangs for a few seconds with it. Is there a better way to end/destroy a Transform stream?
ValidateCSV.prototype.destroy = function () {
this.readable = false;
this.writable = false;
this.emit('end');
};
EDIT:
Here is how I'm using the stream with busboy:
function processMultipart(req, res) {
const userId = req.query._userId;
const busboy = new Busboy({ headers: req.headers, limits: { files: 1 } });
const updateId = req.params.id;
// Transform stream to validate the csv
const validateCSV = new ValidateCSV();
validateCSV
.on('finish', () => {
// Process the csv
})
.on('error', (er) => {
//Do some logging
res.status(500).json(er).end();
});
// Multipart upload handler
busboy
.on('file', (fieldname, file, filename) => {
dataset.name = fieldname.length > 0 ?
fieldname : filename.substr(0, filename.indexOf('.csv'));
file
.on('error', (er) => {
//Send Error
})
.on('end', () => {
// Save dataset to mongo
if (dataset._update) {
res.status(200).json(dataset).end();
} else {
Dataset.create(dataset, (er) => {
if (er) {
res.status(500).json(er).end();
} else {
res.status(200).json(dataset).end();
}
});
}
}).pipe(validateCSV);
});
req.pipe(busboy);
}