i am trying to unzip multiple files using NodeJs and StreamZip. this is what i am trying:
export const unpackZip = async (folder: string) => {
const zipPath = await getFilesByExtension(folder, ".zip").then((zipFile)=>{console.log("ZIPFILE", zipFile)
return zipFile})
console.log("DEBUG PATH: ", zipPath)
let zip: StreamZip
await Promise.all(zipPath.map((zipFile) => {
return new Promise<string>((resolve, reject) => {
zip = new StreamZip({ storeEntries: true, file: `${folder}/${zipFile}` });
zip.on('error', function (err) { console.error('[ERROR]', err); });
zip.on('ready', function () {
console.log('All entries read: ' + zip.entriesCount);
console.log(zip.entries());
});
zip.on('entry', function (entry) {
const pathname = path.resolve('./tmp', entry.name);
if (/\.\./.test(path.relative('./tmp', pathname))) {
console.warn("[zip warn]: ignoring maliciously crafted paths in zip file:", entry.name);
return;
}
if ('/' === entry.name[entry.name.length - 1]) {
console.log('[DIR]', entry.name);
return;
}
zip.extract(entry, `tmp/${entry.name}`, (err?: string, res?: number | undefined) => {
resolve(entry.name)
})
})
})
}))
};
the problem is, that it does indeed go through all the zip files in the folder (getFilesByExtension returns an array of filename strings like [asdf.zip, asdf1.zip, ... ])
but the actual filecontent from all unpacked zips is from the first zip. a Screenshot may say more than i can:
can someone spot the problem in the code?! i am kind of clueless where the issue could be :/
any help would be awesome!! Thanks!!
Never mind... i cannot use map here - if i use a for loop and await my new Promise it works:
for (const zipFile of zipPath) {
await new Promise<string>((resolve, reject) => {
zip = new StreamZip({ storeEntries: true, file: `${folder}/${zipFile}` });
zip.on('error', function (err) { console.error('[ERROR]', err); });
zip.on('ready', function () {
console.log('All entries read: ' + zip.entriesCount);
console.log(zip.entries());
});
zip.on('entry', function (entry) {
const pathname = path.resolve('./tmp', entry.name);
if (/\.\./.test(path.relative('./tmp', pathname))) {
console.warn("[zip warn]: ignoring maliciously crafted paths in zip file:", entry.name);
return;
}
if ('/' === entry.name[entry.name.length - 1]) {
console.log('[DIR]', entry.name);
return;
}
zip.extract(entry, `tmp/${entry.name}`, (err?: string, res?: number | undefined) => {
resolve(entry.name)
})
})
})
}
Related
There was similar questions/answers but not recently and none with the exact requirements.
I have many pictures for a dating app on Firebase Storage, uploaded from the users, with a downloadUrl saved on Firestore. Just noticed it is saved as very big pictures, and slow down loading of the users. Result: I need to resize and reformat to jpeg all pictures on firebase storage.
My research and trials for now 2 months brought me to the following conclusions:
It's not possible through Google Functions as the quota of 9 minutes is too slow to do the whole resizing.
Sharp is the best library to do this, but better do it locally.
I can use gsutil as in this Question Here to download all pictures and keep the path, resize it, and upload it later.
I was blocked at finding how I can resize/reformat with Sharp and whilst the name will be different and probably the metadata kept out, how can I uploaded it back and at the same time get a new downloadUrl so that I can in turn upload it to firestore in the users collection?
MY POTENTIAL SOLUTION (STEP 4):
Not sure if it will work, but I'd have a listening function for changed (finalized) object and getting info from the image to upload it back on firestore, using a self-made downloadUrl.
MY NEW QUESTION: Is it going to work? I'm afraid to break the pictures of all my users...
For your better understanding, here is my process so far:
1. Download Images
gsutil cp -r gs://my-bucket/data [path where you want to download]
2. Script (typescript) to resize/reformat them.
import * as fs from "fs";
import sharp from "sharp";
import * as path from "path";
const walk = (dir: string, done) => {
let results = [];
fs.readdir(dir, (err, list) => {
if (err) return done(err);
let i = 0;
(function next() {
let file = list[i++];
if (!file) return done(null, results);
file = path.resolve(dir, file);
fs.stat(file, (err, stat) => {
if (stat && stat.isDirectory()) {
walk(file, (err, res) => {
results = results.concat(res);
next();
});
} else {
results.push(file);
next();
}
});
})();
});
};
const reformatImage = async (filesPaths: string[]) => {
let newFilesPaths: string[] = [];
await Promise.all(
filesPaths.map(async (filePath) => {
let newFileName = changeExtensionName(filePath);
let newFilePath = path.join(path.dirname(filePath), NewFileName);
if (filePath === newFilePath) {
newFileName = "rszd-" + newFileName;
newFilePath = path.join(path.dirname(filePath), newFileName);
}
newFilesPaths.push(newFilePath);
try {
await sharp(filePath)
.withMetadata()
.resize(600, 800, {
fit: sharp.fit.inside,
})
.toFormat("jpeg")
.jpeg({
mozjpeg: true,
force: true,
})
.toFile(newFilePath)
.then(async (info) => {
console.log("converted file...", info);
})
.catch((error) => {
console.log("sharp error: ", error);
});
} catch (error) {
console.error("error converting...", error);
}
})
);
console.log("THIS IS THE RESIZED IMAGES");
console.log(newFilesPaths);
};
const changeExtensionName = (filePath: string) => {
const ext = path.extname(filePath || "");
const virginName = path.basename(filePath, ext);
const newName = virginName + ".jpg";
return newName;
};
walk("./xxxxxx.appspot.com", (err, results) => {
if (err) throw err;
console.log("THIS IS THE DOWNLOADED IMAGES");
console.log(results);
reformatImage(results);
});
3. Re-upload the files
gsutil cp -r [path your images] gs://my-bucket/data
4. Listen for new file update through a Firebase Functions, and update the new downloadUrl
export const onOldImageResizedUpdateDowloadUrl = functions.storage
.object()
.onFinalize(async (object: any) => {
if (object) {
functions.logger.log('OBJECT: ', object);
const fileBucket = object.bucket;
const filePath: string = object.name;
const userId = path.basename(path.dirname(filePath));
const fileName = path.basename(filePath);
const isResized = fileName.startsWith('rszd-');
if (!isResized) {return;}
const token = object.metadata.firebaseStorageDownloadTokens;
const downloadUrl = createDownloadUrl(
fileBucket,
token,
userId,
fileName
);
const pictureId = 'picture' + fileName.charAt(5); // pictures are named as eg "rszd-" + 1.jpeg
admin
.firestore()
.collection('users')
.doc(userId)
.update({ [pictureId]: downloadUrl });
}
});
function createDownloadUrl(
bucketPath: string,
downloadToken: string,
uid: string,
fileName: string) {
return `https://firebasestorage.googleapis.com/v0/b/${bucketPath}/o/pictures-profil%2F${uid}%2F${fileName}?alt=media&token=${downloadToken}`;
}
I've been trying for 1 week without success to render the psd conversion through this nodejs module: https://www.npmjs.com/package/psd trying to make a confirmation message appear after converting all the images.
I don't know if the problem can be traced to my code or promise compliance, I've tried like 50 times to change every aspect of this code.
In psdconverter.js file
//! PSD COMPONENT MODULES
var PSD = require('psd');
//! file and extension MODULES
const fs = require('fs');
var path = require('path');
const PSDconverter = async (filename) => {
return new Promise((resolve, reject) => {
PSD.open('./img/' + filename).then(function (psd) {
let newfilename = filename.replace(/.psd/i, ""); //REPLACE CASE INSENSITIVE
psd.image.saveAsPng('./img/' + newfilename + '.png');
return newfilename;
}).then(function (res) {
console.log('PSD Conversion Finished!' + res);
resolve(res);
}).catch(function (err) {
console.log(err);
});
})
}
const EnumAndConvert = async () => {
return new Promise((resolve, reject) => {
//! READ DIR IMAGE AND CONVERTION PART
fs.readdir('./img/', (err, files) => {
if (err)
console.log(err + ' errore di conversione, non è stata trovata la cartella img!');
else {
for (let filename of files){
var ext = path.extname('./img/' + filename);
if (ext === '.PSD' || ext === '.psd')
await PSDconverter(filename);
}
}
})
resolve("Everything is converted successfully");
})
}
exports.PSDconverter = PSDconverter;
exports.EnumAndConvert = EnumAndConvert;
in index.js file
function PSDconverter() {
//! convertitore PSD
let EnumPSDAndConvert = require('./psdconverter.js');
EnumPSDAndConvert.EnumAndConvert().then((res) => { //dopo che è ritornata la Promise risolta continuo
console.log(res+"ciao");
})
}
ERROR RESULT:
await PSDconverter(filename);
^^^^^
SyntaxError: await is only valid in async function
When i want the first to be the last one.
Thank you for every help!
Ok the solution is:
index.js
function PSDconverter() {
//! convertitore PSD
let EnumPSDAndConvert = require('./psdconverter.js');
EnumPSDAndConvert.EnumAndConvert().then(() => {
console.log("Conversion Completed");
})
}
psdconverter.js
//! PSD COMPONENT MODULES
var PSD = require('psd');
//! file and extension MODULES
const fs = require('fs');
var path = require('path');
const PSDconverter = (filename) => { //without async
return PSD.open('./img/' + filename).then(function (psd) {
let newfilename = filename.replace(/.psd/i, ""); //REPLACE CASE INSENSITIVE
psd.image.saveAsPng('./img/' + newfilename + '.png');
return newfilename;
}).then(function (res) {
console.log('PSD Conversion Finished!' + res);
}).catch(function (err) {
console.log(err);
});
}
function readImgDir() {
return new Promise((resolve, reject) => {
fs.readdir('./img/', (err, files) => {
if (err)
console.log(err + ' errore di conversione, non è stata trovata la cartella img!');
else {
resolve(files);
}
})
})
}
const EnumAndConvert = async () => {
var files = await readImgDir(); //! READ DIR IMAGE AND CONVERTION PART
for (let filename of files) {
var ext = path.extname('./img/' + filename);
if (ext === '.PSD' || ext === '.psd')
await PSDconverter(filename);
}
}
exports.PSDconverter = PSDconverter;
exports.EnumAndConvert = EnumAndConvert;
If there are any suggestions on how to improve the code I would be curious.
Thanks Again
I am new to nodejs and just started learning. I need to read 5 json files and place them in an array. I have created 2 functions: readDirectory and processFile.
let transactionArray = [];
router.get('/', (req,res) => {
//joining path of directory
const directoryPath = path.join(__dirname, '../data');
readDirectory(directoryPath);
res.send(JSON.stringify(transactionArray))
})
readDirectory will get the dir and will read the filenames.
function readDirectory(directoryPath){
//passsing directoryPath and callback function
fs.readdir(directoryPath, function (err, files) {
//handling error
if (err) {
return console.log('Unable to scan directory: ' + err);
}
//listing all files using map
let fileSummary = files.map(file => {
//get the filename
let categoryName = ''
if (file.includes('category1')) {
categoryName = 'category1'
} else if (file.includes('category2')) {
categoryName = 'category2'
} else {
categoryName = 'Others'
}
// read the file
const filePath = directoryPath +'/'+ file
fs.readFile(filePath, 'utf8', (err, fileContents) => {
if (err) {
console.error(err)
return
}
try {
let data = JSON.parse(fileContents, categoryName)
processFile(data, categoryName);
} catch(err) {
console.error(err)
}
})
})
});
}
Then it will read the file using function processFile.
function processFile(data, categoryName)
{
let paymentSource = ''
if (categoryName == 'category1'){
paymentSource = categoryName +': '+ categoryName +' '+ data.currency_code
} else if (categoryName == 'category2') {
paymentSource = categoryName +': '+ data.extra.payer +'-'+ data.currency_code
} else {
paymentSource = 'Others'
}
let transactionDetails = new Transaction(
data.id,
data.description,
categoryName,
data.made_on,
data.amount,
data.currency_code,
paymentSource)
transactionArray.push(transactionDetails)
console.log(transactionArray);
}
The console log is something like this:
[{Transaction1}] [{Transaction1},{Transaction2}] [{Transaction1},{Transaction2},{Transaction3}]
but the result on the UI is only []
During debug, I noticed that it is not reading synchronously so I tried using readFileSync but it did not work. How can I read both functions synchronously so it will not give an empty array?
Do some playing around to understand what the fs functions do when they have callbacks, and when they're synchronous. From the code that you have we have make a few changes so that you don't have to use the synchronous functions from the file system library.
First of all you need to wait for all the asynchronous tasks to complete before returning response.
router.get('/', async (req, res) => {
// joining path of directory
const directoryPath = path.join(__dirname, '../data')
readDirectory(directoryPath).then(() => {
res.send(JSON.stringify(transactionArray))
}).catch(err => {
res.status(500).json(err)
})
})
Secondly, to keep the code as is as to teach you something about promises, lets wrap the first function in a promise.
function readDirectory (directoryPath) {
return new Promise((resolve, reject) => {
// passsing directoryPath and callback function
fs.readdir(directoryPath, function (err, files) {
// handling error
if (err) {
return console.log('Unable to scan directory: ' + err)
}
// listing all files using map
const fileSummary = Promise.all(
files.map(file => {
return new Promise((resolve, reject) => {
// get the filename
let categoryName = ''
if (file.includes('category1')) {
categoryName = 'category1'
} else if (file.includes('category2')) {
categoryName = 'category2'
} else {
categoryName = 'Others'
}
// read the file
const filePath = directoryPath + '/' + file
fs.readFile(filePath, 'utf8', (err, fileContents) => {
if (err) {
console.error(err)
reject(err)
}
try {
const data = JSON.parse(fileContents, categoryName)
processFile(data, categoryName).then(data => {
data()
})
} catch (err) {
console.error(err)
reject(err)
}
})
})
})
).then(() => {
resolve()
}).catch(err => {
reject(err)
})
})
})
}
Please refer to the bible (MDN) for javascript about promises -> https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise
And finally wrap the processFile function in a promise
function processFile (data, categoryName) {
return new Promise((resolve, reject) => {
let paymentSource = ''
if (categoryName == 'category1') {
paymentSource = categoryName + ': ' + categoryName + ' ' + data.currency_code
} else if (categoryName == 'category2') {
paymentSource = categoryName + ': ' + data.extra.payer + '-' + data.currency_code
} else {
paymentSource = 'Others'
}
const transactionDetails = new Transaction(
data.id,
data.description,
categoryName,
data.made_on,
data.amount,
data.currency_code,
paymentSource)
transactionArray.push(transactionDetails)
console.log(transactionArray)
resolve()
})
}
What the heck am I doing? I'm just making your code execute asynchronous task, but wait for them to be completed before moving on. Promises are a way to handle this. You can easily pull this off with the FS synchronous functions, but this way you can learn about promises!
I have an S3 bucket which has a folder with some files I want to download all the files in that folder to the local machine folder I tried for the single file it's working how to download multiple files.
As per below code in key folderA has 10 files I want to download all the ten to localfolder directory which I mentioned in s3.getObject(params).createReadStream().pipe(ws);
My code :
const downloadObject = () => {
var params = { Bucket: "Sample", Key:"folderA/"};
const ws = fs.createWriteStream(`${__dirname}/localfolder/`);
const s3Stream = s3.getObject(params).createReadStream().pipe(ws);
s3Stream.on("error", (err) => {
ws.end();
});
s3Stream.on("close", () => {
console.log(`downloaded successfully from s3 at ${new Date()}`);
ws.end();
});
};
expected output:
s3 -> bucket/folderA/<10 files>
localmachine -> localfolder/<need all 10 files in local>
There is quite alot to it,
Firstly you would need to list all buckets, then loop over all the buckets (if you only want one fine). Create a local directory if not found etc.
Then find out all files in the bucket and then loop over them, on each path you the get the object and store it.
Here is how would do it with the minio js client (the calls would be the same) tweak it to your needs obviously the folder paths would be different.
/**
* S3 images pull script
*/
const fs = require('fs')
const path = require('path')
const util = require('util')
const readFile = util.promisify(fs.readFile)
const writeFile = util.promisify(fs.writeFile)
//
const rootPath = path.join(__dirname, '..')
const publicPath = path.join(rootPath, 'public', 'images')
//
require('dotenv').config({
path: path.join(rootPath, '.env')
})
// minio client S3
const s3 = new(require('minio')).Client({
endPoint: process.env.S3_HOST,
port: parseInt(process.env.S3_PORT, 10),
useSSL: process.env.S3_USE_SSL === 'true',
accessKey: process.env.S3_ACCESS_KEY,
secretKey: process.env.S3_ACCESS_SECRET,
region: process.env.S3_REGION
})
/**
* Functions
*/
const mkdir = dirPath => {
dirPath.split(path.sep).reduce((prevPath, folder) => {
const currentPath = path.join(prevPath, folder, path.sep);
if (!fs.existsSync(currentPath)) {
fs.mkdirSync(currentPath);
}
return currentPath
}, '')
}
// list objects in bucket
const listObjects = bucket => new Promise(async (resolve, reject) => {
//
bucket.objects = []
bucket.total_objects = 0
bucket.total_size = 0
//
let stream = await s3.listObjectsV2(bucket.name, '', true)
//
stream.on('data', obj => {
if (obj && (obj.name || obj.prefix)) {
bucket.objects.push(obj)
bucket.total_objects++
bucket.total_size = bucket.total_size + obj.size
}
})
//
stream.on('end', () => resolve(bucket))
stream.on('error', e => reject(e))
})
// get an objects data
const getObject = (bucket, name) => new Promise((resolve, reject) => {
s3.getObject(bucket, name, (err, stream) => {
if (err) reject(err)
//
let chunks = []
stream.on('data', chunk => chunks.push(chunk))
stream.on('end', () => resolve(Buffer.concat(chunks || [])))
stream.on('error', e => reject(e))
})
})
/**
*
*/
async function main() {
// get buckets
console.log(`Fetching buckets from: ${process.env.S3_HOST}`)
let buckets = []
try {
buckets = await s3.listBuckets()
console.log(buckets.length + ' buckets found')
} catch (e) {
return console.error(e)
}
// create local folders if not exists
console.log(`Creating local folders in ./api/public/images/ if not exists`)
try {
for (let bucket of buckets) {
//
bucket.local = path.join(publicPath, bucket.name)
try {
await fs.promises.access(bucket.local)
} catch (e) {
if (e.code === 'ENOENT') {
console.log(`Creating local folder: ${bucket.local}`)
await fs.promises.mkdir(bucket.local)
} else
bucket.error = e.message
}
}
} catch (e) {
return console.error(e)
}
// fetch all bucket objects
console.log(`Populating bucket objects`)
try {
for (let bucket of buckets) {
bucket = await listObjects(bucket)
}
} catch (e) {
console.log(e)
}
// loop over buckets and download all objects
try {
for (let bucket of buckets) {
console.log(`Downloading bucket: ${bucket.name}`)
// loop over and download
for (let object of bucket.objects) {
// if object name has prefix
let dir = path.dirname(object.name)
if (dir !== '.') {
try {
await fs.promises.access(path.join(bucket.local, dir))
} catch (e) {
if (e.code === 'ENOENT') {
console.log(`Creating local folder: ${bucket.local}`)
mkdir(path.join(bucket.local, dir))
}
}
}
//
console.log(`Downloading object[${bucket.name}]: ${object.name}`)
await writeFile(path.join(bucket.local, object.name), await getObject(bucket.name, object.name))
}
}
console.log(`Completed!`)
} catch (e) {
console.log(e)
}
}
main()
update Ok, this seems to be linked to through2's "highWaterMark" property. Basically, it means "don't buffer more than x files, wait for someone to consume it and then only then accept another batch of files". Since it works this way by design, the snippet in this question is being reviewed. There must be a better way to handle many files.
Quick fix, allowing 8000 files:
through.obj({ highWaterMark: 8000 }, (file, enc, next) => { ... })
Original question
I'm using a gulp task to create translation files. It scans an src folder for *.i18n.json files and saves one .json per language it finds within the source files.
It works fine - until it finds more than 16 files. It's using through2 for the processing of each file. See source code below. The method processAll18nFiles() is a custom pipe that receives the matching input files, reads the content of each files, constructs the resulting dictionaries on the fly, then finally hands it over to the on('finish) handler to write the dictionaries.
Tested on windows and mac. There seems to be a limitation that my approach hits, because it's working just fine with 16 files or less.
Still looking, clues welcome :-)
source file example: signs.i18n.json
{
"path": "profile.signs",
"data": {
"title": {
"fr": "mes signes précurseurs",
"en": "my warning signs"
},
"add": {
"fr": "ajouter un nouveau signe",
"en": "add a new warning sign"
}
}
}
output file example: en.json
{"profile":{"signs":{"title":"my warning signs","add":"add a new warning sign"}}}
gulpfile.js
const fs = require('fs');
const path = require('path');
const gulp = require('gulp');
const watch = require('gulp-watch');
const through = require('through2');
const searchPatternFolder = 'src/app/**/*.i18n.json';
const outputFolder = path.join('src', 'assets', 'i18n');
gulp.task('default', () => {
console.log('Ionosphere Gulp tasks');
console.log(' > gulp i18n builds the i18n file.');
console.log(' > gulp i18n:watch watches i18n file and trigger build.');
});
gulp.task('i18n:watch', () => watch(searchPatternFolder, { ignoreInitial: false }, () => gulp.start('i18n')));
gulp.task('i18n', done => processAll18nFiles(done));
function processAll18nFiles(done) {
const dictionary = {};
console.log('[i18n] Rebuilding...');
gulp
.src(searchPatternFolder)
.pipe(
through.obj((file, enc, next) => {
console.log('doing ', file.path);
const i18n = JSON.parse(file.contents.toString('utf8'));
composeDictionary(dictionary, i18n.data, i18n.path.split('.'));
next(null, file);
})
)
.on('finish', () => {
const writes = [];
Object.keys(dictionary).forEach(langKey => {
console.log('lang key ', langKey);
writes.push(writeDictionary(langKey, dictionary[langKey]));
});
Promise.all(writes)
.then(data => done())
.catch(err => console.log('ERROR ', err));
});
}
function composeDictionary(dictionary, data, path) {
Object.keys(data)
.map(key => ({ key, data: data[key] }))
.forEach(({ key, data }) => {
if (isString(data)) {
setDictionaryEntry(dictionary, key, path, data);
} else {
composeDictionary(dictionary, data, [...path, key]);
}
});
}
function isString(x) {
return Object.prototype.toString.call(x) === '[object String]';
}
function initDictionaryEntry(key, dictionary) {
if (!dictionary[key]) {
dictionary[key] = {};
}
return dictionary[key];
}
function setDictionaryEntry(dictionary, langKey, path, data) {
initDictionaryEntry(langKey, dictionary);
let subDict = dictionary[langKey];
path.forEach(subKey => {
isLastToken = path[path.length - 1] === subKey;
if (isLastToken) {
subDict[subKey] = data;
} else {
subDict = initDictionaryEntry(subKey, subDict);
}
});
}
function writeDictionary(lang, data) {
return new Promise((resolve, reject) => {
fs.writeFile(
path.join(outputFolder, lang + '.json'),
JSON.stringify(data),
'utf8',
err => (err ? reject(err) : resolve())
);
});
}
Ok, as explained here, one must consume the pipe. This is done by adding a handler of 'data' events such as:
gulp
.src(searchPatternFolder)
.pipe(
through.obj({ highWaterMark: 4, objectMode: true }, (file, enc, next) => {
const { data, path } = JSON.parse(file.contents.toString('utf8'));
next(null, { data, path });
})
)
// The next line handles the "consumption" of upstream pipings
.on('data', ({ data, path }) => ++count && composeDictionary(dictionary, data, path.split('.')))
.on('end', () =>
Promise.all(Object.keys(dictionary).map(langKey => writeDictionary(langKey, dictionary[langKey])))
.then(() => {
console.log(`[i18n] Done, ${count} files processed, language count: ${Object.keys(dictionary).length}`);
done();
})
.catch(err => console.log('ERROR ', err))
);