Read Array of Files and Read Them One By One Nodejs

Read Array of Files and Read Them One By One Nodejs - node.js

I would like to start by saying that I am relatively new to JS and so I welcome any helpful suggestions.
Here's what it should do:
Reads directory files
Loops through the array of files
SHOULD: Read file & upload it to the s3
What it does:
Reads directory files ✅
Loops through the array of files ✅
Never passes this:
Files size are: 1 to 2 mb, 12 files length or 6 either way it does not work.
if (content.length < 1) return console.log("Content < 1")
code:
async s3() {
AWS.config.update({
accessKeyId: process.env.AWS_ACCESS_KEY_ID,
secretAccessKey: process.env.AWS_SECRET_KEY,
})
const s3 = new AWS.S3()
const destination = path.join(Utils.getRootPath(), 'uploads', this.email)
try {
const files = fs.readdirSync(destination)
if (!files || files.length == 0) return console.log(`Provided folder '${destination}' is empty or does not exist.`);
for (const fileName of files) {
const filePath = path.join(destination, fileName)
let content = fs.readFileSync(filePath)
if (content.length < 1) return console.log("Content < 1")
s3.upload({
ACL: 'public-read',
Bucket: process.env.AWS_BUCKET_NAME,
Key: fileName,
Body: fileContent,
}).promise().then(async (uploadData) => {
try {
const headData = await s3.headObject({
Bucket: process.env.AWS_BUCKET_NAME,
Key: fileName,
}).promise();
return console.log(headData);
}
catch (err) {
console.log(err);
}
})
console.log(`${fileName} uploaded.`)
}
} catch (error) {
throw new Error(error)
}
Caller:
async container() {
if (this.email === null) throw new Error('Constructor of Job class is null')
try {
await this.placeOrder();
await Utils.downloadFile(this.email);
await this.s3();
return console.log("DONE!")
}
catch (err) {
// return new Error(err);
// OBRADA OVDE!
console.log(err)
}
}

There are a couple issues, assuming I understand this correctly. First, instead of returning from the for-of loop when content.length < 1, you should just continue. Otherwise you won't continue to process the remaining files. Second, it's generally not a good idea to mix promises and async/await. You can use await on s3.upload.
class FileManager {
async s3() {
console.log('ok')
AWS.config.update({
accessKeyId: process.env.AWS_ACCESS_KEY_ID,
secretAccessKey: process.env.AWS_SECRET_KEY,
});
const s3 = new AWS.S3();
const destination = path.join(Utils.getRootPath(), "uploads", this.email);
try {
const files = fs.readdirSync(destination);
if (!files || files.length == 0)
return console.log(`Provided folder '${destination}' is empty or does not exist.`);
for (const fileName of files) {
const filePath = path.join(destination, fileName);
let content = fs.readFileSync(filePath);
if (content.length < 1) {
console.log("Content < 1");
continue;
}
await s3.upload({
ACL: "public-read",
Bucket: process.env.AWS_BUCKET_NAME,
Key: fileName,
Body: fileContent,
});
try {
const headData = await s3
.headObject({
Bucket: process.env.AWS_BUCKET_NAME,
Key: fileName,
})
.promise();
console.log(headData);
} catch (err) {
console.log(err);
}
console.log(`${fileName} uploaded.`);
}
} catch (error) {
throw new Error(error);
}
}
}

Related

read files and read directory in node.js

I want to know whtether the read file and read directory functions - fs.readdir(path, callback) and fs.readFile(path, options, callback) have similar functions without callback.Here, I first read all the files in given directory, and loop through all the files and upload their content to S3 bucket.
Please see the working code below.
const s3Upload = async (req, res) => {
const directoryName = "MAXIS_GAMING/Daily/"
var data = {}
let files = {}
await readFiles1(directoryName)
}
const readFiles1 = async(dirname) => {
let _files
fs.readdir(dirname, (err, files) => {
// On error, show it and return
if(err) return console.error(err);
// files is an array containing the names of all entries
// in the directory, excluding '.' (the directory itself)
// and '..' (the parent directory).
// Display directory entries
console.log(files.join(' '));
files.forEach(function(filename){
fs.readFile(dirname + filename, 'utf-8', function(err, content){
if(err) {
// onError(err);
throw err
return;
}
console.log('cont..............................',content)
console.log('filename', filename)
//await
uploadFiles(filename, content)
//onFileContent(filename, content);
})
})
})
}
const uploadFiles = async (fileName, fileContent) => {
console.log('in uploadd..........')
const GLOBAL_ACCESS_KEY_ID = 'AKIDAQWZX6B3XUBDIFHLPC5LYFTJF15XPIQ';
const GLOBAL_SECRET_ACCESS_KEY = 'Sv4Fe4h4QgErG5XoZbgeC63oczkdW3bMQfC0jvyR8bPbJ9Y97k+'
const GLOBAL_DEFAULT_REGION = 'ap-southeast-1';
const S3_IMAGE_BUCKET ='max-stg-image/stage/reports'//"max-stg-image";
const S3_IMAGE_PATH = "stage";
AWS.config.update({
accessKeyId: GLOBAL_ACCESS_KEY_ID,
secretAccessKey: GLOBAL_SECRET_ACCESS_KEY,
region: GLOBAL_DEFAULT_REGION,
});
const s3 = new AWS.S3()
const bucket = new AWS.S3()
const params = {
Bucket: S3_IMAGE_BUCKET,
Key: fileName,
Body: fileContent
};
// Uploading files to the bucket
s3.upload(params, function(err, data) {
if (err) {
throw err;
}
console.log(`File uploaded successfully. ${data.Location}`);
});
}
app.get('/home/s3Upload', s3Upload)

You can do something like this:
import { readdir } from 'fs/promises';
//or with require
const readdir = require('fs/promises').readdir;
try {
const files = await readdir(path);
for (const file of files)
console.log(file);
} catch (err) {
console.error(err);
}
Check here all the promise API provided by FS in Node.js.

AWS Lambda unzip from S3 to S3

I'm trying to write an Lambda function that unzips zip files in one S3 directory and extract into another. I had this working in Python but nobody else in my group likes Python so I'm converting it to Node.js which I'm not very good at.
I'm trying to use the unzipper package and I'm able to get a list of files in the zip file using unzipper.Open.S3, but I can't figure out how to stream the files in the zip file into S3.
The meat of the code looks like
const directory = await unzipper.Open.s3(s3,{Bucket: bucket, Key: zip_file});
directory.files.forEach(file => {
console.log("file name = " + file.path + ", type = " + file.type)
const key = dir[0] + "/output/" + file.path;
const params = { Bucket: bucket, Key: key };
const { writeStream, promise } = uploadStream(params)
file.stream().pipe(writeStream);
promise.then(() => {
console.log('upload completed successfully');
}).catch((err) => {
console.log('upload failed.', err.message);
});
});
const uploadStream = ({ Bucket, Key }) => {
const pass = new stream.PassThrough();
return {
writeStream: pass,
promise: s3.upload({ Bucket, Key, Body: pass }).promise()
};
}
I get the console.log for each file, but neither of the logs in promise.then and .catch comes out and no new files appear in S3.

Never mind, I found this code that works better:
exports.handler = async (event) => {
const params = {
Key: zip_directory + "/" + zip_file,
Bucket: input_bucket
};
const zip = s3
.getObject(params)
.createReadStream()
.pipe(unzipper.Parse({ forceStream: true }));
const promises = [];
let num = 0;
for await (const e of zip) {
const entry = e;
const fileName = entry.path;
const type = entry.type;
if (type === 'File') {
const uploadParams = {
Bucket: output_bucket,
Key: output_directory + fileName,
Body: entry,
};
promises.push(s3.upload(uploadParams).promise());
num++;
} else {
entry.autodrain();
}
}
await Promise.all(promises);
};

AWS S3 Loading many files

I need to upload a lot of files (about 65.000) splitted in subdirectory.
I tried to iterate and load every single file like this:
const fs = require("fs");
const path = require("path");
const async = require("async");
const AWS = require("aws-sdk");
const readdir = require("recursive-readdir");
const slash = require("slash");
const { BUCKET, KEY, SECRET } = process.env;
const rootFolder = path.resolve(__dirname, "./");
const uploadFolder = "./test_files/15";
const s3 = new AWS.S3({
signatureVersion: "v4",
accessKeyId: KEY,
secretAccessKey: SECRET,
});
function getFiles(dirPath) {
return fs.existsSync(dirPath) ? readdir(dirPath) : [];
}
async function deploy(upload) {
if (!BUCKET || !KEY || !SECRET) {
throw new Error("you must provide env. variables: [BUCKET, KEY, SECRET]");
}
const filesToUpload = await getFiles(path.resolve(__dirname, upload));
return new Promise((resolve, reject) => {
async.eachOfLimit(
filesToUpload,
10,
async.asyncify(async (file) => {
const Key = file.replace(rootFolder + path.sep, "");
console.log(`uploading: [${slash(Key)}]`);
var options = { partSize: 5 * 1024 * 1024, queueSize: 4 };
return new Promise((res, rej) => {
s3.upload(
{
Key: slash(Key),
Bucket: BUCKET,
Body: fs.readFileSync(file),
},
(err) => {
if (err) {
return rej(new Error(err));
}
res({ result: true });
}
);
});
}),
(err) => {
if (err) {
return reject(new Error(err));
}
resolve({ result: true });
}
);
});
}
deploy(uploadFolder)
.then(() => {
console.log("task complete");
process.exit(0);
})
.catch((err) => {
console.error(err);
process.exit(1);
});
but after a considerable number of uploads i have this:
Error: Error: NetworkingError: connect ETIMEDOUT IP_S3_AWS
I need to upload this set of files from ec2 instance (because its a result of a image processing). I have this behavior from my pc, i don't know if from ec2 have the same problem.
I have considered the way of zip all and upload but i need to keep the original directory structure.
I accept also new way to resolve the problem.
Sorry for my bad english.

It would probably be much simpler to use the AWS CLI aws s3 sync command instead of building this yourself.

node.js renaming s3 object via aws-sdk module

Is it possible to rename an object on s3 via aws-sdk? I couldn't find a method for that, maybe there is a provisionary solution ...

I will answer I guess since no one has - this one should work
// create a new s3 object
var s3 = new AWS.S3();
var BUCKET_NAME = 'your-bucket-name';
var OLD_KEY = '/original-file.js';
var NEW_KEY = '/new-file.js';
// Copy the object to a new location
s3.copyObject({
Bucket: BUCKET_NAME,
CopySource: `${BUCKET_NAME}${OLD_KEY}`,
Key: NEW_KEY
})
.promise()
.then(() =>
// Delete the old object
s3.deleteObject({
Bucket: BUCKET_NAME,
Key: OLD_KEY
}).promise()
)
// Error handling is left up to reader
.catch((e) => console.error(e))

This is just a flow on from #nf071590 answer. Which is awesome.
Below, gets the entire list of a bucket and then changes the image name of every image that isn't .jpg to .jpg
Hope this helps someone. :)
const start = new Date()
const AWS = require('aws-sdk')
const state = {}
AWS.config.update({ region: 'ADD_REGION_HERE' })
try {
var s3 = new AWS.S3();
var BUCKET_NAME = 'ADD_BUCKET_NAME_HERE';
var params = {
Bucket: BUCKET_NAME,
MaxKeys: 1000
};
s3.listObjects(params, function (err, data) {
if (err) {
console.log(err, err.stack); // an error occurred
} else {
console.log(data);
data.Contents.forEach(image => {
var OLD_KEY = image.Key
var NEW_KEY = ''
// split key
var keyArray = image.Key.split('.')
var keyArrayLength = keyArray.length
console.log(keyArrayLength);
var ext = keyArray[keyArrayLength - 1]
// console.log(ext);
if(ext != 'jpg') {
console.log('Change this ext FROM: ', OLD_KEY)
ext = 'jpg'
if (keyArrayLength == 2) {
NEW_KEY = `${keyArray[0]}.${ext}`
} else if (keyArrayLength == 3) {
NEW_KEY = `${keyArray[0]}.${keyArray[1]}.${ext}`
} else if (keyArrayLength == 4) {
NEW_KEY = `${keyArray[0]}.${keyArray[1]}.${keyArray[2]}.${ext}`
}
console.log('TO:: ', NEW_KEY);
// Copy the object to a new location
try {
s3.copyObject({
Bucket: BUCKET_NAME,
CopySource: `${BUCKET_NAME}/${OLD_KEY}`,
Key: NEW_KEY
}).promise()
.then((response) => {
console.log('Seemed to have worked??');
console.log(response);
// Delete the old object
s3.deleteObject({
Bucket: BUCKET_NAME,
Key: OLD_KEY
}).promise()
})
// Error handling is left up to reader
.catch((e) => console.error(e))
} catch (error) {
console.log('error::', error);
}
}
});
}
});
} catch (err) {
const end = new Date() - start
let seconds = end / 1000
state.seconds = seconds
state.error = err
state.status = "error"
state.message = err.message
console.log(err)
console.log(state);
return
}

Create a zip file on S3 from files on S3 using Lambda Node

I need to create a Zip file that consists of a selection of files (videos and images) located in my s3 bucket.
The problem at the moment using my code below is that I quickly hit the memory limit on Lambda.
async.eachLimit(files, 10, function(file, next) {
var params = {
Bucket: bucket, // bucket name
Key: file.key
};
s3.getObject(params, function(err, data) {
if (err) {
console.log('file', file.key);
console.log('get image files err',err, err.stack); // an error occurred
} else {
console.log('file', file.key);
zip.file(file.key, data.Body);
next();
}
});
},
function(err) {
if (err) {
console.log('err', err);
} else {
console.log('zip', zip);
content = zip.generateNodeStream({
type: 'nodebuffer',
streamFiles:true
});
var params = {
Bucket: bucket, // name of dest bucket
Key: 'zipped/images.zip',
Body: content
};
s3.upload(params, function(err, data) {
if (err) {
console.log('upload zip to s3 err',err, err.stack); // an error occurred
} else {
console.log(data); // successful response
}
});
}
});
Is this possible using Lambda, or should I look at a different
approach?
Is it possible to write to a compressed zip file on the fly, therefore eliminating the memory issue somewhat, or do I need to have the files collected before compression?
Any help would be much appreciated.

Okay, I got to do this today and it works. Direct Buffer to Stream, no disk involved. So memory or disk limitation won't be an issue here:
'use strict';
const AWS = require("aws-sdk");
AWS.config.update( { region: "eu-west-1" } );
const s3 = new AWS.S3( { apiVersion: '2006-03-01'} );
const _archiver = require('archiver');
//This returns us a stream.. consider it as a real pipe sending fluid to S3 bucket.. Don't forget it
const streamTo = (_bucket, _key) => {
var stream = require('stream');
var _pass = new stream.PassThrough();
s3.upload( { Bucket: _bucket, Key: _key, Body: _pass }, (_err, _data) => { /*...Handle Errors Here*/ } );
return _pass;
};
exports.handler = async (_req, _ctx, _cb) => {
var _keys = ['list of your file keys in s3'];
var _list = await Promise.all(_keys.map(_key => new Promise((_resolve, _reject) => {
s3.getObject({Bucket:'bucket-name', Key:_key})
.then(_data => _resolve( { data: _data.Body, name: `${_key.split('/').pop()}` } ));
}
))).catch(_err => { throw new Error(_err) } );
await new Promise((_resolve, _reject) => {
var _myStream = streamTo('bucket-name', 'fileName.zip'); //Now we instantiate that pipe...
var _archive = _archiver('zip');
_archive.on('error', err => { throw new Error(err); } );
//Your promise gets resolved when the fluid stops running... so that's when you get to close and resolve
_myStream.on('close', _resolve);
_myStream.on('end', _resolve);
_myStream.on('error', _reject);
_archive.pipe(_myStream); //Pass that pipe to _archive so it can push the fluid straigh down to S3 bucket
_list.forEach(_itm => _archive.append(_itm.data, { name: _itm.name } ) ); //And then we start adding files to it
_archive.finalize(); //Tell is, that's all we want to add. Then when it finishes, the promise will resolve in one of those events up there
}).catch(_err => { throw new Error(_err) } );
_cb(null, { } ); //Handle response back to server
};

I formated the code according to #iocoker.
main entry
// index.js
'use strict';
const S3Zip = require('./s3-zip')
const params = {
files: [
{
fileName: '1.jpg',
key: 'key1.JPG'
},
{
fileName: '2.jpg',
key: 'key2.JPG'
}
],
zippedFileKey: 'zipped-file-key.zip'
}
exports.handler = async event => {
const s3Zip = new S3Zip(params);
await s3Zip.process();
return {
statusCode: 200,
body: JSON.stringify(
{
message: 'Zip file successfully!'
}
)
};
}
Zip file util
// s3-zip.js
'use strict';
const fs = require('fs');
const AWS = require("aws-sdk");
const Archiver = require('archiver');
const Stream = require('stream');
const https = require('https');
const sslAgent = new https.Agent({
KeepAlive: true,
rejectUnauthorized: true
});
sslAgent.setMaxListeners(0);
AWS.config.update({
httpOptions: {
agent: sslAgent,
},
region: 'us-east-1'
});
module.exports = class S3Zip {
constructor(params, bucketName = 'default-bucket') {
this.params = params;
this.BucketName = bucketName;
}
async process() {
const { params, BucketName } = this;
const s3 = new AWS.S3({ apiVersion: '2006-03-01', params: { Bucket: BucketName } });
// create readstreams for all the output files and store them
const createReadStream = fs.createReadStream;
const s3FileDwnldStreams = params.files.map(item => {
const stream = s3.getObject({ Key: item.key }).createReadStream();
return {
stream,
fileName: item.fileName
}
});
const streamPassThrough = new Stream.PassThrough();
// Create a zip archive using streamPassThrough style for the linking request in s3bucket
const uploadParams = {
ACL: 'private',
Body: streamPassThrough,
ContentType: 'application/zip',
Key: params.zippedFileKey
};
const s3Upload = s3.upload(uploadParams, (err, data) => {
if (err) {
console.error('upload err', err)
} else {
console.log('upload data', data);
}
});
s3Upload.on('httpUploadProgress', progress => {
// console.log(progress); // { loaded: 4915, total: 192915, part: 1, key: 'foo.jpg' }
});
// create the archiver
const archive = Archiver('zip', {
zlib: { level: 0 }
});
archive.on('error', (error) => {
throw new Error(`${error.name} ${error.code} ${error.message} ${error.path} ${error.stack}`);
});
// connect the archiver to upload streamPassThrough and pipe all the download streams to it
await new Promise((resolve, reject) => {
console.log("Starting upload of the output Files Zip Archive");
streamPassThrough.on('close', resolve());
streamPassThrough.on('end', resolve());
streamPassThrough.on('error', reject());
archive.pipe(streamPassThrough);
s3FileDwnldStreams.forEach((s3FileDwnldStream) => {
archive.append(s3FileDwnldStream.stream, { name: s3FileDwnldStream.fileName })
});
archive.finalize();
}).catch((error) => {
throw new Error(`${error.code} ${error.message} ${error.data}`);
});
// Finally wait for the uploader to finish
await s3Upload.promise();
}
}

The other solutions are great for not so many files (less than ~60). If they handle more files, they just quit into nothing with no errors. This is because they open too many streams.
This solution is inspired by https://gist.github.com/amiantos/16bacc9ed742c91151fcf1a41012445e
It is a working solution, which works well even with many files (+300) and returns a presigned URL to the zip which contains the files.
Main Lambda:
const AWS = require('aws-sdk');
const S3 = new AWS.S3({
apiVersion: '2006-03-01',
signatureVersion: 'v4',
httpOptions: {
timeout: 300000 // 5min Should Match Lambda function timeout
}
});
const archiver = require('archiver');
import stream from 'stream';
const UPLOAD_BUCKET_NAME = "my-s3-bucket";
const URL_EXPIRE_TIME = 5*60;
export async function getZipSignedUrl(event) {
const prefix = `uploads/id123123/}`; //replace this with your S3 prefix
let files = ["12314123.png", "56787567.png"] //replace this with your files
if (files.length == 0) {
console.log("No files to zip");
return result(404, "No pictures to download");
}
console.log("Files to zip: ", files);
try {
files = files.map(file => {
return {
fileName: file,
key: prefix + '/' + file,
type: "file"
};
});
const destinationKey = prefix + '/' + 'uploads.zip'
console.log("files: ", files);
console.log("destinationKey: ", destinationKey);
await streamToZipInS3(files, destinationKey);
const presignedUrl = await getSignedUrl(UPLOAD_BUCKET_NAME, destinationKey, URL_EXPIRE_TIME, "uploads.zip");
console.log("presignedUrl: ", presignedUrl);
if (!presignedUrl) {
return result(500, null);
}
return result(200, presignedUrl);
}
catch(error) {
console.error(`Error: ${error}`);
return result(500, null);
}
}
Helper functions:
export function result(code, message) {
return {
statusCode: code,
body: JSON.stringify(
{
message: message
}
)
}
}
export async function streamToZipInS3(files, destinationKey) {
await new Promise(async (resolve, reject) => {
var zipStream = streamTo(UPLOAD_BUCKET_NAME, destinationKey, resolve);
zipStream.on("error", reject);
var archive = archiver("zip");
archive.on("error", err => {
throw new Error(err);
});
archive.pipe(zipStream);
for (const file of files) {
if (file["type"] == "file") {
archive.append(getStream(UPLOAD_BUCKET_NAME, file["key"]), {
name: file["fileName"]
});
}
}
archive.finalize();
})
.catch(err => {
console.log(err);
throw new Error(err);
});
}
function streamTo(bucket, key, resolve) {
var passthrough = new stream.PassThrough();
S3.upload(
{
Bucket: bucket,
Key: key,
Body: passthrough,
ContentType: "application/zip",
ServerSideEncryption: "AES256"
},
(err, data) => {
if (err) {
console.error('Error while uploading zip')
throw new Error(err);
reject(err)
return
}
console.log('Zip uploaded')
resolve()
}
).on("httpUploadProgress", progress => {
console.log(progress)
});
return passthrough;
}
function getStream(bucket, key) {
let streamCreated = false;
const passThroughStream = new stream.PassThrough();
passThroughStream.on("newListener", event => {
if (!streamCreated && event == "data") {
const s3Stream = S3
.getObject({ Bucket: bucket, Key: key })
.createReadStream();
s3Stream
.on("error", err => passThroughStream.emit("error", err))
.pipe(passThroughStream);
streamCreated = true;
}
});
return passThroughStream;
}
export async function getSignedUrl(bucket: string, key: string, expires: number, downloadFilename?: string): Promise<string> {
const exists = await objectExists(bucket, key);
if (!exists) {
console.info(`Object ${bucket}/${key} does not exists`);
return null
}
let params = {
Bucket: bucket,
Key: key,
Expires: expires,
};
if (downloadFilename) {
params['ResponseContentDisposition'] = `inline; filename="${encodeURIComponent(downloadFilename)}"`;
}
try {
const url = s3.getSignedUrl('getObject', params);
return url;
} catch (err) {
console.error(`Unable to get URL for ${bucket}/${key}`, err);
return null;
}
};

Using streams may be tricky as I'm not sure how you could pipe multiple streams into an object. I've done this several times using standard file object. It's a multistep process and it's quite fast. Remember that Lambda operates in Linux so you have all Linux resources at hand including the system /tmp directory.
Create a sub-directory in /tmp call "transient" or whatever works for you
Use s3.getObject() and write file objects to /tmp/transient
Use the GLOB package to generate an array[] of paths from /tmp/transient
Loop the array and zip.addLocalFile(array[i]);
zip.writeZip('tmp/files.zip');

I've used a similar approach, but I'm facing the issue that some of the files in the generated ZIP file don't have the correct size (and corresponding data). Is there any limitation on the size of the files this code can manage? In my case I'm zipping large files (a few larger than 1GB) and the overall amount of data may reach 10GB.
I do not get any error/warning message, so it seems it all works fine.
Any idea what may be hapenning?

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Read Array of Files and Read Them One By One Nodejs - node.js

Related

read files and read directory in node.js

AWS Lambda unzip from S3 to S3

AWS S3 Loading many files

node.js renaming s3 object via aws-sdk module

Create a zip file on S3 from files on S3 using Lambda Node

Categories

Resources