AWS Lambda unzip from S3 to S3 - node.js

I'm trying to write an Lambda function that unzips zip files in one S3 directory and extract into another. I had this working in Python but nobody else in my group likes Python so I'm converting it to Node.js which I'm not very good at.
I'm trying to use the unzipper package and I'm able to get a list of files in the zip file using unzipper.Open.S3, but I can't figure out how to stream the files in the zip file into S3.
The meat of the code looks like
const directory = await unzipper.Open.s3(s3,{Bucket: bucket, Key: zip_file});
directory.files.forEach(file => {
console.log("file name = " + file.path + ", type = " + file.type)
const key = dir[0] + "/output/" + file.path;
const params = { Bucket: bucket, Key: key };
const { writeStream, promise } = uploadStream(params)
file.stream().pipe(writeStream);
promise.then(() => {
console.log('upload completed successfully');
}).catch((err) => {
console.log('upload failed.', err.message);
});
});
const uploadStream = ({ Bucket, Key }) => {
const pass = new stream.PassThrough();
return {
writeStream: pass,
promise: s3.upload({ Bucket, Key, Body: pass }).promise()
};
}
I get the console.log for each file, but neither of the logs in promise.then and .catch comes out and no new files appear in S3.

Never mind, I found this code that works better:
exports.handler = async (event) => {
const params = {
Key: zip_directory + "/" + zip_file,
Bucket: input_bucket
};
const zip = s3
.getObject(params)
.createReadStream()
.pipe(unzipper.Parse({ forceStream: true }));
const promises = [];
let num = 0;
for await (const e of zip) {
const entry = e;
const fileName = entry.path;
const type = entry.type;
if (type === 'File') {
const uploadParams = {
Bucket: output_bucket,
Key: output_directory + fileName,
Body: entry,
};
promises.push(s3.upload(uploadParams).promise());
num++;
} else {
entry.autodrain();
}
}
await Promise.all(promises);
};

Related

How to read JSON from S3 by AWS Lambda Node.js 18.x runtime?

#TBA gave the solution.
The root cause is not by runtime. It came from SDK v3.
Point: Do not update the code with mixed things (like both of runtime & SDK version together 🥲)
Thanks again, TBA.
I was using Node.js 14.x version runtime Lambda to read some json file from S3.
Brief code is below
const AWS = require("aws-sdk");
const s3 = new AWS.S3();
exports.handler = (event) => {
const { bucketName, objKey } = event
const params = {
Bucket: bucketName,
Key: objKey
};
return new Promise((resolve) => {
s3.getObject(params, async (err, data) =>{
if (err) console.log(err, err.stack);
else {
const contents = JSON.parse(data.Body)
resolve(contents);
}
});
})
};
and it returned the json data as I expected.
And today I tried to create a new lambda with runtime Node.js 18.x but it returned null or some errors...
Q) Could you give me some advice to solve this 🥲 ?
+) I used same json file for each lambda
+) Not sure why, but in my case, data.Body.toString() didn't work (I saw some answers in stackoverflow provide that and tried but no lucks)
Thanks in advance!
Case A (returns null)
import { S3Client, GetObjectCommand } from "#aws-sdk/client-s3";
const s3Client = new S3Client({ region: "ap-northeast-2" });
export const handler = (event) => {
const { objKey, bucketName } = event;
const params={
Bucket: bucketName,
Key: objKey
};
const getObjCommand = new GetObjectCommand(params);
return new Promise((resolve) => {
s3Client.send(getObjCommand, async (err, data) =>{
if (err) console.log(err, err.stack);
else {
const list = JSON.parse(data.Body)
resolve(list);
}
});
})
};
Case B (returns "Unexpected token o in JSON at position 1")
export const handler = async (event) => {
const { objKey, bucketName } = event;
const params={
Bucket: bucketName,
Key: objKey
};
const getObjCommand = new GetObjectCommand(params);
const response = await s3Client.send(getObjCommand)
console.log("JSON.parse(response.Body)", JSON.parse(response.Body))
};
Case C (returns "TypeError: Converting circular structure to JSON")
export const handler = async (event) => {
const { objKey, bucketName } = event;
const params={
Bucket: bucketName,
Key: objKey
};
const getObjCommand = new GetObjectCommand(params);
try {
const response = await s3Client.send(getObjCommand)
return JSON.stringify(response.Body)
} catch(err) {
console.log("error", err)
return err
}
};

Node.js async calls do not run in sequence

I am completely new to node js .
I am trying to code below steps:
Download a file from AWS S3 folder
Then upload it to some other AWS s3 folder.
So I have searched online and created similar code in node js .
The below code is for the same .
What I see here is the downloadFile and uploadFile functions run in parallel and uploadFile runs first, it seems.
How to run them in sequence?
const aws = require('aws-sdk');
var s3 = new aws.S3();
var fs = require('fs');
// TODO implement
var params = { Bucket: "buckets3", Key: "input_pdf_img/Gas_bill_sample.pdf" };
const filename = 'Gas_bill_sample.pdf';
const bucketName = "translation-bucket-qa-v1";
const key = "input_pdf_img/Gas_bill_sample.pdf";
const key2 = "output_pdf2docx_img/"+filename;
//console.log(filename);
const tmp_filename = "/tmp/Gas_bill_sample.pdf";
console.log(filename);
const downloadFile = (tmp_filename, bucketName, key) => {
const params2 = {
Bucket: bucketName,
Key: key
};
s3.getObject(params, (err, data) => {
if (err) console.error(err);
fs.writeFileSync(tmp_filename, data.Body.toString());
//console.log(`${filePath} has been created!`);
});
};
//downloadFile(tmp_filename, bucketName, key);
//console.log('download done');
//await sleep(1000);
//upload
const uploadFile = (tmp_filename) => {
// Read content from the file
const fileContent = fs.readFileSync(tmp_filename);
// Setting up S3 upload parameters
const params2 = {
Bucket: bucketName,
Key: key2, // File name you want to save as in S3
Body: fileContent
};
// Uploading files to the bucket
s3.upload(params2, function(err, data) {
if (err) {
throw err;
}
console.log(`File uploaded successfully. ${data.Location}`);
});
};
downloadFile(tmp_filename, bucketName, key);
console.log('download done');
//setTimeout(() => {console.log("Let the download finish")}, 6000);
uploadFile(tmp_filename);
//setTimeout(() => {console.log("Let the download finish")}, 6000);const aws = require('aws-sdk');
var s3 = new aws.S3();
var fs = require('fs');
// TODO implement
var params = { Bucket: "buckets3", Key: "input_pdf_img/Gas_bill_sample.pdf" };
const filename = 'Gas_bill_sample.pdf';
const bucketName = "translation-bucket-qa-v1";
const key = "input_pdf_img/Gas_bill_sample.pdf";
const key2 = "output_pdf2docx_img/"+filename;
//console.log(filename);
const tmp_filename = "/tmp/Gas_bill_sample.pdf";
console.log(filename);
const downloadFile = (tmp_filename, bucketName, key) => {
const params2 = {
Bucket: bucketName,
Key: key
};
s3.getObject(params, (err, data) => {
if (err) console.error(err);
fs.writeFileSync(tmp_filename, data.Body.toString());
//console.log(`${filePath} has been created!`);
});
};
//downloadFile(tmp_filename, bucketName, key);
//console.log('download done');
//await sleep(1000);
//upload
const uploadFile = (tmp_filename) => {
// Read content from the file
const fileContent = fs.readFileSync(tmp_filename);
// Setting up S3 upload parameters
const params2 = {
Bucket: bucketName,
Key: key2, // File name you want to save as in S3
Body: fileContent
};
// Uploading files to the bucket
s3.upload(params2, function(err, data) {
if (err) {
throw err;
}
console.log(`File uploaded successfully. ${data.Location}`);
});
};
downloadFile(tmp_filename, bucketName, key);
console.log('download done');
//setTimeout(() => {console.log("Let the download finish")}, 6000);
uploadFile(tmp_filename);
//setTimeout(() => {console.log("Let the download finish")}, 6000);
Tried time out and other ways but no help.
Since the const runs in parallel error is "No such file or directory" as the download file runs after uploadFile.

Extract zip file in S3 Bucket using AWS lambda functions in nodeJs "Error: Invalid CEN header (bad signature)"

I am struggling with unzipping the contents in AWS S3. AWS S3 does not provide the functionality of unzipping the zip folder in the S3 bucket directly. I facing one error . upload code screenshot attached.
"Error: Invalid CEN header (bad signature)"
Any advice or guidance would be greatly appreciated.
My node Js code to upload the zip file:
const AWS = require('aws-sdk');
const s3 = new AWS.S3({signatureVersion: 'v4'});
exports.handler = async (event,context) => {
const bucket = 'bucket-name';
console.log(event)
const body = event.body;
const key=JSON.parse(body).key
console.log(key)
const params = {
Bucket: bucket,
Key: key,
ContentType: 'application/zip',
Expires: 60
};
try{
const signedURL = await s3.getSignedUrl('putObject', params);
const response = {
err:{},
body:"url send",
url:signedURL
};
return response;
}catch(e){
const response = {
err:e.message,
body:"error occured"
};
return response;
}};
My NodeJs code to extract the zip file:
const S3Unzip = require('s3-unzip');
exports.s3_unzip = function(event, context, callback) {
const filename = decodeURIComponent(event.Records[0].s3.object.key.replace(/\+/g, ' '));
const bucketname = event.Records[0].s3.bucket.name;
console.log(event.Records[0].s3.object.key);
new S3Unzip({
bucket: bucketname,
file: filename,
deleteOnSuccess: true,
verbose: true,
}, function(err, success) {
if (err) {
callback(err);
} else {
callback(null);
}
});
}

Unresolved Promise Assistance Node.js

I have the following code below, which is a lambda function to get content from a s3Object zip file. I know for a fact that I am not resolving the list of promises and need a little direction on how to resolve. I have read several codes on here but having a hard time applying it to my code. Any assistance would be greatly appreciated.
// dependencies
const AWS = require('aws-sdk');
var JSZip = require('jszip');
// get reference to S3 client
const s3 = new AWS.S3();
exports.handler = async (event, context, callback) => {
// Read options from the event parameter.
const srcBucket = event.Records[0].s3.bucket.name;
// Object key may have spaces or unicode non-ASCII characters.
const srcKey = decodeURIComponent(event.Records[0].s3.object.key.replace(/\+/g, " "));
// Download the file from the S3 source bucket.
try {
const params = {
Bucket: srcBucket,
Key: srcKey
};
const data = await s3.getObject(params).promise();
var zip = JSZip.loadAsync(data.Body).then(function (content){
return content;
});
zip.then(function(result){
var entries = Object.keys(result.files).map(function (name) {
if(name.indexOf("TestStatus") != -1){
return name;
}
}).filter(notUndefined => notUndefined !== undefined);
var listOfPromises = entries.map(function(entry) {
return result.file(entry).async("text").then(function(fileContent){
return fileContent;
});
});
Promise.all(listOfPromises).then((values) =>{
values.forEach(function(value){
console.log(value);
});
});
});
} catch (error) {
context.fail(error);
return;
}
};
Modified/Corrected code
// dependencies
const AWS = require('aws-sdk');
var JSZip = require('jszip');
// get reference to S3 client
const s3 = new AWS.S3();
exports.handler = async (event, context, callback) => {
// Read options from the event parameter.
const srcBucket = event.Records[0].s3.bucket.name;
// Object key may have spaces or unicode non-ASCII characters.
const srcKey = decodeURIComponent(event.Records[0].s3.object.key.replace(/\+/g, " "));
// Download the file from the S3 source bucket.
try {
const params = {
Bucket: srcBucket,
Key: srcKey
};
const data = await s3.getObject(params).promise();
var zip = JSZip.loadAsync(data.Body);
return zip.then(function(result){
var entries = Object.keys(result.files).map((name) =>{
if(name.indexOf("TestStatus") != -1){
return result.files[name];
}
}).filter(notUndefined => notUndefined !== undefined);
var listOfPromises = entries.map((entry) => {
return entry.async("text")
.then((u8) => {
return [entry.name, u8];
}).catch(error => console.error(error));
});
var promiseOfList = Promise.all(listOfPromises);
promiseOfList.then(function (list) {
console.log(list.toString());
});
});
} catch (error) {
context.fail(error);
return;
}
};
If you look closely you are not retuning anything it is why it stays on Pending
const AWS = require('aws-sdk');
var JSZip = require('jszip');
// get reference to S3 client
const s3 = new AWS.S3();
exports.handler = async (event, context, callback) => {
// Read options from the event parameter.
const srcBucket = event.Records[0].s3.bucket.name;
// Object key may have spaces or unicode non-ASCII characters.
const srcKey = decodeURIComponent(event.Records[0].s3.object.key.replace(/\+/g, " "));
// Download the file from the S3 source bucket.
try {
const params = {
Bucket: srcBucket,
Key: srcKey
};
const data = await s3.getObject(params).promise();
// here is the problem
// var zip = JSZip.loadAsync(data.Body).then(function (content){
// return content;
// }
var zip = await JSZip.loadAsync(data.body)
return zip.then(function(result){
var entries = Object.keys(result.files).map(function (name) {
if(name.indexOf("TestStatus") != -1){
return name;
}
}).filter(notUndefined => notUndefined !== undefined);
var listOfPromises = entries.map(function(entry) {
return result.file(entry).async("text").then(function(fileContent){
return fileContent;
});
});
console.log("Helo");
Promise.all(listOfPromises).then((values) =>{
values.forEach(function(value){
console.log(value);
});
});
});
} catch (error) {
context.fail(error);
return;
}
};
```

Create a zip file on S3 from files on S3 using Lambda Node

I need to create a Zip file that consists of a selection of files (videos and images) located in my s3 bucket.
The problem at the moment using my code below is that I quickly hit the memory limit on Lambda.
async.eachLimit(files, 10, function(file, next) {
var params = {
Bucket: bucket, // bucket name
Key: file.key
};
s3.getObject(params, function(err, data) {
if (err) {
console.log('file', file.key);
console.log('get image files err',err, err.stack); // an error occurred
} else {
console.log('file', file.key);
zip.file(file.key, data.Body);
next();
}
});
},
function(err) {
if (err) {
console.log('err', err);
} else {
console.log('zip', zip);
content = zip.generateNodeStream({
type: 'nodebuffer',
streamFiles:true
});
var params = {
Bucket: bucket, // name of dest bucket
Key: 'zipped/images.zip',
Body: content
};
s3.upload(params, function(err, data) {
if (err) {
console.log('upload zip to s3 err',err, err.stack); // an error occurred
} else {
console.log(data); // successful response
}
});
}
});
Is this possible using Lambda, or should I look at a different
approach?
Is it possible to write to a compressed zip file on the fly, therefore eliminating the memory issue somewhat, or do I need to have the files collected before compression?
Any help would be much appreciated.
Okay, I got to do this today and it works. Direct Buffer to Stream, no disk involved. So memory or disk limitation won't be an issue here:
'use strict';
const AWS = require("aws-sdk");
AWS.config.update( { region: "eu-west-1" } );
const s3 = new AWS.S3( { apiVersion: '2006-03-01'} );
const _archiver = require('archiver');
//This returns us a stream.. consider it as a real pipe sending fluid to S3 bucket.. Don't forget it
const streamTo = (_bucket, _key) => {
var stream = require('stream');
var _pass = new stream.PassThrough();
s3.upload( { Bucket: _bucket, Key: _key, Body: _pass }, (_err, _data) => { /*...Handle Errors Here*/ } );
return _pass;
};
exports.handler = async (_req, _ctx, _cb) => {
var _keys = ['list of your file keys in s3'];
var _list = await Promise.all(_keys.map(_key => new Promise((_resolve, _reject) => {
s3.getObject({Bucket:'bucket-name', Key:_key})
.then(_data => _resolve( { data: _data.Body, name: `${_key.split('/').pop()}` } ));
}
))).catch(_err => { throw new Error(_err) } );
await new Promise((_resolve, _reject) => {
var _myStream = streamTo('bucket-name', 'fileName.zip'); //Now we instantiate that pipe...
var _archive = _archiver('zip');
_archive.on('error', err => { throw new Error(err); } );
//Your promise gets resolved when the fluid stops running... so that's when you get to close and resolve
_myStream.on('close', _resolve);
_myStream.on('end', _resolve);
_myStream.on('error', _reject);
_archive.pipe(_myStream); //Pass that pipe to _archive so it can push the fluid straigh down to S3 bucket
_list.forEach(_itm => _archive.append(_itm.data, { name: _itm.name } ) ); //And then we start adding files to it
_archive.finalize(); //Tell is, that's all we want to add. Then when it finishes, the promise will resolve in one of those events up there
}).catch(_err => { throw new Error(_err) } );
_cb(null, { } ); //Handle response back to server
};
I formated the code according to #iocoker.
main entry
// index.js
'use strict';
const S3Zip = require('./s3-zip')
const params = {
files: [
{
fileName: '1.jpg',
key: 'key1.JPG'
},
{
fileName: '2.jpg',
key: 'key2.JPG'
}
],
zippedFileKey: 'zipped-file-key.zip'
}
exports.handler = async event => {
const s3Zip = new S3Zip(params);
await s3Zip.process();
return {
statusCode: 200,
body: JSON.stringify(
{
message: 'Zip file successfully!'
}
)
};
}
Zip file util
// s3-zip.js
'use strict';
const fs = require('fs');
const AWS = require("aws-sdk");
const Archiver = require('archiver');
const Stream = require('stream');
const https = require('https');
const sslAgent = new https.Agent({
KeepAlive: true,
rejectUnauthorized: true
});
sslAgent.setMaxListeners(0);
AWS.config.update({
httpOptions: {
agent: sslAgent,
},
region: 'us-east-1'
});
module.exports = class S3Zip {
constructor(params, bucketName = 'default-bucket') {
this.params = params;
this.BucketName = bucketName;
}
async process() {
const { params, BucketName } = this;
const s3 = new AWS.S3({ apiVersion: '2006-03-01', params: { Bucket: BucketName } });
// create readstreams for all the output files and store them
const createReadStream = fs.createReadStream;
const s3FileDwnldStreams = params.files.map(item => {
const stream = s3.getObject({ Key: item.key }).createReadStream();
return {
stream,
fileName: item.fileName
}
});
const streamPassThrough = new Stream.PassThrough();
// Create a zip archive using streamPassThrough style for the linking request in s3bucket
const uploadParams = {
ACL: 'private',
Body: streamPassThrough,
ContentType: 'application/zip',
Key: params.zippedFileKey
};
const s3Upload = s3.upload(uploadParams, (err, data) => {
if (err) {
console.error('upload err', err)
} else {
console.log('upload data', data);
}
});
s3Upload.on('httpUploadProgress', progress => {
// console.log(progress); // { loaded: 4915, total: 192915, part: 1, key: 'foo.jpg' }
});
// create the archiver
const archive = Archiver('zip', {
zlib: { level: 0 }
});
archive.on('error', (error) => {
throw new Error(`${error.name} ${error.code} ${error.message} ${error.path} ${error.stack}`);
});
// connect the archiver to upload streamPassThrough and pipe all the download streams to it
await new Promise((resolve, reject) => {
console.log("Starting upload of the output Files Zip Archive");
streamPassThrough.on('close', resolve());
streamPassThrough.on('end', resolve());
streamPassThrough.on('error', reject());
archive.pipe(streamPassThrough);
s3FileDwnldStreams.forEach((s3FileDwnldStream) => {
archive.append(s3FileDwnldStream.stream, { name: s3FileDwnldStream.fileName })
});
archive.finalize();
}).catch((error) => {
throw new Error(`${error.code} ${error.message} ${error.data}`);
});
// Finally wait for the uploader to finish
await s3Upload.promise();
}
}
The other solutions are great for not so many files (less than ~60). If they handle more files, they just quit into nothing with no errors. This is because they open too many streams.
This solution is inspired by https://gist.github.com/amiantos/16bacc9ed742c91151fcf1a41012445e
It is a working solution, which works well even with many files (+300) and returns a presigned URL to the zip which contains the files.
Main Lambda:
const AWS = require('aws-sdk');
const S3 = new AWS.S3({
apiVersion: '2006-03-01',
signatureVersion: 'v4',
httpOptions: {
timeout: 300000 // 5min Should Match Lambda function timeout
}
});
const archiver = require('archiver');
import stream from 'stream';
const UPLOAD_BUCKET_NAME = "my-s3-bucket";
const URL_EXPIRE_TIME = 5*60;
export async function getZipSignedUrl(event) {
const prefix = `uploads/id123123/}`; //replace this with your S3 prefix
let files = ["12314123.png", "56787567.png"] //replace this with your files
if (files.length == 0) {
console.log("No files to zip");
return result(404, "No pictures to download");
}
console.log("Files to zip: ", files);
try {
files = files.map(file => {
return {
fileName: file,
key: prefix + '/' + file,
type: "file"
};
});
const destinationKey = prefix + '/' + 'uploads.zip'
console.log("files: ", files);
console.log("destinationKey: ", destinationKey);
await streamToZipInS3(files, destinationKey);
const presignedUrl = await getSignedUrl(UPLOAD_BUCKET_NAME, destinationKey, URL_EXPIRE_TIME, "uploads.zip");
console.log("presignedUrl: ", presignedUrl);
if (!presignedUrl) {
return result(500, null);
}
return result(200, presignedUrl);
}
catch(error) {
console.error(`Error: ${error}`);
return result(500, null);
}
}
Helper functions:
export function result(code, message) {
return {
statusCode: code,
body: JSON.stringify(
{
message: message
}
)
}
}
export async function streamToZipInS3(files, destinationKey) {
await new Promise(async (resolve, reject) => {
var zipStream = streamTo(UPLOAD_BUCKET_NAME, destinationKey, resolve);
zipStream.on("error", reject);
var archive = archiver("zip");
archive.on("error", err => {
throw new Error(err);
});
archive.pipe(zipStream);
for (const file of files) {
if (file["type"] == "file") {
archive.append(getStream(UPLOAD_BUCKET_NAME, file["key"]), {
name: file["fileName"]
});
}
}
archive.finalize();
})
.catch(err => {
console.log(err);
throw new Error(err);
});
}
function streamTo(bucket, key, resolve) {
var passthrough = new stream.PassThrough();
S3.upload(
{
Bucket: bucket,
Key: key,
Body: passthrough,
ContentType: "application/zip",
ServerSideEncryption: "AES256"
},
(err, data) => {
if (err) {
console.error('Error while uploading zip')
throw new Error(err);
reject(err)
return
}
console.log('Zip uploaded')
resolve()
}
).on("httpUploadProgress", progress => {
console.log(progress)
});
return passthrough;
}
function getStream(bucket, key) {
let streamCreated = false;
const passThroughStream = new stream.PassThrough();
passThroughStream.on("newListener", event => {
if (!streamCreated && event == "data") {
const s3Stream = S3
.getObject({ Bucket: bucket, Key: key })
.createReadStream();
s3Stream
.on("error", err => passThroughStream.emit("error", err))
.pipe(passThroughStream);
streamCreated = true;
}
});
return passThroughStream;
}
export async function getSignedUrl(bucket: string, key: string, expires: number, downloadFilename?: string): Promise<string> {
const exists = await objectExists(bucket, key);
if (!exists) {
console.info(`Object ${bucket}/${key} does not exists`);
return null
}
let params = {
Bucket: bucket,
Key: key,
Expires: expires,
};
if (downloadFilename) {
params['ResponseContentDisposition'] = `inline; filename="${encodeURIComponent(downloadFilename)}"`;
}
try {
const url = s3.getSignedUrl('getObject', params);
return url;
} catch (err) {
console.error(`Unable to get URL for ${bucket}/${key}`, err);
return null;
}
};
Using streams may be tricky as I'm not sure how you could pipe multiple streams into an object. I've done this several times using standard file object. It's a multistep process and it's quite fast. Remember that Lambda operates in Linux so you have all Linux resources at hand including the system /tmp directory.
Create a sub-directory in /tmp call "transient" or whatever works for you
Use s3.getObject() and write file objects to /tmp/transient
Use the GLOB package to generate an array[] of paths from /tmp/transient
Loop the array and zip.addLocalFile(array[i]);
zip.writeZip('tmp/files.zip');
I've used a similar approach, but I'm facing the issue that some of the files in the generated ZIP file don't have the correct size (and corresponding data). Is there any limitation on the size of the files this code can manage? In my case I'm zipping large files (a few larger than 1GB) and the overall amount of data may reach 10GB.
I do not get any error/warning message, so it seems it all works fine.
Any idea what may be hapenning?

Resources