aws sdk Multipart Upload to s3 with node.js - node.js

I am trying to upload large files to a s3 bucket using the node.js aws-sdk.
the V2 method upload integrally uploads the files in a multipart upload.
I want to use the new V3 aws-sdk. What is the way to upload large files in the new version? The method PutObjectCommand doesn't seem to be doing it.
I've seen there are methods such as CreateMultiPartUpload but I can't seem to find a full working example using them.
Thanks in advance.

As of 2021, I would suggest using the lib-storage package, which abstracts a lot of the implementation details.
Sample code:
import { Upload } from "#aws-sdk/lib-storage";
import { S3Client, S3 } from "#aws-sdk/client-s3";
const target = { Bucket, Key, Body };
try {
const parallelUploads3 = new Upload({
client: new S3({}) || new S3Client({}),
tags: [...], // optional tags
queueSize: 4, // optional concurrency configuration
partSize: 5MB, // optional size of each part
leavePartsOnError: false, // optional manually handle dropped parts
params: target,
});
parallelUploads3.on("httpUploadProgress", (progress) => {
console.log(progress);
});
await parallelUploads3.done();
} catch (e) {
console.log(e);
}
Source: https://github.com/aws/aws-sdk-js-v3/blob/main/lib/lib-storage/README.md

Here's what I came up with, to upload a Buffer as a multipart upload, using aws-sdk v3 for nodejs and TypeScript.
Error handling still needs some work (you might want to abort/retry in case of an error), but it should be a good starting point... I have tested this with XML files up to 15MB, and so far so good. No guarantees, though! ;)
import {
CompleteMultipartUploadCommand,
CompleteMultipartUploadCommandInput,
CreateMultipartUploadCommand,
CreateMultipartUploadCommandInput,
S3Client,
UploadPartCommand,
UploadPartCommandInput
} from '#aws-sdk/client-s3'
const client = new S3Client({ region: 'us-west-2' })
export const uploadMultiPartObject = async (file: Buffer, createParams: CreateMultipartUploadCommandInput): Promise<void> => {
try {
const createUploadResponse = await client.send(
new CreateMultipartUploadCommand(createParams)
)
const { Bucket, Key } = createParams
const { UploadId } = createUploadResponse
console.log('Upload initiated. Upload ID: ', UploadId)
// 5MB is the minimum part size
// Last part can be any size (no min.)
// Single part is treated as last part (no min.)
const partSize = (1024 * 1024) * 5 // 5MB
const fileSize = file.length
const numParts = Math.ceil(fileSize / partSize)
const uploadedParts = []
let remainingBytes = fileSize
for (let i = 1; i <= numParts; i ++) {
let startOfPart = fileSize - remainingBytes
let endOfPart = Math.min(partSize, startOfPart + remainingBytes)
if (i > 1) {
endOfPart = startOfPart + Math.min(partSize, remainingBytes)
startOfPart += 1
}
const uploadParams: UploadPartCommandInput = {
// add 1 to endOfPart due to slice end being non-inclusive
Body: file.slice(startOfPart, endOfPart + 1),
Bucket,
Key,
UploadId,
PartNumber: i
}
const uploadPartResponse = await client.send(new UploadPartCommand(uploadParams))
console.log(`Part #${i} uploaded. ETag: `, uploadPartResponse.ETag)
remainingBytes -= Math.min(partSize, remainingBytes)
// For each part upload, you must record the part number and the ETag value.
// You must include these values in the subsequent request to complete the multipart upload.
// https://docs.aws.amazon.com/AmazonS3/latest/API/API_CompleteMultipartUpload.html
uploadedParts.push({ PartNumber: i, ETag: uploadPartResponse.ETag })
}
const completeParams: CompleteMultipartUploadCommandInput = {
Bucket,
Key,
UploadId,
MultipartUpload: {
Parts: uploadedParts
}
}
console.log('Completing upload...')
const completeData = await client.send(new CompleteMultipartUploadCommand(completeParams))
console.log('Upload complete: ', completeData.Key, '\n---')
} catch(e) {
throw e
}
}

Here is the fully working code with AWS SDK v3
import { Upload } from "#aws-sdk/lib-storage";
import { S3Client, S3 } from "#aws-sdk/client-s3";
import { createReadStream } from 'fs';
const inputStream = createReadStream('clamav_db.zip');
const Bucket = process.env.DB_BUCKET
const Key = process.env.FILE_NAME
const Body = inputStream
const target = { Bucket, Key, Body};
try {
const parallelUploads3 = new Upload({
client: new S3Client({
region: process.env.AWS_REGION,
credentials: { accessKeyId: process.env.AWS_ACCESS_KEY, secretAccessKey: process.env.AWS_SECRET_KEY }
}),
queueSize: 4, // optional concurrency configuration
partSize: 5242880, // optional size of each part
leavePartsOnError: false, // optional manually handle dropped parts
params: target,
});
parallelUploads3.on("httpUploadProgress", (progress) => {
console.log(progress);
});
await parallelUploads3.done();
} catch (e) {
console.log(e);
}

Related

File chunk upload to azure storage blob, file seems broken

I'm trying to upload excel file to azure storage blob in chunks, using the stage block and commitblock from BlobBlockClient Class. File upload seems to success but when i try to download and open the file, there it seems to be broken.
I'm using react and node js to do this. Code follows below
In UI
const chunkSize = (1024 * 1024) * 25; // file chunk size
// here slicing the file and sending it to api method
const fileReader = new FileReader();
const from = currentChunkIndexRef.current * chunkSize;
const to = from + chunkSize;
const blob = file.slice(from, to);
fileReader.onload = ((e: any) => uploadChunksToBlob(e, file, obj));
fileReader.readAsDataURL(blob);
// api method
const uploadChunksToBlob = async (event: any, file: File, obj: any) => {
try {
const totalChunks = Math.ceil(file.size / chunkSize);
const uploadChunkURL = `/upload?currentChunk=${currentChunkIndexRef.current}&totalChunks=${totalChunks}&file=${file.name}&type=${file.type}`;
console.log(event.target.result)
const fileUpload = await fetch(uploadChunkURL, {
method: "POST",
headers: { "Content-Type": "application/octet-stream" },
body: JSON.stringify(event.target.result),
});
const fileUploadJson = await fileUpload.json();
const isLastChunk = (totalChunks - 1) === currentChunkIndexRef.current;
if(!isLastChunk) {
console.log({ Chunk: currentChunkIndexRef.current });
currentChunkIndexRef.current = currentChunkIndexRef.current + 1;
// eslint-disable-next-line #typescript-eslint/no-use-before-define
uploadFileToAzureBlob(file, obj);
} else {
console.log("File Uploaded")
}
//
} catch (error) {
console.log("uploadFileToAzureBlob Catch Error" + error);
}
}
// In Node
const sharedKeyCredential = new StorageSharedKeyCredential(
config.StorageAccountName,
config.StorageAccountAccessKey
);
const pipeline = newPipeline(sharedKeyCredential);
const blobServiceClient = new BlobServiceClient(
`https://${config.StorageAccountName}.blob.core.windows.net`,
pipeline
);
const containerName = getContainerName(req.headers.key, req.headers.clientcode);
const identifier = uuid.v4();
const blobName = getBlobName(identifier, file);
const containerClient = blobServiceClient.getContainerClient(containerName);
const blockBlobClient = containerClient.getBlockBlobClient(blobName);
try {
let bufferObj = Buffer.from(`${file}_${Number(currentChunk)}`, "utf8"); // Create buffer object, specifying utf8 as encoding
let base64String = bufferObj.toString("base64"); // Encode the Buffer as a base64 string
blockIds = [...blockIds, base64String];
const bufferedData = Buffer.from(req.body);
let resultOfUnitArray = new Uint8Array(bufferedData.length);
for (let j = 0; j < bufferedData.length; j++) {
resultOfUnitArray[j] = bufferedData.toString().charCodeAt(j);
} // Converting string to bytes
const stageBlockResponse = await blockBlobClient.stageBlock(base64String, resultOfUnitArray, resultOfUnitArray.length, {
onProgress: (e) => {
console.log("bytes sent: " + e.loadedBytes);
}
});
if ((Number(totalChunks) - 1) === (Number(currentChunk))) {
const commitblockResponse = await blockBlobClient.commitBlockList(blockIds, {blobHTTPHeaders: req.headers});
res.json({ uuid: identifier, message: 'File uploaded to Azure Blob storage.' });
} else {
res.json({ message: `Current Chunks ${currentChunk} is Successfully Uploaded` });
}
} catch (err) {
console.log({ err })
res.json({ message: err.message });
}
I don't know, what i'm doing wrong here.
Any help would be appreciated
Thank you
The problem is that you convert it into dataURL, that’s where things break.
It appears to me that you're under the wrong impression that you need to first encode a blob into string in order to send it. Well, you don't have to, browser fetch API is capable to handle raw binary payload.
So on the client (browser) side, you don’t need to go through FileReader. Just send the chunk blob directly.
const blob = file.slice(from, to);
// ...
fetch(uploadChunkURL, {
method: "POST",
headers: { "Content-Type": "application/octet-stream" },
body: blob,
});
On the server (node.js) side, you'll receive the blob in raw binary form, so you can simply forward that blob untouched to azure storage. There's no need to decode from string and move bytes onto resultOfUnitArray like you currently do.
const base64String = Buffer.from(`${file}_${Number(currentChunk)}`, "utf8").toString("base64");
const bufferedData = Buffer.from(req.body);
const stageBlockResponse = await blockBlobClient.stageBlock(
base64String,
bufferedData,
bufferedData.length
);

Delivering image from S3 to React client via Context API and Express server

I'm trying to download a photo from an AWS S3 bucket via an express server to serve to a react app but I'm not having much luck. Here are my (unsuccessful) attempts so far.
The Workflow is as follows:
Client requests photo after retrieving key from database via Context API
Request sent to express server route (important so as to hide the true location from the client)
Express server route requests blob file from AWS S3 bucket
Express server parses image to base64 and serves to client
Client updates state with new image
React Client
const [profilePic, setProfilePic] = useState('');
useEffect(() => {
await actions.getMediaSource(tempPhoto.key)
.then(resp => {
console.log('server resp: ', resp.data.data.newTest) // returns ����\u0000�\u0000\b\u0006\
const url = window.URL || window.webkitURL;
const blobUrl = url.createObjectURL(resp.data.data.newTest);
console.log("blob ", blobUrl);
setProfilePic({ ...profilePic, image : resp.data.data.newTest });
})
.catch(err => errors.push(err));
}
Context API - just axios wrapped into its own library
getMediaContents = async ( key ) => {
return await this.API.call(`http://localhost:5000/${MEDIA}/mediaitem/${key}`, "GET", null, true, this.state.accessToken, null);
}
Express server route
router.get("/mediaitem/:key", async (req, res, next) => {
try{
const { key } = req.params;
// Attempt 1 was to try with s3.getObject(downloadParams).createReadStream();
const readStream = getFileStream(key);
readStream.pipe(res);
// Attempt 2 - attempt to convert response to base 64 encoding
var data = await getFileStream(key);
var test = data.Body.toString("utf-8");
var container = '';
if ( data.Body ) {
container = data.Body.toString("utf-8");
} else {
container = undefined;
}
var buffer = (new Buffer.from(container));
var test = buffer.toString("base64");
require('fs').writeFileSync('../uploads', test); // it never wrote to this directory
console.log('conversion: ', test); // prints: 77+977+977+977+9AO+/vQAIBgYH - this doesn't look like base64 to me.
delete buffer;
res.status(201).json({ newTest: test });
} catch (err){
next(ApiError.internal(`Unexpected error > mediaData/:id GET -> Error: ${err.message}`));
return;
}
});
AWS S3 Library - I made my own library for using the s3 bucket as I'll need to use more functionality later.
const getFileStream = async (fileKey) => {
const downloadParams = {
Key: fileKey,
Bucket: bucketName
}
// This was attempt 1's return without async in the parameter
return s3.getObject(downloadParams).createReadStream();
// Attempt 2's intention was just to wait for the promise to be fulfilled.
return await s3.getObject(downloadParams).promise();
}
exports.getFileStream = getFileStream;
If you've gotten this far you may have realised that I've tried a couple of things from different sources and documentation but I'm not getting any further. I would really appreciate some pointers and advice on what I'm doing wrong and what I could improve on.
If any further information is needed then just let me know.
Thanks in advance for your time!
Maybe it be useful for you, that's how i get image from S3, and process image on server
Create temporary directory
createTmpDir(): Promise<string> {
return mkdtemp(path.join(os.tmpdir(), 'tmp-'));
}
Gets the file
readStream(path: string) {
return this.s3
.getObject({
Bucket: this.awsConfig.bucketName,
Key: path,
})
.createReadStream();
}
How i process file
async MainMethod(fileName){
const dir = await this.createTmpDir();
const serverPath = path.join(
dir,
fileName
);
await pipeline(
this.readStream(attachent.key),
fs.createWriteStream(serverPath + '.jpg')
);
const createFile= await sharp(serverPath + '.jpg')
.jpeg()
.resize({
width: 640,
fit: sharp.fit.inside,
})
.toFile(serverPath + '.jpeg');
const imageBuffer = fs.readFileSync(serverPath + '.jpeg');
//my manipulations
fs.rmSync(dir, { recursive: true, force: true }); //delete temporary folder
}

Use original file name in AWS s3 uploader

I have implemented a s3 uploader per these instructions https://aws.amazon.com/blogs/compute/uploading-to-amazon-s3-directly-from-a-web-or-mobile-application/
This is the Lambda function code
AWS.config.update({ region: process.env.AWS_REGION })
const s3 = new AWS.S3()
const URL_EXPIRATION_SECONDS = 300
// Main Lambda entry point
exports.handler = async (event) => {
return await getUploadURL(event)
}
const getUploadURL = async function(event) {
const randomID = parseInt(Math.random() * 10000000)
const Key = `${randomID}.jpg`
// Get signed URL from S3
const s3Params = {
Bucket: process.env.UploadBucket,
Key,
Expires: URL_EXPIRATION_SECONDS,
Currently the filename (key) is generated using a random ID.
I would like to change that to use the original filename of the uploaded file.
I tried a couple approaches such as using the the fs.readfile() to get the filename but have not had any luck.
There is a webpage with a form that works in conjunction with the Lambda to upload the file to s3.
How do I get the filename?
If you want to save the file with the original filename, you have to pass that filename as part of the key you use to request the signed url. You don't show how you're getting the file to upload, but if it is part of a web site, you get this from the client.
On the client side you have the user identify the file to upload and pass that to your code that calls getUploadURL(). Maybe in your code it is part of event? Then you send the signed URL back to the client and then the client can send the file to the signed URL.
Therefore to upload a file, your client has to send two requests to your server -- one to get the URL and one to upload the file.
You do mention that you're using fs.readFile() If you're able to get the file with this call, then you already have the file name. All you have to do is pass the same name to getUploadURL() as an additional parameter or as part of event. You may have to parse the filename first or within getUploadURL() if it includes a path to someplace other than your current working directory.
The code above looks like it may be a Lambda that's getting called with some event. If that event is a trigger of some sort that you can include a file name, then you can look pull it from that variable. For example:
const getUploadURL = async function(event) {
const randomID = parseInt(Math.random() * 10000000)
const Key = `${event.fileNameFromTrigger}`
// Get signed URL from S3
const s3Params = {
Bucket: process.env.UploadBucket,
Key,
Expires: URL_EXPIRATION_SECONDS.
...
}
If the file name includes the extension, then you don't need to append that as you were with the random name.
I modified the Lambda
changed this
const randomID = parseInt(Math.random() * 10000000)
const Key = `${randomID}.jpg`
to this
const Key = event.queryStringParameters.filename
And this the frontend code with my endpoint redacted. Note the query ?filename= appended to the endpoint and how I used this.filename = file.name
<script>
const MAX_IMAGE_SIZE = 1000000
/* ENTER YOUR ENDPOINT HERE */
const API_ENDPOINT = '{api-endpoint}/uploads?filename=' // e.g. https://ab1234ab123.execute-api.us-east-1.amazonaws.com/uploads
new Vue({
el: "#app",
data: {
image: '',
uploadURL: '',
filename: ''
},
methods: {
onFileChange (e) {
let files = e.target.files || e.dataTransfer.files
//let filename = files[0].name
if (!files.length) return
this.createImage(files[0])
},
createImage (file) {
// var image = new Image()
let reader = new FileReader()
reader.onload = (e) => {
//console.log(file.name)
console.log('length: ', e.target.result.includes('data:image/jpeg'))
if (!e.target.result.includes('data:image/jpeg')) {
return alert('Wrong file type - JPG only.')
}
if (e.target.result.length > MAX_IMAGE_SIZE) {
return alert('Image is loo large.')
}
this.image = e.target.result
this.filename = file.name
}
reader.readAsDataURL(file)
},
removeImage: function (e) {
console.log('Remove clicked')
this.image = ''
this.filename = ''
},
uploadImage: async function (e) {
console.log('Upload clicked')
// Get the presigned URL
const response = await axios({
method: 'GET',
url: API_ENDPOINT + this.filename
})
console.log('Response: ', response)
console.log('Uploading: ', this.image)
let binary = atob(this.image.split(',')[1])
let array = []
for (var i = 0; i < binary.length; i++) {
array.push(binary.charCodeAt(i))
}
let blobData = new Blob([new Uint8Array(array)], {type: 'image/jpeg'})
console.log('Uploading to: ', response.uploadURL)
const result = await fetch(response.uploadURL, {
method: 'PUT',
body: blobData
})
console.log('Result: ', result)
// Final URL for the user doesn't need the query string params
this.uploadURL = response.uploadURL.split('?')[0]
}
}
})
</script>

How to mark a file private before it's uploaded to Google Cloud Storage?

I'm using #google-cloud/storage package and generating signed url to upload file like this:
const path = require("path");
const { Storage } = require("#google-cloud/storage");
const GOOGLE_CLOUD_KEYFILE = path.resolve(
__dirname + "/../gcloud_media_access.json"
);
const storage = new Storage({
keyFilename: GOOGLE_CLOUD_KEYFILE,
});
exports.uploadUrlGCloud = async (bucketName, key, isPrivate = false) => {
let bucket = storage.bucket(bucketName);
let file = bucket.file(key);
const options = {
version: "v4",
action: "write",
expires: Date.now() + 15 * 60 * 1000 // 15 minutes
};
let signedUrl = (await file.getSignedUrl(options))[0];
if(isPrivate){
await file.makePrivate({strict: true});
}
return signedUrl;
};
However when I call this function like this:
const url = await uploadUrlGCloud(bucket, key, true);
I'm getting 404 api error like this:
ApiError: No such object: testbucket/account/upload/4aac0fb0-92dd-11eb-8723-6b3ad09f80fa_demo.jpg
What I want to ask is is there a way to generate the signedUrl private? Before the file is uploaded, I want to mark it as private and prevent public access.
Edit:
I uploaded a file to the created signed URL, and made makePrivate again to the uploaded file. This time I didn't get any errors. However, when I checked the file again, I realized that is still public.
This is the function I tried to make file private:
const makeFilePrivate = async (bucketName, key) => {
return new Promise((resolve, reject) => {
let bucket = storage.bucket(bucketName);
let file = bucket.file(key);
try {
file.makePrivate({strict: true}, err => {
if(!err) {
resolve(file.isPublic());
} else
reject(err);
})
} catch (err) {
reject(err);
}
})
};
console.log(await makeFilePrivate(bucket, remotePath));
// True
You can't make the objects of a public bucket private due to the way how IAM and ACLs interact with one another.

Read data from .xlsx file on S3 using Nodejs Lambda

I'm still new in NodeJs and AWS, so forgive me if this is a noob question.
I am trying to read the data from an excel file (.xlsx). The lambda function receives the extension of the file type.
Here is my code:
exports.handler = async (event, context, callback) => {
console.log('Received event:', JSON.stringify(event, null, 2));
if (event.fileExt === undefined) {
callback("400 Invalid Input");
}
let returnData = "";
const S3 = require('aws-sdk/clients/s3');
const s3 = new S3();
switch(event.fileExt)
{
case "plain":
case "txt":
// Extract text
const params = {Bucket: 'filestation', Key: 'MyTXT.'+event.fileExt};
try {
await s3.getObject(params, function(err, data) {
if (err) console.log(err, err.stack); // an error occurred
else{ // successful response
returnData = data.Body.toString('utf-8');
context.done(null, returnData);
}
}).promise();
} catch (error) {
console.log(error);
return;
}
break;
case "xls":
case "xlsx":
returnData = "Excel";
// Extract text
const params2 = {Bucket: 'filestation', Key: 'MyExcel.'+event.fileExt};
const readXlsxFile = require("read-excel-file/node");
try {
const doc = await s3.getObject(params2);
const parsedDoc = await readXlsxFile(doc);
console.log(parsedDoc)
} catch (err) {
console.log(err);
const message = `Error getting object.`;
console.log(message);
throw new Error(message);
}
break;
case "docx":
returnData = "Word doc";
// Extract text
break;
default:
callback("400 Invalid Operator");
break;
}
callback(null, returnData);
};
The textfile part works. But the xlsx part makes the function time out.
I did install the read-excel-file dependency and uploaded the zip so that I have access to it.
But the function times out with this message:
"errorMessage": "2020-11-02T13:06:50.948Z 120bfb48-f29c-4e3f-9507-fc88125515fd Task timed out after 3.01 seconds"
Any help would be appreciated! Thanks for your time.
using the xlsx npm library. here's how we did it.
assuming the file is under the root project path.
const xlsx = require('xlsx');
// read your excel file
let readFile = xlsx.readFile('file_example_XLSX_5000.xlsx')
// get first-sheet's name
let sheetName = readFile.SheetNames[0];
// convert sheets to JSON. Best if sheet has a headers specified.
console.log(xlsx.utils.sheet_to_json(readFile.Sheets[sheetName]));
You need to install xlsx (SheetJs) library into the project:
npm install xlsx
and then import the "read" function into the lambda, get the s3 object's body and send to xlsx like this:
const { read } = require('sheetjs-style');
const aws = require('aws-sdk');
const s3 = new aws.S3({ apiVersion: '2006-03-01' });
exports.handler = async (event) => {
const bucketName = 'excel-files';
const fileKey = 'Demo Data.xlsx';
// Simple GetObject
let file = await s3.getObject({Bucket: bucketName, Key: fileKey}).promise();
const wb = read(file.Body);
const response = {
statusCode: 200,
body: JSON.stringify({
read: wb.Sheets,
}),
};
return response;
};
(of course, you can receive the bucket and filekey from parameters if you send them...)
Very Important: Use the READ (not the readFile) function and send the Body property (with capital "B") as a paremeter
I changed the timeout to 20 seconds and it works. Only one issue remains: const parsedDoc = await readXlsxFile(doc); wants to receive a string (filepath) and not a file.
Solved by using xlsx NPM library. Using a stream and giving it buffers.

Resources