How to upload a string of JSON data into GCS from NodeJS? - node.js

I am getting result set from BigQuery and looping through it so I have string (JSON data) in one of the column that needs to be uploaded to GCS bucket as a file.
File content would be something like
{
"name":"sharath",
"country":"India"
}
I tried using file.save() method, also passthroughStream but nothing happened (not even erroring out)
file.save() :
for (row of rows) {
const contents = row.JSON_Content;
const file = storage.bucket(gcsBucket).file("/" + process.env.FILE_TMP_PATH + fileName + '*.json');
file.save(contents).then(() => console.messages.push(`file uploaded`));
}
passthroughStream :
for (row of rows) {
const passthroughStream = new stream.PassThrough();
passthroughStream.write(contents);
passthroughStream.end();
passthroughStream.pipe(file.createWriteStream())
.on('error', (err) =>{
throw new Error(`File upload failed with error: ${err.message}`);
})
.on('finish', () => {
// The file upload is complete
});
}
Nothing is working out. These didn't create any file in GCS bucket. I referred this document
My overall code looks like:
//import libraries...
const xxx = {
myFunction: async () => {
try{
...get data from BigQuery...
...loop through resultset...
...code not working is illustrated above...
}catch(err){
throw new Error('error occured');
}
}
module.exports = xxx;

For save data to file, try to stream it (createWriteStream):
const fs = require('fs');
const stream = fs.createWriteStream("/" + process.env.FILE_TMP_PATH + fileName + '*.json', {flags:'a'});
for (row of rows) {
stream.write(row.JSON_Content);
}
stream.end();

Related

generated sitemaps are corrupted using sitemap library for node/js

I'm using a library called sitemap to generate files from an array of objects constructed during runtime. My goal is to upload these generated sitemaps to an S3 bucket.
So far, the function is hosted on AWS lambda and uploading generated files correctly to the bucket.
My problem is that, the generated sitemaps are corrupted. When I run the function locally, they get generated correctly without any issues.
Here's my handler:
module.exports.handler = async () => {
try {
console.log("inside handler....");
await clearGeneratedSitemapsFromTmpDir();
const sms = new SitemapAndIndexStream({
limit: 10000,
getSitemapStream: (i) => {
const sitemapStream = new SitemapStream({
lastmodDateOnly: true,
});
const linkPath = `/sitemap-${i + 1}.xml`;
const writePath = `/tmp/${linkPath}`;
sitemapStream.pipe(createWriteStream(resolve(writePath)));
return [new URL(linkPath, hostName).toString(), sitemapStream];
},
});
const data = await generateSiteMap();
sms.pipe(createWriteStream(resolve("/tmp/sitemap-index.xml")));
// data.forEach((item) => sms.write(item));
Readable.from(data).pipe(sms);
sms.end();
await uploadToS3();
await clearGeneratedSitemapsFromTmpDir();
} catch (error) {
console.log("🚀 ~ file: index.js ~ line 228 ~ exec ~ error", error);
Sentry.captureException(error);
}
};
The data variable has an array of around 11k items, so according to the code above, two sitemap files would be generated(first 10k, rest to second sitemap) in addition to a sitemap index where it lists the two generated sitemaps.
Here's my uploadToS3 function:
const uploadToS3 = async () => {
try {
console.log("uploading to s3....");
const files = await getGeneratedXmlFilesNames();
for (let i = 0; i < files.length; i += 1) {
const file = files[i];
const filePath = `/tmp/${file}`;
// const stream = createReadStream(resolve(filePath));
const fileRead = await readFileAsync(filePath, { encoding: "utf-8" });
const params = {
Body: fileRead,
Key: `${file}`,
ACL: "public-read",
ContentType: "application/xml",
ContentDisposition: "inline",
};
// const result = await s3Client.upload(params).promise();
const result = await s3Client.putObject(params).promise();
console.log(
"🚀 ~ file: index.js ~ line 228 ~ uploadToS3 ~ result",
result
);
}
} catch (error) {
console.log("uploadToS3 => error", error);
// Sentry.captureException(error);
}
};
And here's the function that cleans up the generated files from lambda's /tmp directory after upload to S3:
const clearGeneratedSitemapsFromTmpDir = async () => {
try {
console.log("cleaning up....");
const readLocalTempDirDir = await readDirAsync("/tmp");
const xmlFiles = readLocalTempDirDir.filter((file) =>
file.includes(".xml")
);
for (const file of xmlFiles) {
await unlinkAsync(`/tmp/${file}`);
console.log("deleting file....");
}
} catch (error) {
console.log(
"🚀 ~ file: index.js ~ line 207 ~ clearGeneratedSitemapsFromTmpDir ~ error",
error
);
}
};
My hunch is that the issue is related to streams as I haven't fully understood them yet.
Any help here is highly appreciated.
Side note: I tried to sleep for 10s before uploading, but that didn't work either.
As a workaround, I did this:
const data = await generateSiteMap();
const logger = createWriteStream(resolve("/tmp/all-urls.json.txt"), {
flags: "a",
});
data.forEach((el) => {
logger.write(JSON.stringify(el));
logger.write("\n");
});
logger.end();
const stream = lineSeparatedURLsToSitemapOptions(
createReadStream(resolve("/tmp/all-urls.json.txt"))
)
.pipe(sms)
.pipe(createWriteStream(resolve("/tmp/sitemap-index.xml")));
await new Promise((fulfill) => stream.on("finish", fulfill));
await uploadToS3();
await clearGeneratedSitemapsFromTmpDir();
Will keep question open in case somebody answers it correctly.

Read data from .xlsx file on S3 using Nodejs Lambda

I'm still new in NodeJs and AWS, so forgive me if this is a noob question.
I am trying to read the data from an excel file (.xlsx). The lambda function receives the extension of the file type.
Here is my code:
exports.handler = async (event, context, callback) => {
console.log('Received event:', JSON.stringify(event, null, 2));
if (event.fileExt === undefined) {
callback("400 Invalid Input");
}
let returnData = "";
const S3 = require('aws-sdk/clients/s3');
const s3 = new S3();
switch(event.fileExt)
{
case "plain":
case "txt":
// Extract text
const params = {Bucket: 'filestation', Key: 'MyTXT.'+event.fileExt};
try {
await s3.getObject(params, function(err, data) {
if (err) console.log(err, err.stack); // an error occurred
else{ // successful response
returnData = data.Body.toString('utf-8');
context.done(null, returnData);
}
}).promise();
} catch (error) {
console.log(error);
return;
}
break;
case "xls":
case "xlsx":
returnData = "Excel";
// Extract text
const params2 = {Bucket: 'filestation', Key: 'MyExcel.'+event.fileExt};
const readXlsxFile = require("read-excel-file/node");
try {
const doc = await s3.getObject(params2);
const parsedDoc = await readXlsxFile(doc);
console.log(parsedDoc)
} catch (err) {
console.log(err);
const message = `Error getting object.`;
console.log(message);
throw new Error(message);
}
break;
case "docx":
returnData = "Word doc";
// Extract text
break;
default:
callback("400 Invalid Operator");
break;
}
callback(null, returnData);
};
The textfile part works. But the xlsx part makes the function time out.
I did install the read-excel-file dependency and uploaded the zip so that I have access to it.
But the function times out with this message:
"errorMessage": "2020-11-02T13:06:50.948Z 120bfb48-f29c-4e3f-9507-fc88125515fd Task timed out after 3.01 seconds"
Any help would be appreciated! Thanks for your time.
using the xlsx npm library. here's how we did it.
assuming the file is under the root project path.
const xlsx = require('xlsx');
// read your excel file
let readFile = xlsx.readFile('file_example_XLSX_5000.xlsx')
// get first-sheet's name
let sheetName = readFile.SheetNames[0];
// convert sheets to JSON. Best if sheet has a headers specified.
console.log(xlsx.utils.sheet_to_json(readFile.Sheets[sheetName]));
You need to install xlsx (SheetJs) library into the project:
npm install xlsx
and then import the "read" function into the lambda, get the s3 object's body and send to xlsx like this:
const { read } = require('sheetjs-style');
const aws = require('aws-sdk');
const s3 = new aws.S3({ apiVersion: '2006-03-01' });
exports.handler = async (event) => {
const bucketName = 'excel-files';
const fileKey = 'Demo Data.xlsx';
// Simple GetObject
let file = await s3.getObject({Bucket: bucketName, Key: fileKey}).promise();
const wb = read(file.Body);
const response = {
statusCode: 200,
body: JSON.stringify({
read: wb.Sheets,
}),
};
return response;
};
(of course, you can receive the bucket and filekey from parameters if you send them...)
Very Important: Use the READ (not the readFile) function and send the Body property (with capital "B") as a paremeter
I changed the timeout to 20 seconds and it works. Only one issue remains: const parsedDoc = await readXlsxFile(doc); wants to receive a string (filepath) and not a file.
Solved by using xlsx NPM library. Using a stream and giving it buffers.

How to read file from createReadStream in Node.js?

I have web appication that can upload excel file. If user upload, the app should parse it and will return some rows that file have. So, The application don't need to save file to its filesystem. Parsing file and return rows is a job. But below code, I wrote this morning, it save file to its server and then parse it.. I think it's waste server resource.
I don't know how to read excel file with createReadStream. Without saving file, how can I parse excel directly? I am not familiar with fs, of course, I can delete file after the job finished, but is there any elegant way?
import { createWriteStream } from 'fs'
import path from 'path'
import xlsx from 'node-xlsx'
// some graphql code here...
async singleUpload(_, { file }, context) {
try {
console.log(file)
const { createReadStream, filename, mimetype, encoding } = await file
await new Promise((res) =>
createReadStream()
.pipe(createWriteStream(path.join(__dirname, '../uploads', filename)))
.on('close', res)
)
const workSheetsFromFile = xlsx.parse(path.join(__dirname, '../uploads', filename))
for (const row of workSheetsFromFile[0].data) {
console.log(row)
}
return { filename }
} catch (e) {
throw new Error(e)
}
},
Using express-fileupload library which provides a buffer representation for uploaded files (through data property), combined with excel.js which accepts a buffers will get you there.
see Express-fileupload and Excel.js
// read from a file
const workbook = new Excel.Workbook();
await workbook.xlsx.readFile(filename);
// ... use workbook
// read from a stream
const workbook = new Excel.Workbook();
await workbook.xlsx.read(stream);
// ... use workbook
// load from buffer // this is what you're looking for
const workbook = new Excel.Workbook();
await workbook.xlsx.load(data);
// ... use workbook
Here's a simplified example:
const app = require('express')();
const fileUpload = require('express-fileupload');
const { Workbook } = require('exceljs');
app.use(fileUpload());
app.post('/', async (req, res) => {
if (!req.files || Object.keys(req.files).length === 0) {
return res.status(400).send('No files were uploaded.');
}
// The name of the input field (i.e. "myFile") is used to retrieve the uploaded file
await new Workbook().xlsx.load(req.files.myFile.data)
});
app.listen(3000)
var xlsx = require('xlsx')
//var workbook = xlsx.readFile('testSingle.xlsx')
var workbook = xlsx.read(fileObj);
You just need to use xlsx.read method to read a stream of data.
you can add an event listener before you pipe the data, so you can do something with your file before it uploaded, it look like this
async singleUpload(_, { file }, context) {
try {
console.log(file)
const { createReadStream, filename, mimetype, encoding } = await file
await new Promise((res) =>
createReadStream()
.on('data', (data)=>{
//do something with your data/file
console.log({data})
//your code here
})
.pipe(createWriteStream(path.join(__dirname, '../uploads', filename)))
.on('close', res)
)
},
you can see the documentation
stream node js

upload to s3 with fileS buffer

I am trying to upload to s3 with bulk files.
Somehow if I am uploading with callback, it'll work properly but I want to push all the return data into an array then do something after. But it doesn't work.
I looked online, I was saw answers such as using async await or recurssive would work but still it's not working though. I even tried using reduce but no luck too
example of my reduce
return files.reduce((accumulator, current) => {
const {path, buffer} = current;
const s3 = new AWS.S3();
// https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#putObject-property
s3.putObject(awsS3sdkParams(path, buffer), function ( err, data ) {
const { protocol, host } = this.request.httpRequest.endpoint;
data.params = this.request.params;
data.params.url = `${protocol}//${host}/${data.params.Key}`;
return [...accumulator, data];
});
}, []);
example using recurrsive
const result = [];
const helper = (files) => {
const {path, buffer} = files[0];
const s3 = new AWS.S3();
// https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#putObject-property
s3.putObject(UploadService.awsS3sdkParams(path, buffer), function (err, data){
const { protocol, host } = this.request.httpRequest.endpoint;
data.params = this.request.params;
data.params.url = `${protocol}//${host}/${data.params.Key}`;
UtilsService.clDebug(data, 'data');
result.push(data);
files.shift();
if(files.length > 0) return helper(files);
});
};
helper(files);
return results;
example using promise
const result = [];
for(let {path, buffer} of files){
const s3 = new AWS.S3();
s3.putObject(awsS3sdkParams(path, buffer)).promise()
.then(file => {
result.push(file);
})
.catch(err => {
console.log(err, 'errs');
});
}
I can pretty much understand why result is always [] but how can I make it work though?
Reason why I cannot use async await is because I tried but then somehow after files are either uploaded with bad data that I cannot even open the file, or keys would be the same...
Does anyone has any other suggestions or advice?
Thanks in advance for any

minizip-asm extract function takes forever to execute

I am trying to fetch an AES encrypted password protected zip file from a google storage and extract a csv file from it. I am using google cloud functions for it with node 6.
I've tried using minizip-asm.js library to extract the file. It works intermittently. I am a newbie when it comes to node so would really appreciate some help :).
Here's the relevant piece of code which might help. Could someone help me figure out what's going wrong here.
exports.processFile = (event, callback) => {
const file = event.data;
const filename = file.name;
const projectId = "abc1234";
const bucketName = "abc_reports";
const Storage = require('#google-cloud/storage');
const storage = Storage({
projectId: projectId
});
const folder = storage.bucket(bucketName);
const minizip = require('minizip-asm.js');
if (file.metageneration === '1' && filename.match(".zip") != null) {
// metageneration attribute is updated on metadata changes.
// on create value is 1
console.log(`File ${file.name} uploaded.`);
folder.file(filename).download().then(function(data) {
console.log('Download of file complete');
//create csv file
var csvName = filename.split(".zip")[0] + ".csv"
var mz = new minizip(data[0]);
console.log(data[0]);
console.log(mz.list());
var extract = mz.extract(mz.list()[0].filepath,{
password: 'ABC#123'
})
console.log("extracted");
//write unzipped contents to file
folder.file(csvName).save(extract, function(err) {
if (!err) {
console.log("unzipped csv");
}
else console.log("Error in saving csv : "+err);
});
});
});
}
callback(null, 'Success!');
};
Thanks for the help.

Resources