How to read file from createReadStream in Node.js? - node.js

I have web appication that can upload excel file. If user upload, the app should parse it and will return some rows that file have. So, The application don't need to save file to its filesystem. Parsing file and return rows is a job. But below code, I wrote this morning, it save file to its server and then parse it.. I think it's waste server resource.
I don't know how to read excel file with createReadStream. Without saving file, how can I parse excel directly? I am not familiar with fs, of course, I can delete file after the job finished, but is there any elegant way?
import { createWriteStream } from 'fs'
import path from 'path'
import xlsx from 'node-xlsx'
// some graphql code here...
async singleUpload(_, { file }, context) {
try {
console.log(file)
const { createReadStream, filename, mimetype, encoding } = await file
await new Promise((res) =>
createReadStream()
.pipe(createWriteStream(path.join(__dirname, '../uploads', filename)))
.on('close', res)
)
const workSheetsFromFile = xlsx.parse(path.join(__dirname, '../uploads', filename))
for (const row of workSheetsFromFile[0].data) {
console.log(row)
}
return { filename }
} catch (e) {
throw new Error(e)
}
},

Using express-fileupload library which provides a buffer representation for uploaded files (through data property), combined with excel.js which accepts a buffers will get you there.
see Express-fileupload and Excel.js
// read from a file
const workbook = new Excel.Workbook();
await workbook.xlsx.readFile(filename);
// ... use workbook
// read from a stream
const workbook = new Excel.Workbook();
await workbook.xlsx.read(stream);
// ... use workbook
// load from buffer // this is what you're looking for
const workbook = new Excel.Workbook();
await workbook.xlsx.load(data);
// ... use workbook
Here's a simplified example:
const app = require('express')();
const fileUpload = require('express-fileupload');
const { Workbook } = require('exceljs');
app.use(fileUpload());
app.post('/', async (req, res) => {
if (!req.files || Object.keys(req.files).length === 0) {
return res.status(400).send('No files were uploaded.');
}
// The name of the input field (i.e. "myFile") is used to retrieve the uploaded file
await new Workbook().xlsx.load(req.files.myFile.data)
});
app.listen(3000)

var xlsx = require('xlsx')
//var workbook = xlsx.readFile('testSingle.xlsx')
var workbook = xlsx.read(fileObj);
You just need to use xlsx.read method to read a stream of data.

you can add an event listener before you pipe the data, so you can do something with your file before it uploaded, it look like this
async singleUpload(_, { file }, context) {
try {
console.log(file)
const { createReadStream, filename, mimetype, encoding } = await file
await new Promise((res) =>
createReadStream()
.on('data', (data)=>{
//do something with your data/file
console.log({data})
//your code here
})
.pipe(createWriteStream(path.join(__dirname, '../uploads', filename)))
.on('close', res)
)
},
you can see the documentation
stream node js

Related

how to pipe file during download with puppteer?

Is that possible to pipe during download file with Puppter?
Code attached are example of download with puppteer while the second part is how i extract file during download.
i want to include part 2 in part 1 somehow.
const page = await browser.newPage();//skiped other configs
const client = await page.target().createCDPSession();//set directory of files
await client.send("Page.setDownloadBehavior", {
behavior: "allow",
downloadPath: process.cwd() + "\\src\\tempDataFiles\\rami",
});
//array of links from page
const fileUrlArray = await page.$$eval("selector", (files) => {
return files.map((link) => {link.getAttribute("href")});
//download files
const filtredFiles = fileUrlArray.filter((url) => url !== null);
for (const file of filtredFiles) {
await page.click(`[href="${file}"]`);
}
This code works perfect but the files are zip files and i want to extract them before save.
When i download file without puppter, the extraction is as next code.
**(in this case im not able use https request yet due to lake of knowledge)
The code of unzip file directly when download without puppter (simple http request)
const file = fs.createWriteStream(`./src/tempDataFiles/${store}/${fileName}.xml`);
const request = http.get(url, function (response) {
response.pipe(zlib.createGunzip()).pipe(file);
file.on("error", async (e) => {
Log.error(`downloadAndUnZipFile error : ${e}`);
await fileOnDb.destroy();
file.end();
});
file.on("finish", () => {
Log.success(`${fileName} download Completed`);
file.close();
});
});

How to upload a string of JSON data into GCS from NodeJS?

I am getting result set from BigQuery and looping through it so I have string (JSON data) in one of the column that needs to be uploaded to GCS bucket as a file.
File content would be something like
{
"name":"sharath",
"country":"India"
}
I tried using file.save() method, also passthroughStream but nothing happened (not even erroring out)
file.save() :
for (row of rows) {
const contents = row.JSON_Content;
const file = storage.bucket(gcsBucket).file("/" + process.env.FILE_TMP_PATH + fileName + '*.json');
file.save(contents).then(() => console.messages.push(`file uploaded`));
}
passthroughStream :
for (row of rows) {
const passthroughStream = new stream.PassThrough();
passthroughStream.write(contents);
passthroughStream.end();
passthroughStream.pipe(file.createWriteStream())
.on('error', (err) =>{
throw new Error(`File upload failed with error: ${err.message}`);
})
.on('finish', () => {
// The file upload is complete
});
}
Nothing is working out. These didn't create any file in GCS bucket. I referred this document
My overall code looks like:
//import libraries...
const xxx = {
myFunction: async () => {
try{
...get data from BigQuery...
...loop through resultset...
...code not working is illustrated above...
}catch(err){
throw new Error('error occured');
}
}
module.exports = xxx;
For save data to file, try to stream it (createWriteStream):
const fs = require('fs');
const stream = fs.createWriteStream("/" + process.env.FILE_TMP_PATH + fileName + '*.json', {flags:'a'});
for (row of rows) {
stream.write(row.JSON_Content);
}
stream.end();

how to create api route that will send a CSV file to the frontend in Next.js

As far as I know (correct me if i'm wrong please) the flow of downloading a file should be that the frontend make a call to an api route and everything else is going on on the server.
My task was to read from firestore and write it to the CSV file, I populated the CSV file with the data and now when I try to send it to the frontend only thing that is in the file after the download it the first line containing headers name and email (the file that was written on my computer is correctly willed with the data). This is my route
import { NextApiHandler } from "next";
import fs from "fs";
import { stringify } from "csv-stringify";
import { firestore } from "../../firestore";
import { unstable_getServerSession } from "next-auth/next";
import { authOptions } from "./auth/[...nextauth]";
const exportFromFirestoreHandler: NextApiHandler = async (req, res) => {
const session = await unstable_getServerSession(req, res, authOptions);
if (!session) {
return res.status(401).json({ message: "You must be authorized here" });
}
const filename = "guestlist.csv";
const writableStream = fs.createWriteStream(filename);
const columns = ["name", "email"];
const stringifier = stringify({ header: true, columns });
const querySnapshot = await firestore.collection("paprockibrzozowski").get();
await querySnapshot.docs.forEach((entry) => {
stringifier.write([entry.data().name, entry.data().email], "utf-8");
});
stringifier.pipe(writableStream);
const csvFile = await fs.promises.readFile(
`${process.cwd()}/${filename}`,
"utf-8"
);
res.status(200).setHeader("Content-Type", "text/csv").send(csvFile);
};
export default exportFromFirestoreHandler;
since I await querySnapshot and await readFile I would expect that the entire content of the file would be sent to the frontend. Can you please tell me what am I doing wrong?
Thanks
If anyone will struggle with this same stuff here is the answer base on the # Nelloverflowc thank you for getting me this far, hoverver files not always were populated with data, at first I tried like so
stringifier.on("close", async () => {
const csvFile = fs.readFileSync(`${process.cwd()}/${filename}`, "utf-8");
res
.status(200)
.setHeader("Content-Type", "text/csv")
.setHeader("Content-Disposition", `attachment; filename=${filename}`)
.send(csvFile);
});
stringifier.end();
the api of https://csv.js.org/ must have changed becuase instead of on.('finish') it is on close now, so reading file sync did the job regarding always getting the file populated with the correct data, but along with it there was an error
API resolved without sending a response for /api/export-from-db, this may result in stalled requests.
the solution to that is to convert file into readable stream like so
try {
const csvFile = fs.createReadStream(`${process.cwd()}/${filename}`);
res
.status(200)
.setHeader("Content-Type", "text/csv")
.setHeader("Content-Disposition", `attachment; filename=${filename}`)
.send(csvFile);
} catch (error) {
res.status(400).json({ error });
}
Here is the tread and the discussion that helped me
Node.js send file in response
The await on that forEach is most definitely not doing what you expect it to do, also you probably shouldn't use await and forEach together
Either switch to using the Sync API for the csv-stringify library or do something along these lines (assuming the first .get() actually contains the actual values from a promise):
[...]
stringifier.pipe(writableStream);
stringifier.on('finish', () => {
const csvFile = await fs.promises.readFile(
`${process.cwd()}/${filename}`,
"utf-8"
);
res.status(200).setHeader("Content-Type", "text/csv").send(csvFile);
});
for (const entry of querySnapshot.docs) {
stringifier.write([entry.data().name, entry.data().email], "utf-8");
);
stringifier.end();
[...]

how to make formidable not save to var/folders on nodejs and express app

I'm using formidable to parse incoming files and store them on AWS S3
When I was debugging the code I found out that formidable is first saving it to disk at /var/folders/ and overtime some unnecessary files are stacked up on disk which could lead to a big problem later.
It's very silly of me using a code without fully understanding it and now
I have to figure out how to either remove the parsed file after saving it to S3 or save it to s3 without storing it in disk.
But the question is how do I do it?
I would appreciate if someone could point me in the right direction
this is how i handle the files:
import formidable, { Files, Fields } from 'formidable';
const form = new formidable.IncomingForm();
form.parse(req, async (err: any, fields: Fields, files: Files) => {
let uploadUrl = await util
.uploadToS3({
file: files.uploadFile,
pathName: 'myPathName/inS3',
fileKeyName: 'file',
})
.catch((err) => console.log('S3 error =>', err));
}
This is how i solved this problem:
When I parse incoming form-multipart data I have access to all the details of the files. Because it's already parsed and saved to local disk on the server/my computer. So using the path variable given to me by formidable I unlink/remove that file using node's built-in fs.unlink function. Of course I remove the file after saving it to AWS S3.
This is the code:
import fs from 'fs';
import formidable, { Files, Fields } from 'formidable';
const form = new formidable.IncomingForm();
form.multiples = true;
form.parse(req, async (err: any, fields: Fields, files: Files) => {
const pathArray = [];
try {
const s3Url = await util.uploadToS3(files);
// do something with the s3Url
pathArray.push(files.uploadFileName.path);
} catch(error) {
console.log(error)
} finally {
pathArray.forEach((element: string) => {
fs.unlink(element, (err: any) => {
if (err) console.error('error:',err);
});
});
}
})
I also found a solution which you can take a look at here but due to the architecture if found it slightly hard to implement without changing my original code (or let's just say I didn't fully understand the given implementation)
I think i found it. According to the docs see options.fileWriteStreamHandler, "you need to have a function that will return an instance of a Writable stream that will receive the uploaded file data. With this option, you can have any custom behavior regarding where the uploaded file data will be streamed for. If you are looking to write the file uploaded in other types of cloud storages (AWS S3, Azure blob storage, Google cloud storage) or private file storage, this is the option you're looking for. When this option is defined the default behavior of writing the file in the host machine file system is lost."
const form = formidable({
fileWriteStreamHandler: someFunction,
});
EDIT: My whole code
import formidable from "formidable";
import { Writable } from "stream";
import { Buffer } from "buffer";
import { v4 as uuidv4 } from "uuid";
export const config = {
api: {
bodyParser: false,
},
};
const formidableConfig = {
keepExtensions: true,
maxFileSize: 10_000_000,
maxFieldsSize: 10_000_000,
maxFields: 2,
allowEmptyFiles: false,
multiples: false,
};
// promisify formidable
function formidablePromise(req, opts) {
return new Promise((accept, reject) => {
const form = formidable(opts);
form.parse(req, (err, fields, files) => {
if (err) {
return reject(err);
}
return accept({ fields, files });
});
});
}
const fileConsumer = (acc) => {
const writable = new Writable({
write: (chunk, _enc, next) => {
acc.push(chunk);
next();
},
});
return writable;
};
// inside the handler
export default async function handler(req, res) {
const token = uuidv4();
try {
const chunks = [];
const { fields, files } = await formidablePromise(req, {
...formidableConfig,
// consume this, otherwise formidable tries to save the file to disk
fileWriteStreamHandler: () => fileConsumer(chunks),
});
// do something with the files
const contents = Buffer.concat(chunks);
const bucketRef = storage.bucket("your bucket");
const file = bucketRef.file(files.mediaFile.originalFilename);
await file
.save(contents, {
public: true,
metadata: {
contentType: files.mediaFile.mimetype,
metadata: { firebaseStorageDownloadTokens: token },
},
})
.then(() => {
file.getMetadata().then((data) => {
const fileName = data[0].name;
const media_path = `https://firebasestorage.googleapis.com/v0/b/${bucketRef?.id}/o/${fileName}?alt=media&token=${token}`;
console.log("File link", media_path);
});
});
} catch (e) {
// handle errors
console.log("ERR PREJ ...", e);
}
}

download and untar file than check the content, async await problem, node.js

I am downloading a file in tar format with request-promise module. Then I untar that file with tar module using async await syntax.
const list = new Promise(async (resolve, reject) => {
const filePath = "somedir/myFile.tar.gz";
if (!fs.existsSync(filePath)) {
const options = {
uri: "http://tarFileUrl",
encoding: "binary"
};
try {
console.log("download and untar");
const response = await rp.get(options);
const file = await fs.createWriteStream(filePath);
file.write(response, 'binary');
file.on('finish', () => {
console.log('wrote all data to file');
//here is the untar process
tar.x(
{
file: filePath,
cwd: "lists"
}
);
console.log("extracted");
});
file.end();
} catch(e) {
reject();
}
console.log("doesn't exist");
}
}
//here I am checking if the file exists no need to download either extract it (the try catch block)
//then the Array is created which includes the the list content line by line
if (fs.existsSync(filePath)) {
const file = await fs.readFileSync("lists/alreadyExtractedFile.list").toString().match(/[^\r\n]+/g);
if (file) {
file.map(name => {
if (name === checkingName) {
blackListed = true;
return resolve(blackListed);
}
});
}
else {
console.log("err");
}
}
The console.log output sequence is like so:
download and untar
file doesn't exist
UnhandledPromiseRejectionWarning: Error: ENOENT: no such file or directory, open '...lists/alreadyExtractedFile.list'
wrote all data to file
extracted
So the file lists/alreadyExtractedFile.list is being checked before it's created. My guess is I am doing some wrong async await actions. As console.logs pointed that out the second checking block is somehow coming earlier than the file creating and untaring processes.
Please help me to figure out what I am doing wrong.
Your problem is here
const file = await fs.readFileSync("lists/alreadyExtractedFile.list").toString().match(/[^\r\n]+/g);
the readFileSync function doesn't return a promise, so you shouldn't await it:
const file = fs.readFileSync("lists/alreadyExtractedFile.list")
.toString().match(/[^\r\n]+/g);
This should solve the issue
You need to call resolve inside new Promise() callback.
If you write a local utility and use some sync methods, you can use sync methods whenever possible (in fs, tar etc).
This is a small example where a small archive from the Node.js repository is asynchronously downloaded, synchronously written and unpacked, then a file is synchronously read:
'use strict';
const fs = require('fs');
const rp = require('request-promise');
const tar = require('tar');
(async function main() {
try {
const url = 'https://nodejs.org/download/release/latest/node-v11.10.1-headers.tar.gz';
const arcName = 'node-v11.10.1-headers.tar.gz';
const response = await rp.get({ uri: url, encoding: null });
fs.writeFileSync(arcName, response, { encoding: null });
tar.x({ file: arcName, cwd: '.', sync: true });
const fileContent = fs.readFileSync('node-v11.10.1/include/node/v8-version.h', 'utf8');
console.log(fileContent.match(/[^\r\n]+/g));
} catch (err) {
console.error(err);
}
})();

Resources