I try downloading files with the fetch() function from github.
Then i try to save the fetched file Stream as a file with the fs-module.
When doing it, i get this error:
TypeError [ERR_INVALID_ARG_TYPE]: The "transform.writable" property must be an instance of WritableStream. Received an instance of WriteStream
My problem is, that i don't know the difference between WriteStream and WritableStream or how to convert them.
This is the code i run:
async function downloadFile(link, filename = "download") {
var response = await fetch(link);
var body = await response.body;
var filepath = "./" + filename;
var download_write_stream = fs.createWriteStream(filepath);
console.log(download_write_stream.writable);
await body.pipeTo(download_write_stream);
}
Node.js: v18.7.0
You can use Readable.fromWeb to convert body, which is a ReadableStream from the web streams API, into a NodeJS Readable stream that can be used with the fs methods.
Note that readable.pipe returns another stream instantly. To wait for it to finish, you can use the promise version of stream.finished to convert it into a Promise, or else you could add listeners for the 'finish' and 'error' events to detect success or failure.
const fs = require('fs');
const { Readable } = require('stream');
const { finished } = require('stream/promises');
async function downloadFile(link, filepath = './download') {
const response = await fetch(link);
const body = Readable.fromWeb(response.body);
const download_write_stream = fs.createWriteStream(filepath);
await finished(body.pipe(download_write_stream));
}
Good question. Web streams are something new, and they are different way of handling streams. WritableStream tells us that we can create WritableStreams as follows:
import {
WritableStream
} from 'node:stream/web';
const stream = new WritableStream({
write(chunk) {
console.log(chunk);
}
});
Then, you could create a custom stream that writes each chunk to disk. An easy way could be:
const download_write_stream = fs.createWriteStream('./the_path');
const stream = new WritableStream({
write(chunk) {
download_write_stream.write(chunk);
},
});
async function downloadFile(link, filename = 'download') {
const response = await fetch(link);
const body = await response.body;
await body.pipeTo(stream);
}
Related
I'm currently trying to upload an image to Supabase's Storage, this looks fairly simple from the docs
const { data, error } = await supabase.storage
.from('avatars')
.upload('public/avatar1.png', avatarFile)
Unfortunately Supabase expects a File type.
In my API I have a url that points to the image I want to save, what's the best way for me to get the image at my URL as a File in Node.js?
I have tried this:
let response;
try {
// fetch here is from the isomorphic-unfetch package so I can use it sever-side
response = await fetch('https://example.com/image.jpeg');
} catch (err) {
throw new Error(err);
}
let data = await response?.blob();
let metadata = {
type: 'image/jpeg',
};
let file = new File([data], 'test.jpg', metadata);
return file;
But I get a ReferenceError: File is not defined, which leads me to believe only the browser has access to creating a new File().
All I can find are answers about fs, which I think is Google getting confused. I don't think I can use fs to return a File type.
Any ideas?
So what you can do is: send an HTTP request to the file
const fs = require('fs');
const http = require('http'); // maybe https?
const fileStream = fs.createWriteStream('image.png');
const request = http.get('URL_HERE', function(response) {
response.pipe(fileStream);
});
The above code fetches and writes the file from the URL to your server, and then you need to read it and send it to the upload process.
const finalFile = fs.readFileSync( 'image.png', optionsObject );
And now you have your file object do your upload, then don't forget to remove it if not needed anymore.
You can do something like this:
const fspromise = require('fs').promises;
let response;
try {
// fetch here is from the isomorphic-unfetch package so I can use it sever-side
response = await fetch('https://example.com/image.jpeg');
} catch (err) {
throw new Error(err);
}
let data = await response?.blob();
let metadata = {
type: 'image/jpeg',
};
const file = blob2file(data);
function blob2file(blobData) {
const fd = new FormData();
fd.set('a', blobData);
return fd.get('a');
}
const { data, error } = await supabase.storage
.from('avatars')
.upload('public/avatar1.png', file)
After a lot of research, this isn't actually possible. I've tried a lot of npm packages that advertise being able to convert blobs to Files, but none of them seemed to work.
The only actual solution is to download the file as the other answers have suggested, but in my situation it just wasn't doable.
Given a function parses incoming streams:
async onData(stream, callback) {
const parsed = await simpleParser(stream)
// Code handling parsed stream here
// ...
return callback()
}
I'm looking for a simple and safe way to 'clone' that stream, so I can save it to a file for debugging purposes, without affecting the code. Is this possible?
Same question in fake code: I'm trying to do something like this. Obviously, this is a made up example and doesn't work.
const fs = require('fs')
const wstream = fs.createWriteStream('debug.log')
async onData(stream, callback) {
const debugStream = stream.clone(stream) // Fake code
wstream.write(debugStream)
const parsed = await simpleParser(stream)
// Code handling parsed stream here
// ...
wstream.end()
return callback()
}
No you can't clone a readable stream without consuming. However, you can pipe it twice, one for creating file and the other for 'clone'.
Code is below:
let Readable = require('stream').Readable;
var stream = require('stream')
var s = new Readable()
s.push('beep')
s.push(null)
var stream1 = s.pipe(new stream.PassThrough())
var stream2 = s.pipe(new stream.PassThrough())
// here use stream1 for creating file, and use stream2 just like s' clone stream
// I just print them out for a quick show
stream1.pipe(process.stdout)
stream2.pipe(process.stdout)
I've tried to implement the solution provided by #jiajianrong but was struggling to get it work with a createReadStream, because the Readable throws an error when I try to push the createReadStream directly. Like:
s.push(createReadStream())
To solve this issue I have used a helper function to transform the stream into a buffer.
function streamToBuffer (stream: any) {
const chunks: Buffer[] = []
return new Promise((resolve, reject) => {
stream.on('data', (chunk: any) => chunks.push(Buffer.from(chunk)))
stream.on('error', (err: any) => reject(err))
stream.on('end', () => resolve(Buffer.concat(chunks)))
})
}
Below the solution I have found using one pipe to generate a hash of the stream and the other pipe to upload the stream to a cloud storage.
import stream from 'stream'
const Readable = require('stream').Readable
const s = new Readable()
s.push(await streamToBuffer(createReadStream()))
s.push(null)
const fileStreamForHash = s.pipe(new stream.PassThrough())
const fileStreamForUpload = s.pipe(new stream.PassThrough())
// Generating file hash
const fileHash = await getHashFromStream(fileStreamForHash)
// Uploading stream to cloud storage
await BlobStorage.upload(fileName, fileStreamForUpload)
My answer is mostly based on the answer of jiajianrong.
We're trying to get an audiofile from Google Text-to-Speech and save it to Firebase Storage, using a Google Cloud Function. The documentation for Google Text-to-Speech show how to get an audiofile and save it locally:
// Performs the Text-to-Speech request
const [response] = await client.synthesizeSpeech(request);
// Write the binary audio content to a local file
const writeFile = util.promisify(fs.writeFile);
await writeFile('output.mp3', response.audioContent, 'binary');
console.log('Audio content written to file: output.mp3');
This results in an error message Error: EROFS: read-only file system. Google Cloud Storage doesn't allow writing files locally.
Using Firebase Storage bucket.upload() has a few problems:
const destinationPath = 'Audio/Spanish' + filename.ogg;
// Performs the Text-to-Speech request
const [response] = await client.synthesizeSpeech(request);
// response.audioContent is the downloaded file
await bucket.upload(response.audioContent, {
destination: destinationPath
));
The error message is TypeError: Path must be a string. The first parameter of bucket.upload() is The fully qualified path to the file you wish to upload to your bucket. and is expected to be a string so response.audioContent doesn't work.
The documentation for bucket.upload() suggests that destination: destinationPath is where we should put the path to the Firebase Storage location. Is this correct?
How do we take the audiofile from Google Text-to-Speech (response.audioContent) and save it as a string that bucket.upload() can access? Or should we use something else instead of bucket.upload()?
Here's our full cloud function:
exports.Google_T2S = functions.firestore.document('Users/{userID}/Spanish/T2S_Request').onUpdate((change, context) => {
if (change.after.data().word != undefined) {
// Performs the Text-to-Speech request
async function test() {
try {
const word = change.after.data().word; // the text
const longLanguage = 'Spanish';
const audioFormat = '.mp3';
// copied from https://cloud.google.com/text-to-speech/docs/quickstart-client-libraries#client-libraries-usage-nodejs
const fs = require('fs');
const util = require('util');
const textToSpeech = require('#google-cloud/text-to-speech'); // Imports the Google Cloud client library
const client = new textToSpeech.TextToSpeechClient(); // Creates a client
let myWordFile = word.replace(/ /g,"_"); // replace spaces with underscores in the file name
myWordFile = myWordFile.toLowerCase(); // convert the file name to lower case
myWordFile = myWordFile + audioFormat; // append .mp3 to the file name;
// copied from https://cloud.google.com/blog/products/gcp/use-google-cloud-client-libraries-to-store-files-save-entities-and-log-data
const {Storage} = require('#google-cloud/storage');
const storage = new Storage();
const bucket = storage.bucket('myProject-cd99d.appspot.com');
const destinationPath = 'Audio/Spanish/' + myWordFile;
const request = { // Construct the request
input: {text: word},
// Select the language and SSML Voice Gender (optional)
voice: {languageCode: 'es-ES', ssmlGender: 'FEMALE'},
// Select the type of audio encoding
audioConfig: {audioEncoding: 'MP3'},
};
const [response] = await client.synthesizeSpeech(request);
// Write the binary audio content to a local file
const writeFile = util.promisify(fs.writeFile);
await writeFile('output.mp3', response.audioContent, 'binary');
console.log('Audio content written to file: output.mp3')
// response.audioContent is the downloaded file
await bucket.upload(response.audioContent, {
destination: destinationPath
});
}
catch (error) {
console.error(error);
}
}
test();
} // close if
return 0; // prevents an error message "Function returned undefined, expected Promise or value"
});
file.save() was the answer. util.promisify was unnecessary, and causes an error message about original something. Here's the finished cloud function:
const functions = require('firebase-functions');
// // Create and Deploy Your First Cloud Functions
// // https://firebase.google.com/docs/functions/write-firebase-functions
//
// exports.helloWorld = functions.https.onRequest((request, response) => {
// response.send("Hello from Firebase!");
// });
async function textToSpeechRequest()
{
try
{
const word = change.after.data().word; // the text
const longLanguage = 'Spanish';
const audioFormat = '.mp3';
// copied from https://cloud.google.com/text-to-speech/docs/quickstart-client-libraries#client-libraries-usage-nodejs
const util = require('util');
const textToSpeech = require('#google-cloud/text-to-speech'); // Imports the Google Cloud client library
const client = new textToSpeech.TextToSpeechClient(); // Creates a client
let myWordFile = word.replace(/ /g,"_"); // replace spaces with underscores in the file name
myWordFile = myWordFile.toLowerCase(); // convert the file name to lower case
myWordFile = myWordFile + audioFormat; // append .mp3 to the file name;
// copied from https://cloud.google.com/blog/products/gcp/use-google-cloud-client-libraries-to-store-files-save-entities-and-log-data
const {Storage} = require('#google-cloud/storage');
const storage = new Storage();
//const bucket = storage.bucket('myProject-cd99d.appspot.com');
var file = bucket.file('Audio/Spanish/' + myWordFile);
const request = { // Construct the request
input: {text: word},
// Select the language and SSML Voice Gender (optional)
voice: {languageCode: 'es-ES', ssmlGender: 'FEMALE'},
// Select the type of audio encoding
audioConfig: {audioEncoding: 'MP3'},
};
const options = { // construct the file to write
metadata: {
contentType: 'audio/mpeg',
metadata: {
source: 'Google Text-to-Speech'
}
}
};
// copied from https://cloud.google.com/text-to-speech/docs/quickstart-client-libraries#client-libraries-usage-nodejs
const [response] = await client.synthesizeSpeech(request);
// Write the binary audio content to a local file
// response.audioContent is the downloaded file
return await file.save(response.audioContent, options)
.then(() => {
console.log("File written to Firebase Storage.")
return;
})
.catch((error) => {
console.error(error);
});
} // close try
catch (error) {
console.error(error);
} // close catch
} // close async function declaration
exports.Google_T2S = functions.firestore.document('Users/{userID}/Spanish/T2S_Request').onUpdate((change, context) => {
if (change.after.data().word !== undefined)
{
textToSpeechRequest();
} // close if
}); // close Google_T2S
We're getting an error TypeError: [ERR_INVALID_ARG_TYPE]: The "original" argument must be of type function at Object.promisify. This error doesn't appear to effect the cloud function.
To reiterate the stuff that didn't work, fs.createWriteStream didn't work because Google Cloud Functions can't handle Node file system commands. Instead, Google Cloud Functions have their own methods that wrap the Node file system commands. bucket.upload() will upload a local file to a bucket, but the path to the local file has to be a string, not a buffer or a stream coming from an API. file.save() is documented as
Write arbitrary data to a file.
This is a convenience method which wraps File#createWriteStream.
That's what I want! If there's one thing about my data, it's arbitrary. Or maybe contrary by nature. After that we just had to straighten out the contentType (audio/mpeg, not mp3) and the file path.
I am trying to create a function where I can pass file path and the read the file in async way. What I found out was that it supports streams()
const fs = require('fs');
var parse = require('csv-parse');
var async = require('async');
readCSVData = async (filePath): Promise<any> => {
let csvString = '';
var parser = parse({delimiter: ','}, function (err, data) {
async.eachSeries(data, function (line, callback) {
csvString = csvString + line.join(',')+'\n';
console.log(csvString) // I can see this value getting populated
})
});
fs.createReadStream(filePath).pipe(parser);
}
I got this code from here. but I am new to node js so I am not getting how to use await to get the data once all lines are parsed.
const csvData = await this.util.readCSVData(path)
My best workaround for this task is:
const csv = require('csvtojson')
const csvFilePath = 'data.csv'
const array = await csv().fromFile(csvFilePath);
This answer provides legacy code that uses async library. Promise-based control flow with async doesn't need this library. Asynchronous processing with async.eachSeries doesn't serve a good purpose inside csv-parse callback because a callback waits for data to be filled with all collected data.
If reading all data into memory is not an issue, CSV stream can be converted to a promise:
const fs = require('fs');
const getStream = require('get-stream');
const parse = require('csv-parse');
readCSVData = async (filePath): Promise<any> => {
const parseStream = parse({delimiter: ','});
const data = await getStream.array(fs.createReadStream(filePath).pipe(parseStream));
return data.map(line => line.join(',')).join('\n');
}
I am using pdfkit on my node server, typically creating pdf files, and then uploading them to s3.
The problem is that pdfkit examples pipe the pdf doc into a node write stream, which writes the file to the disk, I followed the example and worked correctly, however my requirement now is to pipe the pdf doc to a memory stream rather than save it on the disk (I am uploading to s3 anyway).
I've followed some node memory streams procedures but none of them seem to work with pdf pipe with me, I could just write strings to memory streams.
So my question is: How to pipe the pdf kit output to a memory stream (or something alike) and then read it as an object to upload to s3?
var fsStream = fs.createWriteStream(outputPath + fileName);
doc.pipe(fsStream);
An updated answer for 2020. There is no need to introduce a new memory stream because "PDFDocument instances are readable Node streams".
You can use the get-stream package to make it easy to wait for the document to finish before passing the result back to your caller.
https://www.npmjs.com/package/get-stream
const PDFDocument = require('pdfkit')
const getStream = require('get-stream')
const pdf = () => {
const doc = new PDFDocument()
doc.text('Hello, World!')
doc.end()
return await getStream.buffer(doc)
}
// Caller could do this:
const pdfBuffer = await pdf()
const pdfBase64string = pdfBuffer.toString('base64')
You don't have to return a buffer if your needs are different. The get-stream readme offers other examples.
There's no need to use an intermediate memory stream1 – just pipe the pdfkit output stream directly into a HTTP upload stream.
In my experience, the AWS SDK is garbage when it comes to working with streams, so I usually use request.
var upload = request({
method: 'PUT',
url: 'https://bucket.s3.amazonaws.com/doc.pdf',
aws: { bucket: 'bucket', key: ..., secret: ... }
});
doc.pipe(upload);
1 - in fact, it is usually undesirable to use a memory stream because that means buffering the entire thing in RAM, which is exactly what streams are supposed to avoid!
You could try something like this, and upload it to S3 inside the end event.
var doc = new pdfkit();
var MemoryStream = require('memorystream');
var memStream = new MemoryStream(null, {
readable : false
});
doc.pipe(memStream);
doc.on('end', function () {
var buffer = Buffer.concat(memStream.queue);
awsservice.putS3Object(buffer, fileName, fileType, folder).then(function () { }, reject);
})
A tweak of #bolav's answer worked for me trying to work with pdfmake and not pdfkit. First you need to have memorystream added to your project using npm or yarn.
const MemoryStream = require('memorystream');
const PdfPrinter = require('pdfmake');
const pdfPrinter = new PdfPrinter();
const docDef = {};
const pdfDoc = pdfPrinter.createPdfKitDocument(docDef);
const memStream = new MemoryStream(null, {readable: false});
const pdfDocStream = pdfDoc.pipe(memStream);
pdfDoc.end();
pdfDocStream.on('finish', () => {
console.log(Buffer.concat(memStream.queue);
});
My code to return a base64 for pdfkit:
import * as PDFDocument from 'pdfkit'
import getStream from 'get-stream'
const pdf = {
createPdf: async (text: string) => {
const doc = new PDFDocument()
doc.fontSize(10).text(text, 50, 50)
doc.end()
const data = await getStream.buffer(doc)
let b64 = Buffer.from(data).toString('base64')
return b64
}
}
export default pdf
Thanks to Troy's answer, mine worked with get-stream as well. The difference was I did not convert it to base64string, but rather uploaded it to AWS S3 as a buffer.
Here is my code:
import PDFDocument from 'pdfkit'
import getStream from 'get-stream';
import s3Client from 'your s3 config file';
const pdfGenerator = () => {
const doc = new PDFDocument();
doc.text('Hello, World!');
doc.end();
return doc;
}
const uploadFile = async () => {
const pdf = pdfGenerator();
const pdfBuffer = await getStream.buffer(pdf)
await s3Client.send(
new PutObjectCommand({
Bucket: 'bucket-name',
Key: 'filename.pdf',
Body: pdfBuffer,
ContentType: 'application/pdf',
})
);
}
uploadFile()