Streaming a large remote file

Streaming a large remote file - node.js

I need to serve local files from a different server using node. The api endpoint are being handled by express.
The goal is not to contain the entire file in memory instead stream the data so it shows the output to the enduser progressively.
By reading the stream api documentation i came up with this solution with a combination with expressjs response. Here is the example:
const open = (req, res) => {
const formattedUrl = new url.URL(
"https://dl.bdebooks.com/Old%20Bangla%20Books/Harano%20Graher%20Jantra%20Manob%20-%20Shaktimoy%20Biswas.pdf"
);
const src = fs.createReadStream(formattedUrl);
return src.pipe(res);
};
But when i hit this express endpoint http://localhost:3000/open it throws following error:
TypeError [ERR_INVALID_URL_SCHEME]: The URL must be of scheme file
I would like to display the file content inline! What I am doing wrong? Any suggestions will be greatly appreciated. :)

fs.createReadStream() operates on the file system. It does not accept an http or https URL. Instead, you need to use something like http.get() to make an http request and then return a readable stream that you can then pipe from.
const open = (req, res) => {
const formattedUrl = new url.URL("https://dl.bdebooks.com/Old%20Bangla%20Books/Harano%20Graher%20Jantra%20Manob%20-%20Shaktimoy%20Biswas.pdf");
http.get(formattedUrl, (stream) => {
stream.pipe(res);
}).on('error', (err) => {
// send some sort of error response here
});
};

Related

Download file from /tmp to client in Google Cloud Function

I am trying to download a file created on an http Google Cloud function saved in the /tmp directory. Everything I try throws "Error: could not handle the request".
The file my code generates is saved at /tmp/output.wav and I can use fs.readdirSync('/tmp') to see the file. But, if I try res.download('/tmp/output.wav', 'output.wav') it throws the "Error: could not handle the request".
If I do res.send( fs.readdirSync('/tmp')[0] ); I can see the file. So I know it's there and readable. Why can't I get it to download to the client?
I checked the logs and there's nothing additional. This code also executes locally on my machine.
I am wondering if this is due to a quota limitation? Would I get a better error if so?
Full code:
exports.master = async (req, res ) => {
const fs = require('fs');
// Some code to make the file at /tmp/output.wav
fs.readdirSync('/tmp').forEach(file => {
console.log(file);
files.push( file );
});
// Code will fire up to this point
res.download('/tmp/output.wav', 'output.wav');
});

Your code is working on my end:
exports.master = async (req, res) => {
const fs = require('fs')
// Some code to make the file at /tmp directory
fs.writeFileSync('/tmp/output.txt','testing output')
var files = [{}];
fs.readdirSync('/tmp','utf8').forEach(file => {
console.log(file);
files.push(file);
});
res.download('/tmp/output.txt','output.txt')
}
I believe there's a problem in your async function or creation of files or on your file/s.

Pipe data chunks as a Response to a clients terminal

so my question is on Node js piping. So my backend looks like this -- there is a simple route, the route calls function and passes to it a file path for an executable type file. This file is then run with the childProcess.spawn and there is a data output that I can console.log
const express = require("express");
const app = express();
etc...
const runExecutable = (executableFile) => {
const runFile = childProcess.spawn(executableFile);
runFile.stdout.on('data', function(data){
console.log("DATA", data);
})
runFile.on('exit', function(code, signal){
[some code here]
})
}
app.get('/example', (req, res) => {
var file = "./testFile.exe";
runExecutable(file);
})
The question I have is how can I pipe this output of data/a.k.a chunks in real time to the client, it's important for them to get the data as it comes out and not for me to write it to a file and send them the whole thing. One more thing to note, the client is accessing my route through a curl curl 123.45.678.901/example in their terminal and I want to pipe the data to their terminal.
On reading around, I know that for example the request module does a request.get(url).pipe(res) /[Express res] and so I'm wondering if this is similar to what I might need to be doing.
Thanks all!

Found the answer: Any stream can be piped - readable.pipe(destination[, options]) - childProcess.spawn(executableFile) is not a stream, but once the file starts being executed it does emit a "data" event which is another way of saying there is a stream being emitted from the running of the file. So if you are looking at these chunks of "data" coming out - like I am - like this:
runFile.stdout.on('data', function(data){
console.log("DATA", data);
})
then that's the stream that you use and that's the where you attach the pipe
Node documentation basically says - to the stream attach .pipe and then just send it to it's destination. Since I wanted to send these chunks of data to my client I also had to pass res around, so my code now looks like this:
const express = require("express");
const app = express();
etc...
const runExecutable = (executableFile, res) => {
const runFile = childProcess.spawn(executableFile);
runFile.stdout.on('data', function(data){
console.log("DATA", data);
}).pipe(res)
runFile.on('exit', function(code, signal){
[some code here]
})
}
app.get('/example', (req, res) => {
var file = "./testFile.exe";
runExecutable(file, res);
})
and it works! I hope this is helpful to others - Thanks for the help Lee!

Stream node requests to the cloud with file metadata

Im using koa in order to build a web app, and I want to allow users to upload files to it. The files need to be streamed to the cloud, but I would like to avoid saving the file locally.
The problem is that I need some file metadata before I pipe the upload stream to the writeable stream. I want to have the mime-type and optionally attach other data like the original file name etc.
I tried sending the binary data with the request's "content-type" header set to the file's type, but I would like the request to have the content type application/octet-stream so I can know in the back-end how to handle the request.
I read somewhere that the better option would be to use multipart/form-data but I'm not sure how to structure the request, and how to parse the metadata in order to notify the cloud before I pipe to its write stream.
Here is the code im currently using. Basically, it just pipes the request as is, and I use the request header to know the type of the file:
module.exports = async ctx => {
// Generate a random id that will be part of the filename.
const id = pushid();
// Get the content type from the header.
const contentType = ctx.header['content-type'];
// Get the extension for the file from the content type
const ext = contentType.split('/').pop();
// This is the configuration for the upload stream to the cloud.
const uploadConfig = {
// I must specify a content type, or know the file extension.
contentType
// there is some other stuff here but its not relevant.
};
// Create a upload stream for the cloud storage.
const uploadStream = bucket
.file(`assets/${id}/original.${ext}`)
.createWriteStream(uploadConfig);
// Here is what took me hours to get to work... dev life is hard
ctx.req.pipe(uploadStream);
// return a promise so Koa doesn't shut down the request before its finished uploading.
return new Promise((resolve, reject) =>
uploadStream.on('finish', resolve).on('error', reject)
);
};
Please assume I don't know much about the uploading protocols and managing streams.

Ok so after a lot of searching I found out that there is a parser that works with streams called busboy. It is pretty easy to use, but before jumping into the code I highly suggest everyone dealing with multipart/form-data requests to read this article.
Here is how I solved it:
const Busboy = require('busboy');
module.exports = async ctx => {
// Init busboy with the headers of the "raw" request.
const busboy = new Busboy({ headers: ctx.req.headers });
busboy.on('file', (fieldname, stream, filename, encoding, contentType) => {
const id = pushid();
const ext = path.extname(filename);
const uploadStream = bucket
.file(`assets/${id}/original${ext}`)
.createWriteStream({
contentType,
resumable: false,
metadata: {
cacheControl: 'public, max-age=3600'
}
});
stream.pipe(uploadStream);
});
// Pipe the request to busboy.
ctx.req.pipe(busboy);
// return a promise that resolves to whatever you want
ctx.body = await new Promise(resolve => {
busboy.on('finish', () => {
resolve('done');
});
});
};

Parse Remote CSV File using Nodejs / Papa Parse?

I am currently working on parsing a remote csv product feed from a Node app and would like to use Papa Parse to do that (as I have had success with it in the browser in the past).
Papa Parse Github: https://github.com/mholt/PapaParse
My initial attempts and web searching haven't turned up exactly how this would be done. The Papa readme says that Papa Parse is now compatible with Node and as such Baby Parse (which used to serve some of the Node parsing functionality) has been depreciated.
Here's a link to the Node section of the docs for anyone stumbling on this issue in the future: https://github.com/mholt/PapaParse#papa-parse-for-node
From that doc paragraph it looks like Papa Parse in Node can parse a readable stream instead of a File. My question is;
Is there any way to utilize Readable Streams functionality to use Papa to download / parse a remote CSV in Node some what similar to how Papa in the browser uses XMLHttpRequest to accomplish that same goal?
For Future Visibility
For those searching on the topic (and to avoid repeating a similar question) attempting to utilize the remote file parsing functionality described here: http://papaparse.com/docs#remote-files will result in the following error in your console:
"Unhandled rejection ReferenceError: XMLHttpRequest is not defined"
I have opened an issue on the official repository and will update this Question as I learn more about the problems that need to be solved.

After lots of tinkering I finally got a working example of this using asynchronous streams and with no additional libraries (except fs/request). It works for remote and local files.
I needed to create a data stream, as well as a PapaParse stream (using papa.NODE_STREAM_INPUT as the first argument to papa.parse()), then pipe the data into the PapaParse stream. Event listeners need to be implemented for the data and finish events on the PapaParse stream. You can then use the parsed data inside your handler for the finish event.
See the example below:
const papa = require("papaparse");
const request = require("request");
const options = {/* options */};
const dataStream = request.get("https://example.com/myfile.csv");
const parseStream = papa.parse(papa.NODE_STREAM_INPUT, options);
dataStream.pipe(parseStream);
let data = [];
parseStream.on("data", chunk => {
data.push(chunk);
});
parseStream.on("finish", () => {
console.log(data);
console.log(data.length);
});
The data event for the parseStream happens to run once for each row in the CSV (though I'm not sure this behaviour is guaranteed). Hope this helps someone!
To use a local file instead of a remote file, you can do the same thing except the dataStream would be created using fs:
const dataStream = fs.createReadStream("./myfile.csv");
(You may want to use path.join and __dirname to specify a path relative to where the file is located rather than relative to where it was run)

OK, so I think I have an answer to this. But I guess only time will tell. Note that my file is .txt with tab delimiters.
var fs = require('fs');
var Papa = require('papaparse');
var file = './rawData/myfile.txt';
// When the file is a local file when need to convert to a file Obj.
// This step may not be necissary when uploading via UI
var content = fs.readFileSync(file, "utf8");
var rows;
Papa.parse(content, {
header: false,
delimiter: "\t",
complete: function(results) {
//console.log("Finished:", results.data);
rows = results.data;
}
});

Actually you could use a lightweight stream transformation library called scramjet - parsing CSV straight from http stream is one of my main examples. It also uses PapaParse to parse CSVs.
All you wrote above, with any transforms in between, can be done in just couple lines:
const {StringStream} = require("scramjet");
const request = require("request");
request.get("https://srv.example.com/main.csv") // fetch csv
.pipe(new StringStream()) // pass to stream
.CSVParse() // parse into objects
.consume(object => console.log("Row:", object)) // do whatever you like with the objects
.then(() => console.log("all done"))
In your own example you're saving the file to disk, which is not necessary even with PapaParse.

I am adding this answer (and will update it as I progress) in case anyone else is still looking into this.
It seems like previous users have ended up downloading the file first and then processing it. This SHOULD NOT be necessary since Papa Parse should be able to process a read stream and it should be possible to pipe 'http' GET to that stream.
Here is one instance of someone discussing what I am trying to do and falling back to downloading the file and then parsing it: https://forums.meteor.com/t/processing-large-csvs-in-meteor-js-with-papaparse/32705/4
Note: in the above Baby Parse is discussed, now that Papa Parse works with Node Baby Parse has been depreciated.
Download File Workaround
While downloading and then Parsing with Papa Parse is not an answer to my question, it is the only workaround I have as of now and someone else may want to use this methodology.
My code to download and then parse currently looks something like this:
// Papa Parse for parsing CSV Files
var Papa = require('papaparse');
// HTTP and FS to enable Papa parse to download remote CSVs via node streams.
var http = require('http');
var fs = require('fs');
var destinationFile = "yourdestination.csv";
var download = function(url, dest, cb) {
var file = fs.createWriteStream(dest);
var request = http.get(url, function(response) {
response.pipe(file);
file.on('finish', function() {
file.close(cb); // close() is async, call cb after close completes.
});
}).on('error', function(err) { // Handle errors
fs.unlink(dest); // Delete the file async. (But we don't check the result)
if (cb) cb(err.message);
});
};
download(feedURL, destinationFile, parseMe);
var parseMe = Papa.parse(destinationFile, {
header: true,
dynamicTyping: true,
step: function(row) {
console.log("Row:", row.data);
},
complete: function() {
console.log("All done!");
}
});

Http(s) actually has a readable stream as parameter in the callback, so here is a simple solution
try {
var streamHttp = await new Promise((resolve, reject) =>
https.get("https://example.com/yourcsv.csv", (res) => {
resolve(res);
})
);
} catch (e) {
console.log(e);
}
Papa.parse(streamHttp, config);

const Papa = require("papaparse");
const { StringStream } = require("scramjet");
const request = require("request");
const req = request
.get("https://example.com/yourcsv.csv")
.pipe(new StringStream());
Papa.parse(req, {
header: true,
complete: (result) => {
console.log(result);
},
});

David Liao's solution worked for me, I did tweak it a little bit since I am using local file. He did not include the example how to solve the file access in node if you did get Error: ENOENT: no such file or directory message in your console.
To test your actual working directory and to understand where you must point your path to console log the following, this gave me better understanding of the file location: console.log(process.cwd()).
const fs = require('fs');
const papa = require('papaparse');
const request = require('request');
const path = require('path');
const options = {
/* options */
};
const fileName = path.resolve(__dirname, 'ADD YOUR ABSOLUTE FILE LOCATION HERE');
const dataStream = fs.createReadStream(fileName);
const parseStream = papa.parse(papa.NODE_STREAM_INPUT, options);
dataStream.pipe(parseStream);
let data = [];
parseStream.on('data', chunk => {
data.push(chunk);
});
parseStream.on('finish', () => {
console.log(data);
console.log(data.length);
});

Adding base64encoded file using form-data (nodejs)

I looking to use the form-data nodejs module to build up a multipart/form-data request. My HTTP endpoint I am posting to requires a file.
My "file" I want to attach is actually a base64encoded version of the file. I have the filename separately that I can use.
Looking at the form-data module - it looks like from the examples it relies on the file either being returned from fs or a request; is it possible to use options (field, value, options ) to make it accept either the base64encoded version of the file or do I need to decode it first? Ultimately the multipart is encoded anyway, or at least it can be.

var upload = multer({ storage: multer.memoryStorage({}) })
app.post('/', upload.single('test'), function (req, res, next) {
var raw = new Buffer(req.file.buffer.toString(), 'base64')
fs.writeFile('/tmp/upload.png', raw, function (err) {
if (err) return next(err)
res.end('Success!')
})
})
does this help ?
file name u can get from req.params or req.query.param , anywhere

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Streaming a large remote file - node.js

Related

Download file from /tmp to client in Google Cloud Function

Pipe data chunks as a Response to a clients terminal

Stream node requests to the cloud with file metadata

Parse Remote CSV File using Nodejs / Papa Parse?

Adding base64encoded file using form-data (nodejs)

Categories

Resources