Reliable file download with got

Reliable file download with got - node.js

I am looking at a few, currently widely used request libraries and how I could use them to automate file downloads + make them reliable enough.
I stumbled over download (npm), but since its based on got (npm) I thought I would try got first directly.
Problem
One problem I could encounter while downloading a file, is that the source file (on the server) could be overwritten during download. When I try reproduce this behaviour with got, got just stops the download process without rising any errors.
What I have so far
The only solution I could come up with, was to use got.stream - piping the request into a FileWriter, and compare total with transferred after the request has ended.
const app = require('express')();
const fs = require('fs');
const stream = require('stream');
const { promisify } = require('util');
const got = require('got');
const pipeline = promisify(stream.pipeline);
app.use('/files', require('express').static('files'));
app.listen(8080);
(async () => {
try {
let progress = null;
// Setup Got Request + EventHandlers
const request = got.stream('http://localhost:8080/files/1gb.test')
.on('downloadProgress', (p) => { progress = p; })
.on('end', () => {
console.log("GOT END");
console.log(progress && progress.transferred === progress.total ? "COMPLETE" : "NOTCOMPLETE");
})
// this does not get fired when source file is overwritten
.on('error', (e) => {
console.log("GOT ERROR");
console.log(e.message);
});
// WriteStream + EventHandlers
const writer = fs.createWriteStream('./downloaded/1gb_downloaded.test')
.on('finish', () => {
console.log("WRITER FINISH");
})
.on('error', (error) => {
console.log("WRITER ERROR", error.message);
})
.on('end', () => {
console.log("WRITER END");
})
.on('close', () => {
console.log("WRITER CLOSE");
});
await pipeline(request, writer);
} catch (e) {
console.error(e.name, e.message);
}
})();
Where do the files come from
In the real world the files i am trying to download are coming from a server which I do not have access to, I don't own it. I don't have any information how this server is setup. However I added a simple local express server to the example code above to try things out.
const app = require('express')();
app.use('/files', require('express').static('files'));
app.listen(8080);
Question
Is this solution reliable enough to detect a "none-finished" download ( so for the case the source file gets overwritten during download ) ? Or are there any othere events I could listen to which I missed ?

The got Request stream emits a error event whenever something goes wrong.
const request = got('http://localhost:8080/files/1gb.test')
.on('downloadProgress', (p) => { progress = p; })
.on('end', (e) => {
console.log("GOT END");
})
.on('error', (err) => {
// Handle error here
});
Various properties in the error object are available here
progress.total will not be available unless the server explicity sets the Content-Length (Most servers do, but you might want to have a look out for that)

It seems there is no inbuilt way to safely check if a download has been completed 100% using got. I came to the conclusion that my best option for now would be to use the NodeJS Module http, which includes, on the ClientRequest Object, an aborted property. When the ReadStream emits an end event I can check whether aborted is true or false.
I tested this method in the case when the source file gets overwritten during download, it works !
const http = require('http');
const app = require('express')();
const fs = require('fs');
app.use('/files', require('express').static('files'));
app.listen(8080);
http.get('http://localhost:8080/files/1gb.test', function (response) {
// WriteStream + EventHandlers
const writer = fs.createWriteStream('./downloaded/1gb_downloaded.test')
.on('finish', () => {
console.log("WRITER FINISH");
})
.on('error', (error) => {
console.log("WRITER ERROR", error.message);
})
.on('end', () => {
console.log("WRITER END");
})
.on('close', () => {
console.log("WRITER CLOSE");
});
// ReadStream + EventHandlers
response
.on('error', (e) => {
console.log("READER ERROR", e.message)
})
.on('end', () => {
console.log("READER END")
console.log(response.aborted ? "DOWNLOAD NOT COMPLETE" : "DOWNLOAD COMPLETE")
})
.on('close', () => {
console.log("READER CLOSE")
})
response.pipe(writer);
});
On the upside, this gives me -1 on dependencies :) , since I don't need got.
On the downside, this just assures me, that a running download was not aborted due to the source file being overwritten. When using http module I need to include more error handling when for example the file was not found to begin with, which had been more convenient using a request library like axios or got.
UPDATE
Realizing that the ReadableStream from http has something like the aborted property made me wonder why none of the request wrapper libraries like got does offer something similar. So I tried axios again, with :
axios({
method: 'get',
url: 'http://localhost:8080/files/1gb.test',
responseType: 'stream'
}).then( function ( response ) {
});
Here the ReadableStream comes in response.data and it has the same aborted property ! 🎉 .

Related

Pub/Sub StreamingPull not working using Node Client

I am trying to use StreamingPull from Pub/Sub. There are messages published but I don't see any response. for following code
const {v1 } = require('#google-cloud/pubsub');
const request = {
subscription: 'projects/<projectId>/subscriptions/Temp'',
stream_ack_deadline_seconds: 600
};
console.log('Pulling Messages...');
const stream = await subClient.streamingPull({});
stream.on('data', response => {
console.log(response);
});
stream.on('error', err => {
console.log(err);
});
stream.on('end', () => {
console.log("end");
});
stream.write(request);
stream.end();
I see the code silently finishing without the response being logged. Is there any attribute I am missing in my request. As per doc of StreamingPullRequest nothing else is mandatory. The only usage example is here in the test files.

How can I force this NodeJS function to wait until fs.writeFile has completed?

I am implementing https.request per these instructions (https://nodejs.org/api/https.html#httpsrequesturl-options-callback) except instead of doing a stdout, I am writing to file. I want the end to wait until the file writing process is complete. How can I do this?
process.stdout.write(d);
is changed to
fs.writeFile(path, d, err => {
if (err) {
console.error(err);
} else {
console.log("data => " path)
}
})
This is the entire code
const https = require('node:https');
const options = {
hostname: 'encrypted.google.com',
port: 443,
path: '/',
method: 'GET'
};
const req = https.request(options, (res) => {
console.log('statusCode:', res.statusCode);
console.log('headers:', res.headers);
res.on('data', (d) => {
process.stdout.write(d);
});
});
req.on('error', (e) => {
console.error(e);
});
req.end();
UPDATE
MTN posted a solution that works, but I realized that my code is slightly more complex. I read the file in chunks and save at the end. MTN's solution finishes early. Here is the code. Can anyone help me fix it?
const request = https.request(url, (response, error) => {
if (error) {console.log(error)}
let data = '';
response.on('data', (chunk) => {
data = data + chunk.toString();
});
response.on('end', () => {
fs.writeFile(path, data, err => {
if (err) {
console.error(err);
} else {
console.log("data => " path)
}
})
})
})
request.on('error', (error) => {
console.error(error);
})
request.end()
},

The immediate answer would be that whatever should happen after the file was written would have to go into the callback, after your console.log. (There is nothing in your code that looks like it's supposed to run afterwards though.)
But:
Your code would be a lot simpler if you'd...
Use a library for sending HTTP requests instead of the raw https module. (For example: got, axios, node-fetch, ...) - Not only do these take care of things like reading the body for you, they also have a promise interface which allows you to do point 2.
Rewrite the code to uses async/await.
Here is an example with got:
import got from 'got'
import { writeFile } from 'fs/promises'
const response = await got('https://encrypted.google.com').text()
await writeFile('test.html', response)
// Whatever should happen after the file was written would simply go here!
Note: This has to be an ES6 module because I used top-level await and import, and got doesn't even support CommonJS anymore. So either your package.json would have to have "type": "module" or the file ending would have to be mjs.

You can use fs.writeFileSync() instead. Its sync so it waits for the writing to be finished

res.on(“data”, (d) => { fs.writeFile(/* Enter params here */ })
Inside the fs.writeFile, add whatever you want to do in the last callback function.

How to push to Node stream after error in 10+?

I picked up some old stream code recently (written when 8.x was LTS) and attempted to update it to 12.x. This led to an interesting break in the way I dealt with ENOENT file errors.
Here's a simplification:
const { createServer } = require('http')
const { createReadStream } = require('fs')
const PORT = 3000
const server = createServer((req, res) => {
res.writeHead(200, {
'Content-Type': 'application/json'
})
const stream = createReadStream(`not-here.json`, {encoding: 'utf8'})
stream.on('error', err => {
stream.push(JSON.stringify({data: [1,2,3,4,5]}))
stream.push(null)
})
stream.pipe(res)
})
server.listen(PORT)
server.on('listening', () => {
console.log(`Server running at http://localhost:${PORT}/`)
})
In Node 8, the above code works fine. I'm able to intercept the error, write something to the stream and let it close normally.
In Node 10+ (tested 10, 12, and 13) the stream is already destroyed when my error callback is called. I can't push new things on the stream and handle the error gracefully for the client side.
Was this an intentional change and can I still handle this error in a nice way for the clint side?

One possibility. Open the file yourself and only create the stream with that already successfully opened file. That will allow you to handle ENOENT (or any other errors upon opening the file) before you get into the messy stream error handling mechanics. The stream architecture seems most aligned with aborting upon error, not recovering with some alternate behavior.
const { createServer } = require('http');
const fs = require('fs');
const PORT = 3000;
const server = createServer((req, res) => {
res.writeHead(200, {'Content-Type': 'application/json'});
fs.open('not-here.json', {encoding: 'utf8'}, (err, fd) => {
if (err) {
// send alternative response here
res.end(JSON.stringify({data: [1,2,3,4,5]}));
} else {
const stream = fs.createReadStream(null, {fd, encoding: 'utf8'});
stream.pipe(res);
}
});
});
server.listen(PORT);
server.on('listening', () => {
console.log(`Server running at http://localhost:${PORT}/`)
});
You could also try experimenting with the autoDestroy or autoClose options on your stream to see if any of those flags will allow the stream to still be open for you to push data into it, even if the file created an error opening or reading. The doc on those flags is not very complete so some combination of programming experiements and studying the code would be required to see if they could be manipulated to still add data to the stream after your stream got an error.

The answer by jfriend00 pointed me in the right direction.
Here are two different ways I solved this. I wanted a function that returned a stream rather than handle the error in the req handler function. This is more like what I'm actually doing in real code.
Handling error from stream:
Just like above except I took care to manually destroy the stream. Does this correctly take care of the internal file descriptor? I think it does.
const server = createServer((req, res) => {
res.writeHead(200, {
'Content-Type': 'application/json'
})
getStream().pipe(res)
})
function getStream() {
const stream = createReadStream(`not-here.json`, {
autoClose: false,
encoding: 'utf8'
})
stream.on('error', err => {
// handling "no such file" errors
if (err.code === 'ENOENT') {
// push JSON data to stream
stream.push(JSON.stringify({data: [1,2,3,4,5]}))
// signal the end of stream
stream.push(null)
}
// destory/close the stream regardless of error
stream.destroy()
console.error(err)
})
return stream
}
Handling the error during file open:
Like jfriend00 suggests.
const { promisify } = require('util')
const { Readable } = require('stream')
const { open, createReadStream } = require('fs')
const openAsync = promisify(open)
const server = createServer(async (req, res) => {
res.writeHead(200, {
'Content-Type': 'application/json'
})
const stream = await getStream()
stream.pipe(res)
})
async function getStream() {
try {
const fd = await openAsync(`not-here.json`)
return createReadStream(null, {fd, encoding: 'utf8'})
} catch (error) {
console.log(error)
// setup new stream
const stream = new Readable()
// push JSON data to stream
stream.push(JSON.stringify({data: [1,2,3,4,5]}))
// signal the end of stream
stream.push(null)
return stream
}
}
I still like handling in the stream better but would love to hear reasons why you might do it one way or the other.

how to catch internet disconnection event while downloading a big file in node js

I need to download a big file from remote server ( for example: from amazon )
But I have very bad internet connection, so it can disconnect for some seconds and automatically reconnect, but in this case downloading freezes and I can't catch this event programatically.
I use code very similar to this:
const request = require('request')
request({
uri: `${pathToRemoteFile}`,
encoding: null
}).on('error', err => console.log(err)) // I suppose that here I will catch all possible errors like internet disconnect, but seems like no
.pipe(fs.createWriteStream(`${pathToStoreFileLocally}`))
.on('finish', () => { console.log('yeah, successfully downloaded') })
.on('error', err => console.log(err))
So for example if I need to download 500MB file, and I have already downloaded for example 100MB and suddenly I lost internet connection so the file will not be downloaded anymore and no errors will raise.
please help

I found the solution which is appropriate for myself.
here is the code which helps me to set timeout on such a long request.
const request = require('request')
request({
uri: `${pathToRemoteFile}`,
encoding: null
}).on('error', err => console.log(err))
.on('socket', socket => {
socket.setTimeout(30000);
socket.on('timeout', () => {
//handle disconnect
})
})
.pipe(fs.createWriteStream(`${pathToStoreFileLocally}`))
.on('finish', () => { console.log('yeah, successfully downloaded') })
.on('error', err => console.log(err))
So the part that I was looking for is:
.on('socket', socket => {
socket.setTimeout(30000);
socket.on('timeout', () => {
//handle disconnect
})
})
on the request stream.
So timeout event will be raised after 30 seconds of idle on socket

How do I call a third party Rest API from Firebase function for Actions on Google

I am trying to call a rest API from Firebase function which servers as a fulfillment for Actions on Google.
I tried the following approach:
const { dialogflow } = require('actions-on-google');
const functions = require('firebase-functions');
const http = require('https');
const host = 'wwws.example.com';
const app = dialogflow({debug: true});
app.intent('my_intent_1', (conv, {param1}) => {
// Call the rate API
callApi(param1).then((output) => {
console.log(output);
conv.close(`I found ${output.length} items!`);
}).catch(() => {
conv.close('Error occurred while trying to get vehicles. Please try again later.');
});
});
function callApi (param1) {
return new Promise((resolve, reject) => {
// Create the path for the HTTP request to get the vehicle
let path = '/api/' + encodeURIComponent(param1);
console.log('API Request: ' + host + path);
// Make the HTTP request to get the vehicle
http.get({host: host, path: path}, (res) => {
let body = ''; // var to store the response chunks
res.on('data', (d) => { body += d; }); // store each response chunk
res.on('end', () => {
// After all the data has been received parse the JSON for desired data
let response = JSON.parse(body);
let output = {};
//copy required response attributes to output here
console.log(response.length.toString());
resolve(output);
});
res.on('error', (error) => {
console.log(`Error calling the API: ${error}`)
reject();
});
}); //http.get
}); //promise
}
exports.myFunction = functions.https.onRequest(app);
This is almost working. API is called and I get the data back. The problem is that without async/await, the function does not wait for the "callApi" to complete, and I get an error from Actions on Google that there was no response. After the error, I can see the console.log outputs in the Firebase log, so everything is working, it is just out of sync.
I tried using async/await but got an error which I think is because Firebase uses old version of node.js which does not support async.
How can I get around this?

Your function callApi returns a promise, but you don't return a promise in your intent handler. You should make sure you add the return so that the handler knows to wait for the response.
app.intent('my_intent_1', (conv, {param1}) => {
// Call the rate API
return callApi(param1).then((output) => {
console.log(output);
conv.close(`I found ${output.length} items!`);
}).catch(() => {
conv.close('Error occurred while trying to get vehicles. Please try again later.');
});
});

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Reliable file download with got - node.js

Related

Pub/Sub StreamingPull not working using Node Client

How can I force this NodeJS function to wait until fs.writeFile has completed?

How to push to Node stream after error in 10+?

how to catch internet disconnection event while downloading a big file in node js

How do I call a third party Rest API from Firebase function for Actions on Google

Categories

Resources