How to download files with node-fetch - node.js

I need help implementing a file downloader in nodejs.
So i need to download over 25'000 files from a server. Im using node-fetch but i don't exactly know how to implement this. I tried using Promise.allSettled() but i also need a way to limit the amount of concurrent requests to the server otherwise i get rate-limited.
This is my code so far:
const fetch = require('node-fetch')
async function main () {
const urls = [
'https://www.example.com/foo.png',
'https://www.example.com/bar.gif',
'https://www.example.com/baz.jpg',
... many more (~25k)
]
// how to save each file on the machine with same file name and extension?
// how to limit the amount of concurrent requests to the server?
const files = await Promise.allSettled(
urls.map((url) => fetch(url))
)
}
main()
So my questions are:
How do i limit the amount of concurrent requests to the server? Can this be solved using a custom https agent with node-fetch and setting the maxSockets to something like 10?
How do i check if the file exists on the server and if it does then download it on my machine with the same file name and extension?
It would be very helpful if someone could show a small example code how i would implement such functionality.
Thanks in advance.

To control how many simultaneous requests are running at once, you can use any of these three options:
mapConcurrent() here and pMap() here: These let you iterate an array, sending requests to a host, but manages things so that you only ever have N requests in flight at the same time where you decide what the value of N is.
rateLimitMap() here: Let's you manage how many requests per second are sent.
Can this be solved using a custom https agent with node-fetch and setting the maxSockets to something like 10?
I'm not aware of any solution using a custom https agent.
How do i check if the file exists on the server and if it does then download it on my machine with the same file name and extension?
You can't directly access a remote http server's file system. So, all you can do is make an http request for a specific resource (a url) and examine the http response to see if it returned data or returned some sort of http error such as a 404.
As for filenames and extensions, that depends entirely upon whether you already know what to request and the server supports that being part of the URL or whether the server returns to you that information in an http header. If you're requesting specific filename and extension, then you can just create a file with that name and extension and save the http response data to that file on your local drive.
As for coding examples, the doc for node-fetch() shows examples of downloading data to a file using streams here: https://www.npmjs.com/package/node-fetch#streams.
import {createWriteStream} from 'fs';
import {pipeline} from 'stream';
import {promisify} from 'util'
import fetch from 'node-fetch';
const streamPipeline = promisify(pipeline);
const url='https://github.githubassets.com/images/modules/logos_page/Octocat.png';
const response = await fetch(url);
if (!response.ok) throw new Error(`unexpected response ${response.statusText}`);
await streamPipeline(response.body, createWriteStream('./octocat.png'));
Personally, I wouldn't use node-fetch as it's design center is to mimic the browser implementation of node which is not as friendly an API design as similar libraries built explicitly for nodejs. I use got(), and there are several other good libraries listed here. You can pick your favorite.
Here's a code example using the got() library:
import {promisify} from 'node:util';
import stream from 'node:stream';
import fs from 'node:fs';
import got from 'got';
const pipeline = promisify(stream.pipeline);
await pipeline(
got.stream('https://sindresorhus.com'),
fs.createWriteStream('index.html')
);

Related

node.js - download files and content from DB to client

I want to send to the client in the same request files from the dir and some content from the DB.
DB query -
const derivatives = await Derivative.findAll();
Here is the res -
res.status(200).send({
data: derivatives.map((derivative) => ({
id: derivative.id,
date: derivative.date,
})),
});
Here is the download -
const fileName = derivatives.map((name) => name.wex);
res.status(200).download(__dirname, `/assets/${fileName}`);
How can I add that to my response?
HTTP lets you specify a "content disposition" to indicate whether a response should be treated as a download, but doesn't support sending downloads arbitrarily, it has to be a response to a request. You can't have part of a response be a download and part be not, for a single request.
So if you need a file to be downloaded, and some JSON used to display some UI, you need to handle that in the client somehow. Either:
The client sends a request, and server returns JSON containing a URL for the download as well as the other data you wanted to send, and then the client requests a download of the URL through JavaScript
The client sends two requests, one for the download and one for the other data; this may complicate things on the server if you need to associate the two requests (want to do a database lookup only once for instance), but is simplest on the client.
The client sends a request, and the server returns a response containing the JSON data and the file data, packed in some way (the file data could be inside the JSON but that would be inefficient), and it's unpacked on the client (using JavaScript) and the client then constructs a Blob URL to "download" (in this case the data is already downloaded, so this just entails saving a file)
There are any number of ways you might pack the file and JSON data together, which is what /u/Quentin was alluding to. Sending both as one response may be better for performance, but you probably don't need to.

How do you include an SSL certificate in an API request in Node?

Here is the situation: I have a certificate, key, public key, and private key. I am building an app that integrates with another system. My system is a Next.js app running on Azure. It makes API requests to an external server running somewhere else. I know that my certificate/keys are good because the request works when I use the Insomnia API Client (200 response and data coming back).
However, trying to do this in Node, I keep getting 400 errors. I have tried doing this with core fetch and https functionalities and using the node-fetch package. I also looked at the request package (as there are examples of this on the internet), but that package was deprecated over a year ago, so I think it should not be used.
Here's an example of some of the code I've tried using. I'm storing the certs/keys in /tmp/certs on my local system & they are loading (I've been logging heavily). I've used a wide variety of different requestOptions, but nothing seems to be doing the trick.
import fs from 'fs'
import fetch from 'node-fetch'
const requestOptions = {
method: 'GET',
port: 443,
cert: fs.readFileSync("/tmp/certs/ct.crt", 'utf-8'),
key: fs.readFileSync("/tmp/certs/ct.key", 'utf-8')
};
const res = await fetch("https://url.com?unique_ID=1234&key=abcd", requestOptions);
Examining the res always turns up a 400. I have tried doing this a number of different ways at this point. This seems like it should be a trivial, common use case... What am I doing wrong and how should I be structuring this request?
Help me, Stack Overflow, you're my only hope!

NodeJS download file from WebServer using POST response

I am completely lost here regarding this.
I have a custom API where the endpoint is /api and my client NodeJS script calls towards this endpoint with some form data.
Let's say my client sends a POST with the parameters download_file and file_id,
how would the web server respond with the file data?
I'm sure someone has solved this before but I can't find any information about this.
In the client script where you're going to download the file there are multiple steps you need to do:
create a writestream to a file
make a request to the server
pipe the response stream to the file write stream
The code would look something like the following:
const request = require('request');
const fs = require('fs');
request('http://google.com/doodle.png').pipe(fs.createWriteStream('doodle.png'))

POST request hangs (timeout) when trying to parse request body, running Koa on Firebase Cloud Functions

I'm working on a small website, serving static files using Firebase Hosting (FH) and rewriting all requests to a single function on Firebase Cloud Functions (FCF), where I'm using Koa (with koa-router) to handle the requests. However, when I try to parse the body of a POST request using koa-bodyparser, the service just hangs until it eventually times out.
The same thing occurs when using other body parsers, such as koa-body, and it seems to persist no matter where I put the parser, unless I put it after the router, in which case the problem goes away, though I still can't access the data, since it never gets a chance to be parsed(?).
The following is a stripped-down version of the code that's causing the problem:
import * as functions from 'firebase-functions'
import * as Koa from 'koa'
import * as KoaRouter from 'koa-router'
import * as KoaBodyParser from 'koa-bodyparser'
const app = new Koa()
const router = new KoaRouter()
app.use(KoaBodyParser())
router.post('/', (context) => {
// do some stuff with the data
})
app.use(router.routes())
export const serve = functions.https.onRequest(app.callback())
I'm still pretty new to all of these tools and I might be missing something completely obvious, but I can't seem to find the solution anywhere. If I'm not mistaken, FCF automatically parses requests, but Koa is unable to access that data unless it does the parsing itself, so I'd assume that something is going wrong between FCF's automatic parsing and the parser used by Koa.
I haven't been able to produce any actual errors or useful error messages, other than a Gateway Timeout (504), so I don't have much to go on and won't be able to provide you with much more than I already have.
How do I go about getting a hold of the data?
Firebase already parses the body.
https://firebase.google.com/docs/functions/http-events#read_values_from_the_request
It appears that the provided Koa body parsing middlewares don't know what to do with an "already parsed" body (ie an object vs an unparsed string), so the middleware ends up getting confused and does some sort of an infinite loop.
A solution is to use ctx.req.body because it's already parsed. :)
Koa rocks!

Nodejs: downloading and transforming large data from mongodb

I use Node JS. (Sails JS framework, but that's less important).
Purpose
Download a CSV file that includes data transformed from MongoDB.
That's the scenario of the server on a response to a large data download request
Read data from MongoDB.
Transform the data format (to CSV).
Send the data to response. (Download).
So, the user is expected to download this large data to their browser.
I'm not sure what would be the best way to handle such request.
MongoDB has built-in support for streaming and Node.js clients can provide a readable stream for the cursor. All HTTP response objects also provide a way to stream the response body through a series of writes or you can pipe to a socket when using WebSockets. Most of the work will be offloaded to the MongoDB server and Node.js was developed for exactly these kinds of requirements.
If you're going to use HTTP on the client side, you can use fetch() and stream the response body into memory. Here is an excellent article that shows how to efficiently process a response for large a CSV file:
const res = await fetch('/big-data.csv')
const csvStream = response.body
.pipeThrough(new TextDecoder())
.pipeThrough(new CSVDecoder())
for (;;) {
const {row, done} = await csvStream.read()
if (done) break
drain(row)
}
Don't forget that both the server and client support encoded content, so make sure you compress the responses to further improve I/O.
It is always a good idea to post some code of what you have tried so far.
First, you would have to retrieve the data from MongoDB using something like mongoose or Waterline if you are using SailsJS
Secondly you can use a library like csv to convert the data to a csv file.
Once you have created the file you can return the file to the user using the sails response like so:
res.attachment('path/to/file.csv');

Resources