Request multiple url for web scraping - node.js

I have this simple script:
request ({
method: 'GET',
url: 'https://www.link1.org/'
url: 'https://link2.net/'
}, function(err, response, body) {
//do stuff
}
It works fine with single url but how could i add another? I need it to get both urls.

Related

node.js how to send multipart form data in post request

I am attempting to get data in the form of an image sent from elsewhere using multipartform, however when trying to understand this via the great sanctuary(stack overflow) there are missing elements I don't quite understand.
const options = {
method: "POST",
url: "https://api.LINK.com/file",
port: 443,
headers: {
"Authorization": "Basic " + auth,
"Content-Type": "multipart/form-data"
},
formData : {
"image" : fs.createReadStream("./images/scr1.png")
}
};
request(options, function (err, res, body) {
if(err) console.log(err);
console.log(body);
});
2 questions:
what is the variable auth, what do I initialize it to/where/how do I declare it
what is the url "api.LINK.com", is this just the site url where this code is on
After your comments I think I may be doing this wrong. The goal is to send data(an image) from somewhere else(like another website) to this node app, then the nodeapp uses the image and sends something back.
So that I would be the one creating the API endpoint

Node JS request get original url

I'm sending GET requests like this in Node JS in a loop
request({
url : 'https://somelink.com',
method: 'GET'
},
function (error, response, body) {
console.log(response);
});
Since the response is async, is it possible to get the original request URL in the response?
Thanks!
You can get the original request href in the response.request object, like so:
const request = require("request");
request({
url : 'https://google.com',
method: 'GET'
},
function (error, response, body) {
if (!error) {
console.log("Original url:", response.request.uri.href);
console.log("Original uri object:", response.request.uri);
}
});
You can access more information in the request.uri object, for example:
console.log("Original uri:", response.request.uri);
This will give you some more useful information like port, path, host etc.

NodeJS request library how to get the full URL including URI and query string params

How does one extract/find the full URL from a HTTP request performed with the NodeJS request library. This would be useful for logging purposes.
Here's a code example to demonstrate:
request({
baseUrl: 'https://foo.bar',
url: '/foobar',
qs: {
page: 1,
pagesize: 25
}
}, (err, res, body) => {
// Somewhere here I'd expect to find the full url from one of the parameters above
// Expected output: https://foo.bar/foobar?page=1&pagesize=25
console.log(res);
});
I can't seem to find any properties of the res param in the callback that contains the URL.
To clarify: by full URL I mean the URL constructed by the request library which should include the following fields:
Base URL (or just URI/URL when no base URL was set)
URL (or URI)
Query string parameters
Actually you can easily do that with store your request when you create it.
const request = require('request');
const myReq = request({
baseUrl: 'https://foo.bar',
url: '/foobar',
qs: {
page: 1,
pagesize: 25
}
}, (err, res, body) => {
console.log(myReq.host); // BASE URL
console.log(myReq.href); // Request url with params
});

Send a File From Api Server to NodeJs to Browser

I have an API Server and NodeJs Server and when a file is requested NodeJs redirected the request to API Server
API Server Send the File as raw data to NodeJs
and Nodejs redirects the file to the browser
But when I checked the network data using wire shark the packet received at browser is not original as that from API Server (work in case of text files, but not in image, video, pdf, doc etc)
router.get('/GetCaseSupportDocument', function (req, res) {
var MyJsonData = {
DocId:parseInt(req.query.DocId) || 0
};
request({
url: 'http://somedomain/someurl', //URL to hit
method: 'POST',
json: MyJsonData
}, function (error, response, body) {
if (error) {
res.status(200).send('Failed');
} else {
res.status(200).send(body);
}
})
});
Can anyone tell why it changes between NodeJs to Browser?
Is there any better solution for this type of transmission?
Updated After finding solution . This works
router.get('/GetCaseSupportDocument', function (req, res) {
var MyJsonData = {
DocId:parseInt(req.query.DocId) || 0
};
request({
url: Url.CaseService + 'GetCaseSupportDocument', //URL to hit
method: 'POST',
json: MyJsonData
}).pipe(res);
})
There is a simple proxy using streams that you can try:
router.get('/GetCaseSupportDocument', function (req, res) {
var MyJsonData = {
DocId: parseInt(req.query.DocId) || 0
};
// updated the response
request({
url: 'http://somedomain/someurl', //URL to hit
method: 'POST',
json: MyJsonData
}).pipe(res);
});
More details with proxy-ing you can find on the request documentation https://github.com/request/request

node.js request multi line

Google analytics measurement protocol says to use multiple lines for their /batch endpoint:
https://developers.google.com/analytics/devguides/collection/protocol/v1/devguide#batch
POST /batch HTTP/1.1
Host: www.google-analytics.com
v=1&tid=UA-XXXXX-Y&cid=555&t=pageview&dp=%2Fhome
v=1&tid=UA-XXXXX-Y&cid=555&t=pageview&dp=%2Fabout
v=1&tid=UA-XXXXX-Y&cid=555&t=pageview&dp=%2Fcontact
How would I do something like that with node.js and request? Here's my current code for /collect
request.post(
'http://www.google-analytics.com/batch',
{ form: { v:1,tid:'UA-xxxxx-1',cid:event.queryStringParameters.cid,t:'event',ec:'xxx',ea:"xxx", el:"xxx", ev:"xxx", dr:'xxx'} },
function (error, response, body) {
done(null,'Check for GA event');
}
);
Combine each line in a single string, separated by "\n".
const request = require("request");
request({
url: "http://www.google-analytics.com/batch",
method: "post",
body: "v=1&t=pageview&tid=UA-XXXXXXXX-X&cid=555&dl=https%3A%2F%2Fmydomain.com%2Ftest&dt=Test\nv=1&t=pageview&tid=UA-XXXXXXXX-X&cid=554&dl=https%3A%2F%2Fmydomain.com%2Ftest2&dt=Test2"
}, function(error, response, body) {
if (error) { console.log(error); }
});
Your real-time report will show 2 active users on 2 different pages.

Resources