I'm trying to implement a web scraper using request module and node.js. Some time at the scrapping I must post a form and then It always redirect to somewhere else where I must reach to continue scraping.
var jarEstados = requestEstados.jar();
options = {
url: urlPrincipal,
method: 'POST',
followRedirect: true,
maxRedirects: 10,
followAllRedirect: true,
jar: jarEstados,
form: requestObject
};
requestEstados(options,function (error, response, html) {
if (!error) {
console.log(html);
}
else {
console.error(error);
}
});
Response:
<head><title>Object moved</title></head>
<body><h1>Object Moved</h1>This object may be found here.</body>
headers:
{ 'cache-control': 'private',
'content-length': '152',
'content-type': 'text/html',
location: 'Resumo_Por_Estado_Municipio.asp',
server: 'Microsoft-IIS/8.5',
'x-powered-by': 'ASP.NET, ARR/2.5, ASP.NET',
'x-customname': 'ServidorANP',
'x-ua-compatible': 'IE=7',
date: 'Wed, 15 Jun 2016 16:08:42 GMT',
connection: 'close' },
statusCode: 302,
the resquest doest fallow the redirect, even if configured as the module site said> Resquest Module
What am I doing wrong? Can't figure it out!
I did the job by myself. I figured out that if I pass some valid User-Agent using the 302 response I could manualy fallow the redirection and keep the train on the rails for the rest of the scrapping process.
Related
I am trying to upload some videos to Cloudflare Stream API. Here is official documentation and its example request using curl: https://developers.cloudflare.com/stream/uploading-videos/upload-video-file/
I am doing the request in Node.js
const uploadVideo = (video: Express.Multer.File): => {
const formData = new URLSearchParams();
formData.append('file', video);
let cloudflareResponse;
try {
cloudflareResponse = await axios.post(
`https://api.cloudflare.com/client/v4/accounts/${ACCOUNT_ID}/stream/copy`,
formData,
{
headers: {
Authorization: `Bearer ${API_KEY}`,
'Content-Type': 'multipart/form-data',
//'Tus-Resumable': '1.0.0',
//'Upload-Length': '600',
//'Upload-Metadata': 'maxDurationSeconds 600'
}
}
);
} catch (e) {
console.log('Error while trying to upload video to Cloudflare API ', e);
}
}
The commented Headers I took from this article, in which the request is done in Django and I tried to replicate it https://medium.com/#berman82312/how-to-setup-cloudflare-stream-direct-creator-uploads-correctly-802c37cbfd0e
The error I am getting is a 400 and here is some of the response
config: {
url: 'https://api.cloudflare.com/client/v4/accounts/...../stream/copy',
method: 'post',
data: 'file=%5Bobject+Object%5D',
headers: {
Accept: 'application/json, text/plain, */*',
'Content-Type': 'multipart/form-data',
Authorization: 'Bearer .....',
'Tus-Resumable': '1.0.0',
'Upload-Length': '600',
'Upload-Metadata': 'maxDurationSeconds 600',
'User-Agent': 'axios/0.21.1',
'Content-Length': 24
},
data: { result: null, success: false, errors: [Array], messages: null }
I am sure something is wrong in the request and hope someone could help me spot the mistake or suggest some modifications that might help. I have been stuck with this problem for hours and on Postman I am also getting a 400 response when trying to send with form-data.
Tried 2 Node.js packages:
https://www.npmjs.com/package/payumoney_nodejs
https://www.npmjs.com/package/payumoney-node
Debugging on localhost.
Debugged in the index.js file under the node_modules in each package.
params {
key: 'x1FanfbP',
salt: 'Vs2GrDyaMQ',
service_provider: 'payu_paisa',
hash: '65f75ced566e2d76dbc6153a277c25f591fc3c0a00a8f51a0699f609d5cbbc94dc7acd5d3be5fe0c0a855c4c6dc7faef49d8b6a1d77dd09398058f800bab068d',
firstname: '',
lastname: '',
email: 'xxxxxxxx#xxxxx.xxx',
phone: XXXXXXXXXX,
amount: '100',
productinfo: '',
txnid: '5b51d253-5d6e-4512-951a-cd6d05bf9e6b',
surl: 'http://localhost:3000/member/contribution/success',
furl: 'http://localhost:3000/member/contribution/failure'
}
request.post(this.payUmoneyURL, form: params, headers: this.headers },
function(error, response, body) {
if (!error) {
var result = response.headers.location;
callback(error, result);
}
});
request.post(payment_url[this.mode] + API.makePayment, { form: params, headers: this.headers }, function(error, response, body) {
if (!error) {
var result = response.headers.location;
callback(error, result);
}
});
Response of response.headers:
response.headers {
date: 'Fri, 28 Jun 2019 12:06:35 GMT',
server: 'Apache',
'x-powered-by': 'PHP/7.2.14',
p3p: 'CP="IDC DSP COR ADM DEVi TAIi PSA PSD IVAi IVDi CONi HIS OUR IND CNT"',
'set-cookie': [ 'PHPSESSID=naopga57qf58vl0hdfj5krq4n5; path=/; domain=.payu.in' ],
expires: 'Thu, 19 Nov 1981 08:52:00 GMT',
'cache-control': 'no-store, no-cache, must-revalidate',
pragma: 'no-cache',
vary: 'Accept-Encoding',
'content-length': '3129',
connection: 'close',
'content-type': 'text/html; charset=UTF-8'
}
The above does not have location key like response.headers.location
Can someone help to know why location is not returned?
Is it because of development on local machine? If yes, then how to test it on localhost?
Any help is appreciated.
It seems like a very silly mistake made but didn't know the area as where to look for the error in the response key before.
After going through the entire response object found that there is a body key sent at the very end of the response object which has HTML string. This HTML specifies if any error has occurred and what exactly it is.
In my case it were the missing mandatory fields:
Error Reason
One or more mandatory parameters are missing in the transaction request.
Corrective Action
Please ensure that you send all mandatory parameters in the transaction request to PayU.
Mandatory parameters which must be sent in the transaction are:
key, txnid, amount, productinfo, firstname, email, phone, surl, furl, hash.
The parameters which you have actually sent in the transaction are:
key, txnid, amount, surl, hash, email, phone.
Mandatory parameter missing from your transaction request are:
productinfo, firstname.
Please re-initiate the transaction with all the mandatory parameters.
After passing data for the missing parameters, received the url in the response.headers.location key as expected.
Integration is working as expected now.
I'm trying to create an app-managed bucket but am encountering the error: Invalid or nonexistent Content-Type, accepted values are {text/json, application/json}
I'm using node.js and request-promise package. The error is confusing to me because I am setting my content-type within the headers of the request to application/json.
Here's my function which makes the request:
let globalOptions = {
resolveWithFullResponse: true
};
function createAppManagedBucket(){
let forgeToken = "eyJhb..."
const options = Object.assign({}, globalOptions, {
method: 'POST',
uri: `https://developer.api.autodesk.com/oss/v2/buckets`,
headers: {
"Content-Type": "application/json",
'User-Agent': 'Request-Promise'
},
form: {
"bucketKey": `someTestBucket`,
"policyKey": `transient`
},
auth: {
'bearer': forgeToken
},
json: true
})
return rp(options)
.then((response) => {
return response.body
}).catch((err) => {
return err
})
}
It seems like even though I've set Content-Type: application/json within the header my request is being forced to have Content-Type: application/x-www-form-urlencoded. If I log the response of this, then I get the error and it looks like my request is actually correct since these are my headers:
rawHeaders:
[ 'Access-Control-Allow-Headers',
'Authorization, Accept-Encoding, Range, Content-Type',
'Access-Control-Allow-Methods',
'GET',
'Access-Control-Allow-Origin',
'*',
'Content-Type',
'application/json; charset=utf-8',
'Date',
'Tue, 09 Apr 2019 15:58:07 GMT',
'Strict-Transport-Security',
'max-age=31536000; includeSubDomains',
'Content-Length',
'99',
'Connection',
'Close' ],
But further down in the request I see
_header: 'POST /oss/v2/buckets HTTP/1.1\r\nContent-Type: application/x-www-form-urlencoded\r\nUser
e\r\nhost: developer.api.autodesk.com\r\nauthorization: Bearer eyJhb...\r\naccept: application/json\r\ncontent-length: 43\r\nConnection: close\r\n\r\n'
Where I can see that the content-type is actually changing to application/form_urlencodedSo it looks like my content-type is being forced to something other than what I set in the header. Has anyone encountered something like this before?
From the request options documentation,
form - when passed an object or a querystring, this sets body to a querystring representation of value, and adds Content-type: application/x-www-form-urlencoded header. When passed no options, a FormData instance is returned (and is piped to request). See "Forms" section above.
You can't mix the json and form request options like you are doing without the Content Type being ambiguous
I'm trying to write a very simple solution to download and parse a calendar file from my Airbnb. Airbnb provides the calendar in ical format, with a unique url for each user such as:
https://www.airbnb.com/calendar/ical/1234.ics?s=abcd
Where those numbers (1234/5678) are unique hex keys to provide some security.
Whenever I hit my (private) url it replies instantly with an ical if I'm using a browser. I can be using any browser, even one from a different country that has never visited airbnb.com before. (I've got remote access to a server I tried it from when debugging.)
In nodejs it works only about 10% of the time. Most of the time I get a 403 error with the text of You don't have permission to access (redacted url) on this server.
Example code:
const request = require('request');
request.get(url, (error, response, body) => {
if (!error && response.statusCode === 200) {
return callback(null, body);
}
return callback('error');
});
This is using the request package here: https://github.com/request/request
I've set it up in an async.whilst loop and it takes about 50 tries to pull down a success, if I set a multi-second delay between each one. (Btw, https://github.com/caolan/async is awesome, so check that if you haven't.)
If it failed EVERY time, that'd be different, but the fact that it fails only occasionally really has me stumped. Furthermore, browsers seem to succeed EVERY time as well.
curl [url] also works, every time. So is there something I'm not specifying in the request that I need to?
Edit 1:
As requested, more of the headers from the reply. I also thought it was rate-limiting me at first. The problem is this is all from the same dev-box. I can curl, or request from a browser without issue multiple times. I can come back in 24 hours and use the nodejs code and it'll fail the first time, or first 50 times.
headers:
{ server: 'AkamaiGHost',
'mime-version': '1.0',
'content-type': 'text/html',
'content-length': '307',
expires: 'Wed, 24 May 2017 17:23:28 GMT',
date: 'Wed, 24 May 2017 17:23:28 GMT',
connection: 'close',
'set-cookie': [ 'mdr_browser=desktop; expires=Wed, 24-May-2017 19:23:28 GMT; path=/; domain=.airbnb.com' ] },
rawHeaders:
[ 'Server',
'AkamaiGHost',
'Mime-Version',
'1.0',
'Content-Type',
'text/html',
'Content-Length',
'307',
'Expires',
'Wed, 24 May 2017 17:23:28 GMT',
'Date',
'Wed, 24 May 2017 17:23:28 GMT',
'Connection',
'close',
'Set-Cookie',
'mdr_browser=desktop; expires=Wed, 24-May-2017 19:23:28 GMT; path=/; domain=.airbnb.com' ],
trailers: {},
rawTrailers: [],
upgrade: false,
url: '',
method: null,
statusCode: 403,
statusMessage: 'Forbidden',
I use request to check whether a given URL is broken or not. However, I encountered a strange situation that one given URL keeps redirecting to itself, and the request fails to return any response. But when I open the url with browser, status code 200 is returned.
Anyone knows why request falls into the redirect loop and cannot get the response while the url works find in the browser? How to deal with this problem? Thanks!
request({
uri: 'http://testurl'
}, function (error, response, body) {
......
}
})
The following is the output after setting "request.debug = true"
REQUEST { uri: 'http://testurl',
callback: [Function],
tunnel: false }
REQUEST make request http://testurl
REQUEST onResponse http://testurl 302 { 'x-cnection': 'close',
date: 'Wed, 12 Nov 2014 23:59:22 GMT',
'transfer-encoding': 'chunked',
location: 'http://testurl',
......,
'x-powered-by': 'Servlet/2.5 JSP/2.1' }
REQUEST redirect http://testurl
REQUEST redirect to http://testurl
REQUEST {}
REQUEST make request http://testurl
REQUEST response end http://testurl 302 { 'x-cnection': 'close',
date: 'Wed, 12 Nov 2014 23:59:22 GMT',
'transfer-encoding': 'chunked',
location: 'http://testurl',
......,
'x-powered-by': 'Servlet/2.5 JSP/2.1' }
REQUEST onResponse http://testurl 302 { 'x-cnection': 'close',
......
UPDATE:
After reading request documentation, I realized it may have something to do with cookies. So I add option jar: true to the request, finally it works.
Try using these request options as explained in their documentation:
- followAllRedirects - follow non-GET HTTP 3xx responses as redirects (default: false)
- maxRedirects - the maximum number of redirects to follow (default: 10)
request({
uri: 'http://testurl',
followAllRedirects: true,
maxRedirects: 50 // some arbitrary value greater than 10
}, function (error, response, body) {
......
}
})
Is the example a public url? I would like to take a look if this is the case.