I'm trying to write a very simple solution to download and parse a calendar file from my Airbnb. Airbnb provides the calendar in ical format, with a unique url for each user such as:
https://www.airbnb.com/calendar/ical/1234.ics?s=abcd
Where those numbers (1234/5678) are unique hex keys to provide some security.
Whenever I hit my (private) url it replies instantly with an ical if I'm using a browser. I can be using any browser, even one from a different country that has never visited airbnb.com before. (I've got remote access to a server I tried it from when debugging.)
In nodejs it works only about 10% of the time. Most of the time I get a 403 error with the text of You don't have permission to access (redacted url) on this server.
Example code:
const request = require('request');
request.get(url, (error, response, body) => {
if (!error && response.statusCode === 200) {
return callback(null, body);
}
return callback('error');
});
This is using the request package here: https://github.com/request/request
I've set it up in an async.whilst loop and it takes about 50 tries to pull down a success, if I set a multi-second delay between each one. (Btw, https://github.com/caolan/async is awesome, so check that if you haven't.)
If it failed EVERY time, that'd be different, but the fact that it fails only occasionally really has me stumped. Furthermore, browsers seem to succeed EVERY time as well.
curl [url] also works, every time. So is there something I'm not specifying in the request that I need to?
Edit 1:
As requested, more of the headers from the reply. I also thought it was rate-limiting me at first. The problem is this is all from the same dev-box. I can curl, or request from a browser without issue multiple times. I can come back in 24 hours and use the nodejs code and it'll fail the first time, or first 50 times.
headers:
{ server: 'AkamaiGHost',
'mime-version': '1.0',
'content-type': 'text/html',
'content-length': '307',
expires: 'Wed, 24 May 2017 17:23:28 GMT',
date: 'Wed, 24 May 2017 17:23:28 GMT',
connection: 'close',
'set-cookie': [ 'mdr_browser=desktop; expires=Wed, 24-May-2017 19:23:28 GMT; path=/; domain=.airbnb.com' ] },
rawHeaders:
[ 'Server',
'AkamaiGHost',
'Mime-Version',
'1.0',
'Content-Type',
'text/html',
'Content-Length',
'307',
'Expires',
'Wed, 24 May 2017 17:23:28 GMT',
'Date',
'Wed, 24 May 2017 17:23:28 GMT',
'Connection',
'close',
'Set-Cookie',
'mdr_browser=desktop; expires=Wed, 24-May-2017 19:23:28 GMT; path=/; domain=.airbnb.com' ],
trailers: {},
rawTrailers: [],
upgrade: false,
url: '',
method: null,
statusCode: 403,
statusMessage: 'Forbidden',
Related
I am working on a remix project and have gotten into an issue where a loader requesting data from a foreign endpoint encoded with gzip do not seem to be decoded.
The remix loader is fairly simple, with some simplification it looks like this:
export const loader = async () => {
try {
const [
encodedData,
[... <other responses>]
] = await Promise.all([
gzippedEndpoint(),
[... <other requests>]
]).catch((e) => {
console.error(e);
});
return json([<loader data>]);
} catch (error) {
console.log("ERROR:", error);
return json({});
}
};
It's the gzippedEndpoint() that fails, where the error stack claims that the returned data is not valid json. I figured compression should not be a problem, but it seems like the fetch requests on the remix server side cannot correctly decode the gzipped data. I also see no option to enable decoding explicitly on remix. When I disable gzip on the foreign endpoint everything works fine for the remix server making the request and parsing the response.
Here is an example of the headers from a returned response (with some obfuscation):
200 GET https://dev.server.com/public/v1/endpoint {
'cache-control': 'no-store, must-revalidate, no-cache',
connection: 'close',
'content-encoding': 'gzip',
'content-type': 'application/json',
date: 'Mon, 12 Sep 2022 06:51:41 GMT',
expires: 'Mon, 12 Sep 2022 06:51:41 GMT',
pragma: 'no-cache',
'referrer-policy': 'no-referrer',
'strict-transport-security': 'max-age=31536000 ; includeSubDomains',
'transfer-encoding': 'chunked',
}
Is there some remix option or request header that I am missing here?
I'm not seeing a 303 response for bulk actions using latest v4 api for the activities endpoint.
From The API Documentation:
Make a request to the action with an X-BULK header with the value true. The response will always be a 202 Accepted.
Poll the URL provided in the Location header of the response. This URL is for the Bulk Actions endpoint.
Once the action is complete, polling the URL will return a 303 See Other response.
Download the response from URL in the Location header of the 303 See other response.
Here's what I'm doing:
I issue the initial request to the activities endpoint with the X-BULK header set to 'true'.
I receive a 202 Acceptedresponse with a Location header set to the polling url.
I begin polling the provided url from the Location header.
I receive a few 200 responses with the following data and headers:
Data:
{"data":{"id":26952539,"etag":"\\"434fa52f83b8e0bb72677f60b8297866\\""}}
Headers:
{
'content-type': 'application/json; charset=utf-8',
'transfer-encoding': 'chunked',
connection: 'close',
vary: 'Accept-Encoding',
status: '200 OK',
'last-modified': 'Sat, 02 Dec 2017 22:17:13 GMT',
'x-ratelimit-limit': '50',
'x-ratelimit-reset': '1512253080',
'x-ratelimit-remaining': '45',
'x-request-id': '4674a764-c417-448c-af09-c6dae1cabe15',
etag: '"434fa52f83b8e0bb72677f60b8297866"',
'x-frame-options': 'SAMEORIGIN',
'cache-control': 'no-cache, private, no-store',
'x-xss-protection': '1; mode=block',
'x-api-version': '4.0.5',
'x-content-type-options': 'nosniff',
date: 'Sat, 02 Dec 2017 22:17:13 GMT',
'set-cookie':
[ 'XSRF-TOKEN=oQqTKV8XKRm9oiMuY1OFZ6qleZyRyvtcs9v52%2FWyeiVXxvVszHLiXsQkWelnUHs3ErSsH64ctIpehxErulAWHg%3D%3D; path=/; secure',
'_session_id=7babc5f94bc48ecd5d18d4b40c17d6ca; path=/; secure; HttpOnly' ],
server: 'nginx',
'strict-transport-security': 'max-age=31536000; includeSubdomains'
}
However a 303 never comes. After a few of the above 200s I get another 200 with the payload:
Data:
{
"data": [
{
"data": [ {id: 1...}, {id: 2...}, {id: 3...}, ... ],
"status": 200
}
],
"status": "completed",
"requested": 46,
"performed": 46
}
Headers:
{
'x-amz-id-2': '1uiNt20Vd/X74JxKZKrt/hah7aof8xfhZlt7fhlDt8b3G2nA47Y8ZDaohb2drSF8ErniirRK2Es=',
'x-amz-request-id': '2B29557952779E29',
date: 'Sat, 02 Dec 2017 22:17:15 GMT',
'last-modified': 'Sat, 02 Dec 2017 22:17:14 GMT',
'x-amz-expiration': 'expiry-date="Wed, 06 Dec 2017 00:00:00 GMT", rule-id="Expiration rule (auto-generated)"',
etag: '"58e33e4eced83d145bf6dec9f72b97be-1"',
'x-amz-server-side-encryption': 'AES256',
'content-encoding': 'utf-8',
'x-amz-version-id': '2Ou7F__59Pz8WKOKZwFg_fOuhQjD5ro0',
'content-disposition': 'attachment; filename="activities 20171202.json";',
'accept-ranges': 'bytes',
'content-type': 'application/json',
'content-length': '9593',
server: 'AmazonS3',
connection: 'close'
}
It appears I can work around this sufficiently by testing for status === 'completed', or even checking for the presence of the content-disposition header.
Am I doing something wrong that prohibits a 303 response, or are there semantics for the activities endpoint that I'm ignoring?
Is it sufficient to test for status === 'completed' to work around this issue?
Note: I am passing the Authorization header for every request, which includes the access token.
Thanks!
This is a known bug with the Clio API-V4.
The best solution at this time is:
testing the payload of the 200 for status === 'completed'
We are working on resolving the lack of 303 response. In the meantime we will update the documentation.
Here's a workaround I've been using with very good results:
Use the ignore_redirect parameter when querying a bulk action status:
/api/v4/bulk_actions/?ignore_redirect=true
Use the ?fields=Response_Url parameter to get the actual URL of the response.
Download the result from the Response_Url property.
I've never had these methods fail.
I am using Unirest middleware from inside my node js script to make a GET request. However for some reason I am getting stale/old data from the resource being requested.
Even after updating the data, I am getting stale data from the resource.
Here's the code :
let unirest = require('unirest');
unirest.get('<resource_url>')
.headers({"cache-control": "no-cache"})
.end(function (response) {
console.log('body===>',JSON.stringify(response.body));
console.log('status=====>',response.status);
console.log('response headers=====>',response.headers);
});
response headers=====> { 'strict-transport-security': 'max-age=15768000; includeSubDomains ',
date: 'Fri, 15 Sep 2017 18:58:40 GMT',
cached_response: 'true',
'cache-control': 'no-transform, max-age=3600',
expires: 'Fri, 15 Sep 2017 12:10:53 GMT',
vary: 'Accept,Accept-Encoding',
'content-length': '1383',
connection: 'close',
'content-type': 'application/json',
'content-language': 'en-US' }
The same resource gives updated data instantly when tried via Python scipt or CURL.
Note : After some time say 3hrs, the node js script gives updated data.
I'm trying to implement a web scraper using request module and node.js. Some time at the scrapping I must post a form and then It always redirect to somewhere else where I must reach to continue scraping.
var jarEstados = requestEstados.jar();
options = {
url: urlPrincipal,
method: 'POST',
followRedirect: true,
maxRedirects: 10,
followAllRedirect: true,
jar: jarEstados,
form: requestObject
};
requestEstados(options,function (error, response, html) {
if (!error) {
console.log(html);
}
else {
console.error(error);
}
});
Response:
<head><title>Object moved</title></head>
<body><h1>Object Moved</h1>This object may be found here.</body>
headers:
{ 'cache-control': 'private',
'content-length': '152',
'content-type': 'text/html',
location: 'Resumo_Por_Estado_Municipio.asp',
server: 'Microsoft-IIS/8.5',
'x-powered-by': 'ASP.NET, ARR/2.5, ASP.NET',
'x-customname': 'ServidorANP',
'x-ua-compatible': 'IE=7',
date: 'Wed, 15 Jun 2016 16:08:42 GMT',
connection: 'close' },
statusCode: 302,
the resquest doest fallow the redirect, even if configured as the module site said> Resquest Module
What am I doing wrong? Can't figure it out!
I did the job by myself. I figured out that if I pass some valid User-Agent using the 302 response I could manualy fallow the redirection and keep the train on the rails for the rest of the scrapping process.
I use request to check whether a given URL is broken or not. However, I encountered a strange situation that one given URL keeps redirecting to itself, and the request fails to return any response. But when I open the url with browser, status code 200 is returned.
Anyone knows why request falls into the redirect loop and cannot get the response while the url works find in the browser? How to deal with this problem? Thanks!
request({
uri: 'http://testurl'
}, function (error, response, body) {
......
}
})
The following is the output after setting "request.debug = true"
REQUEST { uri: 'http://testurl',
callback: [Function],
tunnel: false }
REQUEST make request http://testurl
REQUEST onResponse http://testurl 302 { 'x-cnection': 'close',
date: 'Wed, 12 Nov 2014 23:59:22 GMT',
'transfer-encoding': 'chunked',
location: 'http://testurl',
......,
'x-powered-by': 'Servlet/2.5 JSP/2.1' }
REQUEST redirect http://testurl
REQUEST redirect to http://testurl
REQUEST {}
REQUEST make request http://testurl
REQUEST response end http://testurl 302 { 'x-cnection': 'close',
date: 'Wed, 12 Nov 2014 23:59:22 GMT',
'transfer-encoding': 'chunked',
location: 'http://testurl',
......,
'x-powered-by': 'Servlet/2.5 JSP/2.1' }
REQUEST onResponse http://testurl 302 { 'x-cnection': 'close',
......
UPDATE:
After reading request documentation, I realized it may have something to do with cookies. So I add option jar: true to the request, finally it works.
Try using these request options as explained in their documentation:
- followAllRedirects - follow non-GET HTTP 3xx responses as redirects (default: false)
- maxRedirects - the maximum number of redirects to follow (default: 10)
request({
uri: 'http://testurl',
followAllRedirects: true,
maxRedirects: 50 // some arbitrary value greater than 10
}, function (error, response, body) {
......
}
})
Is the example a public url? I would like to take a look if this is the case.