Request a GET failed in Node - node.js

I am requesting a GET to a 3rd party api service from my node back-end.
I am getting a response of 403 forbidden:
request("http://www.giantbomb.com/api/search/?api_key=my_api_key&field_list=name,image,id&format=json&limit=1&query=street%20fighter%203&resources=game",(err,res,body) => {
console.log(body);
})
Querying the same request in my browser return the expected results.
Any idea why this can happen?
EDIT:
Logging the response body, I receive the following page (without the JS):
<h1>Wordpress RSS Reader, Anonymous Bot or Scraper Blocked</h1>
<p>
Sorry we do not allow WordPress plugins to scrape our site. They tend to be used maliciously to steal our content. We do not allow scraping of any kind.
You can load our RSS feeds using any other reader but you may not download our content.
<a href='/feeds'>Click here more information on our feeds</a>
</p>
<p>
Or you're running a bot that does not provide a unique user agent.
Please provide a UNIQUE user agent that describes you. Do not use a default user agent like "PHP", "Java", "Ruby", "wget", "curl" etc.
You MUST provide a UNIQUE user agent. ...and for God's sake don't impersonate another bot like Google Bot that will for sure
get you permanently banned.
</p>
<p>
Or.... maybe you're running an LG Podcast player written by a 10 year old. Either way, Please stop doing that.
</p>

this service requires User-Agent in headers, see this example
const rp = require('request-promise')
const options = {
method: 'GET',
uri: 'http://www.giantbomb.com/api/search/?api_key=my_api_key&field_list=name,image,id&format=json&limit=1&query=street%20fighter%203&resources=game',
headers: { 'User-Agent': 'test' },
json: true
}
rp(options)
.then(result => {
// process result
})
.catch(e => {
// handle error
})

Include User-Agent header in request like this
var options = {
url: 'http://www.giantbomb.com/api/search/? api_key=my_api_key&field_list=name,image,id&format=json&limit=1&query=street%20fighter%203&resources=game',
headers: {
'User-Agent': 'request'
}
};
request(options, (err,res,body) => {
console.log(body);
})

Related

Send a SAML request by POST using saml2-js in Node

I've gone through the documentation (however limited) to connect to an IDP. It's all configured and working properly except one thing. The IDP won't accept SAML Requests via GET.
Does saml2-js support HTTP POST for sending SAML requests to the IDP and if so, how is this coded? If not, is there an alternative NPM package that would work?
Currently i have:
sso.sp.create_login_request_url(sso.idp,{},(err, login_url, requestId) => {
console.log('err',err)
console.log('login_url',login_url)
console.log('requestId',requestId);
response.redirect(login_url);
});
An addition to jsub's answer:
The POST request to the IdP must be made by the browser, not by the server (by needle in jsub's case). Only the browser contains the IdP session cookie which authenticates the user with the IdP. res must contain an HTML page with an auto-submitting <form> with one <input> per param:
app.get("/login", function(req, res) {
sso.sp.create_login_request_url(sso.idp, {}, function(err, login_url, requestId) {
var url = new URL(login_url);
res.type("html");
res.write(`<!DOCTYPE html><html>
<body onload="document.querySelector('form').submit()">
<form action="${url.protocol}//${url.host}${url.pathname}" method="post">`);
for (const [param, value] of url.searchParams)
res.write(`<input name="${param}" value="${value}"/>`);
res.end(`</form></body></html>`);
});
});
I am trying to work around this as well, and my attempts have not worked but the approach is to make a separate http POST request with a client (using needle in my case) and then try to pipe the response from that into the response for the handler, e.g. something like this:
sso.sp.create_login_request_url(sso.idp, {}, (err, login_url, requestId) => {
// response.redirect(login_url);
const [url, param] = login_url.split("?")
const postOptions = {
headers: {'Content-Type': 'application/x-www-form-urlencoded'}
}
needle.post(Url, param, postOptions, (err, postResponse) => {
postResponse.pipe(res)
});
However I am not having much luck, trying to dig into why the pipe does not work
EDIT: the piping seems to work when I do it in this short form
needle.post(url, param, postOptions).pipe(res)

Setting a User Agent in scrape-it

I'm using scrape-it in my node.js scraping tool (for identifying proper keyword usage) but being identified as a bot by some websites and not getting any content. Is there a way to configure a known user agent header for the GET request to bypass the block?
You can set the headers, including User-agent, by passing an options object to scrape-it:
scrapeIt({
url: "http://example.com"
, headers: { "User-agent": "known-user-agent-of-choice" }
},
{
// some scrapeHTML options ...
})
.then(
// some code ...
);

How to query the gitlab API from the browser?

Just to give some context, I'd like to implement a blog with gitlab pages, so I want to use snippets to store articles and comments. The issue is that querying the API from the browser triggers a CORS error. Here is the infamous code:
const postJson = function(url, body) {
const client = new XMLHttpRequest();
client.open('POST', url);
client.setRequestHeader('Content-Type', 'application/json');
return new Promise((resolve, reject) => {
client.onreadystatechange = () => {
if (client.readyState === 4) {
client.status === 200
? resolve(client.responseText)
: reject({status: client.status, message: client.statusText, response: client.responseText})
}
}
client.send(body)
})
};
postJson('https://gitlab.com/api/graphql', `query {
project(fullPath: "Boiethios/test") {
snippets {
nodes {
title
blob {
rawPath
}
}
}
}
}`).then(console.log, console.error);
That makes perfect sense, because it would allow to fraudulently use the user's session.
There are several options:
Ideally, I would like to have an option to disable all form of authentication (particularly the session), so I could only access the information that is public for everybody.
I could use a personal access token, but I'm not comfortable with this, because the scopes are not fine-grained at all, and leaking such a PAT would allow everybody to see everything in my account. (doesn't work)
I could use OAuth2 to ask for every reader the authorization to access their gitlab account, but nobody wants to authenticate to read something.
I could create a dummy account, and then create a PAT. That's the best IMO, but that adds some unnecessary complexity. (doesn't work)
What is to correct way to query the gitlab API from the browser?
After some research, I have found this way to get the articles and the comments. The CORS policy was triggered because of the POST request with a JSON content. A mere GET request does not have this restriction.
I could recover the information in 2 times:
I created a dummy account, so that I could have a token to query the API for my public information only,
Then I used the API V4 instead of the GraphQL one:
// Gets the snippets information:
fetch('https://gitlab.com/api/v4/projects/7835068/snippets?private_token=AmPeG6zykNxh1etM-hN3')
.then(response => response.json())
.then(console.log);
// Gets the comments of a snippet:
fetch('https://gitlab.com/api/v4/projects/7835068/snippets/1742788/discussions?private_token=AmPeG6zykNxh1etM-hN3')
.then(response => response.json())
.then(console.log);

getting 403 error while sending file to githib via REST using nodejs

I want to send multiple files to Github repository via nodejs. Tried several approaches and end up using node-rest-client module. Tried below code send a sample file to repository called 'metadata'. But after post I am getting error message "Request forbidden by administrative rules. Please make sure your request has a User-Agent header"...please let me know if anyone faced this error before and get rid of it.
convertval = "somedata";
var dataObj = {
"message": "my commit message",
"committer": {
"name": "Scott Chacon",
"email": "ravindra.devagiri#gmail.com"
},
"content": "bXkgbmV3IGZpbGUgY29udGVudHM="
}
debugger;
var Client = require('node-rest-client').Client;
var client = new Client()
var args = {
data: dataObj,
headers: { 'Content-Type': 'application/json' },
};
client.post("https://api.github.com/repos/metadata/contents", args, function (data, response) {
console.log("file send: True : " + data);
});
According to the REST API:
All API requests MUST include a valid User-Agent header. Requests with
no User-Agent header will be rejected.
First of all, you need to define 'User-Agent' with value 'request' in your request header. Refer to this link.
Second, endpoint you are trying to call might require authentication. Generate a personal token from here, add that token in your request header, 'Authorization': 'token '.
If you're using Git extensively in your code, I suggest you to use this - Nodegit.
Edit:
I don't think sending multiple files in a single request is possible in 'Contents' endpoints group (link).
You can checkout Git Data API (as discussed here).

How to follow a link in a get request header in Node.js Express

I am trying to retrieve paginated results from a 3rd party API after making an API call from my Node.js/Express server. I then want to send the data through to the client. I can retrieve the first page of results using the Request package and the following code:
var options = {
url: `https://theURL.com`,
headers: {
'authorization': `bearer ${user_token}`,
'user-agent': '***my details***'
}
}
function callback(error, response, body) {
if (!error) {
res.json({
data: body
});
} else {
res.send("an error occured")
}
}
Request(options, callback);
I understand that the response will contain a Link header which I should follow to get the next page's data and to retrieve the link header for the page after that. I repeat this process until I reach a blank link header, at which point all the pages of data have been retrieved.
Firstly, I don't know how to approach this task, should I be following all the link headers and compiling all the results on my server before transferring them to the client? Or should I send each pages worth of data to the client as I get it and then deal with it there?
Secondly, how can an appropriate solution be achieved in code?

Resources