I'm creating an https request, to get some hidden variables on a sign in page. I'm using the node.js package request for this. After calling the request, I'm using a callback to go back to my parse function.
class h {
constructor(username, password){
this.username = username;
this.password = password;
this.secret12 = '';
}
init() {
//Loading H without cookie
request({
uri: "http://example.com",
method: "GET",
jar: jar,
followRedirect: true,
maxRedirects: 10,
timeout: 10000,
//Need to fake useragent, otherwise server will close connection without response.
headers: {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36'}
},
this.getHiddenInputs());
}
getHiddenInputs(error, response, body) {
if (!error && response.statusCode === 200) {
//Parsing body of request, to get hidden inputs required to mock legit authentication.
const dom = new JSDOM(body);
this.secret12 = (dom.window.document.querySelector('input[value][type="hidden"][name="secret12"]').value);
}
else {
console.log(error);
console.log(response.statusCode)
}
};
}
const helper = new h("Username", "Password");
helper.init();
console.log(helper);
So after calling request inside init(). I'm using the callback function to run the code that finds the Hidden Input after it has completed the request. I'm following the example from here.
Am I missing something?
You are executing this.getHiddenInputs() instead of passing it to request as a callback, so there is no actual callback given to the request call.
You could pass it like this this.getHiddenInputs.bind(this) or I'd prefer something like this (error, response, body) => this.getHiddenInputs(error, response, body)
Related
Hi I'm running an express server that has this .post routed on / and using Formidable and express.json() as middleware.
Express Server
const formidable = require('express-formidable');
app.use(express.json());
app.use(formidable());
app.post('/test', function(req, res){
console.log(req.fields);
})
Using AJAX (No Issues)
When I send a POST request using AJAX like so:
$.ajax({
url:'http://localhost:3000/test',
type: "POST",
crossDomain: true,
dataType: "json",
data: {
"file" : "background.js"
},
success: async function (response) {
}
})
The server outputs:
{ file: 'background.js' }
The Problem
However, when I send the same POST request using AXIOS
var fUrl = 'http://localhost:3000/test';
var fHeader = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.88 Safari/537.36',
'content-type': 'application/x-www-form-urlencoded; charset=UTF-8'
};
var req = await axios({
method: "POST",
url: fUrl,
withCredentials: true,
data: {"file" : 'background.js'},
headers: fHeader
});
The server ouputs in the wrong format:
{ '{"file":"background.js"}': '' }
I suspect that the issue may be because of the content-type header, however when i change it to application/json, the request doesn't complete/timeout and awaits for an apparently infinite amount of time.
app.use(express.json());
app.use(formidable());
never use both at the same time.
Also that is not the way to send a file, but that would be another Q&A
I'm trying to scrape the html using library request on node.js. The response code is 200 and the data I get is unreadable. Here my code:
var request = require("request");
const options = {
uri: 'https://www.wikipedia.org',
encoding: 'utf-8',
headers: {
"Accept": "text/html,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3",
"charset": "utf-8",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-US,en;q=0.9",
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/78.0.3904.108 Chrome/78.0.3904.108 Safari/537.36"
}
};
request(options, function(error, response, body) {
console.log(body);
});
As you can see, I sent the request for html and utf-8 but got a large string like f��j���+���x��,�G�Y�l
My node version is v8.10.0 and the request version is 2.88.0.
Is something wrong with the code or I'am missing something??
Any hint to overtake this problem would be appreciate.
Updated Answer:
In response to your latest post:
The reason it is not working for Amazon is because the response is gzipped.. In order to decompress the gzip response, you simply need to add gzip: true to the options object you are using. This will work for both Amazon and Wikipedia:
const request = require('request');
const options = {
uri: "https://www.amazon.com",
gzip: true
}
request(options, function(error, response, body) {
if (error) throw error;
console.log(body);
});
Lastly, if you are wanting to scrape webpages like this, it is probably best to use a web scraping framework, like Puppeteer, since it is built for web scraping.
See here for Puppeteer GitHub.
Original Answer:
Since you are just grabbing the HTML from the main page, you do not have to specify charset, encoding, or Accept-Encoding..
const request = require('request');
const options = {
uri: 'https://www.wikipedia.org',
//encoding: 'utf-8',
headers: {
"Accept": "text/html,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3",
//"charset": "utf-8",
//"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-US,en;q=0.9",
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/78.0.3904.108 Chrome/78.0.3904.108 Safari/537.36"
}
};
request(options, function (error, response, body) {
if (error) throw error
console.log(body);
});
To take it a bit further... in this scenario, you don't need to specify headers at all...
const request = require('request');
request('https://www.wikipedia.org', function (error, response, body) {
if (error) throw error
console.log(body);
});
Thanks you the reply, when I used that to the Wikipedia page works properly, but when I use it to scrape another website like the amazon, got the same bad result
const request = require('request');
request('https://www.amazon.com', function (error, response, body) {
if (error) throw error
console.log(body);
});
Problem: Need to download a private repository that is within an organization hosted on GitHub Enterprise.
I created a personal access token for my account with scope repo and stored it as an environment variable, GITHUB_ACCESS_TOKEN.
I'm using NodeJS with the request library to make the GET request. However, with the following code, I get a 401 response when I run it.
(Note: I replaced <repo-name> with the actual name of the repository).
Can someone explain why this doesn't work and point me in the right direction?
My Function :
function downloadRepository(owner, repository, branch, accessToken) {
let options = {
method: "GET",
url: `https://api.github.com/orgs/${owner.toLowerCase()}/repos/${repository.toLowerCase()}/tarball/${branch}?access_token=${accessToken}`,
headers: {
'Accept': 'application/vnd.github.v3.raw',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36'
}
};
request(options, (error, response, body) => {
if(error || response.statusCode != 200) {
console.log("Could not download repository: %s", response.statusCode);
return;
}
return body;
});
}
My Main :
const request = require('request');
let wiki = downloadRepository("deep-learning-platform", "<repo-name>", "wiki",
process.env.GITHUB_ACCESS_TOKEN);
I'm trying to do a get request for image search, and I'm not getting the same result that I am in my browser. Is there a way to get the same result using node.js?
Here's the code I'm using:
var keyword = "Photographie"
keyword = keyword.replace(/[^a-zA-Z0-9éàèùâêîôûçëïü]/g, "+")
var httpOptions = { hostname: 'yandex.com',
path: '/images/search?text=' + keyword, //path does not accept spaces or dashes
headers: { 'Content-Type': 'application/x-www-form-urlencoded', 'user-agent': 'Mozilla/5.0'}}
console.log(httpOptions.hostname + httpOptions.path +postTitle)
https.get(httpOptions, (httpResponse) => {
console.log(`STATUS: ${httpResponse.statusCode}`);
httpResponse.setEncoding('utf8');
httpResponse.on('data', (htmlBody) => {
console.log(`BODY: ${htmlBody}`);
});
});
By switching to the request-promise library and using the proper capitalization of the User-Agent header name and an actual user agent string from the Chrome browser, this code works for me:
const rp = require('request-promise');
let keyword = "Photographie"
let options = { url: 'http://yandex.com/images/search?text=' + keyword,
headers: {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36'}
};
rp(options).then(response => {
console.log(response);
}).catch(err => {
console.log(err);
});
When I try to run your actual code, I get a 302 redirect and a cookie set. I'm guessing that they are expecting you to follow the redirect and retain the cookie. But, you can apparently just switch to the above code and it appears to work for me. I don't know exactly what makes my code work, but it could be that is has a more recognizable user agent.
I am trying to verify if a link is a valid image with magic number. Most of the images link work fine. But here are set of images on trump's site that does not produce correct magic numbers, though they appear to work fine on browser. Magic number they produce is 3c21444f.
Below is my code, Any help would be appreciated:
var request = require('request');
var magic = {
jpg: 'ffd8ffe0',
jpg1: 'ffd8ffe1',
png: '89504e47',
gif: '47494638'
};
var options = {
method: 'GET',
url: 'https://assets.donaldjtrump.com/gallery/4749/screen_shot_2016-10-30_at_1.39.54_pm.png',
encoding: null // keeps the body as buffer
};
request(options, function (error, response, body) {
if(!error) {
var magicNumberInBody = body.toString('hex', 0, 4);
if (magicNumberInBody == magic.jpg ||
magicNumberInBody == magic.jpg1 ||
magicNumberInBody == magic.png ||
magicNumberInBody == magic.gif) {
console.log('Valid image');
} else {
console.log('Invalid Image', magicNumberInBody);
}
}
});
So apparently it seemed to be issue with cloudflare blocking my requests to image. So I fixed it using UserAgent Headers to request for those images.
var options = {
method: 'GET',
url: 'https://assets.donaldjtrump.com/gallery/4749/screen_shot_2016-10-30_at_1.39.54_pm.png',
headers: {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36'
},
encoding: null // keeps the body as buffer
};