I am having trouble setting a custom user-agent for my phantom page. I have searched for possible solutions but I seem to be missing some fundamental part of how this should be working because when I try to set my settings, my phantom just hangs and doesn't complete the request or move into the page.open method. Here is my code:
phantom.create().then(function(ph) {
ph.createPage().then(function(page) {
page.set('settings.userAgent', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.256');
page.open(req.cookies.website).then(function(status) {
page.property('content').then(function(content) {
res.send(content);
page.close();
ph.exit();
});
});
});
});
In case anyone is wondering, I solved it... I just needed to look more closely at the phantom npm documentation. Here is the solution if anyone else has the same problem:
phantom.create().then(function(ph) {
ph.createPage().then(function(page) {
page.setting('userAgent', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.256');
page.open(req.cookies.website).then(function(status) {
page.property('content').then(function(content) {
res.send(content);
page.close();
ph.exit();
});
});
});
});
Related
I am trying to fetch a particular website, and I already mimic all the request headers that Chrome sends and I am still getting a pending promise that never resolves.
Here is my current code and headers:
const fetch = require('node-fetch');
(async () => {
console.log('Starting fetch');
const fetchResponse = await fetch('https://www.g2a.com/rocket-league-pc-steam-key-global-i10000003107015', {
method: 'GET',
headers: {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36',
'Accept-Language': 'en-US;q=0.7,en;q=0.3',
'Accept-Encoding': 'gzip, deflate, br'
}
})
console.log('I never see this console.log: ', fetchResponse);
if(fetchResponse.ok){
console.log('ok');
}else {
console.log('not ok');
}
console.log('Leaving...');
})();
This is the console logs I can read:
Starting fetch
This is a pending promise: Promise { <pending> }
not ok
Leaving...
Is there something I can do here? I notice on similar questions that for this specific website, I only need to use Accept-Language header, I already tried that, but still the promise never gets resolved.
Also read on another question that they have security against Node.js requests, maybe I need to use another language?
You'll have a better time using async functions and await instead of then here.
I'm assuming your Node.js doesn't support top-level await, hence the last .then.
const fetch = require("node-fetch");
const headers = {
"User-Agent":
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36",
"Accept-Language": "en-US;q=0.7,en;q=0.3",
"Accept-Encoding": "gzip, deflate, br",
};
async function doFetch(url) {
console.log("Starting fetch");
const fetchResponse = await fetch(url, {
method: "GET",
headers,
});
console.log(fetchResponse);
if (!fetchResponse.ok) {
throw new Error("Response not OK");
}
const data = await fetchResponse.json();
return data;
}
doFetch("https://www.g2a.com/rocket-league-pc-steam-key-global-i10000003107015").then((data) => {
console.log("All done", data);
});
I'm brand new to Node JS (v.10.9.0) and wanted to make a simple web scraping tool that gets statistics and ranks for players on this page. No matter what I can't make it work with this website, I tried multiple request methods including http.request and https.request and have gotten every method working with 'http://www.google.com'. However every attempt for this specific website either gives me a 301 error or a socket hang up error. The location the 301 error gives me is the same link but with a '/' on the end and requesting it results in a socket hang up. I know the site runs on port 443. Do some sites just block node js, why are browsers able to connect but not stuff like this?
Please don't link me to any other threads I've seen every single one and none of them have helped
var request = require('request');
var options = {
method: "GET",
uri: 'https://www.smashboards.com',
rejectUnauthorized: false,
port: '443'
};
request(options, function (error, response, body) {
console.log('error:', error); // Print the error if one occurred
console.log('statusCode:'); // Print the response status code if a response was received
console.log('body:', body); // Print the HTML for the Google homepage.
});
Error:
error: { Error: socket hang up
at createHangUpError (_http_client.js:322:15)
at TLSSocket.socketOnEnd (_http_client.js:425:23)
at TLSSocket.emit (events.js:187:15)
at endReadableNT (_stream_readable.js:1085:12)
at process._tickCallback (internal/process/next_tick.js:63:19) code: 'ECONNRESET' }
EDIT:
Adding this to my options object fixed my problem
headers: {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36'
}
OP Here
All I did was add:
headers: {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36'
}
To my options Object and it's working perfectly.
New code:
var request = require('request');
var options = {
method: 'GET',
uri: 'https://www.smashboards.com',
rejectUnauthorized: false,
headers: {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36'
}
};
request(options, function (error, response, body) {
console.log('error:', error); // Print the error if one occurred
console.log('statusCode:'); // Print the response status code if a response was received
console.log('body:', body); // Print the HTML for the Google homepage.
});
Thats 12+ hours I'm never getting back
I am struggling to successfully make a request using request-promise npm on a site that requires a cookie to view or for the request to be successful.
Henceforth, I have looked into cookieJars in order to store all those that are given in the repsonse after the request has been done.
const rp = require("request-promise")
var cookieJar = rp.jar()
function grabcfToken(){
let token = ""
let options = {
url : 'https://www.off---white.com/en/GB',
method: "GET",
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36',
resolveWithFullResponse : true
}
rp(options)
.then((response)=>{
console.log(response)
})
.catch((error)=>{
console.log(error)
})
}
Can someone tell me why the request isn't successfully going through? How do I apply the cookies that I initially get before being timed out.
const rp = require("request-promise")
var cookieJar = rp.jar()
function grabcfToken(){
let token = ""
let options = {
url : 'https://www.off---white.com/en/GB',
method: "GET",
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36',
resolveWithFullResponse : true,
jar: cookieJar
}
rp(options)
.then((response)=>{
console.log(response)
})
.catch((error)=>{
console.log(error)
})
}
If you're asking about including your jar which you filled with the cookies from the request to be sent to across you have to add jar: cookiejar as pasrt of your options object before sending it.
I have a created a local node server and when i'm printing User-Agent from req of GET request, like:
router.get('**', function (req, res, next) {
if (req.header('User-Agent')) {
console.log('user-agent = ', (req.header('User-Agent')))
res.end(req.header('User-Agent'));
} else {
res.send('Hello World!!!')
}
});
then it print different User-Agent for / path and /favicon.ico path for my One Plus device.
result:-
/ = Mozilla/5.0 (Linux; Android 8.0.0; ONEPLUS A3003 Build/OPR6.170623.013) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.109 Mobile Safari/537.36
/favicon.ico = Mozilla/5.0 (Linux; Android 8.0.0; Build/OPR6.170623.013) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.109 Mobile Safari/537.36
why these two different User-Agent is comming from same browser.
I'm testing this on Chrome browser.
var phantom = require('phantom');
console.dir(phantom);
phantom.create(function(browser){
browser.createPage(function(page){
page.customHeaders={
"HTTP_USER_AGENT": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.71 Safari/537.36",
};
console.dir(page.settings);
//undefined
page.settings={};
page.settings.userAgent = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.71 Safari/537.36';
page.settings.HTTP_USER_AGENT = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.71 Safari/537.36';
console.dir(page.settings);
page.open('http://example.com/req.php', function() {
setTimeout(function() {
var output = page.evaluate(function() {
return document;
});
console.dir(output);
//undefined
}, 1000);
});});});
when I use phantomjs I try and set the header for userAgent using three different ways but when I visit the page and save the PHP $_SERVER object to a txt pad I still see PhantomJS
HTTP_USER_AGENT: Mozilla/5.0 (Unknown; Linux x86_64) AppleWebKit/538.1 (KHTML, like Gecko) PhantomJS/2.0.1-development Safari/538.1
not only that but the output of the page is also undefined.
It seems that the docs have changed or I cant find the correct ones. I am looking at
http://phantomjs.org/api/webpage/property/settings.html
https://www.npmjs.com/package/phantom
How is this used correctly?
According to the Functional Details in the docs, you have to set the user agent through page.set():
page.set('settings.userAgent', 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.71 Safari/537.36');
It has to be done this way, because the bridge has to communicate with the PhantomJS process and isn't doing this in a non-asynchronous fashion. This could've probably been implemented with Object.defineProperty.
If you want to set multiple settings at once, you can do (ref):
page.set('settings', {
userAgent: "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.97 Safari/537.11",
javascriptEnabled: false,
loadImages: false
});
You can find a list of settings that you can set in page.settings.
Currently [ 27.01.2018 ], with these requirements:
phantom: ^4.0.12,
webpage: ^0.3.0
i use this method to set up this property:
page.setting(key, value);
I checked it out with php in $_SERVER array. It works correctly.
Сompletely code looks like this:
const phantom = require('phantom');
(async function() {
const instance = await phantom.create();
const page = await instance.createPage();
page.setting('userAgent',"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.97 Safari/537.11");
await page.on('onResourceRequested', function(requestData) {
//Dump request settings to view result of our changes:
console.info('Requesting', requestData);
});
const status = await page.open('https://stackoverflow.com');
const content = await page.property('content');
//console.log(content);
await instance.exit();
})();