How to add custom headers in Playwright - node.js

headers["user-agent"] = fakeUa();
console.log(fakeUa())
let firstReq = true;
page.route('**/*', route => {
const request = route.request()
//console.log(request.url(), JSON.stringify(request.headers()));
if("x-j3popqvx-a" in request.headers()){
headers = request.headers();
//console.log(headers);
console.log("exiting");
return;
}
else {
console.log("in");
return route.continue({headers: headers});
}
});
let pageRes = await page.goto(url, {waitUntil: 'load', timeout: 0});
I want to add fake user agent when sending request to url. But it doesn't add the fake useragent rather goes with the default one.

While in puppeteer it was possible with the page.setUserAgent() method to apply a custom UA and page.setExtraHTTPHeaders() to set any custom headers, in playwright you can set custom user agent (userAgent) and headers (extraHTTPHeaders) as options of browser.newPage() or browser.newContext() like:
const page = await browser.newPage({ userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36' })
const page = await browser.newPage({
extraHTTPHeaders: {
'Cache-Control': 'no-cache'
}
})
Edit: In case you are using it with newContext() usage looks like this (make sure to set userAgent in the settings of newContext and not in newPage!):
const context = await browser.newContext({ userAgent: 'hello' })
const page = await context.newPage()
// to check the UA:
console.log(await page.evaluate(() => navigator.userAgent))

If you're using #playwright/test, you can set a user agent as follows:
import {expect, test} from "#playwright/test"; // ^1.30.0
const userAgent =
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36";
test.describe("with user agent", () => {
test.use({userAgent});
test("does stuff", async ({page}) => {
await page.goto("https://example.com/");
await expect(page.locator("h1")).toHaveText("Example Domain");
});
});

Related

Web scraping using fetch - promise doesn't resolve

I am trying to fetch a particular website, and I already mimic all the request headers that Chrome sends and I am still getting a pending promise that never resolves.
Here is my current code and headers:
const fetch = require('node-fetch');
(async () => {
console.log('Starting fetch');
const fetchResponse = await fetch('https://www.g2a.com/rocket-league-pc-steam-key-global-i10000003107015', {
method: 'GET',
headers: {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36',
'Accept-Language': 'en-US;q=0.7,en;q=0.3',
'Accept-Encoding': 'gzip, deflate, br'
}
})
console.log('I never see this console.log: ', fetchResponse);
if(fetchResponse.ok){
console.log('ok');
}else {
console.log('not ok');
}
console.log('Leaving...');
})();
This is the console logs I can read:
Starting fetch
This is a pending promise: Promise { <pending> }
not ok
Leaving...
Is there something I can do here? I notice on similar questions that for this specific website, I only need to use Accept-Language header, I already tried that, but still the promise never gets resolved.
Also read on another question that they have security against Node.js requests, maybe I need to use another language?
You'll have a better time using async functions and await instead of then here.
I'm assuming your Node.js doesn't support top-level await, hence the last .then.
const fetch = require("node-fetch");
const headers = {
"User-Agent":
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36",
"Accept-Language": "en-US;q=0.7,en;q=0.3",
"Accept-Encoding": "gzip, deflate, br",
};
async function doFetch(url) {
console.log("Starting fetch");
const fetchResponse = await fetch(url, {
method: "GET",
headers,
});
console.log(fetchResponse);
if (!fetchResponse.ok) {
throw new Error("Response not OK");
}
const data = await fetchResponse.json();
return data;
}
doFetch("https://www.g2a.com/rocket-league-pc-steam-key-global-i10000003107015").then((data) => {
console.log("All done", data);
});

Puppeteer scraping attempt always ends in undefined value

Simple code, should work, but it doesn't.
const puppeteer = require ('puppeteer');
async function scrapeProduct(url) {
const browser = await puppeteer.launch({ headless:false });
const page = await browser.newPage();
await page.setExtraHTTPHeaders({
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36'
});
await page.goto(url)
const [el] = await page.$x('/html/body/main/div[1]/div/div/div[2]/h1');
const txt = await el.getProperty('txt')
const srcText = await txt.jsonValue()
console.log(srcText)
}
scrapeProduct('https://getbootstrap.com/')
//Same result on other urls as well.
I've also tried to querySelector instead of xPath, that worked in some cases, it would log the first value of the node as expected, but then querySelectorAll on the same element would again return "undefined". I've looked everywhere, but simply can't find the solution.
i do it this way
const puppeteer = require("puppeteer");
async function scrapeProduct(url) {
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.setExtraHTTPHeaders({
"user-agent":
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36",
});
await page.goto(url);
// wait for elements defined by XPath appear in page
await page.waitForXPath("/html/body/main/div[1]/div/div/div[2]/h1");
// evaluate XPath expression of the target selector (it return array of ElementHandle)
const headings = await page.$x("/html/body/main/div[1]/div/div/div[2]/h1");
// prepare to get the textContent of the selector above (use page.evaluate)
let textContent = await page.evaluate((el) => el.textContent, headings[0]);
console.log(textContent);
}
scrapeProduct('https://getbootstrap.com/')
upvote my answer if it helps !

CookieJars obtaining all cookies, nodeJS using request-promise

I am struggling to successfully make a request using request-promise npm on a site that requires a cookie to view or for the request to be successful.
Henceforth, I have looked into cookieJars in order to store all those that are given in the repsonse after the request has been done.
const rp = require("request-promise")
var cookieJar = rp.jar()
function grabcfToken(){
let token = ""
let options = {
url : 'https://www.off---white.com/en/GB',
method: "GET",
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36',
resolveWithFullResponse : true
}
rp(options)
.then((response)=>{
console.log(response)
})
.catch((error)=>{
console.log(error)
})
}
Can someone tell me why the request isn't successfully going through? How do I apply the cookies that I initially get before being timed out.
const rp = require("request-promise")
var cookieJar = rp.jar()
function grabcfToken(){
let token = ""
let options = {
url : 'https://www.off---white.com/en/GB',
method: "GET",
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36',
resolveWithFullResponse : true,
jar: cookieJar
}
rp(options)
.then((response)=>{
console.log(response)
})
.catch((error)=>{
console.log(error)
})
}
If you're asking about including your jar which you filled with the cookies from the request to be sent to across you have to add jar: cookiejar as pasrt of your options object before sending it.

node js request proxy

I send a request through a proxy and always receive such a response
tunneling socket could not be established, cause=read ECONNRESET
or
tunneling socket could not be established, cause= socket hang up
My code
let settings = {
url: `url`,
headers: {
'Connection': 'keep-alive',
'User-Agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36"
},
method: 'POST',
proxy: `http://${ip}:${port}`,
strictSSL: false
}
request.request(settings, (err, response, body) => {
// err here
})
what am I doing wrong ?
Now this error : Error: Tunnel creation failed. Socket error: Error: read ECONNRESET
My code:
const request = require('request'),
proxyingAgent = require('proxying-agent');
;
let settings = {
url: url,
headers: {
'Connection': 'keep-alive',
'User-Agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36"
},
method: 'POST',
// proxy: `http://${obj.proxy[obj.proxyIdx]}`,
agent: proxyingAgent.create(`http://${obj.proxy[obj.proxyIdx]}`, url),
}
About your code, problem possibly lies in your settings object.
You need to use syntax like this:
let settings = {
url,
headers: {
'Connection': 'keep-alive',
'User-Agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36"
},
method: 'POST',
proxy: `http://${ip}:${port}`,
strictSSL: false
}
Here we use ES6 to make object short.
But also, you can establish proxy connection with npm package proxying agent.
Your code should look something like this:
const proxyingAgent = require('proxying-agent');
const fetch = require('node-fetch');
const host = <your host>;
const port = <port>;
const creds = {
login: 'username',
password: 'pass'
};
const port = <proxy port>;
const buildProxy = (url) => {
return {
agent: proxyingAgent.create(`http://${creds.login}:${creds.password}#${host}:${port}`, url)
};
};
//If you don't have credentials for proxy, you can rewrite function
const buildProxyWithoutCreds = (url) => {
return {
agent: proxyingAgent.create(`http://${host}:${port}`, url)
};
};
And than you can use it with your url and credentials. We'll use fetch package.
const proxyGetData = async (url, type) => {
try {
const proxyData = buildProxyWithoutCreds(url);
// Make request with proxy. Here we use promise based library node-fetch
let req = await fetch(url, proxyData);
if (req.status === 200) {
return await req[type]();
}
return false;
} catch (e) {
throw new Error(`Error during request: ${e.message}`);
}
};

How to set the user agent string in the phantom module?

var phantom = require('phantom');
console.dir(phantom);
phantom.create(function(browser){
browser.createPage(function(page){
page.customHeaders={
"HTTP_USER_AGENT": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.71 Safari/537.36",
};
console.dir(page.settings);
//undefined
page.settings={};
page.settings.userAgent = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.71 Safari/537.36';
page.settings.HTTP_USER_AGENT = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.71 Safari/537.36';
console.dir(page.settings);
page.open('http://example.com/req.php', function() {
setTimeout(function() {
var output = page.evaluate(function() {
return document;
});
console.dir(output);
//undefined
}, 1000);
});});});
when I use phantomjs I try and set the header for userAgent using three different ways but when I visit the page and save the PHP $_SERVER object to a txt pad I still see PhantomJS
HTTP_USER_AGENT: Mozilla/5.0 (Unknown; Linux x86_64) AppleWebKit/538.1 (KHTML, like Gecko) PhantomJS/2.0.1-development Safari/538.1
not only that but the output of the page is also undefined.
It seems that the docs have changed or I cant find the correct ones. I am looking at
http://phantomjs.org/api/webpage/property/settings.html
https://www.npmjs.com/package/phantom
How is this used correctly?
According to the Functional Details in the docs, you have to set the user agent through page.set():
page.set('settings.userAgent', 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.71 Safari/537.36');
It has to be done this way, because the bridge has to communicate with the PhantomJS process and isn't doing this in a non-asynchronous fashion. This could've probably been implemented with Object.defineProperty.
If you want to set multiple settings at once, you can do (ref):
page.set('settings', {
userAgent: "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.97 Safari/537.11",
javascriptEnabled: false,
loadImages: false
});
You can find a list of settings that you can set in page.settings.
Currently [ 27.01.2018 ], with these requirements:
phantom: ^4.0.12,
webpage: ^0.3.0
i use this method to set up this property:
page.setting(key, value);
I checked it out with php in $_SERVER array. It works correctly.
Сompletely code looks like this:
const phantom = require('phantom');
(async function() {
const instance = await phantom.create();
const page = await instance.createPage();
page.setting('userAgent',"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.97 Safari/537.11");
await page.on('onResourceRequested', function(requestData) {
//Dump request settings to view result of our changes:
console.info('Requesting', requestData);
});
const status = await page.open('https://stackoverflow.com');
const content = await page.property('content');
//console.log(content);
await instance.exit();
})();

Resources