node.js puppeteer launch callback - node.js

Is there a way to use a callback for puppeteer.launch instead of wrapping it in an async IIFE?
(async () => {
const browser = await puppeteer.launch();
...
browser.close()
})();
could be simplified to
puppeteer.launch(function(browser) {
...
//browser automatically closes ideally
});
This has probably been answered somewhere but I couldn't find anything with a quick google search.

If you want it to work in that way, you could probably monkey-patch puppeteer:
const realLaunch = puppeteer.launch;
puppeteer.launch = async (callback) => {
const browser = await realLaunch();
callback(browser);
browser.close();
}
But that's going to be confusing to contributors to your project that are used to the Puppeteer api as published in their documentation, so personally I wouldn't...

Related

playwright - get content from multiple pages in parallel

I am trying to get the page content from multiple URLs using playwright in a nodejs application. My code looks like this:
const getContent = async (url: string): Promise<string> {
const browser = await firefox.launch({ headless: true });
const page = await browser.newPage();
try {
await page.goto(url, {
waitUntil: 'domcontentloaded',
});
return await page.content();
} finally {
await page.close();
await browser.close();
}
}
const items = [
{
urls: ["https://www.google.com", "https://www.example.com"]
// other props
},
{
urls: ["https://www.google.com", "https://www.example.com"]
// other props
},
// more items...
]
await Promise.all(
items.map(async (item) => {
const contents = [];
for (url in item.urls) {
contents.push(await getContent(url))
}
return contents;
}
)
I am getting errors like error (Page.content): Target closed. but I noticed that if I just run without loop:
const content = getContent('https://www.example.com');
It works.
It looks like each iteration of the loops share the same instance of browser and/or page, so they are closing/navigating away each other.
To test it I built a web API with the getContent function and when I send 2 requests (almost) at the same time one of them fails, instead if send one request at the time it always works.
Is there a way to make playwright work in parallel?
I don't know if that solves it, but noticed there are two missing awaits. Both the firefox.launch(...) and the browser.newPage() are asynchronous and need an await in front.
Also, you don't need to launch a new browser so many times. PlayWright has the feature of isolated browser contexts, which are created much faster than launching a browser. It's worth experimenting with launching the browser before the getContent function, and using
const context = await browser.newContext();
const page = await context.newPage();

Simple web scraping with puppeteer / cheerio not working with params

I am trying to scrape https://www.premierleague.com/clubs/38/Wolverhampton-Wanderers/stats?se=274
The results being returned are for the page minus the ?se=274
This is applied by using the filter dropdown on the page and selecting 2019/20 season. I can navigate directly to the page and it works fine, but through code it does not work.
I have tried in cheerio and puppeteer. I was going to try nightmare too but this seems overkill I think. I am clearly not an expert! ;)
function getStats(callback){
var url = "https://www.premierleague.com/clubs/38/Wolverhampton-Wanderers/stats?se=274";
request(url, function (error, response, html) {
//console.log(html);
var $ = cheerio.load(html);
if(!error){
$('.allStatContainer.statontarget_scoring_att').filter(function(){
var data = $(this);
var vSOT = data.text();
//console.log(data);
console.log(vSOT);
});
}
});
callback;
}
This will return 564 instead of 2
It seems like you're calling callback before request returns. Move the callback call into the internal block, where the task you need is completed (in your case, it looks like the filter block).
It also looks like you're missing the () on the callback call.
Also, a recommendation: return the value you need through the callback.
So this code works....$10 from a rent-a-coder did the trick. Easy when you know how!
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch()
const page = await browser.newPage()
await page.goto('https://www.premierleague.com/clubs/4/Chelsea/stats?se=274')
const sleep = ms => new Promise(resolve => setTimeout(resolve, ms))
await sleep(4000)
const element = await page.$(".allStatContainer.statontarget_scoring_att");
const text = await page.evaluate(element => element.textContent, element);
console.log("Shots on Target:"+text)
browser.close()
})()

Running Puppeteer test multiple times with different scenarios Async/Await

Hi Guys I need some help for a scenario
I have an Array with say filled with Google, Yahoo 100 objects.
If Google do X
Else Yahoo do Y
This is easy in say Java Selenium, just loop through with an if statement, and start stop browser, given puppeteer runs async how can I achieve this with Javascript? Also using Jest.
I tried making the foreach loop async to be able to run await but the obvious issue is it launches all the browsers at once.
Would like to avoid .then promise chains for puppeteer.
describe('Sample Test', () => {
let browser
let page
beforeAll(async () => {
browser = await puppeteer.launch()
page = await browser.newPage()
})
afterAll(async () => {
await browser.close()
})
it('should search on google and navigate to domain', async () => {
jest.setTimeout(500000)
let numOfTotalVists = await helpers.getTotalVisits()
numOfTotalVists.forEach(element => {
if (element.includes('Google')) {
browser = puppeteer.launch()
page =browser.newPage()
browser.close()
}
console.log('no')
browser = puppeteer.launch()
page = browser.newPage()
browser.close()
})
Figured it out easy brainfart switched to a simple for loop
for (let i = 0; i < numOfTotalVists.length; i++) {
}

Why can't I access 'window' in an exposeFunction() function with Puppeteer?

I have a very simple Puppeteer script that uses exposeFunction() to run something inside headless Chrome.
(async function(){
var log = console.log.bind(console),
puppeteer = require('puppeteer');
const browser = await puppeteer.launch();
const page = await browser.newPage();
var functionToInject = function(){
return window.navigator.appName;
}
await page.exposeFunction('functionToInject', functionToInject);
var data = await page.evaluate(async function(){
console.log('woo I run inside a browser')
return await functionToInject();
});
console.log(data);
await browser.close();
})()
This fails with:
ReferenceError: window is not defined
Which refers to the injected function. How can I access window inside the headless Chrome?
I know I can do evaluate() instead, but this doesn't work with a function I pass dynamically:
(async function(){
var log = console.log.bind(console),
puppeteer = require('puppeteer');
const browser = await puppeteer.launch();
const page = await browser.newPage();
var data = await page.evaluate(async function(){
console.log('woo I run inside a browser')
return window.navigator.appName;
});
console.log(data);
await browser.close();
})()
evaluate the function
You can pass the dynamic script using evaluate.
(async function(){
var puppeteer = require('puppeteer');
const browser = await puppeteer.launch();
const page = await browser.newPage();
var functionToInject = function(){
return window.navigator.appName;
}
var data = await page.evaluate(functionToInject); // <-- Just pass the function
console.log(data); // outputs: Netscape
await browser.close();
})()
addScriptTag and readFileSync
You can save the function to a seperate file and use the function using addScriptTag.
await page.addScriptTag({path: 'my-script.js'});
or evaluate with readFileSync.
await page.evaluate(fs.readFileSync(filePath, 'utf8'));
or, pass a parameterized funciton as a string to page.evaluate.
await page.evaluate(new Function('foo', 'console.log(foo);'), {foo: 'bar'});
Make a new function dynamically
How about making it into a runnable function :D ?
function runnable(fn) {
return new Function("arguments", `return ${fn.toString()}(arguments)`);
}
The above will create a new function with provided arguments. We can pass any function we want.
Such as the following function with window, along with arguments,
function functionToInject() {
return window.location.href;
};
works flawlessly with promises too,
function functionToInject() {
return new Promise((resolve, reject) => {
setTimeout(() => {
resolve(window.location.href);
}, 5000);
});
}
and with arguments,
async function functionToInject(someargs) {
return someargs; // {bar: 'foo'}
};
Call the desired function with evaluate,
var data = await page.evaluate(runnable(functionToInject), {bar: "foo"});
console.log(data); // shows the location
exposeFunction() isn't the right tool for this job.
From the Puppeteer docs
page.exposeFunction(name, puppeteerFunction)
puppeteerFunction Callback function which will be called in Puppeteer's context.
'In puppeteer's context' is a little vague, but check out the docs for evaluate():
page.evaluateHandle(pageFunction, ...args)
pageFunction Function to be evaluated in the page context
exposeFunction() doesn't expose a function to run inside the page, but exposes a function to be be run in node to be called from the page.
I have to use evaluate():
You problem could be related to the fact that page.exposeFunction() will make your function return a Promise (requiring the use of async and await). This happens because your function will not be running inside your browser, but inside your nodejs application and its results are being send back and forth into/to the browser code. This is why you function passed to page.exposeFunction() is now returning a promise instead of the actual result. And it explains why the window function is not defined, because your function is running inside nodejs (not your browser) and inside nodejs there is no window definition available.
Related questions:
exposeFunction() does not work after goto()
exposed function queryseldtcor not working in puppeteer
How to use evaluateOnNewDocument and exposeFunction?
exposeFunction remains in memory?
Puppeteer: pass variable in .evaluate()
Puppeteer evaluate function
allow to pass a parameterized funciton as a string to page.evaluate
Functions bound with page.exposeFunction() produce unhandled promise rejections
How to pass a function in Puppeteers .evaluate() method?
How can I dynamically inject functions to evaluate using Puppeteer?

How to pass dynamic page automation commands to puppeteer from external file?

I'm trying to pass dynamic page automation commands to puppeteer from an external file. I'm new to puppeteer and node so I apologize in advance.
// app.js
// ========
app.get('/test', (req, res) =>
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('http://testurl.com');
var events = require('./events.json');
for(var i=0;i<events.length;i++){
var tmp = events[i];
await page.evaluate((tmp) => { return Promise.resolve(tmp.event); }, tmp);
}
await browser.close();
})());
My events json file looks like:
// events.json
// ========
[
{
"event":"page.waitFor(4000)"
},
{
"event":"page.click('#aLogin')"
},
{
"event":"page.waitFor(1000)"
}
]
I've tried several variations of the above as well as importing a module that passes the page object to one of the module function, but nothing has worked. Can anyone tell me if this is possible and, if so, how to better achieve this?
The solution is actually very simple and straightforward. You just have to understand how this works.
First of all, you cannot pass page elements like that to evaluate. Instead you can do the following,
On a seperate file,
module.exports = async function getCommands(page) {
return Promise.all([
await page.waitFor(4000),
await page.click("#aLogin"),
await page.waitFor(1000)
]);
};
Now on your main file,
await require('./events.js').getCommands(page);
There, it's done! It'll execute all commands for you one by one just as you wanted.
Here is a complete code with some adjustments,
const puppeteer = require("puppeteer");
async function getCommands(page) {
return Promise.all([
await page.title(),
await page.waitFor(1000)
]);
};
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto("https://example.com");
let data = await getCommands(page);
console.log(data);
await page.close();
await browser.close();
})();

Resources