Hi Guys I need some help for a scenario
I have an Array with say filled with Google, Yahoo 100 objects.
If Google do X
Else Yahoo do Y
This is easy in say Java Selenium, just loop through with an if statement, and start stop browser, given puppeteer runs async how can I achieve this with Javascript? Also using Jest.
I tried making the foreach loop async to be able to run await but the obvious issue is it launches all the browsers at once.
Would like to avoid .then promise chains for puppeteer.
describe('Sample Test', () => {
let browser
let page
beforeAll(async () => {
browser = await puppeteer.launch()
page = await browser.newPage()
})
afterAll(async () => {
await browser.close()
})
it('should search on google and navigate to domain', async () => {
jest.setTimeout(500000)
let numOfTotalVists = await helpers.getTotalVisits()
numOfTotalVists.forEach(element => {
if (element.includes('Google')) {
browser = puppeteer.launch()
page =browser.newPage()
browser.close()
}
console.log('no')
browser = puppeteer.launch()
page = browser.newPage()
browser.close()
})
Figured it out easy brainfart switched to a simple for loop
for (let i = 0; i < numOfTotalVists.length; i++) {
}
Related
I am trying to get the page content from multiple URLs using playwright in a nodejs application. My code looks like this:
const getContent = async (url: string): Promise<string> {
const browser = await firefox.launch({ headless: true });
const page = await browser.newPage();
try {
await page.goto(url, {
waitUntil: 'domcontentloaded',
});
return await page.content();
} finally {
await page.close();
await browser.close();
}
}
const items = [
{
urls: ["https://www.google.com", "https://www.example.com"]
// other props
},
{
urls: ["https://www.google.com", "https://www.example.com"]
// other props
},
// more items...
]
await Promise.all(
items.map(async (item) => {
const contents = [];
for (url in item.urls) {
contents.push(await getContent(url))
}
return contents;
}
)
I am getting errors like error (Page.content): Target closed. but I noticed that if I just run without loop:
const content = getContent('https://www.example.com');
It works.
It looks like each iteration of the loops share the same instance of browser and/or page, so they are closing/navigating away each other.
To test it I built a web API with the getContent function and when I send 2 requests (almost) at the same time one of them fails, instead if send one request at the time it always works.
Is there a way to make playwright work in parallel?
I don't know if that solves it, but noticed there are two missing awaits. Both the firefox.launch(...) and the browser.newPage() are asynchronous and need an await in front.
Also, you don't need to launch a new browser so many times. PlayWright has the feature of isolated browser contexts, which are created much faster than launching a browser. It's worth experimenting with launching the browser before the getContent function, and using
const context = await browser.newContext();
const page = await context.newPage();
Is there a way to use a callback for puppeteer.launch instead of wrapping it in an async IIFE?
(async () => {
const browser = await puppeteer.launch();
...
browser.close()
})();
could be simplified to
puppeteer.launch(function(browser) {
...
//browser automatically closes ideally
});
This has probably been answered somewhere but I couldn't find anything with a quick google search.
If you want it to work in that way, you could probably monkey-patch puppeteer:
const realLaunch = puppeteer.launch;
puppeteer.launch = async (callback) => {
const browser = await realLaunch();
callback(browser);
browser.close();
}
But that's going to be confusing to contributors to your project that are used to the Puppeteer api as published in their documentation, so personally I wouldn't...
I am trying to scrape https://www.premierleague.com/clubs/38/Wolverhampton-Wanderers/stats?se=274
The results being returned are for the page minus the ?se=274
This is applied by using the filter dropdown on the page and selecting 2019/20 season. I can navigate directly to the page and it works fine, but through code it does not work.
I have tried in cheerio and puppeteer. I was going to try nightmare too but this seems overkill I think. I am clearly not an expert! ;)
function getStats(callback){
var url = "https://www.premierleague.com/clubs/38/Wolverhampton-Wanderers/stats?se=274";
request(url, function (error, response, html) {
//console.log(html);
var $ = cheerio.load(html);
if(!error){
$('.allStatContainer.statontarget_scoring_att').filter(function(){
var data = $(this);
var vSOT = data.text();
//console.log(data);
console.log(vSOT);
});
}
});
callback;
}
This will return 564 instead of 2
It seems like you're calling callback before request returns. Move the callback call into the internal block, where the task you need is completed (in your case, it looks like the filter block).
It also looks like you're missing the () on the callback call.
Also, a recommendation: return the value you need through the callback.
So this code works....$10 from a rent-a-coder did the trick. Easy when you know how!
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch()
const page = await browser.newPage()
await page.goto('https://www.premierleague.com/clubs/4/Chelsea/stats?se=274')
const sleep = ms => new Promise(resolve => setTimeout(resolve, ms))
await sleep(4000)
const element = await page.$(".allStatContainer.statontarget_scoring_att");
const text = await page.evaluate(element => element.textContent, element);
console.log("Shots on Target:"+text)
browser.close()
})()
Background:
I am writing a Nodejs script with puppeteer to web scrape data from a web page. I'm not familiar with Nodejs, promises, or puppeteer. I've tried many things and done research for a few days.
Application Flow:
With automation, go to a website
Scrape data from the page, push to an array
If there is a "next page" click the next page button
Scrape data from the page, push to same array
Repeat
Problem:
My problem is with #3. With web automation, clicking the next page button.
All I want, is to use the .click() method in puppeteer, to click on the button selector. However, .click() returns a Promise. Since it's a promise, I need keyword await, but you can't have await in the for loop (or any block other than async).
What Have I Tried:
I've tried creating another async function, with statements for await page.click();and calling that function in the problem area. I've tried creating a regular function with page.click() and calling that in the problem area. Refactoring everything to have it not work as well. I'm not really understanding Promises and Async/Await even after reading about it for a few days.
What I Want Help With:
Help with invoking the .click() method inside the problem area or any help with selecting the 'Next Page' using web automation.
Pseudo Code:
let scrape = async () => {
await //do.some.automation;
const result = await page.evaluate(() => {
for (looping each page) {
if (there is a next page) {
for (loop through data) {
array.push(data);
//----PROBLEM----
//use automation to click the selector of the next page button
//--------------
}
}
}
return data;
});
//close browser
return result;
};
scrape().then((value) => {
//output data here;
});
});
All Code:
let scrape = async () => {
const browser = await puppeteer.launch({
headless: false
});
const page = await browser.newPage();
await page.goto("GO TO A WEBSITE");
await page.click("CLICK A BUTTON");
await page.waitFor(2000);
//Scraping
const result = await page.evaluate(() => {
let pages = document.getElementsByClassName("results-paging")[2];
let allPages = pages.getElementsByClassName("pagerLink");
let allJobs = [];
//Loop through each page
for (var j = 0; j < allPages.length; j++) {
let eachPage = pages.getElementsByClassName("pagerLink")[j].innerHTML;
if (eachPage) {
//Scrape jobs on single page
let listSection = document.getElementsByTagName("ul")[2];
let allList = listSection.getElementsByTagName("li");
for (var i = 0; i < allList.length; i++) {
let eachList = listSection.getElementsByTagName("li")[i].innerText;
allJobs.push(eachList);
//--------PROBLEM-------------
await page.click('#selector_of_next_page');
//----------------------------
}
}
else {
window.alert("Fail");
}
}
return allJobs;
});
browser.close();
return result;
};
scrape().then((value) => {
let data = value.join("\r\n");
console.log(data);
fs.writeFile("RESULTS.txt", data, function (err) {
console.log("SUCCESS MESSAGE");
});
});
Error Message:
SyntaxError: await is only valid in async function
You can not use page methods inside page.evaluate function.
Based on your example you should change
await page.click('#selector_of_next_page');
to native JS equivalent
document.getElementById('selector_of_next_page').click();
I'm trying to pass dynamic page automation commands to puppeteer from an external file. I'm new to puppeteer and node so I apologize in advance.
// app.js
// ========
app.get('/test', (req, res) =>
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('http://testurl.com');
var events = require('./events.json');
for(var i=0;i<events.length;i++){
var tmp = events[i];
await page.evaluate((tmp) => { return Promise.resolve(tmp.event); }, tmp);
}
await browser.close();
})());
My events json file looks like:
// events.json
// ========
[
{
"event":"page.waitFor(4000)"
},
{
"event":"page.click('#aLogin')"
},
{
"event":"page.waitFor(1000)"
}
]
I've tried several variations of the above as well as importing a module that passes the page object to one of the module function, but nothing has worked. Can anyone tell me if this is possible and, if so, how to better achieve this?
The solution is actually very simple and straightforward. You just have to understand how this works.
First of all, you cannot pass page elements like that to evaluate. Instead you can do the following,
On a seperate file,
module.exports = async function getCommands(page) {
return Promise.all([
await page.waitFor(4000),
await page.click("#aLogin"),
await page.waitFor(1000)
]);
};
Now on your main file,
await require('./events.js').getCommands(page);
There, it's done! It'll execute all commands for you one by one just as you wanted.
Here is a complete code with some adjustments,
const puppeteer = require("puppeteer");
async function getCommands(page) {
return Promise.all([
await page.title(),
await page.waitFor(1000)
]);
};
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto("https://example.com");
let data = await getCommands(page);
console.log(data);
await page.close();
await browser.close();
})();