Puppeteer Failing for more than 11 Urls - node.js

I would like to ask, whats the best way to capture more than 20 screenshots or different Urls?
I have tried the following code.
async function sCapture(url, site_name) {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.setViewport({ width: 1280, height: 720 })
await page.goto(url);
await page.screenshot({
path:`statusImage/${site_name}.jpg`
});
await browser.close();
}
Am getting the Urls from my DB like this.
db_connection.promise()
.execute("SELECT * FROM `urls`")
.then(([rows]) => {
rows.forEach(user => {
const url = user.link;
const name = user.link_name;
console.log(name);
sCapture(url, name)
});
db_connection.end();
}).catch(err => {
console.log(err);
});
Because my DB Table contains more than 50 urls
Before, I was getting this error:
MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 exit listeners added. Use emitter.setMaxListeners() to increase limit
After I added the line below. Its just killing my server and I have to do a manual reboot for my site to work again.
require('events').EventEmitter.prototype._maxListeners = 100;
I will appreciate any help rendered.

I think your current code actually starts a new browser instance for each URL you want to fetch and I don't think you need to do that. A separate page is enough. Also, you are currently making all those requests in parallel, which will tax your machine more than doing it in sequence. Putting these two changes together give you something like this:
let browser;
async function sCapture(url, site_name) {
const page = await browser.newPage();
await page.setViewport({ width: 1280, height: 720 })
await page.goto(url);
await page.screenshot({
path:`statusImage/${site_name}.jpg`
});
}
const doit = () => {
db_connection.promise()
.execute("SELECT * FROM `urls`")
.then(([rows]) => {
rows.forEach(async user => {
const url = user.link;
const name = user.link_name;
console.log(name);
await sCapture(url, name);
});
db_connection.end();
}).catch(err => {
console.log(err);
});
}
(async () => {
browser = await puppeteer.launch();
doit();
await browser.close();
})();

Related

Can't click link using puppeteer - Thingiverse

I'm trying to automate away downloading multiple files on thingiverse. I choose an object at random. But I'm having a hard time locating the link I need, clicking and then downloading. Has someone run into this before can I get some help?
I've tried several other variations.
import puppeteer from 'puppeteer';
async function main() {
const browser = await puppeteer.launch({
headless: true,
});
const page = await browser.newPage();
const response = await page.goto('https://www.thingiverse.com/thing:2033856/files');
const buttons = await page.$x(`//a[contains(text(), 'Download')]`);
if(buttons.length > 0){
console.log(buttons.length);
} else {
console.log('no buttons');
}
await wait(5000);
await browser.close();
return 'Finish';
}
async function wait(time: number) {
return new Promise(function (resolve) {
setTimeout(resolve, time);
});
}
function start() {
main()
.then((test) => console.log('DONE'))
.catch((reason) => console.log('Error: ', reason));
}
start();
Download Page
Code
I was able to get it to work.
The selector is: a[class^="ThingFile__download"]
Puppeteer is: const puppeteer = require('puppeteer-extra');
Before the await page.goto() I always recommend setting the viewport:
await page.setViewport({width: 1920, height: 720});
After that is set, change the await page.goto() to have a waitUntil option:
const response = await page.goto('https://www.thingiverse.com/thing:2033856/files', { waitUntil: 'networkidle0' }); // wait until page load
Next, this is a very important part. You have to do what is called waitForSelector() or waitForFunction().
I added both of these lines of code after the const response:
await page.waitForSelector('a[class^="ThingFile__download"]', {visible: true})
await page.waitForFunction("document.querySelector('a[class^=\"ThingFile__download\"]') && document.querySelector('a[class^=\"ThingFile__download\"]').clientHeight != 0");
Next, get the buttons. For my testing I just grabbed the button href.
const buttons = await page.$eval('a[class^="ThingFile__download"]', anchor => anchor.getAttribute('href'));
Lastly, do not check the .length of this variable. In this case we are just returning the href value which is a string. You will get a Promise of an ElementHandle when you try getting just the button:
const button = await page.$('a[class^="ThingFile__download"]');
console.log(button)
if (button) { ... }
Now if you change that page.$ to be page.$$, you will be getting a Promise of an Array<ElementHandle>, and will be able to use .length there.
const buttonsAll = await page.$$('a[class^="ThingFile__download"]');
console.log(buttonsAll)
if (buttons.length > 0) { ... }
Hopefully this helps, and if you can't figure it out I can post my full source later if I have time to make it look better.

Why am I not able to navigate through iFrames using Apify/Puppeteer?

I'm trying to manipulate forms of sites w/ iFrames in it using Puppeteer. I tried different ways to reach a specific iFrame, or even to count iFrames in a website, with no success.
Why isn't Puppeteer's object recognizing the iFrames / child frames of the page I'm trying to navigate through?
It's happening with other pages as well, such as https://www.veiculos.itau.com.br/simulacao
const Apify = require('apify');
const sleep = require('sleep-promise');
Apify.main(async () => {
// Launch the web browser.
const browser = await Apify.launchPuppeteer();
// Create and navigate new page
console.log('Open target page');
const page = await browser.newPage();
await page.goto('https://www.credlineitau.com.br/');
await sleep(15 * 1000);
for (const frame in page.mainFrame().childFrames()) {
console.log('test');
}
await browser.close();
});
Perhaps you'll find some helpful inspiration below.
const waitForIframeContent = async (page, frameSelector, contentSelector) => {
await page.waitForFunction((frameSelector, contentSelector) => {
const frame = document.querySelector(frameSelector);
const node = frame.contentDocument.querySelector(contentSelector);
return node && node.innerText;
}, {
timeout: TIMEOUTS.ten,
}, frameSelector, contentSelector);
};
const $frame = await waitForSelector(page, SELECTORS.frame.iframeNode).catch(() => null);
if ($frame) {
const frame = page.frames().find(frame => frame.name() === 'content-iframe');
const $cancelStatus = await waitForSelector(frame, SELECTORS.frame.membership.cancelStatus).catch(() => null);
await waitForIframeContent(page, SELECTORS.frame.iframeNode, SELECTORS.frame.membership.cancelStatus);
}
Give it a shot.

Puppeteer times out when headless is true on waitForNavigation and waitForSelector

I was successfully able to navigate through a website, log in, and get the information I need. However when I switch to using the same code with headless being true, it times out on each line where I am trying to wait.
const self = {
browser: null,
page: null,
initialize: async () => {
self.browser = await puppeteer.launch({
headless: true,
});
self.page = await self.browser.newPage();
/* Go to homepage */
await self.page.goto(SHIPT_URL, { waitUntil: "networkidle0" });
},
getResults: async () => {
await self.page.type("#email", email, { delay: 100 });
await self.page.type("#password", password, { delay: 100 });
let loginButton = await self.page.$("button.sc-fzplWN");
if (loginButton) {
await loginButton.click();
await self.page.waitForNavigation({ waitUntil: "networkidle0" }); // Hangs here
await self.page.waitForSelector("a.pointer.link.gray900"); // Hangs here if I remove above waitForNavigation
I've tried researching this and tried a few different approaches like setting my user agent, putting the .click and waitForNavigation in a promise.all, removing "waitUntil" in waitForNavigation, and removing await on .click but nothing seems to be working.
Edit. So it looks like the login isn't working, it says wrong username or email which makes no sense since it works when I am not on headless. Going to need to look into this more
Try and screenshot the output before running your commands. It could be that the default size of the page is different and the selectors don't exist.
Puppeteer sets an initial page size to 800×600px, which defines the screenshot size.
getResults: async () => {
await page.screenshot({path: 'example.png'});
await self.page.type("#email", email, { delay: 100 });
await self.page.type("#password", password, { delay: 100 });
let loginButton = await self.page.$("button.sc-fzplWN");

Puppeteer: How to get DOM after switching chromium Tabs?

Right now I am getting all the DOM from Only one Tab which is Default.But I want to capture DOM from Each tab after Switching.
Here is the my sample code.
async function RedirectToLogin(page)
{
console.log(page.content());
}
async function main() {
const browser = await puppeteer.launch({headless: false});
const page = await browser.newPage();
await page.setViewport({
width: 2000,
height: 1000,
});
await page.goto('https://www.google.com/');
console.log('clicked');
browser.on('targetchanged', () => RedirectToLogin(page));
count++;
}
main();
You can get the URL for the currently active page by doing the following:
const url = await page.url();
console.log(url);

How to handle more than one window in puppeteer?

I am using puppeteer to do tests with the browser, i've managed to do is to access a page, then i do click in a DOM element, after do click, the browser show me other view, in this view i do click in a button that open a pop up for do log in with facebook.
My question is:
how i can handler the other window for do login with
facebook? this is code.
Example code:
import * as puppeteer from 'puppeteer';
const f = async () => {
console.log('..');
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
page.setViewport({ width: 1200, height: 800 })
await page.goto('https://goalgoodies.herokuapp.com').catch(err => { console.log('error ', err); });
await page.screenshot({ path: 'screenshot.png' });
const resp = await page.click('a').catch(err => { console.log('error click', err); });
const inputElement = await page.$('.signin a').catch(err => { console.log('error selector', err); });
await inputElement.click().catch(err => { console.log('error click', err); });
await page.screenshot({ path: 'screenshot2.png' });
const fbBtn = await page.$('button[name=facebook]');
await fbBtn.click();
// here it's open pop up for do login with facebook
await page.screenshot({ path: 'clickpopup.png' });
};
f();
Apparently there is no way with puppeter to interact with other windows
Here another related question
In this post forum the user aslushnikov mentions something related with Target domain, but I can not understand what he means, or how to execute.
Any help would be appreciated.
Thank you
I think you are looking for Browser Contexts.
https://chromedevtools.github.io/devtools-protocol/tot/Target/#method-createBrowserContext
A sample implementation is discussed in details here,
https://github.com/cyrus-and/chrome-remote-interface/issues/118
Hope it helps.
To allow changing between open pages I created a simple utility method:
async function changePage(url) {
let pages = await browser.pages();
let foundPage;
for(let i = 0; i < pages.length; i += 1) {
if(pages[i].url() === url) {
foundPage = pages[i];//return the new working page
break;
}
}
return foundPage;
}
This assumes your timing is correct for any newly opened windows, but that would be a different topic.

Resources