Currently I'm using Puppeteer to fetch cookies & headers from a page, however it's using a bot prevention system which is only bypassed when clicking on the page; I don't want to keep this sequential so it's "detectable"
How can I have my Puppeteer click anywhere on the page at random? regardless of wether it clicks a link, button etc..
I've currently got this code
const getCookies = async (state) => {
try {
state.browser = await launch_browser(state);
state.context = await state.browser.createIncognitoBrowserContext();
state.page = await state.context.newPage();
await state.page.authenticate({
username: proxies.username(),
password: proxies.password(),
});
await state.page.setViewport(functions.get_viewport());
state.page.on('response', response => handle_response(response, state));
await state.page.goto('https://www.website.com', {
waitUntil: 'networkidle0',
});
await state.page.waitFor('.unlockLink a', {
timeout: 5000
});
await state.page.click('.unlockLink a');
await state.page.waitFor('input[id="nondevice"]', {
timeout: 5000
});
state.publicIpv4Address = await state.page.evaluate(() => {
return sessionStorage.getItem("publicIpv4Address");
});
state.csrfToken = await state.page.evaluate(() => {
return sessionStorage.getItem("csrf-token");
});
//I NEED TO CLICK HERE! CAN BE WHITESPACE, LINK, IMAGE
state.browser_cookies = await state.page.cookies();
state.browser.close();
for (const cookie of state.browser_cookies) {
if(cookie.name === "dtPC") {
state.dtpc = cookie.value;
}
await state.jar.setCookie(
`${cookie.name}=${cookie.value}`,
'https://www.website.com'
)
}
return state;
} catch(error) {
if(state.browser) {
state.browser.close();
}
throw new Error(error);
}
};
The simplest way I can think of out of my head to choose a random element from DOM would be probably something like using querySelectorAll() which will return you an array of all <div>s in your document (or choose any other element, like <p> or anything else), then you can easily use click() on random one from the result, for example:
await page.evaluate(() => {
const allDivs = document.querySelectorAll('.left-sidebar-toggle');
const randomElement = allDivs[Math.floor(Math.random() * allDivs.length)];
randomElement.click();
});
Related
I'm trying to automate away downloading multiple files on thingiverse. I choose an object at random. But I'm having a hard time locating the link I need, clicking and then downloading. Has someone run into this before can I get some help?
I've tried several other variations.
import puppeteer from 'puppeteer';
async function main() {
const browser = await puppeteer.launch({
headless: true,
});
const page = await browser.newPage();
const response = await page.goto('https://www.thingiverse.com/thing:2033856/files');
const buttons = await page.$x(`//a[contains(text(), 'Download')]`);
if(buttons.length > 0){
console.log(buttons.length);
} else {
console.log('no buttons');
}
await wait(5000);
await browser.close();
return 'Finish';
}
async function wait(time: number) {
return new Promise(function (resolve) {
setTimeout(resolve, time);
});
}
function start() {
main()
.then((test) => console.log('DONE'))
.catch((reason) => console.log('Error: ', reason));
}
start();
Download Page
Code
I was able to get it to work.
The selector is: a[class^="ThingFile__download"]
Puppeteer is: const puppeteer = require('puppeteer-extra');
Before the await page.goto() I always recommend setting the viewport:
await page.setViewport({width: 1920, height: 720});
After that is set, change the await page.goto() to have a waitUntil option:
const response = await page.goto('https://www.thingiverse.com/thing:2033856/files', { waitUntil: 'networkidle0' }); // wait until page load
Next, this is a very important part. You have to do what is called waitForSelector() or waitForFunction().
I added both of these lines of code after the const response:
await page.waitForSelector('a[class^="ThingFile__download"]', {visible: true})
await page.waitForFunction("document.querySelector('a[class^=\"ThingFile__download\"]') && document.querySelector('a[class^=\"ThingFile__download\"]').clientHeight != 0");
Next, get the buttons. For my testing I just grabbed the button href.
const buttons = await page.$eval('a[class^="ThingFile__download"]', anchor => anchor.getAttribute('href'));
Lastly, do not check the .length of this variable. In this case we are just returning the href value which is a string. You will get a Promise of an ElementHandle when you try getting just the button:
const button = await page.$('a[class^="ThingFile__download"]');
console.log(button)
if (button) { ... }
Now if you change that page.$ to be page.$$, you will be getting a Promise of an Array<ElementHandle>, and will be able to use .length there.
const buttonsAll = await page.$$('a[class^="ThingFile__download"]');
console.log(buttonsAll)
if (buttons.length > 0) { ... }
Hopefully this helps, and if you can't figure it out I can post my full source later if I have time to make it look better.
I am trying to click the "Create File" button on fakebook's download your information page. I am currently able to goto the page, and I wait for the login process to finish. However, when I try to detect the button using
page.$x("//div[contains(text(),'Create File')]")
nothing is found. The same thing occurs when I try to find it in the chrome dev tools console, both in a puppeteer window and in a regular window outside of the instance of chrome puppeteer is controlling:
This is the html info for the element:
I am able to find the element however after I have clicked on it using the chrome dev tools inspector tool:
(the second print statement is from after I have clicked on it with the element inspector tool)
How should I select this element? I am new to puppeteer and to xpath so I apologize if I just missed something obvious.
A small few links I currently remember looking at previously:
Puppeteer can't find selector
puppeteer cannot find element
puppeteer: how to wait until an element is visible?
My Code:
const StealthPlugin = require("puppeteer-extra-plugin-stealth");
(async () => {
let browser;
try {
puppeteer.use(StealthPlugin());
browser = await puppeteer.launch({
headless: false,
// path: "C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe",
args: ["--disable-notifications"],
});
const pages = await browser.pages();
const page = pages[0];
const url = "https://www.facebook.com/dyi?referrer=yfi_settings";
await page.goto(url);
//Handle the login process. Since the login page is different from the url we want, I am going to assume the user
//has logged in if they return to the desired page.
//Wait for the login page to process
await page.waitForFunction(
(args) => {
return window.location.href !== args[0];
},
{ polling: "mutation", timeout: 0 },
[url]
);
//Since multifactor auth can resend the user temporarly to the desired url, use a little debouncing to make sure the user is completely done signing in
// make sure there is no redirect for mfa
await page.waitForFunction(
async (args) => {
// function to make sure there is a debouncing delay between checking the url
// Taken from: https://stackoverflow.com/a/49813472/11072972
function delay(delayInms) {
return new Promise((resolve) => {
setTimeout(() => {
resolve(2);
}, delayInms);
});
}
if (window.location.href === args[0]) {
await delay(2000);
return window.location.href === args[0];
}
return false;
},
{ polling: "mutation", timeout: 0 },
[url]
);
// await page.waitForRequest(url, { timeout: 100000 });
const requestArchiveXpath = "//div[contains(text(),'Create File')]";
await page.waitForXPath(requestArchiveXpath);
const [requestArchiveSelector] = await page.$x(requestArchiveXpath);
await page.click(requestArchiveSelector);
page.waitForTimeout(3000);
} catch (e) {
console.log("End Error: ", e);
} finally {
if (browser) {
await browser.close();
}
}
})();
Resolved using the comment above by #vsemozhebuty and source. Only the last few lines inside the try must change:
const iframeXpath = "//iframe[not(#hidden)]";
const requestArchiveXpath = "//div[contains(text(),'Create File')]";
//Wait for and get iframe
await page.waitForXPath(iframeXpath);
const [iframeHandle] = await page.$x(iframeXpath);
//content frame for iframe => https://devdocs.io/puppeteer/index#elementhandlecontentframe
const frame = await iframeHandle.contentFrame();
//Wait for and get button
await frame.waitForXPath(requestArchiveXpath);
const [requestArchiveSelector] = await frame.$x(requestArchiveXpath);
//click button
await requestArchiveSelector.click();
await page.waitForTimeout(3000);
I am trying to scrape a webpage with Puppeteer. Enter, navigate through some pages and in the data pages (those that are paginated) add POST data (emulating the form).
The event to intercept the request can only be created once, so all calls will be affected by the data sent via POST. (Node Puppeteer, page.on( "request" ) throw a "Request is already handled!")
I didn't find much information on this (how do POST request in puppeteer?), and finally did the following:
Create a function that will always be called (on each request).
Query an attribute of the function to see if it has an object.
If you have it, embed the data via POST; and remove the attribute.
If the attribute does not exist, continue without embedding data.
const openConnection = async () => {
const browser = await puppeteer.launch({
headless: true,
args: ["--no-sandbox"],
});
const page = await browser.newPage();
await page.setRequestInterception(true);
page.on("request", requestPost);
return { browser, page };
};
const requestPost = async (req) => {
if (typeof requestPost.data === "object") {
requestPost.data.headers = { ...req.headers(), ...requestPost.data.headers };
await req.continue(requestPost.data);
delete requestPost.data;
} else {
await req.continue();
}
};
const getData = async (m, y, p, l) => {
const { browser, page } = await openConnection();
let data = [];
let pagina = p;
do {
/* JUST because this attribute is being created, the next request that is created in the page.goto() that follows, will be altered with these attributes */
requestPost.data = {
method: "POST",
postData: `&pagina=${pagina}&mes=${m}&year=${y}`,
headers: { "Content-Type": "application/x-www-form-urlencoded" },
};
await page.goto("https://url.com/info.cgi", { waitUntil: "networkidle2" });
// Now I work the data and add it to the end
// data = data.push();
pagina++;
} while (pagina < p + l);
await closeConnection(page, browser);
return data;
};
I'm trying to manipulate forms of sites w/ iFrames in it using Puppeteer. I tried different ways to reach a specific iFrame, or even to count iFrames in a website, with no success.
Why isn't Puppeteer's object recognizing the iFrames / child frames of the page I'm trying to navigate through?
It's happening with other pages as well, such as https://www.veiculos.itau.com.br/simulacao
const Apify = require('apify');
const sleep = require('sleep-promise');
Apify.main(async () => {
// Launch the web browser.
const browser = await Apify.launchPuppeteer();
// Create and navigate new page
console.log('Open target page');
const page = await browser.newPage();
await page.goto('https://www.credlineitau.com.br/');
await sleep(15 * 1000);
for (const frame in page.mainFrame().childFrames()) {
console.log('test');
}
await browser.close();
});
Perhaps you'll find some helpful inspiration below.
const waitForIframeContent = async (page, frameSelector, contentSelector) => {
await page.waitForFunction((frameSelector, contentSelector) => {
const frame = document.querySelector(frameSelector);
const node = frame.contentDocument.querySelector(contentSelector);
return node && node.innerText;
}, {
timeout: TIMEOUTS.ten,
}, frameSelector, contentSelector);
};
const $frame = await waitForSelector(page, SELECTORS.frame.iframeNode).catch(() => null);
if ($frame) {
const frame = page.frames().find(frame => frame.name() === 'content-iframe');
const $cancelStatus = await waitForSelector(frame, SELECTORS.frame.membership.cancelStatus).catch(() => null);
await waitForIframeContent(page, SELECTORS.frame.iframeNode, SELECTORS.frame.membership.cancelStatus);
}
Give it a shot.
I am using puppeteer to do tests with the browser, i've managed to do is to access a page, then i do click in a DOM element, after do click, the browser show me other view, in this view i do click in a button that open a pop up for do log in with facebook.
My question is:
how i can handler the other window for do login with
facebook? this is code.
Example code:
import * as puppeteer from 'puppeteer';
const f = async () => {
console.log('..');
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
page.setViewport({ width: 1200, height: 800 })
await page.goto('https://goalgoodies.herokuapp.com').catch(err => { console.log('error ', err); });
await page.screenshot({ path: 'screenshot.png' });
const resp = await page.click('a').catch(err => { console.log('error click', err); });
const inputElement = await page.$('.signin a').catch(err => { console.log('error selector', err); });
await inputElement.click().catch(err => { console.log('error click', err); });
await page.screenshot({ path: 'screenshot2.png' });
const fbBtn = await page.$('button[name=facebook]');
await fbBtn.click();
// here it's open pop up for do login with facebook
await page.screenshot({ path: 'clickpopup.png' });
};
f();
Apparently there is no way with puppeter to interact with other windows
Here another related question
In this post forum the user aslushnikov mentions something related with Target domain, but I can not understand what he means, or how to execute.
Any help would be appreciated.
Thank you
I think you are looking for Browser Contexts.
https://chromedevtools.github.io/devtools-protocol/tot/Target/#method-createBrowserContext
A sample implementation is discussed in details here,
https://github.com/cyrus-and/chrome-remote-interface/issues/118
Hope it helps.
To allow changing between open pages I created a simple utility method:
async function changePage(url) {
let pages = await browser.pages();
let foundPage;
for(let i = 0; i < pages.length; i += 1) {
if(pages[i].url() === url) {
foundPage = pages[i];//return the new working page
break;
}
}
return foundPage;
}
This assumes your timing is correct for any newly opened windows, but that would be a different topic.