I am trying to fetch data and trigger some automatic buying process with the following website. https://www.klwines.com/
Was using "puppeteer" methods with NodeJS to process the script. According to the following screenshot provided, I got stuck with an issue where I cannot select one of the a radio button from the list since all radio buttons having the same id. What I am trying to do is just trying to select the last radio button from the following list and then trigger he button shown in the image. I was using the following NodeJS code with the help of puppeteer.
await page.waitForNavigation();
await page.waitForSelector('[name="continue"]');
const radio = await page.evaluate("table tr:nth-child(4) > td > input[type=radio]")
radio.click()
Please note that the page variable is defined as the following.
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
If someone can help with this to find a way that would be really great full.
You can try this way;
const puppeteer = require('puppeteer');
exports.yourStatus = async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://www.klwines.com/');
const data = await page.evaluate(() => {
function cleanData(element) {
const items = element.getElementById('Shepmente_0__shepmentewayCode');
return [...items].map(item => {
console.log(item)
});
}
return data;
};
Related
What i'm trying to accomplish is enter this site https://www.discoverpermaculture.com/permaculture-masterclass-video-1 wait until it loads, load all comments from disqus (click 'Load more comments' button until it's no longer present) and save page as mhtml for offline use.
I found similar question here Puppeteer / Node.js to click a button as long as it exists -- and when it no longer exists, commence action but unfortunately trying to detect the "Load more comments" button doesn't work for some reason.
Seems like WaitForSelector('a.load-more__button') is not working because all it prints out is "not visible".
Here's my code
const puppeteer = require('puppeteer');
const url = "https://www.discoverpermaculture.com/permaculture-masterclass-video-1";
const isElementVisible = async (page, cssSelector) => {
let visible = true;
await page
.waitForSelector(cssSelector, { visible: true, timeout: 4000 })
.catch(() => {
console.log('not visible');
visible = false;
});
return visible;
};
async function run () {
let browser = await puppeteer.launch({
headless: true,
defaultViewport: null,
args: [
'--window-size=1920,10000',
],
});
const page = await browser.newPage();
const fs = require('fs');
await page.goto(url);
await page.waitForNavigation();
await page.waitForTimeout(4000)
const selectorForLoadMoreButton = 'a.load-more__button';
let loadMoreVisible = await isElementVisible(page, selectorForLoadMoreButton);
while (loadMoreVisible) {
console.log('load more visible');
await page
.click(selectorForLoadMoreButton)
.catch(() => {});
await page.waitForTimeout(4000);
loadMoreVisible = await isElementVisible(page, selectorForLoadMoreButton);
}
const cdp = await page.target().createCDPSession();
const { data } = await cdp.send('Page.captureSnapshot', { format: 'mhtml' });
fs.writeFileSync('page.mhtml', data);
browser.close();
}
run();
You're just waiting for an ajax request to be processed. You could simply save the total number of comments (top left of the DISQUS plugin) and compare it to an array of comments once the array is equal to the total then you've retrieved every comments.
I've posted something a while back on waiting for ajax request you can see it here: https://stackoverflow.com/a/66092889/3645650.
Alternatively, a simpler approach would be to just use the DISQUS api.
Comments are publicly accessible. You can just use the api key from the website:
https://disqus.com/api/3.0/threads/listPostsThreaded?limit=50&thread=7187962034&forum=pdc2018&order=popular&cursor=1%3A0%3A0&api_key=E8Uh5l5fHZ6gD8U3KycjAIAk46f68Zw7C6eW8WSjZvCLXebZ7p0r1yrYDrLilk2F
parameter
options
limit
Default to 50. Maximum is 100.
thread
Thread number. eg: 7187962034.
forum
Forum id. eg: pdc2018.
order
desc, asc, popular.
cursor
Probably the page number. Format is 1:0:0. eg: Page 2 would be 2:0:0.
api_key
The platform api key. Here the api key is E8Uh5l5fHZ6gD8U3KycjAIAk46f68Zw7C6eW8WSjZvCLXebZ7p0r1yrYDrLilk2F.
If you have to iterate through different pages you would need to intercept the xhr responses to retrieve the thread number.
It turned out the problem was that disqus comments were inside of an iframe
//needed to add those 2 lines
const elementHandle = await page.waitForSelector('iframe');
const frame = await elementHandle.contentFrame();
//and change 'page' to 'frame' below
let loadMoreVisible = await isElementVisible(frame, selectorForLoadMoreButton);
while (loadMoreVisible) {
console.log('load more visible');
await frame
.click(selectorForLoadMoreButton)
.catch(() => {});
await frame.waitForTimeout(4000);
loadMoreVisible = await isElementVisible(frame, selectorForLoadMoreButton);
}
After this change it works perfect
i wanted to scrape certain data from a mutual fund website where i can track only selective funds instead of all of them.
so i tried to puppeteer to scrape the dynamic table generated by the website. I manage to get the table but when i try to parse it to cheerio, seems like nothing happen
const scrapeImages = async (username) => {
console.log("test");
const browser = await puppeteer.launch({
args: ['--no-sandbox']
});
const page = await browser.newPage();
await page.goto('https://www.publicmutual.com.my/Our-Products/UT-Fund-Prices');
await page.waitFor(5000);
const data = await page.evaluate( () => {
const tds = Array.from(document.querySelectorAll('div.form-group:nth-child(4) > div:nth-child(1) > div:nth-child(1)'))
return tds.map(td => td.innerHTML)
});
await browser.close();
console.log(data);
let $ = cheerio.load(data);
$('table > tbody > tr > td').each((index, element) => {
console.log($(element).text());
});
};
scrapeImages("test");
ultimately i am not sure how can i do this directly with puppeteer only instead of directing to cheerio for the scraping and also i would like to scrape only selected funds for instance, if you visit the web here https://www.publicmutual.com.my/Our-Products/UT-Fund-Prices
i would like to get only funds from abbreviation
PAIF
PAGF
PCIF
instead of all of them. not sure how can i do this with only puppeteer?
That page has jQuery already which is even better than cheerio:
const rows = await page.evaluate( () => {
return $('.fundtable tr').get().map(tr => $(tr).find('td').get().map(td => $(td).text()))
}
I'm trying to click on a cookiewall on a webpage, but Puppeteer refuses to recognize the short selector with just the type and class selector (button.button-action). Changing this to the full CSS selector fixes the problem but isn't a viable solution since any chance in parent elements can break the selector. As far as I know this shouldn't be a problem because on the page in question using document.querySelector("button.button-action") also returns the element I'm trying to click.
The code that doesn't work:
const puppeteer = require('puppeteer');
const main = async () => {
const browser = await puppeteer.launch({headless: false,});
const page = await browser.newPage();
await page.goto("https://www.euclaim.nl/check-uw-vlucht#/problem", { waitUntil: 'networkidle2' });
const cookiewall = await page.waitForSelector("button.button-action", {visible: true});
await cookiewall.click();
};
main();
The code that does work:
const puppeteer = require('puppeteer');
const main = async () => {
const browser = await puppeteer.launch({headless: false,});
const page = await browser.newPage();
await page.goto("https://www.euclaim.nl/check-uw-vlucht#/problem", { waitUntil: 'networkidle2' });
const cookiewall = await page.waitForSelector("#InfoPopupContainer > div.ipBody > div > div > div.row.actionButtonContainer.mobileText > button", {visible: true});
await cookiewall.click();
};
main();
The problem is that you have three button.button-action there. And the first match is not visible.
One thing you could do is waitForSelector but without the visible bit (because it will check the first button).
And then iterate through all items checking which item is clickable.
await page.waitForSelector("button.button-action");
const actions = await page.$$("button.button-action");
for(let action of actions) {
if(await action.boundingBox()){
await action.click();
break;
}
}
I need to read data on https://www.cmegroup.com/tools-information/quikstrike/options-calendar.html
I tried to click on FX tab from page.click in puppeteer, but the page remains on the default.
Any help welcome
const puppeteer = require('puppeteer');
let scrape = async () => {
const browser = await puppeteer.launch({headless: false});
const page = await browser.newPage();
await page.goto('https://www.cmegroup.com/tools-information/quikstrike/options-calendar.html');
await page.waitFor(1000);
//div select FX
await page.click('#ctl00_MainContent_ucViewControl_IntegratedCMEOptionExpirationCalendar_ucViewControl_ucProductSelector_lvGroups_ctrl3_lbProductGroup');
//browser.close();
return result;
};
scrape().then((value) => {
console.log(value); // Success!
});
I couldn't find the element you're looking for on that page. However, this might be helpful:
Wait for the selector to appear on the page before clicking on it:
await page.waitForSelector(selector);
If still facing the issue, try using Javascript click method:
await page.$eval(selector, elem => elem.click());
i have been working in a web scraping code in node.js using the npm puppeteer to get the url, image and titles from each news in the page but the only thing i was able to get the url, image and title from the first news.
const puppeteer = require('puppeteer');
(async () => {
const brower = await puppeteer.launch();
const page = await brower.newPage();
const url = 'https://es.cointelegraph.com/category/latest';
await page.goto(url, { waitUntil: 'load' });
const datos = await page.evaluate(() => Array.from(document.querySelectorAll('.categories-page__list'))
.map( info => ({
titulo: info.querySelector('.post-preview-item-inline__title').innerText.trim(),
link: info.querySelector('.post-preview-item-inline__title-link').href,
imagen: info.querySelector('.post-preview-item-inline__figure .lazy-image__wrp img ').src
}))
)
console.log(datos);
await page.close();
await brower.close();
})()
Because there is just one .categories-page__list in the page while there are a lot of .post-preview-list-inline__item elements.
You map over an array returned from document.querySelectorAll('.categories-page__list') but the array has just one element, it's right that it run the map closure just once.
So, replace
document.querySelectorAll('.categories-page__list')
with
document.querySelectorAll('.post-preview-list-inline__item')
and everything works.
Here you can find a working example.
Let me know if you need some more help 😉