How to download pdf file that opens in new tab with puppeteer? - node.js

I am trying to download invoice from website using puppeteer, I just started to learn puppeteer. I am using node to create and execute the code. I have managed to login and navigate to the invoice page, but it opens in new tab, so, code is not detecting it since its not the active tab. This is the code I used:
const puppeteer = require('puppeteer')
const SECRET_EMAIL = 'emailid'
const SECRET_PASSWORD = 'password'
const main = async () => {
const browser = await puppeteer.launch({
headless: false,
})
const page = await browser.newPage()
await page.goto('https://my.apify.com/sign-in', { waitUntil: 'networkidle2' })
await page.waitForSelector('div.sign_shared__SignForm-sc-1jf30gt-2.kFKpB')
await page.type('input#email', SECRET_EMAIL)
await page.type('input#password', SECRET_PASSWORD)
await page.click('input[type="submit"]')
await page.waitForSelector('#logged-user')
await page.goto('https://my.apify.com/billing#/invoices', { waitUntil: 'networkidle2' })
await page.waitForSelector('#reactive-table-1')
await page.click('#reactive-table-1 > tbody > tr:nth-child(1) > td.number > a')
const newPagePromise = new Promise(x => browser.once('targetcreated', target => x(target.page())))
const page2 = await newPagePromise
await page2.bringToFront()
await page2.screenshot({ path: 'apify1.png' })
//await browser.close()
}
main()
In the above code I am just trying to take screenshot. Can anyone help me?

Here is an example of a work-around for the chromium issue mentioned in the comments above. Adapt to fit your specific needs and use-case. Basically, you need to capture the new page (target) and then do whatever you need to do to download the file, possibly pass it as a buffer to Node as per the example below if no other means work for you (including a direct request to the download location via fetch or ideally some request library on the back-end)
const [PDF_page] = await Promise.all([
browser
.waitForTarget(target => target.url().includes('my.apify.com/account/invoices/' && target).then(target => target.page()),
ATT_page.click('#reactive-table-1 > tbody > tr:nth-child(1) > td.number > a'),
]);
const asyncRes = PDF_page.waitForResponse(response =>
response
.request()
.url()
.includes('my.apify.com/account/invoices'));
await PDF_page.reload();
const res = await asyncRes;
const url = res.url();
const headers = res.headers();
if (!headers['content-type'].includes('application/pdf')) {
await PDF_page.close();
return null;
}
const options = {
// target request options
};
const pdfAb = await PDF_page.evaluate(
async (url, options) => {
function bufferToBase64(buffer) {
return btoa(
new Uint8Array(buffer).reduce((data, byte) => {
return data + String.fromCharCode(byte);
}, ''),
);
}
return await fetch(url, options)
.then(response => response.arrayBuffer())
.then(arrayBuffer => bufferToBase64(arrayBuffer));
},
url,
options,
);
const pdf = Buffer.from(pdfAb, 'base64');
await PDF_page.close();

Related

Can't click link using puppeteer - Thingiverse

I'm trying to automate away downloading multiple files on thingiverse. I choose an object at random. But I'm having a hard time locating the link I need, clicking and then downloading. Has someone run into this before can I get some help?
I've tried several other variations.
import puppeteer from 'puppeteer';
async function main() {
const browser = await puppeteer.launch({
headless: true,
});
const page = await browser.newPage();
const response = await page.goto('https://www.thingiverse.com/thing:2033856/files');
const buttons = await page.$x(`//a[contains(text(), 'Download')]`);
if(buttons.length > 0){
console.log(buttons.length);
} else {
console.log('no buttons');
}
await wait(5000);
await browser.close();
return 'Finish';
}
async function wait(time: number) {
return new Promise(function (resolve) {
setTimeout(resolve, time);
});
}
function start() {
main()
.then((test) => console.log('DONE'))
.catch((reason) => console.log('Error: ', reason));
}
start();
Download Page
Code
I was able to get it to work.
The selector is: a[class^="ThingFile__download"]
Puppeteer is: const puppeteer = require('puppeteer-extra');
Before the await page.goto() I always recommend setting the viewport:
await page.setViewport({width: 1920, height: 720});
After that is set, change the await page.goto() to have a waitUntil option:
const response = await page.goto('https://www.thingiverse.com/thing:2033856/files', { waitUntil: 'networkidle0' }); // wait until page load
Next, this is a very important part. You have to do what is called waitForSelector() or waitForFunction().
I added both of these lines of code after the const response:
await page.waitForSelector('a[class^="ThingFile__download"]', {visible: true})
await page.waitForFunction("document.querySelector('a[class^=\"ThingFile__download\"]') && document.querySelector('a[class^=\"ThingFile__download\"]').clientHeight != 0");
Next, get the buttons. For my testing I just grabbed the button href.
const buttons = await page.$eval('a[class^="ThingFile__download"]', anchor => anchor.getAttribute('href'));
Lastly, do not check the .length of this variable. In this case we are just returning the href value which is a string. You will get a Promise of an ElementHandle when you try getting just the button:
const button = await page.$('a[class^="ThingFile__download"]');
console.log(button)
if (button) { ... }
Now if you change that page.$ to be page.$$, you will be getting a Promise of an Array<ElementHandle>, and will be able to use .length there.
const buttonsAll = await page.$$('a[class^="ThingFile__download"]');
console.log(buttonsAll)
if (buttons.length > 0) { ... }
Hopefully this helps, and if you can't figure it out I can post my full source later if I have time to make it look better.

node - How to deploy script to Google Cloud functions

I need to deploy a node script to cloud functions. I've created a new function and pasted the code inside the editor, all the necessary dependencies are listed into the package.json and I've also created an .env file that will hold some informations.
The script initially was made to be executed from cli but now that I need to deploy it I'm not sure if it will work, it's a telegram bot script and I don't know if I need to create any action to be sure that it will be executed when the bot receive a commend from the users.
Anyone with more experience on gcloud can explain to me if my deployment is made correctly?Do I need to use cloud shell or I can just copy/paste my code into a new function and it will run?
This is the code I have at the moment, but it will be not executed in cloud functions
const process = require('process');
const puppeteer = require('puppeteer');
const TelegramBot = require('node-telegram-bot-api');
const bot = new TelegramBot(process.env.TELEGRAM_BOT_TOKEN, { polling: true });
const webHookURL = 'https://example.com/'
let baseURL = 'https://example.com/events.php';
let chatID;
let username;
let eventsList = [];
puppeteer.launch({
headless: true
}).then( async (browser) => {
const page = await browser.newPage()
const navigationPromise = page.waitForNavigation()
await page.goto(baseURL, {
waitUntil: 'networkidle0'
})
await page.setViewport({ width: 1280, height: 607 })
await page.waitForSelector('.sc-VigVT > #qc-cmp2-ui > .qc-cmp2-footer > .qc-cmp2-summary-buttons > .ljEJIv')
await page.click('.sc-VigVT > #qc-cmp2-ui > .qc-cmp2-footer > .qc-cmp2-summary-buttons > .ljEJIv')
await navigationPromise
await page.waitForSelector('.container > .row > .results-item > .kode_ticket_wraper > .container')
await navigationPromise
await page.waitForSelector('.container > .kode_ticket_list > li')
eventsList = await page.$$eval('ul.kode_ticket_list > li', el => {
return el.map( e => {
let url = e.querySelector('.ticket_btn > a').getAttribute('href')
return {
type: e.querySelector('.kode_ticket_text > h6').innerHTML,
time: e.querySelector('.kode_ticket_text > p').innerHTML,
url: url,
name: new URL(url).pathname.split('-').join(' ').slice(6)
}
})
})
await browser.close()
})
// I need to set the webhhok, I've seen the documentations of the library I'm using but seems not working
bot.setWebHook(`${webHookURL}/bot${process.env.TELEGRAM_BOT_TOKEN}`)
bot.onText(/\/dirette/, (msg, match) => {
username = msg.chat.first_name
chatID = msg.chat.id
let response = `Hi ${username}! Here is a list of today live events:\n\n`
eventsList.map( item => {
response += `<b>${item.type}</b>\n`
response += `<b>${item.time}</b>\n`
response += `<a href='${item.url}'>${item.name.toUpperCase()}</a>\n\n`
})
bot.sendMessage(chatID, response, { parse_mode: 'HTML', disable_web_page_preview: true })
})

puppeteer Error: No node found for selector: .olp-text-box

I'm trying to get the number of sellers (which are selling only NEW items) on the Amazon product page using puppeteer.
for some reason, I'm getting error in the first button '.olp-text-box'.
Any ideas?
here is my code
const pupUrl = 'https://www.amazon.com/dp/' + req.body.asinIdInput;
async function configureBrowser(){
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(pupUrl , {waitUntil: 'load', timeout: 0});
await page.click('.olp-text-box');
await page.click('#aod-filter-string');
await page.click('.a-size-base.a-color-base')
return page;
}
async function checkSellers(page){
await page.reload();
let html = await page.evaluate(()=> document.body.innerHTML);
$('#aod-filter-offer-count-string',html).each(function(){
var numberOfSellers = $(this).text();
console.log(numberOfSellers);
});
}
async function monitor(){
let page = await configureBrowser();
await checkSellers(page);
}
monitor();

How to intercept downloads of blob generated in client side of website through puppeteer?

I have a page on this link (https://master.d3tei1upkyr9mb.amplifyapp.com/report) with 3 export buttons.
These export buttons generate XLSX, CSV, PDF on the frontend, and hence there are no URLs for XLSX, CSV, PDF.
I need puppeteer to be able to download or get or intercept the blobs or buffers of these files in my node backend.
I tried different ways to achieve this but still haven't figured out.
It was possible through playwright library through the code written below. But I need to be able to do it with Puppeteer.
const {chromium} = require('playwright');
const fs = require('fs');
(async () => {
const browser = await chromium.launch();
const context = await browser.newContext({acceptDownloads: true});
const page = await context.newPage();
await page.goto('http://localhost:3000/');
const [ download ] = await Promise.all([
page.waitForEvent('download'), // <-- start waiting for the download
page.click('button#expoXLSX') // <-- perform the action that directly or indirectly initiates it.
]);
const path = await download.path();
console.log(path);
const newFile = await fs.readFileSync(path);
console.log(newFile);
fs.writeFile("test.xlsx", newFile, "binary",function(err) {
if(err) {
console.log(err);
} else {
console.log("The file was saved!");
}
});
await browser.close()
})();
Is there any way?
Any reason not to simulate the click on the frontend and allow puppeteer download the file to the location of your choice? You can easily download the file this way with the following:
Edit: You can determine when the file download completes by listening to the Page.downloadProgress event and checking for the completed state. Getting the actual filename saved to disk isn't 100% guaranteed with this method, but you are able to get what is termed the suggestedFileName from the Page.downloadWillBegin event, which in my tests thus far (at least on the example page in the question) does match the filename persisted to disk.
const puppeteer = require('puppeteer');
const path = require('path');
const downloadPath = path.resolve('./download');
(async ()=> {
let fileName;
const browser = await puppeteer.launch({
headless: false
});
const page = await browser.newPage();
await page.goto(
'https://master.d3tei1upkyr9mb.amplifyapp.com/report',
{ waitUntil: 'networkidle2' }
);
await page._client.send('Page.setDownloadBehavior', {
behavior: 'allow',
downloadPath: downloadPath
});
await page._client.on('Page.downloadWillBegin', ({ url, suggestedFilename }) => {
console.log('download beginning,', url, suggestedFilename);
fileName = suggestedFilename;
});
await page._client.on('Page.downloadProgress', ({ state }) => {
if (state === 'completed') {
console.log('download completed. File location: ', downloadPath + '/' + fileName);
}
});
await page.click('button#expoPDF');
})();

Why am I not able to navigate through iFrames using Apify/Puppeteer?

I'm trying to manipulate forms of sites w/ iFrames in it using Puppeteer. I tried different ways to reach a specific iFrame, or even to count iFrames in a website, with no success.
Why isn't Puppeteer's object recognizing the iFrames / child frames of the page I'm trying to navigate through?
It's happening with other pages as well, such as https://www.veiculos.itau.com.br/simulacao
const Apify = require('apify');
const sleep = require('sleep-promise');
Apify.main(async () => {
// Launch the web browser.
const browser = await Apify.launchPuppeteer();
// Create and navigate new page
console.log('Open target page');
const page = await browser.newPage();
await page.goto('https://www.credlineitau.com.br/');
await sleep(15 * 1000);
for (const frame in page.mainFrame().childFrames()) {
console.log('test');
}
await browser.close();
});
Perhaps you'll find some helpful inspiration below.
const waitForIframeContent = async (page, frameSelector, contentSelector) => {
await page.waitForFunction((frameSelector, contentSelector) => {
const frame = document.querySelector(frameSelector);
const node = frame.contentDocument.querySelector(contentSelector);
return node && node.innerText;
}, {
timeout: TIMEOUTS.ten,
}, frameSelector, contentSelector);
};
const $frame = await waitForSelector(page, SELECTORS.frame.iframeNode).catch(() => null);
if ($frame) {
const frame = page.frames().find(frame => frame.name() === 'content-iframe');
const $cancelStatus = await waitForSelector(frame, SELECTORS.frame.membership.cancelStatus).catch(() => null);
await waitForIframeContent(page, SELECTORS.frame.iframeNode, SELECTORS.frame.membership.cancelStatus);
}
Give it a shot.

Resources