How to fix 'page.target is not a function' in puppeteer? - node.js

I'm trying to use the Devtools Protocol with Puppeteer, but it throws the following error:
TypeError: page.target is not a function
This is my code:
const puppeteer = require('puppeteer');
(async() => {
// Use Puppeteer to launch a browser and open a page.
const browser = await puppeteer.launch({headless: true});
const page = await browser.newPage();
// Create a raw DevTools protocol session to talk to the page.
const session = await page.target().createCDPSession();
await page.goto('https://www.google.com');
})();
Am I missing something?

Make sure you're using the latest version of Puppeteer.
"dependencies": {
"puppeteer": "latest"
}

Related

Puppeteer - Failed to launch the browser process! in Windows Service

I have a small javascript setup to convert html to pdf using the javascript library puppeteer.
Hosting the service by opening the command panel and starting node index.js everything works fine. The express-api hosts the service under the predefined port and requesting the service I get the converted PDF back.
Now, installing the javascript as Windows-Service by using the library node-windows and requesting the service, I get the following error message:
Failed to launch the browser process!
Now I'm not sure where to search for the root cause. Is it possible that this could be a permission issue?
Following my puppeteer javascript code :
const ValidationError = require('./../errors/ValidationError.js')
const puppeteer = require('puppeteer-core');
module.exports = class PdfService{
static async htmlToPdf(html){
if(!html){
throw new ValidationError("no html");
}
const browser = await puppeteer.launch({
headless: true,
executablePath: process.env.EDGE_PATH,
args: ["--no-sandbox"]
});
const page = await browser.newPage();
await page.setContent(html, {
waitUntil: "networkidle2"
});
const pdf = await page.pdf({format: 'A4',printBackground: true});
await browser.close();
return pdf;
}
}

How to download file with puppeteer if page doesn't send any request

I'm using puppeteer for downloading a file from the site. I have only an element which I click and file just downloads.
I googled info about downloading files with puppeteer but all of them are based on page.on('request', ...). But it doesn't work for me because page doesn't send any request
page.on('request', arg => {
console.log(arg.url())
})
And in the terminal I have only "https://some-site/images/csv.gif" but it's only .gif. How does this file downloads at all? If browser doesn't do any request does it mean that this file is already on the client site?
Try (replace the variables on top fit to your needs):
const puppeteer = require('puppeteer');
const url = 'your.url.com';
const buttonElementId = '#downloadButton';
(async () => {
const browser = await puppeteer.launch({headless: false});
const page = await browser.newPage();
await page.goto(url);
await page.click(buttonElementId);
await browser.close();
})();

Puppeteer: Failed to launch the browser process! spawn

When I try to run node app.js, I get error:
the message is Failed to launch the browser process! spawn
/Users/iliebogdanbarbulescu/Downloads/firstProject/node_modules/chromium/lib/chromium/chrome-mac/Chromium.app
EACCES
What I did
I checked the folder at /Users/iliebogdanbarbulescu/Downloads/firstProject/node_modules/chromium/lib/chromium/chrome-mac/Chromium.app and the file is not zipped. It can be run.
Note:
If I try to execute without the path, it works, but
I would like to use either Chrome or Chromium to open a new page.
const browser = await puppeteer.launch({headless:false'});
const express = require('express');
const puppeteer = require('puppeteer');
const app = express();
(async () => {
const browser = await puppeteer.launch({headless:false, executablePath:'/Users/iliebogdanbarbulescu/Downloads/firstProject/node_modules/chromium/lib/chromium/chrome-mac/Chromium.app'});
const page = await browser.newPage();
await page.goto('https://google.com', {waitUntil: 'networkidle2'});
})().catch((error) =>{
console.error("the message is " + error.message);
});
app.listen(3000, function (){
console.log('server started');
})
If you navigate to chrome://version/ page in this exact browser, it will show the Executable Path which is the exact string you need to use as executablePath puppeteer launch option.
Usually, chrome's path looks like this on MAC:
/Applications/Google Chrome.app/Contents/MacOS/Google Chrome
Or something like this if chromium is located in your node_modules folder:
/Users/iliebogdanbarbulescu/Downloads/firstProject/node_modules/chromium/lib/chromium/chrome-mac/Chromium.app/Contents/MacOS/Chromium
Now if you compare the string you used for executablePath: it differs from the one retrieved with the method mentioned above. Exactly the /Contents/MacOS/Chromium should be added to the end of the current path to make it work.
Note: the chromium bundled with puppeteer is the version guaranteed to work together with the actual pptr version: if you plan to use other chrome/or chromium-based browsers you might experience unexpected issues.
Following up on #theDavidBarton:
Chromium which was shipped with Puppeteer did not work, but the Chrome installation on my MacBook did work.
OS: OS-X 10.15.7 (Catalina)
Node version: v14.5.0
Failed code:
const browser = await puppeteer.launch({
headless: true,
executablePath: "/users/bert/Project/NodeJS/PuppeteerTest/node_modules/puppeteer/.local-chromium/mac-818858/chrome-mac/Chromium.app/Contents/MacOS/Chromium"
});
Successful code:
const browser = await puppeteer.launch({
headless: true,
executablePath: "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"
});
Full code, Just the first example on the Puppeteer website:
const puppeteer = require('puppeteer');
(async () => {
try {
const browser = await puppeteer.launch({headless: true, executablePath: "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"});
const page = await browser.newPage();
await page.goto('https://example.com');
await page.screenshot({path: 'example.png'});
await browser.close();
} catch (err) {
console.log(err);
}
})();
And, yes, I got the Screenshot !! :-)
Using location-chrome: https://www.npmjs.com/package/locate-chrome
const locateChrome = require('locate-chrome');
const executablePath = await new Promise(resolve => locateChrome(arg => resolve(arg)));
const browser = await puppeteer.launch({ executablePath });

How to catch a download with playwright?

I'm trying to download a file from a website using Playwright. The button that triggers the download does some js and then the download starts.
Clicking the button using the .click function triggers a download but it shows an error: Failed - Download error.
I've tried using the devtools protocol Page.setDownloadBehavior, but this doesn't seem to do anything.
const playwright = require("playwright");
const { /*chromium,*/ devices } = require("playwright");
const iPhone = devices["iPad (gen 7) landscape"];
(async () => {
const my_chromium = playwright["chromium"];
const browser = await my_chromium.launch({ headless: false });
const context = await browser.newContext({
viewport: iPhone.viewport,
userAgent: iPhone.userAgent
});
const page = await context.newPage();
const client = await browser.pageTarget(page).createCDPSession();
console.log(client);
await client.send("Page.setDownloadBehavior", {
behavior: "allow",
downloadPath: "C:/in"
});
//...and so on
await page.click("#download-button");
browser.close();
})();
Full file here
There is a proposal for a better download api in Playwright, but I can't find the current API.
There was a suggestion that something to do with the downloadWillBegin event would be useful, but I've no idea how to access that from Playwright.
I'm open to the suggestion that I should use Puppeteer instead, but I moved to playwright because I couldn't work out how to download a file with Pupeteer either, and the issue related to it suggested that the whole team had moved to Playwright.
Take a look at the page.on("download")
const browser = await playwright.chromium.launch({});
const context = await browser.newContext({ acceptDownloads: true });
const page = await context.newPage();
await page.goto("https://somedownloadpage.weburl");
await page.type("#password", password);
await page.click("text=Continue");
const download = await page.waitForEvent("download");
console.log("file downloaded to", await download.path());
Embarassingly, I was closing the browser before the download had started.
It turns out that the download error was caused by the client section. However that means that I have no control over where the file is saved.
The download works when headless: false but not when headless: true.
If anyone has a better answer, that'd be great!
You can use waitForTimeout.
I tried with {headless: true} & await page.waitForTimeout(1000);
it's working fine. you can check same here
To download file (also its buffer) i highly recomend this module: Got node module. Its much easier, clean and light.
(async () => {
const response = await got('https://sindresorhus.com')
.on('downloadProgress', progress => {
// Report download progress
})
.on('uploadProgress', progress => {
// Report upload progress
});
console.log(response);
})();

Open Puppeteer with specific configuration (download PDF instead of PDF viewer)

I would like to open Chromium with a specific configuration.
I am looking for the configuration to activate the following option :
Settings => Site Settings => Permissions => PDF documents => "Download PDF files instead of automatically openning them in Chrome"
I searched the tags on this command line switch page but the only parameter that deals with pdf is --print-to-pdf which does not correspond to my need.
Do you have any ideas?
There is no option you can pass into Puppeteer to force PDF downloads. However, you can use chrome-devtools-protocol to add a content-disposition: attachment response header to force downloads.
A visual flow of what you need to do:
I'll include a full example code below. In the example below, PDF files and XML files will be downloaded in headful mode.
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({
headless: false,
defaultViewport: null,
});
const page = await browser.newPage();
const client = await page.target().createCDPSession();
await client.send('Fetch.enable', {
patterns: [
{
urlPattern: '*',
requestStage: 'Response',
},
],
});
await client.on('Fetch.requestPaused', async (reqEvent) => {
const { requestId } = reqEvent;
let responseHeaders = reqEvent.responseHeaders || [];
let contentType = '';
for (let elements of responseHeaders) {
if (elements.name.toLowerCase() === 'content-type') {
contentType = elements.value;
}
}
if (contentType.endsWith('pdf') || contentType.endsWith('xml')) {
responseHeaders.push({
name: 'content-disposition',
value: 'attachment',
});
const responseObj = await client.send('Fetch.getResponseBody', {
requestId,
});
await client.send('Fetch.fulfillRequest', {
requestId,
responseCode: 200,
responseHeaders,
body: responseObj.body,
});
} else {
await client.send('Fetch.continueRequest', { requestId });
}
});
await page.goto('https://pdf-xml-download-test.vercel.app/');
await page.waitFor(100000);
await client.send('Fetch.disable');
await browser.close();
})();
For a more detailed explanation, please refer to the Git repo I've setup with comments. It also includes an example code for playwright.
Puppeteer currently does not support navigating (or downloading) PDFs
in headless mode that easily. Quote from the docs for the page.goto function:
NOTE Headless mode doesn't support navigation to a PDF document. See the upstream issue.
What you can do though, is detect if the browser is navigating to the PDF file and then download it yourself via Node.js.
Code sample
const puppeteer = require('puppeteer');
const http = require('http');
const fs = require('fs');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
page.on('request', req => {
if (req.url() === '...') {
const file = fs.createWriteStream('./file.pdf');
http.get(req.url(), response => response.pipe(file));
}
});
await page.goto('...');
await browser.close();
})();
This navigates to a URL and monitors the ongoing requests. If the "matched request" is found, Node.js will manually download the file via http.get and pipe it into file.pdf. Please be aware that this is a minimal working example. You want to catch errors when downloading and might also want to use something more sophisticated then http.get depending on the situation.
Future note
In the future, there might be an easier way to do it. When puppeteer will support response interception, you will be able to simply force the browser to download a document, but right now this is not supported (May 2019).

Resources