This is driving me insane... I have the following code:
// Load a PUG template
const template = await loadTemplateRoute(pdfProps.layout);
// Generate HTML
const html = template(pdfProps);
// requirement for puppeteer to work locally, if using locally
const executablePath = process.env.EXECUTABLE_PATH || await chromium.executablePath;
console.log('executable path', executablePath);
// These are needed to run on WSL
chromium.args.push(['--disable-gpu', '--single-process']);
console.log('1');
const browser = await puppeteer.launch({
args: chromium.args,
defaultViewport: chromium.defaultViewport,
executablePath,
headless: true,
ignoreHTTPSErrors: true
});
console.log('2');
const page = await browser.newPage();
console.log('3');
// eslint-disable-next-line quote-props
await page.setContent(html, { 'waitUntil': 'networkidle2' });
console.log('4');
// here we can insert customizable features in the future using JSONB stored formats
const pdf = await page.pdf({
format: 'A4',
printBackground: true,
margin: {
top: '1cm',
right: '1cm',
bottom: '1cm',
left: '1cm'
}
});
await page.close();
console.log('5');
await browser.close();
console.log('6');
return pdf;
Running this gives me the PDF I want, but only like once in ten times. The other times I get either this after console.log('4'):
Protocol error (IO.read): Target closed.
at .../node_modules/puppeteer-core/lib/Connection.js:183:56
at new Promise (<anonymous>)
at CDPSession.send (.../node_modules/puppeteer-core/lib/Connection.js:182:12)
at Function.readProtocolStream (.../node_modules/puppeteer-core/lib/helper.js:254:37)
at processTicksAndRejections (internal/process/task_queues.js:94:5)
at Page.pdf (.../node_modules/puppeteer-core/lib/Page.js:1021:12)
Or other times (more seldom) this after console.log('3'):
Navigation failed because browser has disconnected!
at CDPSession.<anonymous> (.../node_modules/puppeteer-core/lib/LifecycleWatcher.js:46:107)
at CDPSession.emit (events.js:223:5)
at CDPSession.EventEmitter.emit (domain.js:475:20)
at CDPSession._onClosed (.../node_modules/puppeteer-core/lib/Connection.js:215:10)
at Connection._onClose (.../node_modules/puppeteer-core/lib/Connection.js:138:15)
at WebSocket.<anonymous> (.../node_modules/puppeteer-core/lib/WebSocketTransport.js:48:22)
at WebSocket.onClose (.../node_modules/puppeteer-core/node_modules/ws/lib/event-target.js:124:16)
at WebSocket.emit (events.js:223:5)
at WebSocket.EventEmitter.emit (domain.js:475:20)
at WebSocket.emitClose (.../node_modules/puppeteer-core/node_modules/ws/lib/websocket.js:191:10)
at Socket.socketOnClose (.../node_modules/puppeteer-core/node_modules/ws/lib/websocket.js:850:15)
at Socket.emit (events.js:223:5)
at Socket.EventEmitter.emit (domain.js:475:20)
at TCP.<anonymous> (net.js:664:12)
I run this on a WSL Ubuntu but running it on Mac gives errors too (but less frequently).
It seems to be working better if I wait like 5 minutes between tries but listing processes (ps -ef) shows nothing running/hanging...
EDIT: Logging out what's happening in /node_modules/puppeteer-core/lib/Connection.js:182:56 gives:
send(); Page.printToPDF {
transferMode: 'ReturnAsStream',
landscape: false,
displayHeaderFooter: false,
headerTemplate: '',
footerTemplate: '',
printBackground: true,
scale: 1,
paperWidth: 8.27,
paperHeight: 11.7,
marginTop: 0.39375,
marginBottom: 0.39375,
marginLeft: 0.39375,
marginRight: 0.39375,
pageRanges: '',
preferCSSPageSize: false
}
send(); IO.read { handle: '1' }
send(); IO.read { handle: '1' }
The Page.printToPDF works fine, the first IO.read also is working while the second IO.read throws the error...
After trying a bunch of things I started suspecting external sources as it worked fine with simple templates.
Reworking the templates to not load any external CSS and instead place all CSS in <style> tags and "pre-parse" all images to base64 (<img src="data:image/png;MIIlkaa3498asm..." />) it is no longer happening...
Clearly some load of resources that is messing with me...
I had the same issue and resolved it by
replacing page.pdf() with page.createPDFStream() (docs here)
adding a 60 seconds timeout to page.createPDFStream() and page.setContent (defaults to 30 sec)
waiting for ['load', 'domcontentloaded'] events to be fired
Example
await page.setContent(html, {
timeout: 60000,
waitUntil: ['load', 'domcontentloaded'],
});
await page.emulateMediaType('screen');
// const pdf = await page.pdf({
const pdfStream = await page.createPDFStream({
timeout: 60000,
format: 'A4',
margin: { top: '0.5cm', right: '1cm', bottom: '0.8cm', left: '1cm' },
printBackground: true,
});
// ...
// do something with the PDF stream, e.g. save to file
pdfStream
.on('end', () => console.log('pdfStream done'))
.on('error', (err) => console.log(err))
.pipe(fs.createWriteStream('my-form.pdf'));
Related
I have use puppeteer with http proxy
This is the config for puppeteer:
let config = {
userDataDir: `./puppeteer-cache/dev_chrome_profile_${hash}`,
headless: false,
args: [
`--proxy-server=${newProxyUrl}`,
'--ignore-certificate-errors',
'--disable-web-security',
'--disable-features=IsolateOrigins,site-per-process,BlockInsecurePrivateNetworkRequests',
'--disable-site-isolation-trials'
],
defaultViewport: null,
ignoreHTTPSErrors: true
}
Sometimes I have an issue:
This site can’t be reached. The webpage at https://some.site.com might be temporarily down or it may have moved permanently to a new web address.
ERR_SSL_BAD_RECORD_MAC_ALERT
when I try page.goto('https://some.site.com').
const page = await browser.newPage();
if (proxy && proxy.login && proxy.pass) {
await page.authenticate({
username: proxy.login,
password: proxy.pass,
});
}
try {
console.log('POINT 13');
await page.goto(films[i].url);
console.log('POINT 14');
return;
} catch (e) {
console.log('POINT 15');
return;
}
I see POINT 13 in my console, but neither POINT 14 nor POINT 15. The script like slept freezes between points 13 and 14, on page.goto()...
I have tried to change timeout for page.goto() function but it's not work.
I am having trouble getting my electron app to allow for scrolling. Please can someone give me some help? Here is my code below. Sorry I am still fairly new to electron and trying to get my head around it. The problem shows as the content in the browser window extends past the end of the window and I have no facility to scroll down using ether the trackpad or arrow keys.
function createWindow () {
server.run();
// Create the browser window.
mainWindow = new BrowserWindow({
width: 1024,
height: 768,
scrollBounce: false,
webPreferences: {
nodeIntegration: true,
contextIsolation: false,
zoomFactor: 0.8,
enableRemoteModule: false
},
frame: true,
minWidth: 1024,
minHeight: 768
});
// and load the index.html of the app.
mainWindow.loadURL('http://'+server.host+':'+server.port+'/')
/*
mainWindow.loadURL(url.format({
pathname: path.join(__dirname, 'index.php'),
protocol: 'file:',
slashes: true
}))
*/
const {shell} = require('electron')
shell.showItemInFolder('fullPath')
// Open the DevTools.
// mainWindow.webContents.openDevTools()
// Emitted when the window is closed.
mainWindow.on('closed', function () {
// Dereference the window object, usually you would store windows
// in an array if your app supports multi windows, this is the time
// when you should delete the corresponding element.
// PHP SERVER QUIT
server.close();
app.quit();
mainWindow = null;
})
}
I want to create a function to create a pdf from a HTML file and trigger it from AWS Lambda. I have put the function inside a docker image and deployed the docker image to AWS Lambda. It is working fine when I run the docker locally, but when I deploy it to AWS Lambda, it does not work.
My app.js which is responsible for creating the pdf file:
const fs = require('fs');
const puppeteer = require('puppeteer');
const path = require('path');
const chromium = require('chrome-aws-lambda');
const buildPaths = {
buildPathHtml: path.resolve('./index.html'),
buildPathPdf: path.resolve('./my-file.pdf')
};
const { buildPathHtml, buildPathPdf } = buildPaths;
const isFileExist = (filePath) => {
try {
fs.statSync(filePath);
return true;
} catch (error) {
return false;
}
};
const printPdf = async () => {
if (!isFileExist(buildPathHtml)) {
throw new Error('HTML file does not exist.');
}
console.log('Creating browser...');
browser = await chromium.puppeteer.launch({
args: ['--no-sandbox', '--disable-dev-shm-usage', '--disable-web-security'],
executablePath: await chromium.executablePath,
headless: true,
});
console.log('Browser created. Creating page...');
const page = await browser.newPage();
console.log('Page created. Going to html file path...');
await page.goto(`file:${buildPathHtml}`, { waitUntil: 'networkidle0' });
console.log('Went to html file path. Creating pdf...');
const pdf = await page.pdf({
format: 'A4',
printBackground: true,
landscape: true,
displayHeaderFooter: true,
headerTemplate: ``,
footerTemplate: `
<div style="font-size: 10px; font-family: 'Raleway'; font-weight: bold; width: 1000px; text-align: center; color: grey; padding-left: 10px;">
<span>Page: </span>
<span class="pageNumber"></span> / <span class="totalPages"></span>
<span> - Date: </span>
<span class="date"></span>
</div>`,
margin: {
top: '20px',
right: '20px',
bottom: '40px',
left: '20px'
}
});
console.log('Pdf created. Closing the browser...');
await browser.close();
console.log('Succesfully created the PDF');
fs.writeFileSync(buildPathPdf, pdf);
return {
headers: {
"Content-type": "application/pdf",
},
statusCode: 200,
body: pdf.toString("base64"),
isBase64Encoded: true,
};
};
exports.lambdaHandler = async (event) => {
try {
console.log('Handler invoked. Event: ', event);
const response = await printPdf();
console.log('Response generated. response: ', response);
return response;
} catch (error) {
console.log('Error generating PDF', error);
}
}
exports.lambdaHandler();
This is my Dockerfile:
FROM amazon/aws-lambda-nodejs:12
COPY google-chrome.repo /etc/yum.repos.d/
RUN yum -y update google-chrome-stable
RUN yum -y install google-chrome-stable
COPY . ./
RUN npm ci --quiet
CMD [ "app.lambdaHandler" ]
I build the docker with this command:
docker build -t my-function .
Then I run it like this:
docker run -p 9000:8080 my-function:latest
Now to test it locally I run this command:
curl -XPOST "http://localhost:9000/2015-03-31/functions/function/invocations" -d '{}'
And it works fine. But when I deploy it to AWS Lambda and invoke a test from there I get this error:
START RequestId: 3de1cb55-eee5-4c82-a3c4-4dabf7eb9570 Version: $LATEST
2021-03-18T22:06:30.748Z 3de1cb55-eee5-4c82-a3c4-4dabf7eb9570 INFO Handler invoked. Event: {}
2021-03-18T22:06:30.750Z 3de1cb55-eee5-4c82-a3c4-4dabf7eb9570 INFO Creating browser...
2021-03-18T22:06:36.269Z 3de1cb55-eee5-4c82-a3c4-4dabf7eb9570 INFO Error generating PDF Error: Protocol error (Target.setDiscoverTargets): Target closed.
at /var/task/node_modules/puppeteer/lib/cjs/puppeteer/common/Connection.js:71:63
at new Promise (<anonymous>)
at Connection.send (/var/task/node_modules/puppeteer/lib/cjs/puppeteer/common/Connection.js:70:16)
at Function.create (/var/task/node_modules/puppeteer/lib/cjs/puppeteer/common/Browser.js:116:26)
at ChromeLauncher.launch (/var/task/node_modules/puppeteer/lib/cjs/puppeteer/node/Launcher.js:101:56)
at processTicksAndRejections (internal/process/task_queues.js:97:5)
at async printPdf (/var/task/app.js:27:12)
at async Runtime.exports.lambdaHandler [as handler] (/var/task/app.js:75:20)
END RequestId: 3de1cb55-eee5-4c82-a3c4-4dabf7eb9570
REPORT RequestId: 3de1cb55-eee5-4c82-a3c4-4dabf7eb9570 Duration: 5540.02 ms Billed Duration: 6366 ms Memory Size: 512 MB Max Memory Used: 377 MB Init Duration: 825.77 ms
It seems that google-chrome-stable has not been successfully installed in AWS Lambda environment. Any ideas how to fix it?
I have a Firebase function to create a PDF file. Lately, it times out due to a "Chrome revision"? Neither do I understand the error message, nor do I understand what is wrong. The function works, when I deploy it locally under MacOS.
TimeoutError: Timed out after 30000 ms while trying to connect to the browser! Only Chrome at revision r818858 is guaranteed to work.
at Timeout.onTimeout (/workspace/node_modules/puppeteer/lib/cjs/puppeteer/node/BrowserRunner.js:204:20)
at listOnTimeout (internal/timers.js:549:17)
at processTimers (internal/timers.js:492:7)
The function:
const puppeteer = require('puppeteer');
const createPDF = async (html, outputPath) => {
let pdf;
try {
const browser = await puppeteer.launch({
args: ['--no-sandbox']
});
const page = await browser.newPage();
await page.emulateMediaType('screen');
await page.setContent(html, {
waitUntil: 'networkidle0'
});
pdf = await page.pdf({
// path: outputPath,
format: 'A4',
printBackground: true,
margin: {
top: "50px",
bottom: "50px"
}
});
await browser.close();
} catch (e) {
console.error(e);
}
return pdf;
};
TimeoutError: Timed out after 30000 ms while trying to connect to the browser!
The aforementioned error is coming from the fact that as mentioned in the documentation:
When you install Puppeteer, it downloads a recent version of Chromium
Everytime you're executing Puppeteer you're running a Chromium in the backend to which Puppeteer will try to connect, hence when it can't connect to the browser this errors raises.
After doing multiple test I was able to execute the Cloud Function by adding the parameter headless on the launch option, since the documentation mentions that it should be true by default I don't quite understand why setting it manually allows the Cloud Function to finish correctly.
At the beginning I was trying with the timeout set to 0 to disable the error due to timeout, however it seems that it's not required, since by only adding headless it finished correctly, but if you find the same problem with the timeouts you can add it.
At the end my code looks like this:
const createPDF = async (html, outputPath) => {
let pdf;
try {
const browser = await puppeteer.launch({
args: ['--no-sandbox'],
headless: true,
timeout: 0
});
const page = await browser.newPage();
await page.emulateMediaType('screen');
await page.setContent(html, {
waitUntil: 'networkidle0'
});
pdf = await page.pdf({
// path: outputPath,
format: 'A4',
printBackground: true,
margin: {
top: "50px",
bottom: "50px"
}
});
await browser.close();
console.log("Download finished"); //Added this to debug that it finishes correctly
} catch (e) {
console.error(e);
}
return pdf;
};
I am using puppeteer on Google App Engine with Node.JS
whenever I run puppeteer on app engine, I encounter an error saying
Navigation failed because browser has disconnected!
This works fine in local environment, so I am guessing it is a problem with app engine.
const browser = await puppeteer.launch({
ignoreHTTPSErrors: true,
headless: true,
args: ["--disable-setuid-sandbox", "--no-sandbox"],
});
This is my app engine's app.yaml file
runtime: nodejs12
env: standard
handlers:
- url: /.*
secure: always
script: auto
-- EDIT--
It works when I add --disable-dev-shm-usage argument, but then it always timeouts. Here are my codes.
const browser = await puppeteer.launch({
ignoreHTTPSErrors: true,
headless: true,
args: [
"--disable-gpu",
"--disable-dev-shm-usage",
"--no-sandbox",
"--disable-setuid-sandbox",
"--no-first-run",
"--no-zygote",
"--single-process",
],
});
const page = await browser.newPage();
try {
const url = "https://seekingalpha.com/market-news/1";
const pageOption = {
waitUntil: "networkidle2",
timeout: 20000,
};
await page.goto(url, pageOption);
} catch (e) {
console.log(e);
await page.close();
await browser.close();
return resolve("error at 1");
}
try {
const ulSelector = "#latest-news-list";
await page.waitForSelector(ulSelector, { timeout: 30000 });
} catch (e) {
// ALWAYS TIMEOUTS HERE!
console.log(e);
await page.close();
await browser.close();
return resolve("error at 2");
}
...
It seems the problem was app engine's memory capacity.
When memory is not enough to deal with puppeteer crawling,
It automatically generates another instance.
However, newly created instance has a different puppeteer browser.
Therefore, it results in Navigation failed because browser has disconnected.
The solution is simply upgrade the app engine instance so it can deal with the crawling job by a single instance.
default instance is F1, which has 256M of memory, so I upgraded to F4, which has 1GB of memery, then it doesn't show an error message anymore.
runtime: nodejs12
instance_class: F4
handlers:
- url: /.*
secure: always
script: auto
For me the error was solved when I stopped using the --use-gl=swiftshader arg.
It is used by default if you use args: chromium.args from chrome-aws-lambda
I was having that error in a deploy, the solution for this problem is change some parameters in waitForNavigation:
{ waitUntil: "domcontentloaded" , timeout: 60000 }