When I run this function on my local machine in the Firebase Emulator it works great, but when I deploy it to the cloud I get a lot of:
TimeoutError: Navigation timeout of 30000 ms exceeded
Code is very simple:
const functions = require("firebase-functions");
const puppeteer = require("puppeteer");
exports.myFunc = functions
.runWith({ memory: '2GB' })
.https.onRequest(async (request, response) => {
const browser = await puppeteer.launch({
headless: true,
args: ['--no-sandbox', '--disable-setuid-sandbox']
});
const page = await browser.newPage();
await page.goto("https://example.com", { waitUntil: 'networkidle2' });
const pageContent = await page.content();
await browser.close();
response.send(pageContent);
});
I get a lot of this in the Firebase Functions log:
You can see this in the Google Cloud report too:
Why are so many runs getting timeout?
Related
I have been experiencing this issue from a long time now. I have a web scraper on a Windows VM and I have it set to run every few hours. It works most of the time but a lot of times Puppeteer just opens this page 👇 and not the site or page I want to open.
Why does that happen and what can be the fix for this?
A simple reproduction for this issue can be this code
import puppeteer from 'puppeteer'
import { scheduleJob } from 'node-schedule';
async function run() {
const browser = await puppeteer.launch({
headless: false,
executablePath: chromePath,
defaultViewport: null,
timeout: 0,
args: ['--no-sandbox', '--start-maximized'],
});
const page = await browser.newPage();
await page.evaluateOnNewDocument(() => {
Object.defineProperty(navigator, 'webdriver', {
get: () => false,
});
});
await page.goto('https://aliexpress.com', {
waitUntil: 'networkidle0',
timeout: 0,
});
}
run();
scheduleJob('scrape aliexpress', `0 */${hours} * * *`, run);
I have a small javascript setup to convert html to pdf using the javascript library puppeteer.
Hosting the service by opening the command panel and starting node index.js everything works fine. The express-api hosts the service under the predefined port and requesting the service I get the converted PDF back.
Now, installing the javascript as Windows-Service by using the library node-windows and requesting the service, I get the following error message:
Failed to launch the browser process!
Now I'm not sure where to search for the root cause. Is it possible that this could be a permission issue?
Following my puppeteer javascript code :
const ValidationError = require('./../errors/ValidationError.js')
const puppeteer = require('puppeteer-core');
module.exports = class PdfService{
static async htmlToPdf(html){
if(!html){
throw new ValidationError("no html");
}
const browser = await puppeteer.launch({
headless: true,
executablePath: process.env.EDGE_PATH,
args: ["--no-sandbox"]
});
const page = await browser.newPage();
await page.setContent(html, {
waitUntil: "networkidle2"
});
const pdf = await page.pdf({format: 'A4',printBackground: true});
await browser.close();
return pdf;
}
}
Node.js app with Express, deployed on Heroku. It's just dynamic webpages. Loading static webpages works fine.
Loading dynamic webpages works on localhost, but on Heroku it throws me code=H12, desc="Request timeout", service=30000ms, status=503.
In addition, fresh after doing heroku restart or making a deployment, there always seems to be one instance of a status=200 that loads only the static portion of a dynamic webpage.
Screenshot of logs here.
I've tried the following, which have all led to either the same or other unexpected results when deployed on Heroku (such as Error R14 (Memory quota exceeded) and code=H13 desc="Connection closed without response"):
Switching the Puppeteer Heroku buildpack I was using. I've tried the ones mentioned in this troubleshooting guide and this comment.
Adding headless: true in Puppeteer's launch arguments.
Adding the --no-sandbox, --disable-setuid-sandbox, --single-process, and --no-zygote flags in args of Puppeteer's launch arguments. (Reference: this comment & this comment)
Setting the waitUntil argument in Puppeteer's goto function to domcontentloaded, networkidle0 and networkidle2. (Reference: this comment)
Passing a timeout argument in Puppeteer goto function; I've tried 30000 and 60000 specifically, as well as 0 per this comment.
Using the waitForSelector function.
Clearing Heroku's build cache, as per this article.
Printing the url variable (see my code below) in the console. Output is as expected.
I've observed that:
With the code I have right now (see below), the try-catch-finally block never catches any error. It's always one of the following: I get an incomplete result (static portion of requested dynamic webpage), or the app crashes (code=H13 desc="Connection closed without response"). So I haven't been able to get anything out of attempting to print exception in the console from within the catch block.
Any ideas on how I could get this to work?
const app = express();
const puppeteer = require("puppeteer");
let port = process.env.PORT || 3000;
let browser;
...
app.listen(port, async() => {
browser = await puppeteer
.launch({
timeout: 0,
headless: true,
args: [
"--no-sandbox",
"--disable-setuid-sandbox",
"--single-process",
"--no-zygote",
],
});
});
...
app.get("/appropriate-route-name", async (req, res) => {
let url = req.query.url;
let page = await browser.newPage();
try {
await page.goto(url, {
waitUntil: "networkidle2",
});
res.send({ data: await page.content() });
} catch (exception) {
res.send({ data: null });
} finally {
await browser.close();
}
}
Was able to get it to work by using user-agents. Dynamic pages now load just fine on Heroku; requests don't time out every single time anymore.
const app = express();
const puppeteer = require("puppeteer");
let port = process.env.PORT || 3000;
var userAgent = require("user-agents");
...
app.get("/route-name", async (req, res) => {
let url = req.query.url;
let browser = await puppeteer.launch({
args: ["--no-sandbox"],
});
let page = await browser.newPage();
try {
await page.setUserAgent(userAgent.toString()); // added this
await page.goto(url, {
timeout: 30000,
waitUntil: "newtorkidle2", // or "networkidle0", depending on what you need
});
res.send({ data: await page.content() });
} catch (e) {
res.send({ data: null });
} finally {
await browser.close();
}
});
When I try to run node app.js, I get error:
the message is Failed to launch the browser process! spawn
/Users/iliebogdanbarbulescu/Downloads/firstProject/node_modules/chromium/lib/chromium/chrome-mac/Chromium.app
EACCES
What I did
I checked the folder at /Users/iliebogdanbarbulescu/Downloads/firstProject/node_modules/chromium/lib/chromium/chrome-mac/Chromium.app and the file is not zipped. It can be run.
Note:
If I try to execute without the path, it works, but
I would like to use either Chrome or Chromium to open a new page.
const browser = await puppeteer.launch({headless:false'});
const express = require('express');
const puppeteer = require('puppeteer');
const app = express();
(async () => {
const browser = await puppeteer.launch({headless:false, executablePath:'/Users/iliebogdanbarbulescu/Downloads/firstProject/node_modules/chromium/lib/chromium/chrome-mac/Chromium.app'});
const page = await browser.newPage();
await page.goto('https://google.com', {waitUntil: 'networkidle2'});
})().catch((error) =>{
console.error("the message is " + error.message);
});
app.listen(3000, function (){
console.log('server started');
})
If you navigate to chrome://version/ page in this exact browser, it will show the Executable Path which is the exact string you need to use as executablePath puppeteer launch option.
Usually, chrome's path looks like this on MAC:
/Applications/Google Chrome.app/Contents/MacOS/Google Chrome
Or something like this if chromium is located in your node_modules folder:
/Users/iliebogdanbarbulescu/Downloads/firstProject/node_modules/chromium/lib/chromium/chrome-mac/Chromium.app/Contents/MacOS/Chromium
Now if you compare the string you used for executablePath: it differs from the one retrieved with the method mentioned above. Exactly the /Contents/MacOS/Chromium should be added to the end of the current path to make it work.
Note: the chromium bundled with puppeteer is the version guaranteed to work together with the actual pptr version: if you plan to use other chrome/or chromium-based browsers you might experience unexpected issues.
Following up on #theDavidBarton:
Chromium which was shipped with Puppeteer did not work, but the Chrome installation on my MacBook did work.
OS: OS-X 10.15.7 (Catalina)
Node version: v14.5.0
Failed code:
const browser = await puppeteer.launch({
headless: true,
executablePath: "/users/bert/Project/NodeJS/PuppeteerTest/node_modules/puppeteer/.local-chromium/mac-818858/chrome-mac/Chromium.app/Contents/MacOS/Chromium"
});
Successful code:
const browser = await puppeteer.launch({
headless: true,
executablePath: "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"
});
Full code, Just the first example on the Puppeteer website:
const puppeteer = require('puppeteer');
(async () => {
try {
const browser = await puppeteer.launch({headless: true, executablePath: "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"});
const page = await browser.newPage();
await page.goto('https://example.com');
await page.screenshot({path: 'example.png'});
await browser.close();
} catch (err) {
console.log(err);
}
})();
And, yes, I got the Screenshot !! :-)
Using location-chrome: https://www.npmjs.com/package/locate-chrome
const locateChrome = require('locate-chrome');
const executablePath = await new Promise(resolve => locateChrome(arg => resolve(arg)));
const browser = await puppeteer.launch({ executablePath });
Tried looking through the docs, but didn't find a way to set a max timeout for a test case. Seems like a simple feature.
import puppeteer from 'puppeteer'
test('App loads', async() => {
const browser = await puppeteer.launch({ headless: false, slowMo: 250 });
const page = await browser.newPage();
await page.goto('http://localhost:3000');
await browser.close();
});
Jest's test(name, fn, timeout) function can take a 3rd parameter that specifies a custom timeout.
test('example', async () => {
...
}, 1000); // timeout of 1s (default is 5s)
Source: https://github.com/facebook/jest/issues/5055#issuecomment-350827560
You can also set the timeout globally for a suite using jest.setTimeout(10000); in the beforeAll() function:
beforeAll(async () => {
jest.setTimeout(10000); // change timeout to 10 seconds
});