Run and install X server to make puppeteer works - node.js

on my linux server I running my nodejs project which should crawl single page app by puppeteer npm module.
Here is an example of the code I use:
const puppeteer = require('puppeteer');
(async () => {
try {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://bvopen.abrickis.me/#/standings');
await page.waitForSelector('.category', { timeout: 1000 });
const body = await page.evaluate(() => {
return document.querySelector('body').innerHTML;
});
console.log(body);
await browser.close();
} catch (error) {
console.log(error);
}
})();
But I've got the next error:
0|www | Error: Failed to launch the browser process!
0|www | [5642:5642:0511/154701.856738:ERROR:browser_main_loop.cc(1485)] Unable to open X display.
0|www | [0511/154701.863486:ERROR:nacl_helper_linux.cc(308)] NaCl helper process running without a sandbox!
0|www | Most likely you need to configure your SUID sandbox correctly
0|www | TROUBLESHOOTING: https://github.com/puppeteer/puppeteer/blob/master/docs/troubleshooting.md
I searched a lot how to install X server and tried a lot of things like sudo apt-get install xorg openbox but it doesn't helps.

Looks like puppeteer wants to start the browser in a non-headless mode but as you don't have xorg installed it failed. But I would say that's not what you want when it's running on a server anyways. So there is no need to install xorg or anything.
Maybe try to launch the puppeteer browser with following options:
await puppeteer.launch({
headless: true,
args: [
"--disable-gpu",
"--disable-dev-shm-usage",
"--no-sandbox",
"--disable-setuid-sandbox"
]
});

Related

Puppeteer Fails to launch the browser

After creating a directory with an index.js file with the following code:
const puppeteer = require('puppeteer');
async function main() {
const browser = await puppeteer.launch({
headless: false,
args: ['--no-sandbox']
});
const page = await browser.newPage();
await page.goto('https://example.com');
await page.screenshot({
path: 'example.png'
});
await browser.close();
}
// Start the script
main();
and then running npm init, and npm install puppeteer, the following error is returned:
node index.js
/mnt/c/Users/trgre/OneDrive/Desktop/puppeteer
test/node_modules/puppeteer/lib/cjs/puppeteer/node/BrowserRunner.js:214
reject(new Errors_js_1.TimeoutError(`Timed out after ${timeout} ms while trying to
connect to the browser! Only Chrome at revision r${preferredRevision} is guaranteed to
work.`));
^
TimeoutError: Timed out after 30000 ms while trying to connect to the browser! Only Chrome at
revision r901912 is guaranteed to work.
at Timeout.onTimeout (/mnt/c/Users/trgre/OneDrive/Desktop/puppeteer
test/node_modules/puppeteer/lib/cjs/puppeteer/node/BrowserRunner.js:214:20)
at listOnTimeout (node:internal/timers:557:17)
at processTimers (node:internal/timers:500:7)
Node.js v17.1.0
Any ideas on what to do in order to run a puppeteer program, I am on windows using Ubuntu 20?

Puppeteer newPage() freezes/ never resolves nor rejects Ubuntu Server

I am using an Ubuntu server 18.04.5 LTS and Puppeteer 10.0.0. My Problem is that the browser.newPage() function never resolves. So basicly in the console it alsways loggs Start but never 1 nor 2. I have tried a different puppeteer version or puppeteer-core with my own chromium version. I even installed an VM on my Pc and it works there but not on my Server.
var puppeteer = require('puppeteer')
var adresse = "https://www.google.de/"
async function test() {
try {
const browser = await puppeteer.launch({
"headless": true,
"args": [
'--disable-setuid-sandbox',
'--no-sandbox',
'--disable-gpu',
]
})
console.log("Start")
const page = await browser.newPage()
console.log("1")
await page.goto(adresse)
console.log("2")
console.log(page)
} catch (error) {
console.log(error)
}
}
test()

Firebase puppeteer PDF function times out due to Chrome revision

I have a Firebase function to create a PDF file. Lately, it times out due to a "Chrome revision"? Neither do I understand the error message, nor do I understand what is wrong. The function works, when I deploy it locally under MacOS.
TimeoutError: Timed out after 30000 ms while trying to connect to the browser! Only Chrome at revision r818858 is guaranteed to work.
at Timeout.onTimeout (/workspace/node_modules/puppeteer/lib/cjs/puppeteer/node/BrowserRunner.js:204:20)
at listOnTimeout (internal/timers.js:549:17)
at processTimers (internal/timers.js:492:7)
The function:
const puppeteer = require('puppeteer');
const createPDF = async (html, outputPath) => {
let pdf;
try {
const browser = await puppeteer.launch({
args: ['--no-sandbox']
});
const page = await browser.newPage();
await page.emulateMediaType('screen');
await page.setContent(html, {
waitUntil: 'networkidle0'
});
pdf = await page.pdf({
// path: outputPath,
format: 'A4',
printBackground: true,
margin: {
top: "50px",
bottom: "50px"
}
});
await browser.close();
} catch (e) {
console.error(e);
}
return pdf;
};
TimeoutError: Timed out after 30000 ms while trying to connect to the browser!
The aforementioned error is coming from the fact that as mentioned in the documentation:
When you install Puppeteer, it downloads a recent version of Chromium
Everytime you're executing Puppeteer you're running a Chromium in the backend to which Puppeteer will try to connect, hence when it can't connect to the browser this errors raises.
After doing multiple test I was able to execute the Cloud Function by adding the parameter headless on the launch option, since the documentation mentions that it should be true by default I don't quite understand why setting it manually allows the Cloud Function to finish correctly.
At the beginning I was trying with the timeout set to 0 to disable the error due to timeout, however it seems that it's not required, since by only adding headless it finished correctly, but if you find the same problem with the timeouts you can add it.
At the end my code looks like this:
const createPDF = async (html, outputPath) => {
let pdf;
try {
const browser = await puppeteer.launch({
args: ['--no-sandbox'],
headless: true,
timeout: 0
});
const page = await browser.newPage();
await page.emulateMediaType('screen');
await page.setContent(html, {
waitUntil: 'networkidle0'
});
pdf = await page.pdf({
// path: outputPath,
format: 'A4',
printBackground: true,
margin: {
top: "50px",
bottom: "50px"
}
});
await browser.close();
console.log("Download finished"); //Added this to debug that it finishes correctly
} catch (e) {
console.error(e);
}
return pdf;
};

Puppeteer error: Navigation failed because browser has disconnected

I am using puppeteer on Google App Engine with Node.JS
whenever I run puppeteer on app engine, I encounter an error saying
Navigation failed because browser has disconnected!
This works fine in local environment, so I am guessing it is a problem with app engine.
const browser = await puppeteer.launch({
ignoreHTTPSErrors: true,
headless: true,
args: ["--disable-setuid-sandbox", "--no-sandbox"],
});
This is my app engine's app.yaml file
runtime: nodejs12
env: standard
handlers:
- url: /.*
secure: always
script: auto
-- EDIT--
It works when I add --disable-dev-shm-usage argument, but then it always timeouts. Here are my codes.
const browser = await puppeteer.launch({
ignoreHTTPSErrors: true,
headless: true,
args: [
"--disable-gpu",
"--disable-dev-shm-usage",
"--no-sandbox",
"--disable-setuid-sandbox",
"--no-first-run",
"--no-zygote",
"--single-process",
],
});
const page = await browser.newPage();
try {
const url = "https://seekingalpha.com/market-news/1";
const pageOption = {
waitUntil: "networkidle2",
timeout: 20000,
};
await page.goto(url, pageOption);
} catch (e) {
console.log(e);
await page.close();
await browser.close();
return resolve("error at 1");
}
try {
const ulSelector = "#latest-news-list";
await page.waitForSelector(ulSelector, { timeout: 30000 });
} catch (e) {
// ALWAYS TIMEOUTS HERE!
console.log(e);
await page.close();
await browser.close();
return resolve("error at 2");
}
...
It seems the problem was app engine's memory capacity.
When memory is not enough to deal with puppeteer crawling,
It automatically generates another instance.
However, newly created instance has a different puppeteer browser.
Therefore, it results in Navigation failed because browser has disconnected.
The solution is simply upgrade the app engine instance so it can deal with the crawling job by a single instance.
default instance is F1, which has 256M of memory, so I upgraded to F4, which has 1GB of memery, then it doesn't show an error message anymore.
runtime: nodejs12
instance_class: F4
handlers:
- url: /.*
secure: always
script: auto
For me the error was solved when I stopped using the --use-gl=swiftshader arg.
It is used by default if you use args: chromium.args from chrome-aws-lambda
I was having that error in a deploy, the solution for this problem is change some parameters in waitForNavigation:
{ waitUntil: "domcontentloaded" , timeout: 60000 }

Running Puppeteer with xfvb headless : false

I am running Puppeteer in a headless Ubuntu 16.04 AWS EC2 instance and would like to run it with a virtual display through xfvb. whenever I try to run it I continue to get the error:
/home/ubuntu/node_modules/xvfb/index.js:84
throw new Error('Could not start Xvfb.');
Error: Could not start Xvfb.
at Xvfb.startSync (/home/ubuntu/node_modules/xvfb/index.js:84:17)
at Object.<anonymous> (/home/ubuntu/puppeteer-works.js:39:6)
at Module._compile (internal/modules/cjs/loader.js:689:30)
at Object.Module._extensions..js (internal/modules/cjs/loader.js:700:10)
at Module.load (internal/modules/cjs/loader.js:599:32)
at tryModuleLoad (internal/modules/cjs/loader.js:538:12)
at Function.Module._load (internal/modules/cjs/loader.js:530:3)
at Function.Module.runMain (internal/modules/cjs/loader.js:742:12)
at startup (internal/bootstrap/node.js:266:19)
at bootstrapNodeJSCore (internal/bootstrap/node.js:596:3)
My code is below:
const puppeteer = require('puppeteer');
const fs = require("fs");
const Xvfb = require('xvfb');
var xvfb = new Xvfb();
var text = fs.readFileSync("proxy.txt").toString('utf-8');
const textByLine = text.split(" ");
const preparePageForTests = async (page) => {
const userAgent = 'Mozilla/5.0 (X11; Linux x86_64)' +
'AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.39
Safari/537.36';
await page.setUserAgent(userAgent);
await page.evaluateOnNewDocument(() => {
Object.defineProperty(navigator, 'webdriver', {
get: () => false,
});
});
await page.evaluateOnNewDocument(() => {
window.chrome = {
runtime: {},
};
});
await page.evaluateOnNewDocument(() => {
const originalQuery = window.navigator.permissions.query;
return window.navigator.permissions.query = (parameters) => (
parameters.name === 'notifications' ?
Promise.resolve({ state: Notification.permission }) :
originalQuery(parameters)
);
});
await page.evaluateOnNewDocument(() => {
Object.defineProperty(navigator, 'plugins', {
get: () => [1, 2, 3, 4, 5],
});
});
}
xvfb.startSync();
(async () => {
const browser = await puppeteer.launch({
args: ['--no-sandbox', '--proxy-server='+textByLine[0]],
headless: true, });
const page = await browser.newPage();
page.authenticate({
username: textByLine[1],
password: textByLine[2]
});
await preparePageForTests(page);
const testUrl ="https://publicindex.sccourts.org/abbeville/publicindex/";
await page.goto(testUrl);
const html = await page.content();
await page.screenshot({path: 'result.png'});
await browser.close()
console.log(html)
})();
xvfb.stopSync();
I appreciate any help, am pretty new to node.js so I apologize in advance for any format errors. I am not being allowed to post this due to it being mainly code, so I am adding this extra sentence.
You seem to be trying to use the Xvfb node module. While the other answers definitely work, here's a snippet that works fully within nodejs
const puppeteer = require('puppeteer')
const Xvfb = require('xvfb');
(async () => {
var xvfb = new Xvfb({
silent: true,
xvfb_args: ["-screen", "0", '1280x720x24', "-ac"],
});
xvfb.start((err)=>{if (err) console.error(err)})
const browser = await puppeteer.launch({
headless: false,
defaultViewport: null, //otherwise it defaults to 800x600
args: ['--no-sandbox', '--start-fullscreen', '--display='+xvfb._display]
});
const page = await browser.newPage();
await page.goto(`https://wikipedia.org`,{waitUntil: 'networkidle2'});
await page.screenshot({path: 'result.png'});
await browser.close()
xvfb.stop();
})()
This isn't perfect in terms of handling errors (and possible race conditions) in xvfb.start(), but it should get you started, and it works pretty consistently for me.
Edit: Remember to install Xvfb first: sudo apt-get install xvfb (Thanks, #iamfrank)
it's seem th xfvb not install properly
you should Install xvfb (the X windows virtual framebuffer) packages for ubuntu OS
$ sudo apt-get update
$ sudo apt-get install xvfb
Run Xvfb in the background and specify a display number (10 in my example)
$ Xvfb :10 -ac &
Set the DISPLAY variable to the number you chose
$ export DISPLAY=:10
If other required packages and xvfb are properly installed, then run the following command.
xvfb-run -a --server-args="-screen 0 1280x800x24 -ac -nolisten tcp -dpi 96 +extension RANDR" command-that-runs-chrome
Cheers!!!

Resources