Puppeteer is failing to launch the browser in local - node.js

I am getting this error again and again while launching the application. I would have reinstalled puppeteer for like 8-9 times and even downloaded all the dependencies listed in the Troubleshooting link.
Error: Failed to launch the browser process! spawn /home/......./NodeJs/Scraping/code3/node_modules/puppeteer/.local-chromium/linux-756035/chrome-linux/chrome ENOENT
TROUBLESHOOTING: https://github.com/puppeteer/puppeteer/blob/main/docs/troubleshooting.md
This Code is just for taking a screenshot of google.com
NodeJs Version- 14.0.0
Puppeteer Version- 4.0.1
Ubuntu Version- 20.04
I am using puppeteer which is bundled with Chromium
const chalk = require("chalk");
// MY OCD of colorful console.logs for debugging... IT HELPS
const error = chalk.bold.red;
const success = chalk.keyword("green");
(async () => {
try {
// open the headless browser
var browser = await puppeteer.launch({ headless: false });
// open a new page
var page = await browser.newPage();
// enter url in page
await page.goto(`https://www.google.com/`);
// Google Say Cheese!!
await page.screenshot({ path: "example.png" });
await browser.close();
console.log(success("Browser Closed"));
} catch (err) {
// Catch and display errors
console.log(error(err));
await browser.close();
console.log(error("Browser Closed"));
}
})(); ```

As you said puppeteer 2.x.x works for you perfectly but 4.x.x doesn't: it seems to be a linux dependency issue which occurs more since puppeteer 3.x.x (usually libgbm1 is the culprit).
If you are not sure where is your chrome executable located first run:
whereis chrome
(e.g.: /usr/bin/chrome)
Then to find your missing dependencies run:
ldd /usr/bin/chrome | grep not
sudo apt-get install the listed dependencies.
After this happened you are able to do a clean npm install on your project with the latest puppeteer aas well (as of today it will be 5.0.0).

Related

Puppeteer Waiting for target frame Ubuntu digitalocean

I have been building a webscraper in Node.js and running it on a digital ocean Ubuntu server. Puppeteer is only having issues on Ubuntu for my program.
I originally had an issue running Puppeteer with root user so I switched to a new account I made on the server and now I have this new issue.
Version: HeadlessChrome/105.0.5173.0
Error: Waiting for target frame D0E4A57B880331E15F232D467A28499A
failed
at Timeout._onTimeout (/home/pricepal/priceServer-deployment/price-server/node_modules/puppeteer/lib/cjs/puppeteer/common/util.js:447:18)
at listOnTimeout (node:internal/timers:564:17)
at process.processTimers (node:internal/timers:507:7)
Node.js v18.7.0
Here is the block of code that the program stops at and eventually errors out:
try {
const browser = await puppeteer.launch()
const page = await browser.newPage()
await page.goto(link)
const content = await page.content()
await browser.close()
return content
} catch (error) {
console.log(error)
}
It takes a little longer than normal to generate the headless browser but the error is stemming from a timeout happening at page.goto(link). All of the links fail to load not just one in particular.
The links I am using work when ran on my m1 mac with the same chromium and node versions.
I have been doing research and trying new things all day but I cannot get it fixed and have found little resourced relating to this issue.
I had the exact same problem, been pulling my hair out looking for answers the past few days. I know it's not exactly a proper answer (mods sorry if you have to delete this), but I found that switching from Ubuntu to Debian 10 magically fixed everything. FWIW the line causing the error is:
const page = await browser.newPage()
I suspect the issue lies somewhere within the version of Chromium that Puppeteer downloads, and its interaction with the OS. What exactly though I couldn't say. My results are as follows:
Didn't work:
Ubuntu 22.04
Ubuntu 20.04
Debian 11
Worked:
Debian 10

Heroku Could not find expected browser (chrome) locally. Run `npm install` to download the correct Chromium revision (970485)

Actually am making an API for some works with puppeteer, it works in my local bcoz i had executable path
also in docs it says no need for set executable path but my error is
Could not find expected browser (chrome) locally. Run `npm install` to download the correct Chromium revision (970485).
My code For the launch() is
const browser = await puppeteer.launch({
args: [
'--no-sandbox',
'--disable-setuid-sandbox',
],
});
const page = await browser.newPage();
Here i use Railway (Heroku clone)
If am right,
i have made mistakes in Executable path
lmk where i made the error
thank you

Headless Chrome (Puppeteer) different behaviour running in local docker and remote docker (AWS EC2)

I am trying to debug an issue which causes headless Chrome using Puppeteer to behave differently on my local environment and on a remote environment such as AWS or Heroku.
The application tries to search public available jobs on LinkedIn without authentication (no need to look at profiles), the url format is something like this: https://www.linkedin.com/jobs/search?keywords=Engineer&location=New+York&redirect=false&position=1&pageNum=0
When I open this url in my local environment I have no problems, but when I try to do the same thing on a remote machine such as AWS EC2 or Heroku Dyno I am redirected to a login form by LinkedIn. To debug this difference I've built a Docker image (based on this image) to have isolation from my local Chrome/profile:
Dockerfile
FROM buildkite/puppeteer
WORKDIR /app
COPY . .
RUN npm install
CMD node index.js
EXPOSE 9222
index.js
const puppeteer = require("puppeteer-extra");
puppeteer.use(require("puppeteer-extra-plugin-stealth")());
const testPuppeteer = async () => {
console.log('Opening browser');
const browser = await puppeteer.launch({
headless: true,
slowMo: 20,
args: [
'--remote-debugging-address=0.0.0.0',
'--remote-debugging-port=9222',
'--single-process',
'--lang=en-GB',
'--disable-dev-shm-usage',
'--no-sandbox',
'--disable-setuid-sandbox',
"--proxy-server='direct://",
'--proxy-bypass-list=*',
'--disable-gpu',
'--allow-running-insecure-content',
'--enable-automation',
],
});
console.log('Opening page...');
const page = await browser.newPage();
console.log('Page open');
const url = "https://www.linkedin.com/jobs/search?keywords=Engineer&location=New+York&redirect=false&position=1&pageNum=0";
console.log('Opening url', url);
await page.goto(url, {
waitUntil: 'networkidle0',
});
console.log('Url open');
// page && await page.close();
// browser && await browser.close();
console.log("Done! Leaving page open for remote inspection...");
};
(async () => {
await testPuppeteer();
})();
The docker image used for this test can be found here.
I've run the image on my local environment with the following command:
docker run -p 9222:9222 spinlud/puppeteer-linkedin-test
Then from the local Chrome browser chrome://inspect it should be possible to inspect the GUI of the application (I have deliberately left open the page in headless browser):
As you can see even in local docker the page opens without authentication.
I've done the same test on an AWS EC2 (Amazon Linux 2) with Docker installed. It needs to be a public instance with SSH access and an inbound rule to allow traffic through port 9222 (for remote Chrome debugging).
I've run the same command:
docker run -p 9222:9222 spinlud/puppeteer-linkedin-test
Then again from local Chrome browser chrome://inspect, once added the remote public IP of the EC2, I was able to inspect the GUI of the remote headless Chrome as well:
As you can see this time LinkedIn requires authentication. We can see also a difference in the cookies:
I can't understand the reasons behind this different behaviour between my local and remote environment. In theory Docker should provide isolation and in both environment the headless browser should start with no cookies and a fresh (empty session). Still there is difference and I can't figure out why.
Does anyone have any clue?

Puppeteer scrapes news from google properly in local server but not in heroku

I have added the required build packs. There are also no errors shown in heroku logs. Locally the deployed application works completely fine and scrapes the required news but on heroku the page just refreshes and displays nothing
app.post("/news",function(req,res){
var pla= req.body.place;
var url='https://www.google.com/search?q=covid+19+'+pla+'&sxsrf=ALeKk02SupK-SO625SAtNAmqA5CHUj5xjg:1586447007701&source=lnms&tbm=nws&sa=X&ved=2ahUKEwikieXS19voAhXAxzgGHV5bCcQQ_AUoAXoECBwQAw&biw=1536&bih=535';
(async () => {
const browser = await puppeteer.launch({args: ['--no-sandbox']});
const page = await browser.newPage();
await page.goto(url);
var data = await page.evaluate(() =>
Array.from(document.querySelectorAll('div.g'))
.map(compact => ({
headline: compact.querySelector('h3').innerText.trim(),
img: compact.querySelector("img") === null ? 'https://upload.wikimedia.org/wikipedia/commons/thumb/6/6c/No_image_3x4.svg/1280px-No_image_3x4.svg.png' : compact.querySelector("img.th.BbeB2d").src,
url: compact.querySelector("h3.r.dO0Ag>a").href,
source: compact.querySelector("div.gG0TJc>div.dhIWPd>span.xQ82C.e8fRJf").innerText.trim(),
time: compact.querySelector("div.gG0TJc>div.dhIWPd>span.f.nsa.fwzPFf").innerText.trim(),
desc : compact.querySelector("div.st").innerText.trim()
}))
)
console.log(data);
res.render('news.ejs',{data: data});
await browser.close();
})();
});
I'd suggest you to add the '--disable-setuid-sandbox' flag to your puppeteer launch command:
const browser = await puppeteer.launch({args: ['--no-sandbox', '--disable-setuid-sandbox']});
I had some problem in the past, and if I recall it correctly the flag helped.
May be this could help (copied from Puppeteer official website) because I had similar problem and it worked for me.
Running Puppeteer on Heroku (https://github.com/puppeteer/puppeteer/blob/main/docs/troubleshooting.md#running-puppeteer-on-heroku)
Running Puppeteer on Heroku requires some additional dependencies that aren't included on the Linux box that Heroku spins up for you. To add the dependencies on deploy, add the Puppeteer Heroku buildpack to the list of buildpacks for your app under Settings > Buildpacks.
The url for the buildpack is https://github.com/jontewks/puppeteer-heroku-buildpack
Ensure that you're using '--no-sandbox' mode when launching Puppeteer. This can be done by passing it as an argument to your .launch() call: puppeteer.launch({ args: ['--no-sandbox'] });.
When you click add buildpack, simply paste that url into the input, and click save. On the next deploy, your app will also install the dependencies that Puppeteer needs to run.
If you need to render Chinese, Japanese, or Korean characters you may need to use a buildpack with additional font files like https://github.com/CoffeeAndCode/puppeteer-heroku-buildpack
There's also another simple guide from #timleland that includes a sample project: https://timleland.com/headless-chrome-on-heroku/.

Unhandled promise rejection (rejection id: 1): Error: kill ESRCH

I've made some research on the Web and SOF, but found nothing really helpful on that error.
I installed Node and Puppeteer with Windows 10 Ubuntu Bash, but didn't manage to make it work, yet I manage to make it work on Windows without Bash on an other machine.
My command is :
node index.js
My index.js tries to take a screenshot of a page :
const puppeteer = require('puppeteer');
async function run() {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://github.com');
await page.screenshot({ path: 'screenshots/github.png' });
browser.close();
}
run();
Does anybody know the way I could fix this "Error: kill ESRCH" error?
I had the same issue, this worked for me.
Try updating your script to the following:
const puppeteer = require('puppeteer');
async function run() {
//const browser = await puppeteer.launch();
const browser = await puppeteer.launch({headless: true, args: ['--no-sandbox'] }); //WSL's chrome support is very new, and requires sandbox to be disabled in a lot of cases.
const page = await browser.newPage();
await page.goto('https://github.com');
await page.screenshot({ path: 'screenshots/github.png' });
await browser.close(); //As #Md. Abu Taher suggested
}
run();
const browser = await puppeteer.launch({ args: ['--no-sandbox'] });
If you want to read all the details on this, this ticket has them (or links to them).
https://github.com/Microsoft/WSL/issues/648
Other puppeteer users with similar issues:
https://github.com/GoogleChrome/puppeteer/issues/290#issuecomment-322851507
I just fixed this issue. What you need to do is the following:
1) Install Debian dependencies
You can find them in this doc:
https://github.com/GoogleChrome/puppeteer/blob/master/docs/troubleshooting.md
sudo apt-get install all of those bad boys.
2) Add '--no-sandbox' flag when launching puppeteer
3) Make sure your windows 10 is up to date. I was missing an important update that allowed you to launch Chrome.
Points no consider:
Windows bash is not a complete drop-in replacement for Ubuntu bash (yet). There are many cases where different GUI based apps did not work properly. Also, the script might be confused by bash on windows 10. It could think that the os is linux instead of windows.
Windows 10 bash only supports 64-bit binaries, so make sure the node and the chrome version that's used inside is pretty much 64-bit. Puppeteer is using -child.pid to kill the child processes instead of child.pid on windows version. Make sure puppeteer is not getting confused by all these bash/windows thing.
Back to your case.
You are using browser.close() in the function, but it should be await browser.close(), otherwise it's not executing in proper order.
Also, You should try to add await page.close(); before browser.close();.
So the code should be,
await page.close();
await browser.close();
I worked around it by softlinking chrome.exe to node_modules/puppeteer/.../chrome as below
ln -s /mnt/c/Program\ Files\ \(x86\)/Google/Chrome/Application/chrome.exe node_modules/puppeteer/.local-chromium/linux-515411/chrome-linux/chrome

Resources