I keep getting the same error no matter what, when Puppeteer launches Chrome:
My code:
//....
let pathToExtension = require('path').join(__dirname, 'ext'); // doesn't help
// pathToExtension = none of these work either: the app directory, the chrome extension directory, the chrome user data directory ...
pathToExtension = `E:/ext`
browser = await puppeteer.launch({
headless: false, //true,
executablePath: executablePath,
args: [`--disable-extensions-except=${pathToExtension}'`,
`--load-extension=${pathToExtension}`,
// ignoredDefaultArgs: ['--disable-extensions'],
// I would expect the above to allow all extensions, but seems to do nothing
})
//....
I have followed existing solutions here on SO:
Extension must be unpacked:
Failed to load extension and manifest file is missing or unreadable while trying to test Chrome extensions with Playwright
Both --disable-extensions-except and --load-extension need to be
set:
puppeteer unable to load chrome extension in browser
Also, my code above is basically one from official docs https://pptr.dev/guides/chrome-extensions/ .
UPDATE:
Switching headless: false to true made the error (hopefully not just the message) disappear. However, I still need the extensions to load on headful Chrome.
The doc page (above) states that "Extensions in Chrome/Chromium currently only work in non-headless mode and experimental Chrome headless mode.", so I'd expect it to work in headful/non-headless. Not sure abut the "experimental Chrome".
Related
I have a Chrome Extension I want to install automatically on a Chrome Profile stored at my Desktop.
Chrome Profile Path: C:\\Users\\user\\Desktop\\ChromeProfiles\\test
Chrome Extension Path:
C:\\Users\\user\\Desktop\\SSDC Bot Chrome Console\\Extension Ver
I use this code below to launch a Chrome and load in the Extension:
(async () => {
const pathToExtension = require('path').join("C:\\Users\\user\\Desktop\\SSDC Bot Chrome Console", 'Extension Ver');
const browser = await puppeteer.launch({
headless: false,
args: [
`--disable-extensions-except=${pathToExtension}`,
`--load-extension=${pathToExtension}`,
`--user-data-dir=${'C:\\Users\\user\\Desktop\\ChromeProfiles'+'\\'+'test'}`
],
executablePath : arg[0]
});
})();
What I want to achieve is the following:
Open that Chrome Profile using Puppeteer and Install Extension
Open that Chrome Profile using CMD (Not controlled by puppeteer) and have the Chrome Extension be present.
However, after successfully running the code above and Chrome launches controlled by Puppeteer and having the chrome extension there, when I launch the profile using CMD, the extension is gone.
Should I be using --load-extension? Is there a different flag to use or way to install the extension?
Actually am making an API for some works with puppeteer, it works in my local bcoz i had executable path
also in docs it says no need for set executable path but my error is
Could not find expected browser (chrome) locally. Run `npm install` to download the correct Chromium revision (970485).
My code For the launch() is
const browser = await puppeteer.launch({
args: [
'--no-sandbox',
'--disable-setuid-sandbox',
],
});
const page = await browser.newPage();
Here i use Railway (Heroku clone)
If am right,
i have made mistakes in Executable path
lmk where i made the error
thank you
I am trying to debug an issue which causes headless Chrome using Puppeteer to behave differently on my local environment and on a remote environment such as AWS or Heroku.
The application tries to search public available jobs on LinkedIn without authentication (no need to look at profiles), the url format is something like this: https://www.linkedin.com/jobs/search?keywords=Engineer&location=New+York&redirect=false&position=1&pageNum=0
When I open this url in my local environment I have no problems, but when I try to do the same thing on a remote machine such as AWS EC2 or Heroku Dyno I am redirected to a login form by LinkedIn. To debug this difference I've built a Docker image (based on this image) to have isolation from my local Chrome/profile:
Dockerfile
FROM buildkite/puppeteer
WORKDIR /app
COPY . .
RUN npm install
CMD node index.js
EXPOSE 9222
index.js
const puppeteer = require("puppeteer-extra");
puppeteer.use(require("puppeteer-extra-plugin-stealth")());
const testPuppeteer = async () => {
console.log('Opening browser');
const browser = await puppeteer.launch({
headless: true,
slowMo: 20,
args: [
'--remote-debugging-address=0.0.0.0',
'--remote-debugging-port=9222',
'--single-process',
'--lang=en-GB',
'--disable-dev-shm-usage',
'--no-sandbox',
'--disable-setuid-sandbox',
"--proxy-server='direct://",
'--proxy-bypass-list=*',
'--disable-gpu',
'--allow-running-insecure-content',
'--enable-automation',
],
});
console.log('Opening page...');
const page = await browser.newPage();
console.log('Page open');
const url = "https://www.linkedin.com/jobs/search?keywords=Engineer&location=New+York&redirect=false&position=1&pageNum=0";
console.log('Opening url', url);
await page.goto(url, {
waitUntil: 'networkidle0',
});
console.log('Url open');
// page && await page.close();
// browser && await browser.close();
console.log("Done! Leaving page open for remote inspection...");
};
(async () => {
await testPuppeteer();
})();
The docker image used for this test can be found here.
I've run the image on my local environment with the following command:
docker run -p 9222:9222 spinlud/puppeteer-linkedin-test
Then from the local Chrome browser chrome://inspect it should be possible to inspect the GUI of the application (I have deliberately left open the page in headless browser):
As you can see even in local docker the page opens without authentication.
I've done the same test on an AWS EC2 (Amazon Linux 2) with Docker installed. It needs to be a public instance with SSH access and an inbound rule to allow traffic through port 9222 (for remote Chrome debugging).
I've run the same command:
docker run -p 9222:9222 spinlud/puppeteer-linkedin-test
Then again from local Chrome browser chrome://inspect, once added the remote public IP of the EC2, I was able to inspect the GUI of the remote headless Chrome as well:
As you can see this time LinkedIn requires authentication. We can see also a difference in the cookies:
I can't understand the reasons behind this different behaviour between my local and remote environment. In theory Docker should provide isolation and in both environment the headless browser should start with no cookies and a fresh (empty session). Still there is difference and I can't figure out why.
Does anyone have any clue?
I have added the required build packs. There are also no errors shown in heroku logs. Locally the deployed application works completely fine and scrapes the required news but on heroku the page just refreshes and displays nothing
app.post("/news",function(req,res){
var pla= req.body.place;
var url='https://www.google.com/search?q=covid+19+'+pla+'&sxsrf=ALeKk02SupK-SO625SAtNAmqA5CHUj5xjg:1586447007701&source=lnms&tbm=nws&sa=X&ved=2ahUKEwikieXS19voAhXAxzgGHV5bCcQQ_AUoAXoECBwQAw&biw=1536&bih=535';
(async () => {
const browser = await puppeteer.launch({args: ['--no-sandbox']});
const page = await browser.newPage();
await page.goto(url);
var data = await page.evaluate(() =>
Array.from(document.querySelectorAll('div.g'))
.map(compact => ({
headline: compact.querySelector('h3').innerText.trim(),
img: compact.querySelector("img") === null ? 'https://upload.wikimedia.org/wikipedia/commons/thumb/6/6c/No_image_3x4.svg/1280px-No_image_3x4.svg.png' : compact.querySelector("img.th.BbeB2d").src,
url: compact.querySelector("h3.r.dO0Ag>a").href,
source: compact.querySelector("div.gG0TJc>div.dhIWPd>span.xQ82C.e8fRJf").innerText.trim(),
time: compact.querySelector("div.gG0TJc>div.dhIWPd>span.f.nsa.fwzPFf").innerText.trim(),
desc : compact.querySelector("div.st").innerText.trim()
}))
)
console.log(data);
res.render('news.ejs',{data: data});
await browser.close();
})();
});
I'd suggest you to add the '--disable-setuid-sandbox' flag to your puppeteer launch command:
const browser = await puppeteer.launch({args: ['--no-sandbox', '--disable-setuid-sandbox']});
I had some problem in the past, and if I recall it correctly the flag helped.
May be this could help (copied from Puppeteer official website) because I had similar problem and it worked for me.
Running Puppeteer on Heroku (https://github.com/puppeteer/puppeteer/blob/main/docs/troubleshooting.md#running-puppeteer-on-heroku)
Running Puppeteer on Heroku requires some additional dependencies that aren't included on the Linux box that Heroku spins up for you. To add the dependencies on deploy, add the Puppeteer Heroku buildpack to the list of buildpacks for your app under Settings > Buildpacks.
The url for the buildpack is https://github.com/jontewks/puppeteer-heroku-buildpack
Ensure that you're using '--no-sandbox' mode when launching Puppeteer. This can be done by passing it as an argument to your .launch() call: puppeteer.launch({ args: ['--no-sandbox'] });.
When you click add buildpack, simply paste that url into the input, and click save. On the next deploy, your app will also install the dependencies that Puppeteer needs to run.
If you need to render Chinese, Japanese, or Korean characters you may need to use a buildpack with additional font files like https://github.com/CoffeeAndCode/puppeteer-heroku-buildpack
There's also another simple guide from #timleland that includes a sample project: https://timleland.com/headless-chrome-on-heroku/.
Could anyone help me with setting proxy-server for headless chrome while using the lighthouse chrome launcher in Node.js as mentioned here
const launcher = new ChromeLauncher({
port: 9222,
autoSelectChrome: true, // False to manually select which Chrome install.
additionalFlags: [
'--window-size=412,732',
'--disable-gpu',
'--proxy-server="IP:PORT"',
headless ? '--headless' : ''
]
});
However, the above script does not hit my proxy server at all. Chrome seems to fallback to DIRECT:// connections to the target website.
One other resource that talks about using HTTP/HTTPS proxy server in the context of headless chrome is this. But it does not give any example of how to use this from Node.js.
I tried it using regular exec and it works just fine, here is my snippet:
const exec = require('child_process').exec;
function launchHeadlessChrome(url, callback) {
// Assuming MacOSx.
const CHROME = '/Users/h0x91b/Desktop/Google\\ Chrome\\ Beta.app/Contents/MacOS/Google\\ Chrome';
exec(`${CHROME} --headless --disable-gpu --remote-debugging-port=9222 --proxy-server=127.0.0.1:8888 ${url}`, callback);
}
launchHeadlessChrome('https://www.chromestatus.com', (err, stdout, stderr) => {
console.log('callback', err, stderr, stdout)
});
Then I navigated to http://localhost:9222 and in Developer tools I see :
Proxy connection Error, which is ok, because I don't have proxy on this port, but this means that the Chrome tried to connect via proxy...
BTW Chrome version is 59.
Have checked the source code https://github.com/GoogleChrome/lighthouse/blob/master/chrome-launcher/chrome-launcher.ts#L38-L44
I see no additionalFlags here, there is only chromeFlags try to use it...