Run puppeteer on Chrome No chromium

Run puppeteer on Chrome No chromium - node.js

I want to open Telegram site with puppeteer
But there is a problem
Telegram session only opens on Chrome
You must login with puppeteer each time
There is a way for the puppeteer to run only on the running chrome to detect the session
const browser = await puppeteer.launch({
headless : false,
executablePath: "C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe",
args: ["--lang=en-US,en", '--no-sandbox', '--disable-setuid-sandbox', '--disable-extensions']
})
This code works properly
But on chromium

Yes, it's possible to run a puppeteer instance on top of an pre-existing Chrome process.
In order to achieve this, first, you need to start the Chrome process with the remote-debugging-port option, usually defined as: --remote-debugging-port=9222
This Medium articule is well detailed on how to achieve so, but to summarize:
MAC OS:
Run:
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222 --no-first-run --no-default-browser-check --user-data-dir=$(mktemp -d -t 'chrome-remote_data_dir')
Windows:
Right click on your Google Chrome shortcut icon => Properties
In Target field, add to the very end --remote-debugging-port=9222
Should look something like:
"C:\Program Files (x86)\Google\Chrome\Application\chrome.exe" --remote-debugging-port=9222
Then, you'll be able to navigate to http://localhost:9222/json/version (the port is the same you've defined above), and see an output like this:
{
"Browser": "HeadlessChrome/87.0.4280.66",
"Protocol-Version": "1.3",
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/87.0.4280.66 Safari/537.36",
"V8-Version": "8.7.220.25",
"WebKit-Version": "537.36 (#fd98a29dd59b36f71e4741332c9ad5bda42094bf)",
"webSocketDebuggerUrl": "ws://localhost:9222/devtools/browser/000aaaa-bb08-55af-a8e3-760dd9998fc7"
}
Then, you can use the puppeteer connect() method (instead of the launch() method) like this:
const browser = await puppeteer.connect({
browserWSEndpoint: "ws://localhost:9222/devtools/browser/000aaaa-bb08-55af-a8e3-760dd9998fc7",
});
// now, 'browser' is connected to your chrome window.
// get the opened pages
const openedPages = await browser.pages();
// filter out the one you want (telegram). not sure the best way to do it, please test it yourself
const telegramPage = openedPages.filter(page => page.url().includes("telegram"));

Related

Chrome: 'manifest file is missing or unreadable' when launched by Puppeteer

I keep getting the same error no matter what, when Puppeteer launches Chrome:
My code:
//....
let pathToExtension = require('path').join(__dirname, 'ext'); // doesn't help
// pathToExtension = none of these work either: the app directory, the chrome extension directory, the chrome user data directory ...
pathToExtension = `E:/ext`
browser = await puppeteer.launch({
headless: false, //true,
executablePath: executablePath,
args: [`--disable-extensions-except=${pathToExtension}'`,
`--load-extension=${pathToExtension}`,
// ignoredDefaultArgs: ['--disable-extensions'],
// I would expect the above to allow all extensions, but seems to do nothing
})
//....
I have followed existing solutions here on SO:
Extension must be unpacked:
Failed to load extension and manifest file is missing or unreadable while trying to test Chrome extensions with Playwright
Both --disable-extensions-except and --load-extension need to be
set:
puppeteer unable to load chrome extension in browser
Also, my code above is basically one from official docs https://pptr.dev/guides/chrome-extensions/ .
UPDATE:
Switching headless: false to true made the error (hopefully not just the message) disappear. However, I still need the extensions to load on headful Chrome.
The doc page (above) states that "Extensions in Chrome/Chromium currently only work in non-headless mode and experimental Chrome headless mode.", so I'd expect it to work in headful/non-headless. Not sure abut the "experimental Chrome".

TimeoutError: Navigation timeout of 30000 ms exceeded in puppeteer in ubuntu

No issue in windows.. But in production server ubuntu, I'm getting this error after goto function
const browser = await puppeteer.launch({
headless: true,
args: ['--no-sandbox', '--disable-setuid-sandbox'],
});
const url: String = login.url;
const page: any = await browser.newPage();
await page.setUserAgent('Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36');
await page.goto(url, { waitUntil: 'networkidle2' });
await page.setViewport({
width: 1520,
height: 800,
deviceScaleFactor: 1,
isMobile: false
});
chromium-browser installed, puppeteer installed and some others like libgbm-dev or something
Anyone tell me whats the issue?
If you need any more informations please comment...

In my case, I was running an Ubuntu server with 512MB memory that could not handle running my scripts. I figured this out by writing a simple scraper that visited Google, which worked fine. I then ran my more intensive scrapers, and watched memory usage via htop, while they failed to execute and giving me a timeout error.
I upgraded the server two 2gb of memory, and everything worked fine. You might not need to upgrade all the way to 2gb, but I did just in case.

Puppeteer sometimes require lot of time to answer.
in my case: Puppeteer-19.4.1 Ubuntu-20.04.1 LTS (server) with 1gb RAM, i solve the issue just increasing "page.goto" timeout at 2 minutes.
await page.goto(url, {'timeout': 120000});

Puppeteer --load-extension does not actually install the extension

I have a Chrome Extension I want to install automatically on a Chrome Profile stored at my Desktop.
Chrome Profile Path: C:\\Users\\user\\Desktop\\ChromeProfiles\\test
Chrome Extension Path:
C:\\Users\\user\\Desktop\\SSDC Bot Chrome Console\\Extension Ver
I use this code below to launch a Chrome and load in the Extension:
(async () => {
const pathToExtension = require('path').join("C:\\Users\\user\\Desktop\\SSDC Bot Chrome Console", 'Extension Ver');
const browser = await puppeteer.launch({
headless: false,
args: [
`--disable-extensions-except=${pathToExtension}`,
`--load-extension=${pathToExtension}`,
`--user-data-dir=${'C:\\Users\\user\\Desktop\\ChromeProfiles'+'\\'+'test'}`
],
executablePath : arg[0]
});
})();
What I want to achieve is the following:
Open that Chrome Profile using Puppeteer and Install Extension
Open that Chrome Profile using CMD (Not controlled by puppeteer) and have the Chrome Extension be present.
However, after successfully running the code above and Chrome launches controlled by Puppeteer and having the chrome extension there, when I launch the profile using CMD, the extension is gone.
Should I be using --load-extension? Is there a different flag to use or way to install the extension?

Headless Chrome (Puppeteer) different behaviour running in local docker and remote docker (AWS EC2)

I am trying to debug an issue which causes headless Chrome using Puppeteer to behave differently on my local environment and on a remote environment such as AWS or Heroku.
The application tries to search public available jobs on LinkedIn without authentication (no need to look at profiles), the url format is something like this: https://www.linkedin.com/jobs/search?keywords=Engineer&location=New+York&redirect=false&position=1&pageNum=0
When I open this url in my local environment I have no problems, but when I try to do the same thing on a remote machine such as AWS EC2 or Heroku Dyno I am redirected to a login form by LinkedIn. To debug this difference I've built a Docker image (based on this image) to have isolation from my local Chrome/profile:
Dockerfile
FROM buildkite/puppeteer
WORKDIR /app
COPY . .
RUN npm install
CMD node index.js
EXPOSE 9222
index.js
const puppeteer = require("puppeteer-extra");
puppeteer.use(require("puppeteer-extra-plugin-stealth")());
const testPuppeteer = async () => {
console.log('Opening browser');
const browser = await puppeteer.launch({
headless: true,
slowMo: 20,
args: [
'--remote-debugging-address=0.0.0.0',
'--remote-debugging-port=9222',
'--single-process',
'--lang=en-GB',
'--disable-dev-shm-usage',
'--no-sandbox',
'--disable-setuid-sandbox',
"--proxy-server='direct://",
'--proxy-bypass-list=*',
'--disable-gpu',
'--allow-running-insecure-content',
'--enable-automation',
],
});
console.log('Opening page...');
const page = await browser.newPage();
console.log('Page open');
const url = "https://www.linkedin.com/jobs/search?keywords=Engineer&location=New+York&redirect=false&position=1&pageNum=0";
console.log('Opening url', url);
await page.goto(url, {
waitUntil: 'networkidle0',
});
console.log('Url open');
// page && await page.close();
// browser && await browser.close();
console.log("Done! Leaving page open for remote inspection...");
};
(async () => {
await testPuppeteer();
})();
The docker image used for this test can be found here.
I've run the image on my local environment with the following command:
docker run -p 9222:9222 spinlud/puppeteer-linkedin-test
Then from the local Chrome browser chrome://inspect it should be possible to inspect the GUI of the application (I have deliberately left open the page in headless browser):
As you can see even in local docker the page opens without authentication.
I've done the same test on an AWS EC2 (Amazon Linux 2) with Docker installed. It needs to be a public instance with SSH access and an inbound rule to allow traffic through port 9222 (for remote Chrome debugging).
I've run the same command:
docker run -p 9222:9222 spinlud/puppeteer-linkedin-test
Then again from local Chrome browser chrome://inspect, once added the remote public IP of the EC2, I was able to inspect the GUI of the remote headless Chrome as well:
As you can see this time LinkedIn requires authentication. We can see also a difference in the cookies:
I can't understand the reasons behind this different behaviour between my local and remote environment. In theory Docker should provide isolation and in both environment the headless browser should start with no cookies and a fresh (empty session). Still there is difference and I can't figure out why.
Does anyone have any clue?

Headless chrome proxy server settings

Could anyone help me with setting proxy-server for headless chrome while using the lighthouse chrome launcher in Node.js as mentioned here
const launcher = new ChromeLauncher({
port: 9222,
autoSelectChrome: true, // False to manually select which Chrome install.
additionalFlags: [
'--window-size=412,732',
'--disable-gpu',
'--proxy-server="IP:PORT"',
headless ? '--headless' : ''
]
});
However, the above script does not hit my proxy server at all. Chrome seems to fallback to DIRECT:// connections to the target website.
One other resource that talks about using HTTP/HTTPS proxy server in the context of headless chrome is this. But it does not give any example of how to use this from Node.js.

I tried it using regular exec and it works just fine, here is my snippet:
const exec = require('child_process').exec;
function launchHeadlessChrome(url, callback) {
// Assuming MacOSx.
const CHROME = '/Users/h0x91b/Desktop/Google\\ Chrome\\ Beta.app/Contents/MacOS/Google\\ Chrome';
exec(`${CHROME} --headless --disable-gpu --remote-debugging-port=9222 --proxy-server=127.0.0.1:8888 ${url}`, callback);
}
launchHeadlessChrome('https://www.chromestatus.com', (err, stdout, stderr) => {
console.log('callback', err, stderr, stdout)
});
Then I navigated to http://localhost:9222 and in Developer tools I see :
Proxy connection Error, which is ok, because I don't have proxy on this port, but this means that the Chrome tried to connect via proxy...
BTW Chrome version is 59.
Have checked the source code https://github.com/GoogleChrome/lighthouse/blob/master/chrome-launcher/chrome-launcher.ts#L38-L44
I see no additionalFlags here, there is only chromeFlags try to use it...

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Run puppeteer on Chrome No chromium - node.js

Related

Chrome: 'manifest file is missing or unreadable' when launched by Puppeteer

TimeoutError: Navigation timeout of 30000 ms exceeded in puppeteer in ubuntu

Puppeteer --load-extension does not actually install the extension

Headless Chrome (Puppeteer) different behaviour running in local docker and remote docker (AWS EC2)

Headless chrome proxy server settings

Categories

Resources