I see that Puppeteer used devtools protocol. I want to see what requests are sent by Puppeteer.
https://github.com/puppeteer/puppeteer
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
await page.screenshot({ path: 'example.png' });
await browser.close();
})();
How can I modify the above simple program to print the devtools requests sent by Puppeteer?
Edit
As the code is in Nodejs, I added the tag nodejs because the solution may be in Nodejs instead of Puppeteer.
Edit
Fiddler is mentioned as relevant. So I add this tag as well.
You could use chrome-protocol-proxy it captures all the CDP messagee. There are few extra steps involved here.
Run google chrome in debug mode
start chrome-protocol-proxy
Start puppeteer using puppeteer.connect()
Run following commads, you may have to change them accordingly
google-chrome-stable --remote-debugging-port=9222 --headless # run chrome
chrome-protocol-proxy # to display CDP messages
Remove this line from your code
const browser = await puppeteer.launch();
Add this line
const browser = await puppeteer.connect({"browserURL":"localhost:9223"});
Instead of browserURL you can give browserWSEndpoint which you will get by cURL on localhost:9223/json/version
If you want to go more into detail of CDP and puppeteer you might want to look at Gettig Started with CDP
Related
I have the following NodeJS code to open Chromium in headless mode and record a web page to a video :
const { launch, getStream } = require("puppeteer-stream");
const fs = require("fs");
const { exec } = require("child_process");
async function test() {
const browser = await launch({headless: true});
const page = await browser.newPage();
await page.goto("https://www.someurl.com");
const stream = await getStream(page, { audio: true, video: true});
// record the web page to mp4 video
const ffmpeg = exec('ffmpeg -y -i - output.mp4');
stream.pipe(ffmpeg.stdin);
setTimeout(async () => {
await stream.destroy();
stream.on("end", () => {});
}, 1000 * 60);
}
The following code works properly but doesn't open chromium in headless mode. No matter what I do, the browser is still opened and visible when browsing the page. No error is thrown.
Does anyone know why it's not opened in headless mode please ?
Thanks
It says in the documentation for puppeteer-stream:
Notice: This will only work in headful mode
This is due to a limitation of Chromium where the Tab Capture API for the extension doesn't work in headless mode. (There are a couple bug reports about this, but I can't find the links at the moment.)
I had the same issue that headless doesn't work with some Websites and Elements (showing blank page content, not finding an element etc.).
But there is another method to "simulate" the headless mode by minimizing and moving the window to a location that can not be seen by the user.
This doesn't hide the chrome task from the taskbar, but the Chrome tab itself will still be hidden for the User.
Just use the following arguments:
var chromeOptions = new ChromeOptions();
chromeOptions.AddArguments(new List<string>() { "--window-size=1,1", "window-position=-2000,0" }); // This hides the chrome window
var chromeDriverService = ChromeDriverService.CreateDefaultService();
chromeDriverService.HideCommandPromptWindow = true; // This is to hid the console.
ChromeDriver driver = new ChromeDriver(chromeDriverService, chromeOptions);
driver.Navigate().GoToUrl("https://google.com");
in short the important part:
chromeOptions.AddArguments(new List<string>() { "--window-size=1,1", "window-position=-2000,0" });
chromeDriverService.HideCommandPromptWindow = true;
//driver.Manage().Window.Minimize(); //use this if the code above does not work
I am attempting to open Chrome with Puppeteer enabling a Chrome extension.
I have been searching for solutions and have tried to implement many with no success.
chrome://version/
Google Chrome: 91.0.4472.164 (Official Build) (x86_64)
Revision: 541163496c9982c98f61819bab7cf2183ea8180f-refs/branch-heads/4472#{#1569}
OS: macOS Version 10.15.7
JavaScript: V8 9.1.269.39
Executable Path: /Applications/Google Chrome.app/Contents/MacOS/Google Chrome
Puppeteer: 10.1.0
Code (one of many attempts, but latest):
const puppeteer = require('puppeteer');
const path = require('path');
const extension_id_string = 'copjnifc....example.....gaodgpbh';
const extension_version = '1.5.1_0';
const extension_path = path.resolve(__dirname, '../../..', `/Library/Application\ Support/Google/Chrome/Default/Extensions/${extension_id_string}/${extension_version}`);
(async() => {
const browser = await puppeteer.launch({
headless: false,
args: [
`--load-extension=${extension_path}`,
`--disable-extensions-except=${extension_path}`
]
})
const page = await browser.newPage()
await page.goto('http://google.com');
})();
I've used Node JS path module to get absolute path to the extensions directory.
On running the code with Node v14.17.1 Chromium opens a browser and an alert pops up saying:
alert => Failed to load extension from: . Manifest file is missing or unreadable
When I follow the extension_path (denoted above on line 6) in terminal I can see a Manifest.json file, so there is one.
What am I missing here? Am I defining the path to the extension incorrectly? Or do I need to set the executablePath for my current Chrome path inside the options when launching a browser? (I did try this with no success).
const chrome_executablePath = path.resolve(__dirname, '../../..', '/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome')
...
const browser = await puppeteer.launch({
headless: false,
executablePath: chrome_executablePath, //<-- added this line in previous attempts, but still didn't work
args: [
`--load-extension=${extension_path}`,
`--disable-extensions-except=${extension_path}`
]
})
...
Any pointers / help greatly appreciated.
I think you can do three things here:
Check if the manifest.json is encoded to UTF-8.
Make sure your extension path points to the extension area (i.e development area).
After headless false, add "devtools: true" option.
I am trying to use pac file as an argument for puppeteer proxy settings as define here: https://www.chromium.org/developers/design-documents/network-settings
--proxy-pac-url=<pac-file-url>
here is my code,
const puppeteer = require('puppeteer');
(async() => {
const proxyUrl = 'http://{IPAddress}:{Port}';
const browser = await puppeteer.launch({
args: [`--proxy-pac=url=${proxypacUrl}`],
headless: false,
});
const page = await browser.newPage();
await page.goto('https://stackoverflow.com/');
await browser.close();
})();
However, when I execute the code, the code just works and visit stackoverflow.com, but completely ignore
--proxy-pac=url=${proxypacUrl}
I know this because I can monitor the proxy logs. Proxy PAC URL file specifically says to use proxy for all traffic.
Here is my proxy pac file,
function FindProxyForURL(url, host) {
return "PROXY IP:PORT; PROXY IP:PORT";
}
When I change --proxy-pac-url=<pac-file-url> to --proxy-server and specify ip and port directly, the traffic goes through the proxy.
Can someone please let me know what I am doing wrong with Proxy PAC URL?
You have an error in your code
--proxy-pac=url=${proxypacUrl} should be --proxy-pac-url=${proxypacUrl}.
Unfortunately fixing it won't help because proxy pac files aren't supported in headless chromium, here is an issue.
I am running a NodeJS script using puppeteer in my local machine to download some assets from Internet. I wanted that script to to be running as Google Cloud function.
I just wanted to know, is there any local space associated with GFC where we can save this files and can be accessed later or can we specify any cloud storage bucket URL where this download can save.
#!/usr/bin/env node
const { program } = require('commander');
const puppeteer = require('puppeteer');
program
.option('-e, --email <email>', 'Login Email Address', process.env.LOOKER_EMAIL || '')
.option('-p, --password <password>', 'Login Password', process.env.LOOKER_PASSWORD || '')
.option('-d, --dashboard <id>', 'Dashboard To Download');
program.parse(process.argv);
const fs = require('fs');
const basePath = 'C:\\card\\'
(async () => {
const loginEmail = program.email;
const loginPassword = program.password;
const dashboardId = program.dashboard;
// used puppeteer to download some files
const browser = await puppeteer.launch({
headless: true
})
let pages = await browser.pages();
const page = await browser.newPage();
await page.setViewport({ width: 1920, height: 1080 });
await page.goto(loginUrl);
await page.waitForSelector(loginEmailSelector);
await page.type(loginEmailSelector, loginEmail);
await page.type(loginPasswordSelector, loginPassword);
await Promise.all([
page.waitForNavigation(),
page.click(loginButtonSelector)
]);
await page.goto(`https://somewebsite/${dashboardId}`);
await page.waitForSelector(menuSelector, {
visible: true
});
await page.click(menuSelector);
await page.waitForSelector(downloadSelector, {
visible: true
});
const ts = Date.now()
const downloadLoc = basePath + ts + '\\'
console.log('downloadLoc ', downloadLoc)
await page._client.send('Page.setDownloadBehavior', {
behavior: 'allow',
downloadPath: downloadLoc
})
console.log(`your file's on the way!`)
})();
So here in the script I am just downloading the file in C drive, I wanted this to store in some cloud storage if possible, Please let me know if you have any suggestions.
Concept of Cloud Function assumes that code should be stateless, which means that any data should be stored outside, although there is possibility to use /tmp directory, but this is only for temporary purposes. Recommended solution is Cloud Storage (reference).
However not only Cloud Storage can be used to keep the state. This would be the best in case of binary objects, meaning files.
On the other hand, if those files contain data, you could try to choose one of Google NoSQL databases like Firestore, Datastore (actually Firestore in Datastore mode) and Firebase Realtime database. All of them have nice API for many languages, including of course node.js. Additionally, if you plan to create larger solutions, it's possible to even use BigTable for massive data and BigQuery if you need analytics. All of this depends on what you need.
Nice and very convenient in above mentioned Google API's is that in Cloud Functions there is no need to authenticate to the particular products saving a lot of code and resources. All of the solutions are server-less, so you do not have to care about server underneath and scaling when your solution will grow. Also you can get extremely fast network speed between resources when you will do it inside GCP.
If I have an existing Google Chrome window open, I'd like to tell puppeteer to open a new tab instead of opening a new window. Is there a way to do that? is there some option or flag I can pass to puppeteer to accomplish this?
I have:
const puppeteer = require('puppeteer');
(async function () {
const b = await puppeteer.launch({
devtools: true,
openInExistingWindow: true /// ? something like this?
});
const page = await b.newPage();
await page.goto('https://example.com');
})();
const browser = puppeteer.launch();
const page = browser.newPage();
This will open a new tab (Puppeteer calls them "pages") in your current browser instance. You can check out the Page class docs here and the Browser class docs here.
You'll need to use:
/usr/bin/google-chrome-stable --remote-debugging-port=9220
to get the websocket connection for debugging which then can be fed to Puppeteer:
await puppeteer.connect({browserWSEndpoint: chromeWebsocket})