Puppeteer proxy connection with VPN Gate - node.js

I've been starting a small project on Node.js and Puppeteer that requires the use of a proxy and i've had some problem connecting through VPNGate's proxy servers.
this is the code i've used so far:
async function getIpTest(){
ips= await new ipGeneration(40);
console.log(ips['#HostName']);
proxConnect= '--proxy-server=' + ips['#HostName'] + '.opengw.net';
const browser= await puppeteer.launch({
headless: false,
ignoreHTTPSErrors: true,
args: [proxConnect]
});
const page = await browser.newPage();
await page.setExtraHTTPHeaders({'Proxy-Authorization': 'Basic' + Buffer.from('vpn:vpn').toString('base64')});
await page.goto('http://www.whatsmyip.org/');
}
where
IPGeneration()
is just a module i made to parse their CSV file.
and
proxConnect= '--proxy-server=' + ips['#HostName'] + '.opengw.net';
is part of the parsing and yeld same results if i put as string directly in puppeteer.launch args
I tried changing the port, or not using any. I tried a dozen of different proxy adresses, and tried to connect directly to IP or hostname
I've tried to look everywhere online but can't seem to find why it is not working (should i mention everything works without trying to launch puppeteer with the proxy).
Is it just VPN Gate that won't work with puppeteer?
EDIT: i was messing around and see that they have config data to connect through openVPN. Could it be a simple working solution to use node>openVPN>VPN Gate servers? Ill try this now

Related

Node Puppeteer Very slow when connected to VPN

Background: Win10 with HyperV with a Win10 VM and a linux/docker VM (with official node docker with xvfb based on https://github.com/beemi/puppeteer-headful). Both are configured with PIA VPN (win10 using their app, linux/docker using openvpn)
See test code below:
const puppeteer = require('puppeteer')
async function getpage() {
const browser = await puppeteer.launch({ executablePath: process.env.PUPPETEER_EXEC_PATH, headless:false, args: ['--no-sandbox', '--disable-setuid-sandbox', '--font-render-hinting=none', '--disable-dev-shm-usage' ], ignoreDefautArgs: ["--enable-automation"]})
const page = await browser.newPage()
await page.setDefaultNavigationTimeout(0);
console.log(new Date().toLocaleTimeString())
await page.goto('https://www.stackoverflow.com', {waitUntil: 'networkidle0'})
console.log(new Date().toLocaleTimeString())
await page.close()
await browser.close()
}
getpage()
Win10VM connected to VPN takes 3 seconds
dockerVM connect to VPN takes 132 seconds
dockerVM not connect to VPN takes 1 second
note that headless false/true does not affect time.
Since the Win10VM is fast, I don't think there is any issue with the VPN. I've tried to curl a large file inside the docker with the VPN and I get fast speeds so I don't think its an issue with the container (or the VPN). I've also tried a variety of different modern websites and get similar results.
For some reason, node or puppeteer does not seem to like going through a vpn

Running my Puppeteer app within PM2's cluster mode doesn't take advantage of the multiple processes

While running my Puppeteer app with PM2's cluster mode enabled, during concurrent requests, only one of the processes seems to be utilized instead of all 4 (1 for each of my cores). Here's the basic flow of my program:
helpers.startChrome()
.then((resp) => {
http.createServer(function (req, res) {
const {webSocketUrl} = JSON.parse(resp.body);
let browser = await puppeteer.connect({browserWSEndpoint: webSocketUrl});
const page = await browser.newPage();
... //do puppeteer stuff
await page.close();
await browser.disconnect();
})
})
and here is the startChrome() function:
startChrome: function(){
return new Promise(async (resolve, reject) => {
const opts = {
//chromeFlags: ["--no-sandbox", "--headless", "--use-gl=egl"],
userDataDir: "D:/pupeteercache",
output: 'json'
};
// Launch chrome using chrome-launcher.
const chrome = await chromeLauncher.launch(opts);
opts.port = chrome.port;
// Connect to it using puppeteer.connect().
resp = await util.promisify(request)(`http://localhost:${opts.port}/json/version`);
resolve(resp);
})
}
First, I use a package called chrome-launcher to start up chrome, I then setup a simple http server that listens for incoming requests to my app. When a request is recieved, i connect to the chrome endpoint i setup through chrome-launcher at the beginning.
When i now try to run this app within PM2's cluster mode, 4 separate chrome tabs are opened up (not sure why it works this way but alright), and everything seems to be running fine. But when I send the server 10 concurrent requests to test and see if all processes are getting used, only the first one is. I know this because when i run PM2 monit, only the first process is using any memory.
Can someone explain to me why all the processes aren't utilized? Is it because of how i'm using chrome-launcher to only use one browser with multiple tabs instead of running multiple browsers?
You cannot use the same user directory for multiple instances at same time. If you pass a user directory, no matter what kind of launcher it is, it will automatically pick the running process and create a new tab on that instead.
Puppeteer creates a temporary profile whenever you want to launch the browser. So if you want to utilize 4 instances, pass it a different user data directory on each instance.

Firefox proxy server for Puppeteer Node.js

While my setting up my node.js puppeteer proxy server I found little misunderstandings. My software is Linux Mint 19, I run puppeteer on Node.js. All works well when I run my command:
const puppeteer = require('puppeteer');
const pptrFirefox = require('puppeteer-firefox');
(async () => {
const browser = await puppeteer.launch({
headless: false,
args:[ '--proxy-server=socks5://127.0.0.1:9050']
});
const page = await browser.newPage();
await page.goto('http://www.whatismyproxy.com/');
await page.screenshot({path: 'example.png'}).then(()=>{console.log("I took screenshot")});
await browser.close();
})();
proxy run on app tor in the system. While my IP is changed and privacy works, google and other websites recognize me as a bot (even without proxy server ON). When I change into "puppeteer-firefox" proxy flags do not work, but I am not recognized as a bot.
My goal is to not be recognized as a bot and run my puppeteer section incognito (in future from Tails linux, through proxy). I am already very excited from your answers :). I ensure you this is only for development purposes. regards to all
Although Puppeteer and Puppeteer-Firefox share the same API, the arguments you send using the args arguments are Browser specific.
Firefox doesn't support passing a proxy from the command arguments. But you can create a profile and launch Firefox using that profile. There are many posts explaining how to create a profile and launch Firefox with that profile. This is one of them.

Running puppeteer with containerized chrome binary from another container

I want my code using puppeteer running in one container and using (perhaps by "executablePath" launch param?) a chrome binary from another container. Is this possible? any known solution for that?
Use case:
worker code runs in multiple k8 pods (as containers) . "Sometime" (might be often or not often) worker needs to run code utilizing puppeteer. I don't want to make the docker gigantic and limited as the puppeteer/chrome container is (1.5 GB If I recall correctly) I just want my code to be supplied with the needed binary from another running container
Notice: this is not a question about containerizing puppeteer, I know that's a possibility
Along with this answer here and here, here is how you can do this. Basically the idea is to run chrome on different docker and connect to it from another, then use that whenever we need. It will need some maintenance, error handling, timeouts and concurrency, but that is not the issue here.
Master
You save puppeteer on master account, you do not install chrome when installing puppeteer there with PUPPETEER_SKIP_CHROMIUM_DOWNLOAD = true, use this one to connect to your worker puppeteers running on another docker.
const browser = await puppeteer.connect({
browserWSEndpoint: "ws://123.123.123.123:8080",
ignoreHTTPSErrors: true
});
Worker
You setup a fully running chrome here, expose the websocket. There are different ways to do this. Here is the simplest one.
const http = require('http');
const httpProxy = require('http-proxy');
const proxy = new httpProxy.createProxyServer();
http
.createServer()
.on('upgrade', async(req, socket, head) => {
const browser = await puppeteer.launch();
const target = browser.wsEndpoint();
proxyy.ws(req, socket, head, { target })
})
.listen(8080);

Auto allow webcam access using Puppeteer for Node.js

I'm setting up a test that involves starting a webcam video session.
So far all is working fine and doesn't require any user interaction except for granting access to the webcam.
When the third party library I'm using makes the call: navigator.mediaDevices.getUserMedia({audio: true, video: true})
the browser opens a prompt asking the user to allow access.
What I'm looking for is a way to grant access without user interaction.
I've tried puppeteer's page.on('dialog'... but that doesn't get called for the webcam access prompt.
Please let me know if you have any ideas?
Google Chrome has a launch option --use-fake-ui-for-media-stream that allows the user to skip a prompt of getUserMedia.
And you can set it with puppeteer like below.
const puppeteer = require('puppeteer')
;(async () => {
const browser = await puppeteer.launch({
args: [ '--use-fake-ui-for-media-stream' ]
})
const page = await browser.newPage()
await page.goto('http://localhost/start-video-test.html')
const startVideoButton = await page.$('#startVideoButton')
startVideoButton.click()
// video session starts without prompt
return browser.close()
})()

Resources