Puppeteer is detected on ubuntu server but not locally - node.js

so I have a puppeteer script to watch TikTok live streams and when I run it locally it works as expected, but in Ubuntu 20.04 LTS Server the page loads for the live stream, but the live stream never starts and it requires me to log in, which doesn't have locally either. Any ideas to bypass that detection?
Settings
const puppeteer = require("puppeteer-extra");
const { Cluster } = require("puppeteer-cluster");
// Use stealth plugin to bypass bot detection
const StealthPlugin = require("puppeteer-extra-plugin-stealth");
const AnonymizeUA = require("puppeteer-extra-plugin-anonymize-ua");
const AdblockerPlugin = require('puppeteer-extra-plugin-adblocker');
puppeteer.use(StealthPlugin());
puppeteer.use(AnonymizeUA());
puppeteer.use(AdblockerPlugin({ blockTrackers: true }));
(async () => {
const cluster = await Cluster.launch({
concurrency: Cluster.CONCURRENCY_CONTEXT,
maxConcurrency: 60000,
timeout: 86400000,
puppeteer: puppeteer,
retryLimit: 10,
retryDelay: 1000,
puppeteerOptions: {
headless: true,
timeout: 120000, //360000
args: [
"--start-maximized",
"--no-sandbox",
"--disable-setuid-sandbox",
"--disable-dev-shm-usage",
"--disable-accelerated-2d-canvas",
"--no-first-run",
"--no-zygote",
"--disable-gpu",
],
executablePath: "/snap/bin/chromium",
defaultViewport: null,
},
});
Even when I install GUI on the machine, and visit the live stream manually I still cant view it, so it has something to do with the server getting detected as a server maybe?
Thanks a lot!

Related

Puppeteer ERR_SSL_BAD_RECORD_MAC_ALERT with proxy

I have use puppeteer with http proxy
This is the config for puppeteer:
let config = {
userDataDir: `./puppeteer-cache/dev_chrome_profile_${hash}`,
headless: false,
args: [
`--proxy-server=${newProxyUrl}`,
'--ignore-certificate-errors',
'--disable-web-security',
'--disable-features=IsolateOrigins,site-per-process,BlockInsecurePrivateNetworkRequests',
'--disable-site-isolation-trials'
],
defaultViewport: null,
ignoreHTTPSErrors: true
}
Sometimes I have an issue:
This site can’t be reached. The webpage at https://some.site.com might be temporarily down or it may have moved permanently to a new web address.
ERR_SSL_BAD_RECORD_MAC_ALERT
when I try page.goto('https://some.site.com').
const page = await browser.newPage();
if (proxy && proxy.login && proxy.pass) {
await page.authenticate({
username: proxy.login,
password: proxy.pass,
});
}
try {
console.log('POINT 13');
await page.goto(films[i].url);
console.log('POINT 14');
return;
} catch (e) {
console.log('POINT 15');
return;
}
I see POINT 13 in my console, but neither POINT 14 nor POINT 15. The script like slept freezes between points 13 and 14, on page.goto()...
I have tried to change timeout for page.goto() function but it's not work.

Nodejs puppeteer RequestError: self signed certificate

I'm running puppeteer-extra-plugin-stealth on my device localhost and it threw error RequestError: self signed certificate after the chromium opened.
Previously when I faced this error I solved it by adding --ignore-certificate-errors in args array but now it doesn't work anymore. I saw some discussion asking to add --ignore-certificate-errors-spki-list too but still not working. Making npm to ignore ca also doesn't help.
const StealthPlugin = require('puppeteer-extra-plugin-stealth')();
const puppeteer = require('puppeteer-extra');
StealthPlugin.onBrowser = () => { };
StealthPlugin.enabledEvasions.delete('chrome.runtime')
StealthPlugin.enabledEvasions.delete('iframe.contentWindow')
puppeteer.use(StealthPlugin);
puppeteer.launch({
headless: false,
userDataDir: '/',
args: [
'--no-sandbox',
'--proxy-server="direct://"',
'--proxy-bypass-list=*',
'--start-fullscreen',
'--ignore-certificate-errors',
'--ignore-certificate-errors-spki-list'
]
}).then(async browser => {})
Any way I can ignore this certificate checking? Thank you.

Puppeteer nodejs project keeps freezing

I have a nodejs project running puppeteer v13.5.1 which does some webscraping.
After some time (mostly 40-80 minutes) the process freezes without throwing any error. It just stops.
I've added some logs and the strange thing is it happens on different executions.
Sometimes it freezes on
const refreshedHtml = await page.evaluate(() => document.documentElement.innerHTML);
sometimes on
await page.click('button.swiper-button-next');
I've tried many different variations, last one being:
const browser = await puppeteer.launch({
headless: true,
devtools: true,
args: [
'--ignore-certificate-errors',
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-accelerated-2d-canvas',
'--disable-gpu'
]
});
Any help appriecated

Puppeteer error while running in ubuntu machine

when I run puppeteer on Ubuntu I get this error:
UnhandledPromiseRejectionWarning: Error: Unable to launch browser, error message: Failed to launch the browser process!
[2098647:2098647:0520/162023.317120:ERROR:vaapi_wrapper.cc(594)] Could not get a valid VA display
[2098647:2098647:0520/162023.317252:ERROR:gpu_init.cc(426)] Passthrough is not supported, GL is egl
TROUBLESHOOTING: https://github.com/puppeteer/puppeteer/blob/main/docs/troubleshooting.md
at Cluster.<anonymous> (/root/Desktop/Copart/node_modules/puppeteer-cluster/dist/Cluster.js:119:23)
at Generator.throw (<anonymous>)
at rejected (/root/Desktop/Copart/node_modules/puppeteer-cluster/dist/Cluster.js:6:65)
at process._tickCallback (internal/process/next_tick.js:68:7)
Here are my puppeteer options:
pupOptions: {
headless: false,
args: [
"--incognito",
"--disable-gpu",
"--disable-dev-shm-usage",
"--disable-setuid-sandbox",
"--no-first-run",
"--no-sandbox",
"--no-zygote",
],
defaultViewport: null,
slowMo: 10,
sameDomainDelay: 1000,
retryDelay: 3000,
workerCreationDelay: 3000,
timeout: 30000000,
userDataDir: "/root/.config/google-chrome",
executablePath: "/opt/google/chrome/google-chrome",
}
Also, here is the plugins that I use:
const puppeteer = require("puppeteer-extra");
const RecaptchaPlugin = require("puppeteer-extra-plugin-recaptcha");
I tried killing google instance before running code but still didn't work
Also, I would like to mention that it works when using "puppeteer-cluster"
Anyone have any idea or solution for this? Thanks a lot for the help!
I had to remove "--disable-gpu", from args
If you are running puppeteer on Ubuntu server, you should try turning
headless: false
to
headless: true
if there is no GUI on your system, then it can't show you the browser

Puppeteer error: Navigation failed because browser has disconnected

I am using puppeteer on Google App Engine with Node.JS
whenever I run puppeteer on app engine, I encounter an error saying
Navigation failed because browser has disconnected!
This works fine in local environment, so I am guessing it is a problem with app engine.
const browser = await puppeteer.launch({
ignoreHTTPSErrors: true,
headless: true,
args: ["--disable-setuid-sandbox", "--no-sandbox"],
});
This is my app engine's app.yaml file
runtime: nodejs12
env: standard
handlers:
- url: /.*
secure: always
script: auto
-- EDIT--
It works when I add --disable-dev-shm-usage argument, but then it always timeouts. Here are my codes.
const browser = await puppeteer.launch({
ignoreHTTPSErrors: true,
headless: true,
args: [
"--disable-gpu",
"--disable-dev-shm-usage",
"--no-sandbox",
"--disable-setuid-sandbox",
"--no-first-run",
"--no-zygote",
"--single-process",
],
});
const page = await browser.newPage();
try {
const url = "https://seekingalpha.com/market-news/1";
const pageOption = {
waitUntil: "networkidle2",
timeout: 20000,
};
await page.goto(url, pageOption);
} catch (e) {
console.log(e);
await page.close();
await browser.close();
return resolve("error at 1");
}
try {
const ulSelector = "#latest-news-list";
await page.waitForSelector(ulSelector, { timeout: 30000 });
} catch (e) {
// ALWAYS TIMEOUTS HERE!
console.log(e);
await page.close();
await browser.close();
return resolve("error at 2");
}
...
It seems the problem was app engine's memory capacity.
When memory is not enough to deal with puppeteer crawling,
It automatically generates another instance.
However, newly created instance has a different puppeteer browser.
Therefore, it results in Navigation failed because browser has disconnected.
The solution is simply upgrade the app engine instance so it can deal with the crawling job by a single instance.
default instance is F1, which has 256M of memory, so I upgraded to F4, which has 1GB of memery, then it doesn't show an error message anymore.
runtime: nodejs12
instance_class: F4
handlers:
- url: /.*
secure: always
script: auto
For me the error was solved when I stopped using the --use-gl=swiftshader arg.
It is used by default if you use args: chromium.args from chrome-aws-lambda
I was having that error in a deploy, the solution for this problem is change some parameters in waitForNavigation:
{ waitUntil: "domcontentloaded" , timeout: 60000 }

Resources