I have use puppeteer with http proxy
This is the config for puppeteer:
let config = {
userDataDir: `./puppeteer-cache/dev_chrome_profile_${hash}`,
headless: false,
args: [
`--proxy-server=${newProxyUrl}`,
'--ignore-certificate-errors',
'--disable-web-security',
'--disable-features=IsolateOrigins,site-per-process,BlockInsecurePrivateNetworkRequests',
'--disable-site-isolation-trials'
],
defaultViewport: null,
ignoreHTTPSErrors: true
}
Sometimes I have an issue:
This site can’t be reached. The webpage at https://some.site.com might be temporarily down or it may have moved permanently to a new web address.
ERR_SSL_BAD_RECORD_MAC_ALERT
when I try page.goto('https://some.site.com').
const page = await browser.newPage();
if (proxy && proxy.login && proxy.pass) {
await page.authenticate({
username: proxy.login,
password: proxy.pass,
});
}
try {
console.log('POINT 13');
await page.goto(films[i].url);
console.log('POINT 14');
return;
} catch (e) {
console.log('POINT 15');
return;
}
I see POINT 13 in my console, but neither POINT 14 nor POINT 15. The script like slept freezes between points 13 and 14, on page.goto()...
I have tried to change timeout for page.goto() function but it's not work.
Related
so I have a puppeteer script to watch TikTok live streams and when I run it locally it works as expected, but in Ubuntu 20.04 LTS Server the page loads for the live stream, but the live stream never starts and it requires me to log in, which doesn't have locally either. Any ideas to bypass that detection?
Settings
const puppeteer = require("puppeteer-extra");
const { Cluster } = require("puppeteer-cluster");
// Use stealth plugin to bypass bot detection
const StealthPlugin = require("puppeteer-extra-plugin-stealth");
const AnonymizeUA = require("puppeteer-extra-plugin-anonymize-ua");
const AdblockerPlugin = require('puppeteer-extra-plugin-adblocker');
puppeteer.use(StealthPlugin());
puppeteer.use(AnonymizeUA());
puppeteer.use(AdblockerPlugin({ blockTrackers: true }));
(async () => {
const cluster = await Cluster.launch({
concurrency: Cluster.CONCURRENCY_CONTEXT,
maxConcurrency: 60000,
timeout: 86400000,
puppeteer: puppeteer,
retryLimit: 10,
retryDelay: 1000,
puppeteerOptions: {
headless: true,
timeout: 120000, //360000
args: [
"--start-maximized",
"--no-sandbox",
"--disable-setuid-sandbox",
"--disable-dev-shm-usage",
"--disable-accelerated-2d-canvas",
"--no-first-run",
"--no-zygote",
"--disable-gpu",
],
executablePath: "/snap/bin/chromium",
defaultViewport: null,
},
});
Even when I install GUI on the machine, and visit the live stream manually I still cant view it, so it has something to do with the server getting detected as a server maybe?
Thanks a lot!
I'm trying to access a website that only allows me to go through if I change the value of some cookies, however whenever I try to change the value of the __Secure-3PSIDCC cookie it tells me that it is an Invalid cookie field
DevTools listening on ws://127.0.0.1:33837/devtools/browser/e3f433a2-8a43-481b-9a2f-b6aa4a923228
MESA-INTEL: warning: Performance support disabled, consider sysctl dev.i915.perf_stream_paranoid=0
libva error: vaGetDriverNameByIndex() failed with unknown libva error, driver_name = (null)
[27423:27516:0915/232908.850008:ERROR:nss_util.cc(349)] After loading Root Certs, loaded==false: NSS error code: -8018
/home/bonk/Kisa/testing/node_modules/puppeteer/lib/cjs/puppeteer/common/Connection.js:298
error: new Errors_js_1.ProtocolError(),
^
ProtocolError: Protocol error (Network.setCookies): Invalid cookie fields
at /home/bonk/Kisa/testing/node_modules/puppeteer/lib/cjs/puppeteer/common/Connection.js:298:24
at new Promise (<anonymous>)
at CDPSession.send (/home/bonk/Kisa/testing/node_modules/puppeteer/lib/cjs/puppeteer/common/Connection.js:294:16)
at Page.setCookie (/home/bonk/Kisa/testing/node_modules/puppeteer/lib/cjs/puppeteer/common/Page.js:869:67)
at processTicksAndRejections (node:internal/process/task_queues:96:5)
at async f (/home/bonk/Kisa/testing/test.js:75:5) {
originalMessage: 'Invalid cookie fields'
here is my code
const f = (async() => {
const browser = await puppeteer.launch({
dumpio: true,
headless: false
});
const page = await browser.newPage();
page.setUserAgent(
"Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0"
);
const cookies = [
{
'name': '__Secure-3PSIDCC',
'value': 'hello', domain: '.google.com'
}
];
await page.setCookie(...cookies);
await page.goto("https://google.com");
//await page._client.send("Page.setDownloadBehavior", {
// behavior: "allow",
// downloadPath: "./anime/" + titlee
//})
await setTimeout(async() =>{
const link = await page.evaluate(() => document.querySelector('*').outerHTML);
await page.mouse.click(782, 354);
await setTimeout(async() =>{
await page.bringToFront();
await page.mouse.click(782, 354);
}, 10);
}, 10000);
});
f();
changing the value of __Secure-3PSIDCC cookie on firefox works however it throws me the error above when I try and change the cookie value in chromium
I am using an Ubuntu server 18.04.5 LTS and Puppeteer 10.0.0. My Problem is that the browser.newPage() function never resolves. So basicly in the console it alsways loggs Start but never 1 nor 2. I have tried a different puppeteer version or puppeteer-core with my own chromium version. I even installed an VM on my Pc and it works there but not on my Server.
var puppeteer = require('puppeteer')
var adresse = "https://www.google.de/"
async function test() {
try {
const browser = await puppeteer.launch({
"headless": true,
"args": [
'--disable-setuid-sandbox',
'--no-sandbox',
'--disable-gpu',
]
})
console.log("Start")
const page = await browser.newPage()
console.log("1")
await page.goto(adresse)
console.log("2")
console.log(page)
} catch (error) {
console.log(error)
}
}
test()
I have a Firebase function to create a PDF file. Lately, it times out due to a "Chrome revision"? Neither do I understand the error message, nor do I understand what is wrong. The function works, when I deploy it locally under MacOS.
TimeoutError: Timed out after 30000 ms while trying to connect to the browser! Only Chrome at revision r818858 is guaranteed to work.
at Timeout.onTimeout (/workspace/node_modules/puppeteer/lib/cjs/puppeteer/node/BrowserRunner.js:204:20)
at listOnTimeout (internal/timers.js:549:17)
at processTimers (internal/timers.js:492:7)
The function:
const puppeteer = require('puppeteer');
const createPDF = async (html, outputPath) => {
let pdf;
try {
const browser = await puppeteer.launch({
args: ['--no-sandbox']
});
const page = await browser.newPage();
await page.emulateMediaType('screen');
await page.setContent(html, {
waitUntil: 'networkidle0'
});
pdf = await page.pdf({
// path: outputPath,
format: 'A4',
printBackground: true,
margin: {
top: "50px",
bottom: "50px"
}
});
await browser.close();
} catch (e) {
console.error(e);
}
return pdf;
};
TimeoutError: Timed out after 30000 ms while trying to connect to the browser!
The aforementioned error is coming from the fact that as mentioned in the documentation:
When you install Puppeteer, it downloads a recent version of Chromium
Everytime you're executing Puppeteer you're running a Chromium in the backend to which Puppeteer will try to connect, hence when it can't connect to the browser this errors raises.
After doing multiple test I was able to execute the Cloud Function by adding the parameter headless on the launch option, since the documentation mentions that it should be true by default I don't quite understand why setting it manually allows the Cloud Function to finish correctly.
At the beginning I was trying with the timeout set to 0 to disable the error due to timeout, however it seems that it's not required, since by only adding headless it finished correctly, but if you find the same problem with the timeouts you can add it.
At the end my code looks like this:
const createPDF = async (html, outputPath) => {
let pdf;
try {
const browser = await puppeteer.launch({
args: ['--no-sandbox'],
headless: true,
timeout: 0
});
const page = await browser.newPage();
await page.emulateMediaType('screen');
await page.setContent(html, {
waitUntil: 'networkidle0'
});
pdf = await page.pdf({
// path: outputPath,
format: 'A4',
printBackground: true,
margin: {
top: "50px",
bottom: "50px"
}
});
await browser.close();
console.log("Download finished"); //Added this to debug that it finishes correctly
} catch (e) {
console.error(e);
}
return pdf;
};
I am using puppeteer on Google App Engine with Node.JS
whenever I run puppeteer on app engine, I encounter an error saying
Navigation failed because browser has disconnected!
This works fine in local environment, so I am guessing it is a problem with app engine.
const browser = await puppeteer.launch({
ignoreHTTPSErrors: true,
headless: true,
args: ["--disable-setuid-sandbox", "--no-sandbox"],
});
This is my app engine's app.yaml file
runtime: nodejs12
env: standard
handlers:
- url: /.*
secure: always
script: auto
-- EDIT--
It works when I add --disable-dev-shm-usage argument, but then it always timeouts. Here are my codes.
const browser = await puppeteer.launch({
ignoreHTTPSErrors: true,
headless: true,
args: [
"--disable-gpu",
"--disable-dev-shm-usage",
"--no-sandbox",
"--disable-setuid-sandbox",
"--no-first-run",
"--no-zygote",
"--single-process",
],
});
const page = await browser.newPage();
try {
const url = "https://seekingalpha.com/market-news/1";
const pageOption = {
waitUntil: "networkidle2",
timeout: 20000,
};
await page.goto(url, pageOption);
} catch (e) {
console.log(e);
await page.close();
await browser.close();
return resolve("error at 1");
}
try {
const ulSelector = "#latest-news-list";
await page.waitForSelector(ulSelector, { timeout: 30000 });
} catch (e) {
// ALWAYS TIMEOUTS HERE!
console.log(e);
await page.close();
await browser.close();
return resolve("error at 2");
}
...
It seems the problem was app engine's memory capacity.
When memory is not enough to deal with puppeteer crawling,
It automatically generates another instance.
However, newly created instance has a different puppeteer browser.
Therefore, it results in Navigation failed because browser has disconnected.
The solution is simply upgrade the app engine instance so it can deal with the crawling job by a single instance.
default instance is F1, which has 256M of memory, so I upgraded to F4, which has 1GB of memery, then it doesn't show an error message anymore.
runtime: nodejs12
instance_class: F4
handlers:
- url: /.*
secure: always
script: auto
For me the error was solved when I stopped using the --use-gl=swiftshader arg.
It is used by default if you use args: chromium.args from chrome-aws-lambda
I was having that error in a deploy, the solution for this problem is change some parameters in waitForNavigation:
{ waitUntil: "domcontentloaded" , timeout: 60000 }