I have been experiencing this issue from a long time now. I have a web scraper on a Windows VM and I have it set to run every few hours. It works most of the time but a lot of times Puppeteer just opens this page 👇 and not the site or page I want to open.
Why does that happen and what can be the fix for this?
A simple reproduction for this issue can be this code
import puppeteer from 'puppeteer'
import { scheduleJob } from 'node-schedule';
async function run() {
const browser = await puppeteer.launch({
headless: false,
executablePath: chromePath,
defaultViewport: null,
timeout: 0,
args: ['--no-sandbox', '--start-maximized'],
});
const page = await browser.newPage();
await page.evaluateOnNewDocument(() => {
Object.defineProperty(navigator, 'webdriver', {
get: () => false,
});
});
await page.goto('https://aliexpress.com', {
waitUntil: 'networkidle0',
timeout: 0,
});
}
run();
scheduleJob('scrape aliexpress', `0 */${hours} * * *`, run);
Related
I have this code:
const puppeteer = require("puppeteer");
(async () => {
const browser = await puppeteer.launch({
headless: false,
ignoreDefaultArgs: ["--disable-extensions"],
});
const page = await browser.newPage();
await page.goto("https://google.com");
await browser.close();
})();
when I run it, the chromium browser opens, but :
Thank you !
I downgraded my puppeteer version to 18.1.0 and runned ly code again and it worked as I expected, Thank you!
When I run this function on my local machine in the Firebase Emulator it works great, but when I deploy it to the cloud I get a lot of:
TimeoutError: Navigation timeout of 30000 ms exceeded
Code is very simple:
const functions = require("firebase-functions");
const puppeteer = require("puppeteer");
exports.myFunc = functions
.runWith({ memory: '2GB' })
.https.onRequest(async (request, response) => {
const browser = await puppeteer.launch({
headless: true,
args: ['--no-sandbox', '--disable-setuid-sandbox']
});
const page = await browser.newPage();
await page.goto("https://example.com", { waitUntil: 'networkidle2' });
const pageContent = await page.content();
await browser.close();
response.send(pageContent);
});
I get a lot of this in the Firebase Functions log:
You can see this in the Google Cloud report too:
Why are so many runs getting timeout?
Imagine keeping track of a page like this? (Open with Chrome, then right click and select Translate to English.)
http://www.tsetmc.com/Loader.aspx?ParTree=151311&i=35366681030756042
When you press F12 and select the Network tab, note that responses are returning—with an interval of about 1 per second—containing the last prices and trades, with these HTTP header details:
{
...
connection: keep-alive
cookies: fooCookie
...
}
I have tried the GOT package with a keep-alive config:
const gotOption = {
keepAlive: true,
maxSockets: 10,
}
await got.get(url, {
agent: {
http: new HttpAgent(gotOption),
https: new HttpsAgent(gotOption),
},
})
I get just the first response, but how can I get new responses?
Is it possible to use Puppeteer for this purpose?
Well, there is a new xhr request being made every 3 to 5 seconds.
You could run a function triggering on that specific event. Intercepting .aspx responses and running your script. Here is a minimal snipet.
let puppeteer = require(`puppeteer`);
(async () => {
let browser = await puppeteer.launch({
headless: true,
});
let page = await browser.newPage(); (await browser.pages())[0].close();
let res = 0;
page.on('response', async (response) => {
if (response.url().includes(`.aspx`)) {
res++;
console.log(`\u001b[1;36m` + `Response ${res}: ${new Date(Date.now())}` + `\u001b[0m`);
};
});
await page.goto('http://www.tsetmc.com/Loader.aspx?ParTree=151311&i=35366681030756042');
//await browser.close();
})();
When I try to run node app.js, I get error:
the message is Failed to launch the browser process! spawn
/Users/iliebogdanbarbulescu/Downloads/firstProject/node_modules/chromium/lib/chromium/chrome-mac/Chromium.app
EACCES
What I did
I checked the folder at /Users/iliebogdanbarbulescu/Downloads/firstProject/node_modules/chromium/lib/chromium/chrome-mac/Chromium.app and the file is not zipped. It can be run.
Note:
If I try to execute without the path, it works, but
I would like to use either Chrome or Chromium to open a new page.
const browser = await puppeteer.launch({headless:false'});
const express = require('express');
const puppeteer = require('puppeteer');
const app = express();
(async () => {
const browser = await puppeteer.launch({headless:false, executablePath:'/Users/iliebogdanbarbulescu/Downloads/firstProject/node_modules/chromium/lib/chromium/chrome-mac/Chromium.app'});
const page = await browser.newPage();
await page.goto('https://google.com', {waitUntil: 'networkidle2'});
})().catch((error) =>{
console.error("the message is " + error.message);
});
app.listen(3000, function (){
console.log('server started');
})
If you navigate to chrome://version/ page in this exact browser, it will show the Executable Path which is the exact string you need to use as executablePath puppeteer launch option.
Usually, chrome's path looks like this on MAC:
/Applications/Google Chrome.app/Contents/MacOS/Google Chrome
Or something like this if chromium is located in your node_modules folder:
/Users/iliebogdanbarbulescu/Downloads/firstProject/node_modules/chromium/lib/chromium/chrome-mac/Chromium.app/Contents/MacOS/Chromium
Now if you compare the string you used for executablePath: it differs from the one retrieved with the method mentioned above. Exactly the /Contents/MacOS/Chromium should be added to the end of the current path to make it work.
Note: the chromium bundled with puppeteer is the version guaranteed to work together with the actual pptr version: if you plan to use other chrome/or chromium-based browsers you might experience unexpected issues.
Following up on #theDavidBarton:
Chromium which was shipped with Puppeteer did not work, but the Chrome installation on my MacBook did work.
OS: OS-X 10.15.7 (Catalina)
Node version: v14.5.0
Failed code:
const browser = await puppeteer.launch({
headless: true,
executablePath: "/users/bert/Project/NodeJS/PuppeteerTest/node_modules/puppeteer/.local-chromium/mac-818858/chrome-mac/Chromium.app/Contents/MacOS/Chromium"
});
Successful code:
const browser = await puppeteer.launch({
headless: true,
executablePath: "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"
});
Full code, Just the first example on the Puppeteer website:
const puppeteer = require('puppeteer');
(async () => {
try {
const browser = await puppeteer.launch({headless: true, executablePath: "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"});
const page = await browser.newPage();
await page.goto('https://example.com');
await page.screenshot({path: 'example.png'});
await browser.close();
} catch (err) {
console.log(err);
}
})();
And, yes, I got the Screenshot !! :-)
Using location-chrome: https://www.npmjs.com/package/locate-chrome
const locateChrome = require('locate-chrome');
const executablePath = await new Promise(resolve => locateChrome(arg => resolve(arg)));
const browser = await puppeteer.launch({ executablePath });
Tried looking through the docs, but didn't find a way to set a max timeout for a test case. Seems like a simple feature.
import puppeteer from 'puppeteer'
test('App loads', async() => {
const browser = await puppeteer.launch({ headless: false, slowMo: 250 });
const page = await browser.newPage();
await page.goto('http://localhost:3000');
await browser.close();
});
Jest's test(name, fn, timeout) function can take a 3rd parameter that specifies a custom timeout.
test('example', async () => {
...
}, 1000); // timeout of 1s (default is 5s)
Source: https://github.com/facebook/jest/issues/5055#issuecomment-350827560
You can also set the timeout globally for a suite using jest.setTimeout(10000); in the beforeAll() function:
beforeAll(async () => {
jest.setTimeout(10000); // change timeout to 10 seconds
});