How to improve puppeteer startup performance during tests - node.js

I've written a small crawler with the help of Puppeteer.
Now I'm facing the challenge that my tests are rather slowly (> 3 seconds for each test). I've been able to track it down to the launch function of Puppeteer and the usage of Istanbul/nyc.
If I run the test just with mocha, the tests are finished under 400 ms.
But if I additionally use nyc, the duration of the tests exceeds 3000 ms
All that I'm using is
'use strict';
const puppeteer = require('puppeteer');
module.exports = async function startBrowser() {
const options = {
args: [
// '--no-sandbox',
// '--disable-setuid-sandbox',
// '--disable-dev-shm-usage',
// '--disable-accelerated-2d-canvas',
// '--disable-gpu'
],
headless: false // true
};
return await puppeteer.launch(options);
};
Here is the test I'm using:
'use strict';
/* global describe: false, before: false, it: false,
beforeEach: false, afterEach: false, after: false, window: false, document: false */
const assert = require('assert').strict;
const startBrowser = require('../');
const util = require('util');
describe('Puppeteer', function() {
let pageManager;
it('start the browser', async function() {
this.timeout(10000);
console.time('startBrowser');
const browser = await startBrowser();
console.timeEnd('startBrowser');
assert(browser);
console.time('closeBrowser');
await browser.close();
console.timeEnd('closeBrowser');
});
});
I've created a repository with this code and test here.
nyc _mocha ./test/*.test.js runs in ~3500ms, mocha ./test/*.test.js takes only 130ms.
What I've tried so far:
different combination of include/exclude nyc flags
updating to latest versions of Puppeteer, nyc and mocha
removing my Puppeteer arguments
searching for Puppeteer & Istanbul related issues (with not much success)
trying headless: true
bypassing all proxies, see this puppeteer issue
What can I do to have tests with coverage be as fast as the tests alone?
Using:
Ubuntu 19.04
node.js 10.15.3

I've started to debug Puppeteer and these are my findings:
Puppeteer is unsurprisingly using child_process.spawn() to spawn a new browser
nyc is using spawn-wrap for such child processes
spawn-wrap is reading the whole executable (./node_modules/puppeteer/.local-chromium/linux-686378/chrome-linux/chrome) into memory with fs.readFileSync which is taking an unusually long time to finish
spawn-wraps README delivers some kind of an explanation:
The initial wrap call uses synchronous I/O. Probably you should not be using this script in any production environments anyway. Also, this will slow down child process execution by a lot, since we're adding a few layers of indirection.
For me personally the answer is that I cannot get the same performance for running tests with and without code coverage as long as I use nyc/istanbul.
I've given c8 a shot, and the performance is nearly the same and I can still have code coverage.

Please try this also.
'use strict'
const puppeteer = require('puppeteer')
module.exports = async function startBrowser() {
const options = {
args: [
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-dev-shm-usage',
'--disable-accelerated-2d-canvas',
'--no-first-run',
'--no-zygote',
'--single-process', // <- this one doesn't works in Windows
'--disable-gpu'
],
headless: true
}
return await puppeteer.launch(options)
}

./Chromium --headless --disable-gpu --remote-debugging-port=9222 --devtools=false
browser = await puppeteer.connect({
browserWSEndpoint: 'ws://127.0.0.1:9222/devtools/browser/........xxxxxx..',
});
use express to hold it
PS: But ,I can't specify the browserWSEndpoint
the url change every time Chromium --headless restart

Related

Puppeteer nodejs project keeps freezing

I have a nodejs project running puppeteer v13.5.1 which does some webscraping.
After some time (mostly 40-80 minutes) the process freezes without throwing any error. It just stops.
I've added some logs and the strange thing is it happens on different executions.
Sometimes it freezes on
const refreshedHtml = await page.evaluate(() => document.documentElement.innerHTML);
sometimes on
await page.click('button.swiper-button-next');
I've tried many different variations, last one being:
const browser = await puppeteer.launch({
headless: true,
devtools: true,
args: [
'--ignore-certificate-errors',
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-accelerated-2d-canvas',
'--disable-gpu'
]
});
Any help appriecated

Puppeteer newPage() freezes/ never resolves nor rejects Ubuntu Server

I am using an Ubuntu server 18.04.5 LTS and Puppeteer 10.0.0. My Problem is that the browser.newPage() function never resolves. So basicly in the console it alsways loggs Start but never 1 nor 2. I have tried a different puppeteer version or puppeteer-core with my own chromium version. I even installed an VM on my Pc and it works there but not on my Server.
var puppeteer = require('puppeteer')
var adresse = "https://www.google.de/"
async function test() {
try {
const browser = await puppeteer.launch({
"headless": true,
"args": [
'--disable-setuid-sandbox',
'--no-sandbox',
'--disable-gpu',
]
})
console.log("Start")
const page = await browser.newPage()
console.log("1")
await page.goto(adresse)
console.log("2")
console.log(page)
} catch (error) {
console.log(error)
}
}
test()

Puppeteer error: Navigation failed because browser has disconnected

I am using puppeteer on Google App Engine with Node.JS
whenever I run puppeteer on app engine, I encounter an error saying
Navigation failed because browser has disconnected!
This works fine in local environment, so I am guessing it is a problem with app engine.
const browser = await puppeteer.launch({
ignoreHTTPSErrors: true,
headless: true,
args: ["--disable-setuid-sandbox", "--no-sandbox"],
});
This is my app engine's app.yaml file
runtime: nodejs12
env: standard
handlers:
- url: /.*
secure: always
script: auto
-- EDIT--
It works when I add --disable-dev-shm-usage argument, but then it always timeouts. Here are my codes.
const browser = await puppeteer.launch({
ignoreHTTPSErrors: true,
headless: true,
args: [
"--disable-gpu",
"--disable-dev-shm-usage",
"--no-sandbox",
"--disable-setuid-sandbox",
"--no-first-run",
"--no-zygote",
"--single-process",
],
});
const page = await browser.newPage();
try {
const url = "https://seekingalpha.com/market-news/1";
const pageOption = {
waitUntil: "networkidle2",
timeout: 20000,
};
await page.goto(url, pageOption);
} catch (e) {
console.log(e);
await page.close();
await browser.close();
return resolve("error at 1");
}
try {
const ulSelector = "#latest-news-list";
await page.waitForSelector(ulSelector, { timeout: 30000 });
} catch (e) {
// ALWAYS TIMEOUTS HERE!
console.log(e);
await page.close();
await browser.close();
return resolve("error at 2");
}
...
It seems the problem was app engine's memory capacity.
When memory is not enough to deal with puppeteer crawling,
It automatically generates another instance.
However, newly created instance has a different puppeteer browser.
Therefore, it results in Navigation failed because browser has disconnected.
The solution is simply upgrade the app engine instance so it can deal with the crawling job by a single instance.
default instance is F1, which has 256M of memory, so I upgraded to F4, which has 1GB of memery, then it doesn't show an error message anymore.
runtime: nodejs12
instance_class: F4
handlers:
- url: /.*
secure: always
script: auto
For me the error was solved when I stopped using the --use-gl=swiftshader arg.
It is used by default if you use args: chromium.args from chrome-aws-lambda
I was having that error in a deploy, the solution for this problem is change some parameters in waitForNavigation:
{ waitUntil: "domcontentloaded" , timeout: 60000 }

JEST with Express does not finish

I started to write tests with Jest of (nano)express application. The test starts the server at beforeAll() and closes it at afterAll(). I can see that the code is executed, but the JEST process does not end.
test.js
test('end to end test', async () => {
const polls = await axios.get(`http://localhost:3000/bff/polls/last`);
console.log(polls.data);
expect(polls.data).toBeDefined();
});
beforeAll(() => {
app.listen(3000, '0.0.0.0')
.then(r => logger.info("Server started"));
});
afterAll(() => {
if (app.close())
logger.info("Server stopped");
});
Output from npm run test
Test Suites: 1 failed, 1 total
Tests: 1 failed, 1 total
Snapshots: 0 total
Time: 5.625s
Ran all test suites.
Jest did not exit one second after the test run has completed.
This usually means that there are asynchronous operations that weren't stopped in your tests. Consider running Jest with `--detectOpenHandles` to troubleshoot this issue.
When I run with jest --config jest.config.js --detectOpenHandles the test does not finish as well but there is no error and I need to kill it anyway.
The complete source code is there: https://github.com/literakl/mezinamiridici/blob/master/infrastructure/test/api.int.test.js
I have tested separatelly outside of the tests that nanoexpress will terminate the process with app.close() call. So it is JEST related.
Update: the same behaviour with promises
test('end to end test', () => {
const polls = axios.get(`http://localhost:3000/bff/polls/last`);
return expect(polls).resolves.toBeDefined();
});
Update:
Here you can find minimum reproducible repository: https://github.com/literakl/nano-options.git
I have switched from Axios to GotJS and the trouble is still there. When I run the test with npm run test from command line now, it fails with:
Timeout - Async callback was not invoked within the 20000ms timeout specified by jest.setTimeout.Timeout - Async callback was not invoked within the 20000ms timeout specified by jest.setTimeout.Error
When I start the test from WebStorm there is no error but the process keeps running.
UPDATE
My initial thought was that this is a winston related issue but it appears that jest testEnvironment has to be set to node in order for Axios to run propertly using the axios/lib/adapters/http adapter. You can check a related issue here "detect jest and use http adapter instead of XMLhttpRequest".
Set testEnvironment: 'node' inside jest.config.js.
Update create user test to run the done callback function at the end:
describe("user accounts", () => {
test('create user', async (done) => {
// let response = await axios.get(`${API}/users/1234`);
let response = await axios.get(`${API}/users/1234`, getAuthHeader()); // TODO error with Authorization header
expect(response.data.success).toBeTruthy();
expect(response.data.data).toBeDefined();
let profile = response.data.data;
expect(profile.bio.nickname).toMatch("leos");
expect(profile.auth.email).toMatch("leos#email.bud");
// Call done here to let Jest know we re done with the async test call
done();
});
});
The root cause was an open handle of mongodb client. How did I find it?
1) install leakes-handle library
npm install leaked-handles --save-dev
2) import it to your test
require("leaked-handles");
3) the output
tcp handle leaked at one of:
at makeConnection (infrastructure\node_modules\mongodb\lib\core\connection\connect.js:274:20)
tcp stream {
fd: -1,
readable: true,
writable: true,
address: { address: '127.0.0.1', family: 'IPv4', port: 54963 },
serverAddr: null
}
If you cannot find the root cause, you can kill the JEST explicitelly with
jest --config jest.config.js --runInBand --forceExit
Here is another reason for me. I was using Puppeteer and because my target element was hidden the screenshot method threw error:
const browser = await puppeteer.launch({
headless: true,
executablePath: chromiumPath
});
const page = await browser.newPage();
await page.goto(`file://${__dirname}\\my.html`);
const element = await page.$("#my-element");
// This may throw errors
await element.screenshot({path: snapshotFileName});
await browser.close();
So, I made sure that the browser.close() was called no matter what:
try {
await element.screenshot({path: snapshotFileName});
} finally {
await browser.close();
}
Following works for my integration testing with Express, Nodejs, and Jest. Nothing special in package.json either: "test": "jest --verbose". Cheers
afterAll( async () => {
await mongoose.connection.close();
jest.setTimeout(3000);
});
Tests: 6 passed, 6 total
Snapshots: 0 total
Time: 4.818 s, estimated 5 s
Ran all test suites matching /users.test.js/i.

Protractor azure pipeline No element found Error

My protractor tests work correctly on my machine but when start it on Azure pipeline all tests fail with No element found.
Do you have an idea wwhat is the problem
May be i miss something here.That is in my conf.js:
browser.ignoreSynchronization = false;
exports.config = {
allScriptsTimeout: 500000,
// getPageTimeout: 15000,
specs: ['specDAC.js'],
rootElement: 'html',
capabilities: {
'browserName': 'chrome',
chromeOptions: {
args: ["--headless", "--disable-gpu", "--window-size=1200,900"],
binary: process.env.CHROME_BIN
}
},
directConnect: true,
baseUrl: 'http://localhost:4200/',
framework: 'jasmine',
jasmineNodeOpts: {
showColors: true,
defaultTimeoutInterval: 1000000,
Usually when I see 'element not found' it's typically signalizing that the page/AUT is not even loaded. It's hard to say without seeing actual code but I can assume that your test starts with navigating to some page. Try add some logging or wrap this part in to condition (e.g If 'login' button is present => click; else => console.log("something wrong")
The problem is not in the code. The test is work on my machine. The problem is something in the pipeline or in conf.js. The pipeline cannot find any elements. The page is loaded i put an average waiting time.
Ok may be you are right. That is my code you can check it:
it('first test', async function(){
await sleep(2000);
await browser.driver.manage().window().maximize();
await browser.waitForAngularEnabled(false);
await sleep(8000);
// login user
await loginPage.get(testConf.loginUrl);
await sleep(4000);
await loginPage.setLoginCredentials(testConf.mmmClientUser, testConf.password);
The error is not find an element where put my email but locally it is work

Resources