Puppeteer nodejs Error: input.on is not a function\n at new Interface - node.js

M generating a pdf from html using the npm module puppeteer.
When the running the following code m getting an error.
It is working properly on windows , but when the same is executed on linux red hat server , it is giving an error
let poptions = {
path: pdfPath, scale: 0.8, printBackground: true, format: "letter"
,"margin": {
"bottom": 70,
"left": 25,
"right": 35,
"top": 70,
},
landscape:true
}
console.log(htmlPath);
const browser = await puppeteer.launch({ args: [
'--no-sandbox'
],"dumpio": true})
const page = await browser.newPage();
page.on('console', (msg) => console.log('PAGE LOG:', msg.text()));
await page.goto(htmlPath);
// await page.emulateMedia('print');
poption=Object.assign(poptions,pageoptions)
if(pageStyle)await page.addStyleTag(pageStyle);
const pdf = await page.pdf(poptions);
await browser.close();
Error: input.on is not a function
at new Interface (readline.js:207:11)
at Object.createInterface (readline.js:75:10)
at Promise (/microservice/node_modules/puppeteer/lib/Launcher.js:329:25)
at new Promise ()
at waitForWSEndpoint (/microservice/node_modules/puppeteer/lib/Launcher.js:326:10)
at Launcher.launch (/microservice/node_modules/puppeteer/lib/Launcher.js:170:41)

Used the following parameters while launching the chrome.--disable-setuid-sandbox resolved the issue
const browser = await puppeteer.launch({ args: [
'--no-sandbox',
'--disable-setuid-sandbox',
'--headless',
'--disable-dev-shm-usage',
'--disable-gpu',
'--disable-features=NetworkService',
'--window-size=1920x1080',
'--disable-features=VizDisplayCompositor',
'--log-file=/home/ec2-user/credence/microservices/reporting-server/log/server.log',
'--log-level=0'
],"dumpio": true})

Related

Puppeteer performance issue

I'm testing Puppeteer for large scale PDF generation, I'm testing 5 requests per second for 1 min, the problem is that the puppeteer is causing the VM to consume all resources and crash.
code
async generatePdf({
url,
displayHeaderFooter,
headerContent,
footerContent,
width,
height,
marginTop,
marginRight,
marginBottom,
marginLeft}: IPDFRepositoryDTO): Promise<Buffer> {
if (!global.browser) {
await launchPuppeteer();
}
const page = await global.browser.newPage();
await page.goto(url, {
waitUntil: 'networkidle0',
});
const isDisplayHeaderFooter =
displayHeaderFooter != undefined ? displayHeaderFooter : true;
const pageOptions = {
printBackground: true,
format: 'Letter',
displayHeaderFooter: isDisplayHeaderFooter,
headerTemplate: headerContent,
footerTemplate: footerContent,
width,
height,
margin: {
top: marginTop || '80px',
bottom: marginBottom || '80px',
left: marginLeft || '20px',
right: marginRight || '20px',
},
} as PDFOptions;
const pdf = await page.pdf(pageOptions);
await page.close();
return pdf;}
function launch puppeteer:
export const launchPuppeteer = async () => {
global.browser = await launch({
args: [
// '--no-sandbox',
// '--disable-setuid-sandbox',
// '--disable-dev-shm-usage',
// '--disable-accelerated-2d-canvas',
// '--no-first-run',
// '--no-zygote',
// '--single-process',
// '--disable-gpu',
'--no-sandbox',
'--disable-accelerated-2d-canvas',
'--no-zygote',
'--single-process',
'--disable-gpu',
'--disable-canvas-aa', // Disable antialiasing on 2d canvas
'--disable-2d-canvas-clip-aa', // Disable antialiasing on 2d canvas clips
'--disable-gl-drawing-for-tests', // BEST OPTION EVER! Disables GL drawing operations which produce pixel output. With this the GL output will not be correct but tests will run faster.
'--disable-dev-shm-usage',
'--use-gl=swiftshader', // better cpu usage with --use-gl=desktop rather than --use-gl=swiftshader, still needs more testing.
'--enable-webgl',
'--hide-scrollbars',
'--mute-audio',
'--no-first-run',
'--disable-infobars',
'--disable-breakpad',
//'--ignore-gpu-blacklist',
// '--window-size=1280,1024', // see defaultViewport
// '--user-data-dir=./chromeData', // created in index.js, guess cache folder ends up inside too.
'--disable-setuid-sandbox'
],
});
console.log('browser ready');
};
For the tests I'm using a google cloud VM that has 4 cores and 4GB of ram, e2-highcpu-4 , running Ubuntu
enter image description here

Node puppeteer scraping YouTube and encountering redirected you too many times

I'm trying to scrape a YouTube playlists URL using Node / puppeteer. It was working, but now I'm getting ERR_TOO_MANY_REDIRECTS error. I can still access the page using chrome from my desktop.
I've tried using the chromium browser and chrome browsers. I've also tried using the puppeteer-extra stealth plugin and the random-useragent.
This is how my code stand at the moment:
const browser = await puppeteer.launch({
stealth: true,
headless: false // true,
executablePath: "C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe",
args: [
'--disable-notifications', '--disable-features=site-per-process'
],
defaultViewport: null
});
const page = await browser.newPage()
await page.setUserAgent(random_useragent.getRandom());
await page.goto(<playlist-url, {
waitUntil: 'networkidle2',
timeout: 0
})
await page.waitForSelector('button[aria-label="Agree to the use of cookies and other data for the purposes described"')
It at the page.goto it bombs. And it happens even if I try going to https://www.youtube.com.
Any suggestions what I should try next. I tried a proxy server but couldn't get it to work. I suspect I need a proxy to actually route through.
If all you need is playlist IDs for a given channel, it's possible to query a feed at:
https://youtube.com/feeds/videos.xml?channel_id=<Channel ID>
To get IDs of videos you can query a feed at:
https://youtube.com/feeds/videos.xml?playlist_id=PLAYLIST_ID
You can get playlists (and Mixes) links from YouTube like in the code example below (also check full code the online IDE):
const puppeteer = require("puppeteer-extra");
const StealthPlugin = require("puppeteer-extra-plugin-stealth");
puppeteer.use(StealthPlugin());
const searchString = "java course";
const requestParams = {
baseURL: `https://www.youtube.com`,
encodedQuery: encodeURI(searchString), // what we want to search for in URI encoding
};
async function fillPlaylistsDataFromPage(page) {
const dataFromPage = await page.evaluate((requestParams) => {
const mixes = Array.from(document.querySelectorAll("#contents > ytd-radio-renderer")).map((el) => ({
title: el.querySelector("a > h3 > #video-title")?.textContent.trim(),
link: `${requestParams.baseURL}${el.querySelector("a#thumbnail")?.getAttribute("href")}`,
videos: Array.from(el.querySelectorAll("ytd-child-video-renderer a")).map((el) => ({
title: el.querySelector("#video-title")?.textContent.trim(),
link: `${requestParams.baseURL}${el.getAttribute("href")}`,
length: el.querySelector("#length")?.textContent.trim(),
})),
thumbnail: el.querySelector("a#thumbnail #img")?.getAttribute("src"),
}));
const playlists = Array.from(document.querySelectorAll("#contents > ytd-playlist-renderer")).map((el) => ({
title: el.querySelector("a > h3 > #video-title")?.textContent.trim(),
link: `${requestParams.baseURL}${el.querySelector("a#thumbnail")?.getAttribute("href")}`,
channel: {
name: el.querySelector("#channel-name a")?.textContent.trim(),
link: `${requestParams.baseURL}${el.querySelector("#channel-name a")?.getAttribute("href")}`,
},
videoCount: el.querySelector("yt-formatted-string.ytd-thumbnail-overlay-side-panel-renderer")?.textContent.trim(),
videos: Array.from(el.querySelectorAll("ytd-child-video-renderer a")).map((el) => ({
title: el.querySelector("#video-title")?.textContent.trim(),
link: `${requestParams.baseURL}${el.getAttribute("href")}`,
length: el.querySelector("#length")?.textContent.trim(),
})),
thumbnail: el.querySelector("a#thumbnail #img")?.getAttribute("src"),
}));
return [...mixes, ...playlists];
}, requestParams);
return dataFromPage;
}
async function getYoutubeSearchResults() {
const browser = await puppeteer.launch({
headless: false,
args: ["--no-sandbox", "--disable-setuid-sandbox"],
});
const page = await browser.newPage();
const URL = `${requestParams.baseURL}/results?search_query=${requestParams.encodedQuery}`;
await page.setDefaultNavigationTimeout(60000);
await page.goto(URL);
await page.waitForSelector("#contents > ytd-video-renderer");
const playlists = await fillPlaylistsDataFromPage(page);
await browser.close();
return playlists;
}
getYoutubeSearchResults().then(console.log);
📌Note: to get thumbnail you need to scroll playlist into view (using .scrollIntoView() method).
Output:
[
{
"title":"Java Complete Course | Placement Series",
"link":"https://www.youtube.com/watch?v=yRpLlJmRo2w&list=PLfqMhTWNBTe3LtFWcvwpqTkUSlB32kJop",
"channel":{
"name":"Apna College",
"link":"https://www.youtube.com/c/ApnaCollegeOfficial"
},
"videoCount":"35",
"videos":[
{
"title":"Introduction to Java Language | Lecture 1 | Complete Placement Course",
"link":"https://www.youtube.com/watch?v=yRpLlJmRo2w&list=PLfqMhTWNBTe3LtFWcvwpqTkUSlB32kJop",
"length":"18:46"
},
{
"title":"Variables in Java | Input Output | Complete Placement Course | Lecture 2",
"link":"https://www.youtube.com/watch?v=LusTv0RlnSU&list=PLfqMhTWNBTe3LtFWcvwpqTkUSlB32kJop",
"length":"42:36"
}
],
"thumbnail":null
},
{
"title":"Java Tutorials For Beginners In Hindi",
"link":"https://www.youtube.com/watch?v=ntLJmHOJ0ME&list=PLu0W_9lII9agS67Uits0UnJyrYiXhDS6q",
"channel":{
"name":"CodeWithHarry",
"link":"https://www.youtube.com/c/CodeWithHarry"
},
"videoCount":"113",
"videos":[
{
"title":"Introduction to Java + Installing Java JDK and IntelliJ IDEA for Java",
"link":"https://www.youtube.com/watch?v=ntLJmHOJ0ME&list=PLu0W_9lII9agS67Uits0UnJyrYiXhDS6q",
"length":"19:00"
},
{
"title":"Basic Structure of a Java Program: Understanding our First Java Hello World Program",
"link":"https://www.youtube.com/watch?v=zIdg7hkqNE0&list=PLu0W_9lII9agS67Uits0UnJyrYiXhDS6q",
"length":"14:09"
}
],
"thumbnail":null
}
]
You can read more about scraping YouTube playlists from blog post Web scraping YouTube secondary search results with Nodejs.

puppeteer: chrome is being controlled by automated test software [duplicate]

This question already has an answer here:
Puppeteer Chrome is being controlled by automated test software
(1 answer)
Closed 1 year ago.
I am not abel to disable the info bar using below arguments. Please help me fix this in puppeteer. Thanks
const getPage = async () => {
browser = await puppeteer.launch({
headless: false,
defaultViewport: null,
args: ['--start-maximized', 'disable-gpu', '--disable-infobars', '--disable-extensions', '--ignore-certificate-errors'],
}); // default is true
const [page] = await browser.pages();
return page;
};
You need to disable the enable-automation default launch arg as well:
const options = {
args: ['--start-maximized', 'disable-gpu', '--disable-infobars', '--disable-extensions', '--ignore-certificate-errors'],
headless: false,
ignoreDefaultArgs: ['--enable-automation'],
defaultViewport: null,
};
browser = await puppeteer.launch(options);

Package puppeteer don't work properly (centos 7)

I used the package puppeteer to generate pdf in nodejs in local developement i use lubuntu 18.04 and every thing works great but in production after deploying the code in centos 7 it works sometimes but the majority of time it doesn't work.
i installed necessary packages of chromium using this command:
sudo yum install pango.x86_64 libXcomposite.x86_64 libXcursor.x86_64 libXdamage.x86_64 libXext.x86_64 libXi.x86_64 libXtst.x86_64 cups-libs.x86_64 libXScrnSaver.x86_64 libXrandr.x86_64 GConf2.x86_64 alsa-lib.x86_64 atk.x86_64 gtk3.x86_64 ipa-gothic-fonts xorg-x11-fonts-100dpi xorg-x11-fonts-75dpi xorg-x11-utils xorg-x11-fonts-cyrillic xorg-x11-fonts-Type1 xorg-x11-fonts-misc
I added the argument {args:['--no-sandbox']} but the same problem.
This is my code :
const generator = (data, fileName, prepa) =>
new Promise(async (resolve, reject) => {
try {
const browser = await puppeteer.launch({ args: ["--no-sandbox"] });
const page = await browser.newPage();
let content;
if (prepa === true) {
content = templateCarnetPrepa(data);
} else {
content = templateCarnet(data);
}
await page.setContent(content);
await page.addStyleTag({ path: "./bootstrap.min.css" });
await page.emulateMediaType("screen");
await page.pdf({
path: `./${ENV.MEDIA_STORAGE}/carnet/${fileName}.pdf`,
format: "A4",
margin: { top: 20, left: 20, right: 20, bottom: 20 },
printBackground: true,
});
await browser.close();
//process.exit()
resolve(0);
} catch (e) {
console.log(e);
reject(e);
}
});
Does any one have a solution to this problem ?

While using capture-website, puppeteer with webpack throwing error:browser not downloaded, at ChromeLauncher.launch (webpack-internal)

I am trying to take screenshot by providing html file on node js.
I have used capture-website package.
Here is the code:
try{
await captureWebsite.file('file.html', 'file.png', {overwrite: true}, function (error) {
if (error) {
console.log('error',error);
}
});
}catch(e){
console.log('error in capture image:', e)
}
Version:
node:12.16.1
capture-website: "0.8.1"
Angular : 7
You can do something like:
async function screenshotFromHtml ({ html, timeout = 2000 }: ScreenshotOptions) {
const browser = await puppeteer.launch({
headless: !process.env.DEBUG_HEADFULL,
args: [
'--no-sandbox',
'--disable-setuid-sandbox',
]
})
const page = await browser.newPage()
// Set viewport to something big
// Prevents Carbon from cutting off lines
await page.setViewport({
width: 2560,
height: 1080,
deviceScaleFactor: 2
})
page.setContent(html)
const base64 = await page.screenshot({ encoding: "base64" }) as string;
// Wait some more as `waitUntil: 'load'` or `waitUntil: 'networkidle0'
await page.waitFor(timeout)
// Close browser
await browser.close()
return base64
}
This code is in typescript, but you can use the function body in your JS project
In this github file you can see a html render code too

Resources