Install a Firefox browser binary for Puppeteer in Docker - node.js

I'm trying to run a script with a dependency of https://github.com/pevers/images-scraper in a Docker container. I can run the script on its own (on my machine which has Node 14.15.4, doing node src/index.js which will yield the console outout of A), but in the context of docker it is not working - I am getting an error message about a missing Firefox binary.
Dockerfile:
FROM node:14
WORKDIR /usr/src/app
COPY package*.json ./
RUN npm install
RUN PUPPETEER_PRODUCT=firefox npm install puppeteer
COPY . .
CMD [ "node", "src/index.js" ]
src/index.js:
let Scraper = require("images-scraper");
const google = new Scraper({
puppeteer: {
userAgent:
"Mozilla/5.0 (X11; Linux i686; rv:64.0) Gecko/20100101 Firefox/64.0",
headless: true,
safe: true,
},
});
(async () => {
const A = await google.scrape("bananas", 200);
console.log(A)
})();
package.json
{
"name": "test",
"version": "1.0.0",
"scripts": {
"test": "node src/index.js"
},
"dependencies": {
"images-scraper": "^6.2.1"
}
}
When I /bin/bash into my container and do node src/index.js I get the error:
at ChromeLauncher.launch (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/node/Launcher.js:79:23)
at async GoogleScraper.scrape (/usr/src/app/node_modules/images-scraper/src/google/scraper.js:53:21)
at async /usr/src/app/src/index.js:19:13
New to using Node with Docker - I was wondering where I am going wrong here. Thanks

Install firefox execution for your container:
FROM node:14
RUN apt-get update \
&& apt-get install -y wget gnupg fonts-ipafont-gothic fonts-freefont-ttf firefox-esr --no-install-recommends \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /usr/src/app
COPY package*.json ./
RUN npm install
RUN PUPPETEER_PRODUCT=firefox npm install puppeteer
COPY . .
CMD [ "node", "src/index.js" ]

the error state that there is errors by running chrome not Firefox ....
so make sure that you have all chrome dependencies installed in your Container:
ca-certificates
fonts-liberation
libappindicator3-1
libasound2
libatk-bridge2.0-0
libatk1.0-0
libc6
libcairo2
libcups2
libdbus-1-3
libexpat1
libfontconfig1
libgbm1
libgcc1
libglib2.0-0
libgtk-3-0
libnspr4
libnss3
libpango-1.0-0
libpangocairo-1.0-0
libstdc++6
libx11-6
libx11-xcb1
libxcb1
libxcomposite1
libxcursor1
libxdamage1
libxext6
libxfixes3
libxi6
libxrandr2
libxrender1
libxss1
libxtst6
lsb-release
wget
xdg-utils
source : Here

Following on from #LinPy's comment, adapting my Dockerfile per the puppeteer documentation worked:
FROM alpine:edge
# Installs latest Chromium (89) package.
RUN apk add --no-cache \
chromium \
nss \
freetype \
freetype-dev \
harfbuzz \
ca-certificates \
ttf-freefont \
nodejs \
nodejs-npm \
yarn
# Tell Puppeteer to skip installing Chrome. We'll be using the installed package.
ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=true \
PUPPETEER_EXECUTABLE_PATH=/usr/bin/chromium-browser
# Puppeteer v6.0.0 works with Chromium 89.
RUN yarn add puppeteer#6.0.0
...
As well I added args: ['--no-sandbox', '--disable-setuid-sandbox'] to the Scraper constructor. Thanks

Related

Error: Failed to launch the browser process! spawn /usr/bin/chromium-browser

I want to generate pdf from puppeteer package. I am using docker and nestjs as language. When I try to run the code, it says Error: Failed to launch the browser process! spawn /usr/bin/chromium-browser ENOENT
My current OS is macos m1 chip
Now, I have tried many things
Used these and also changes headless property to false;
puppeteer.launch({
dumpio: true,
headless: true,
executablePath: "/usr/bin/chromium-browser",
args: ["--disable-setuid-sandbox", "--no-sandbox", "--disable-gpu"],
});
Dockerfile
FROM node:14.18.1 As development
WORKDIR /usr/src/app
COPY package*.json ./
RUN apt-get update && apt-get install -y build-essential libcairo2-dev libpango1.0-dev libjpeg-dev libgif-dev librsvg2-dev
RUN apt-get install -y \
fonts-liberation \
gconf-service \
libappindicator1 \
libasound2 \
libatk1.0-0 \
libcairo2 \
libcups2 \
libfontconfig1 \
libgbm-dev \
libgdk-pixbuf2.0-0 \
libgtk-3-0 \
libicu-dev \
libjpeg-dev \
libnspr4 \
libnss3 \
libpango-1.0-0 \
libpangocairo-1.0-0 \
libpng-dev \
libx11-6 \
libx11-xcb1 \
libxcb1 \
libxcomposite1 \
libxcursor1 \
libxdamage1 \
libxext6 \
libxfixes3 \
libxi6 \
libxrandr2 \
libxrender1 \
libxss1 \
libxtst6 \
xdg-utils
RUN wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
RUN dpkg -i google-chrome-stable_current_amd64.deb; apt-get -fy install
RUN npm i puppeteer
RUN chmod -R o+rwx node_modules/puppeteer/.local-chromium
ENV PUPPETEER_EXECUTABLE_PATH=/usr/bin/chromium-browser
RUN npm install --only=development
COPY . .
RUN npm install rimraf
RUN npm run build
FROM node:14.2.0 as production
ARG NODE_ENV=production
ENV NODE_ENV=${NODE_ENV}
WORKDIR /usr/src/app
COPY package*.json ./
RUN npm install --only=production
COPY . .
COPY --from=development /usr/src/app/dist ./dist
CMD ["node", "dist/main"]
Here is my code
const browser = await puppeteer.launch({
dumpio: true,
headless: true,
executablePath: "/usr/bin/chromium-browser",
args: ["--disable-setuid-sandbox", "--no-sandbox", "--disable-gpu"],
});
// create a new page(opening a new tab with new page)
const page = await browser.newPage();
// set html inside page
await page.setContent(html);
// provide some options(eg: width, height) to page
const pdfBuffer = await page.pdf(options);
// clear out our Puppeteer instance to free memory
await page.close();
await browser.close();
console.log("PDF buffer is", pdfBuffer);
return pdfBuffer;
} catch (error) {
console.log("Error is", error);
}
There many solutions for these, I tried most of the links, but none of them are working out. Its almost been 2hrs, but I am not able to reach solution for it.
Some of the things are tried out:
https://github.com/puppeteer/puppeteer/blob/main/docs/troubleshooting.md#running-puppeteer-in-docker
https://www.cloudsavvyit.com/13461/how-to-run-puppeteer-and-headless-chrome-in-a-docker-container/

How to record Video with Audio from a puppeteer instance from a docker container

I'm trying to build a simple server that basically:
opens a webpage with a puppeteer instance.
records the webpage and save the video file.
I was able to implement this with puppeteer-stream and it works perfectly locally.
While trying to put this in a docker instance deployed to AWS Elasticbeanstalk, I ran into an issue where I couldn't spin up the chrome browser from the docker container but that was fixed with the help of xvfb - now it launches a fake UI for the chrome tabs and records that inside.
At this point, I can record video perfectly but it has NO audio. I tried setting up PulseAudio as a virtual audio driver but it doesn't work either.
Another issue with PulseAudio is that it's going to record the entire process instead of a chrome tab so it's going to be chaotic if we decide to record multiple web pages concurrently.
Here's what my Dockerfile looks like at the moment:
FROM node:12
# Install dependencies
RUN apt-get update &&\
apt-get install -yq gconf-service libasound2 libatk1.0-0 libc6 libcairo2 libcups2 libdbus-1-3 \
libexpat1 libfontconfig1 libgcc1 libgconf-2-4 libgdk-pixbuf2.0-0 libglib2.0-0 libgtk-3-0 libnspr4 \
libpango-1.0-0 libpangocairo-1.0-0 libstdc++6 libx11-6 libx11-xcb1 libxcb1 libxcomposite1 \
libxcursor1 libxdamage1 libxext6 libxfixes3 libxi6 libxrandr2 libxrender1 libxss1 libxtst6 \
ca-certificates fonts-liberation libappindicator1 libnss3 lsb-release xdg-utils wget \
xvfb pulseaudio x11vnc x11-xkb-utils xfonts-100dpi xfonts-75dpi xfonts-scalable xfonts-cyrillic x11-apps
# Cd into /app
WORKDIR /app
# Copy package.json into the app folder
COPY package.json /app
# Install dependencies
RUN npm config set PUPPETEER_SKIP_CHROMIUM_DOWNLOAD false
RUN npm config set ignore-scripts false
RUN npm install
COPY . /app
# Start server on port 80
EXPOSE 80
# Creating Display
ENV DISPLAY :99
# Start script on Xvfb
CMD Xvfb :99 -screen 0 1920x1080x24 & pulseaudio --daemonize & yarn start
I also tried a solution where you have to create a new usergroup & give them audio and video access but it doesn't work either:
RUN groupadd -r pptruser && useradd -r -g pptruser -G audio,video pptruser \
&& mkdir -p /home/pptruser/Downloads \
&& chown -R pptruser:pptruser /home/pptruser \
&& chown -R pptruser:pptruser ./node_modules
I also checked this Node puppeteer stream not recording audio out but that’s not my case.
Is there any way I can achieve building a server that can record audio AND video on multiple webpages in a Docker container?
REMINDER - it works locally without docker (locally). It just doesn't work inside docker and we need to use Docker for it to work on aws.
Appreciate the help!

How to Fix Could not find browser revision 756035

I am using puppeteer to execute some test cases in Docker and I get this below error:
"before all" hook: codeceptjs.beforeSuite for "Validate_onbord_broker_business":
Could not find browser revision 756035. Run "npm install" or "yarn install" to download a browser binary.
at ChromeLauncher.launch (node_modules/puppeteer/lib/Launcher.js:59:23)
at async Puppeteer._startBrowser (node_modules/codeceptjs/lib/helper/Puppeteer.js:512:22)
And this is the Dockerfile I am using:
# Use whatever version you are running locally (see node -v)
FROM node:12.18
WORKDIR /app
RUN apt-get update \
&& apt-get install -y wget gnupg ca-certificates \
&& wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - \
&& sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' \
&& apt-get update \
&& apt-get install -y google-chrome-stable \
&& rm -rf /var/lib/apt/lists/* \
&& wget --quiet https://raw.githubusercontent.com/vishnubob/wait-for-it/master/wait-for-it.sh -O /usr/sbin/wait-for-it.sh \
&& chmod +x /usr/sbin/wait-for-it.sh
# Install dependencies (you are already in /app)
COPY package.json package-lock.json ./
# RUN npm ci
RUN npm install
ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD true
RUN npm install codeceptjs puppeteer
COPY . /app
RUN pwd
RUN ls
# RUN npx codeceptjs init
RUN npx codeceptjs run
# CMD ["npm", "start"]
Can someone please help me what is going wrong?
I think the problem is that you're skipping downloading a version of chromium when you install your dependencies: ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD true. Puppeteer is only guaranteed to work with the bundled version of chromium and it's generally a bad idea to use some other version of chromium.
Based on your comments on your post, I'm guessing that the reason you're not using the bundled version of Chromium is because you're having "fun" with dependencies. These can be manually installed using apt - here are the dependencies which are installed by the official puppeteer dockerfile:
apt-get -y install xvfb gconf-service libasound2 libatk1.0-0 libc6 libcairo2 libcups2 \
libdbus-1-3 libexpat1 libfontconfig1 libgbm1 libgcc1 libgconf-2-4 libgdk-pixbuf2.0-0 libglib2.0-0 \
libgtk-3-0 libnspr4 libpango-1.0-0 libpangocairo-1.0-0 libstdc++6 libx11-6 libx11-xcb1 libxcb1 \
libxcomposite1 libxcursor1 libxdamage1 libxext6 libxfixes3 libxi6 libxrandr2 libxrender1 libxss1 \
libxtst6 ca-certificates fonts-liberation libappindicator1 libnss3 lsb-release xdg-utils wget
I think this is the updated Dockerfile - let me know if it works because I can't test it without the rest of your code!
# Use whatever version you are running locally (see node -v)
FROM node:12.18
WORKDIR /app
RUN apt-get update \
&& apt-get -y install xvfb gconf-service gnupg libasound2 libatk1.0-0 libc6 libcairo2 libcups2 \
libdbus-1-3 libexpat1 libfontconfig1 libgbm1 libgcc1 libgconf-2-4 libgdk-pixbuf2.0-0 libglib2.0-0 \
libgtk-3-0 libnspr4 libpango-1.0-0 libpangocairo-1.0-0 libstdc++6 libx11-6 libx11-xcb1 libxcb1 \
libxcomposite1 libxcursor1 libxdamage1 libxext6 libxfixes3 libxi6 libxrandr2 libxrender1 libxss1 \
libxtst6 ca-certificates fonts-liberation libappindicator1 libnss3 lsb-release xdg-utils wget \
&& wget --quiet https://raw.githubusercontent.com/vishnubob/wait-for-it/master/wait-for-it.sh -O /usr/sbin/wait-for-it.sh \
&& chmod +x /usr/sbin/wait-for-it.sh
# Install dependencies (you are already in /app)
COPY package.json package-lock.json ./
# RUN npm ci
RUN npm install
RUN npm install codeceptjs puppeteer
COPY . /app
RUN pwd
RUN ls
# RUN npx codeceptjs init
RUN npx codeceptjs run
# CMD ["npm", "start"]

Running Headless Chrome Puppeteer and Xfvb in a Node.js Docker Container, trouble running image

Im trying to run a Puppeteer script in a docker container with Xfvb so that I can run headless: false on my production app, which is the only way my script will get the required output from the site I am scraping. Im having trouble getting the docker image to run after Ive built it. I originally following this article's process: http://www.smartjava.org/content/using-puppeteer-in-docker-copy-2/
but I was getting the error
Error: Could not find browser revision 818858 Run "PUPPETEER_PRODUCT=firefox npm install"
once I tried to run the image, which didnt make much sense to me but it seems like the bundled version of Chromium didnt have proper dependencies according to this article
Running Puppeteer in Docker:
https://github.com/puppeteer/puppeteer/blob/main/docs/troubleshooting.md#running-puppeteer-in-docker
So I modified my dockerfile with the ideas they used in their example, which allowed me to build my container with no errors. But when I run the image I get an error that Im stumped on. I believe it has to do with Xfvb.
_XSERVTransmkdir: ERROR: euid != 0,directory /tmp/.X11-unix will not be created.
Dockerfile:
FROM node:latest
# update and add all the steps for running with xvfb
RUN apt-get update &&\
apt-get install -yq gconf-service libasound2 libatk1.0-0 libc6 libcairo2 libcups2 libdbus-1-3 \
libexpat1 libfontconfig1 libgcc1 libgconf-2-4 libgdk-pixbuf2.0-0 libglib2.0-0 libgtk-3-0 libnspr4 \
libpango-1.0-0 libpangocairo-1.0-0 libstdc++6 libx11-6 libx11-xcb1 libxcb1 libxcomposite1 \
libxcursor1 libxdamage1 libxext6 libxfixes3 libxi6 libxrandr2 libxrender1 libxss1 libxtst6 \
ca-certificates fonts-liberation libappindicator1 libnss3 lsb-release xdg-utils wget \
xvfb x11vnc x11-xkb-utils xfonts-100dpi xfonts-75dpi xfonts-scalable xfonts-cyrillic x11-apps
# this installs the necessary libs to make the bundled version of Chromium
# that Puppeteer installs, work.
RUN apt-get update \
&& apt-get install -y wget gnupg \
&& wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - \
&& sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' \
&& apt-get update \
&& apt-get install -y google-chrome-stable fonts-ipafont-gothic fonts-wqy-zenhei fonts-thai-tlwg fonts-kacst fonts-freefont-ttf libxss1 \
--no-install-recommends \
&& rm -rf /var/lib/apt/lists/*
ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD true
# add the required dependencies
WORKDIR /app
COPY node_modules /app/node_modules
RUN npm install puppeteer \
# Add user so we don't need --no-sandbox.
# same layer as npm install to keep re-chowned files from using up several hundred MBs more space
&& groupadd -r pptruser && useradd -r -g pptruser -G audio,video pptruser \
&& mkdir -p /home/pptruser/Downloads \
&& chown -R pptruser:pptruser /home/pptruser
# && chown -R pptruser:pptruser /node_modules
# Run everything after as non-privileged user.
USER pptruser
# Finally copy the build application
COPY . .
# make sure we can run without a UI
ENV DISPLAY :99
CMD Xvfb :99 -screen 0 1024x768x16 & node ./src/export.js
Package.json
{
"name": "puppeteer-headless",
"version": "1.0.0",
"description": "Headless crawler with simulated UI",
"devDependencies": {
"#types/node": "^14.4.0",
"#types/puppeteer": "^5.4.0"
},
"dependencies": {
"puppeteer": "^5.5.0"
}
}
launch method in my script:
const browser = await puppeteer.launch({
headless: false,
executablePath: 'google-chrome-stable'
});

Deploying a Dockerized node app onto Heroku and using Xvfb with puppeteer, cannot launch browser

Puppeteer version: 5.5
Platform / OS version: Docker container deployed on Heroku
Node.js version: latest
I have set up puppeteer scripts that live in a docker container, so that they can be run with Xvfb, a simulated UI so that the scripts can be run in headless:false mode. The site they crawl causes this to be a requirement.
The docker container builds fine, and the image runs locally. I can run the scripts and the Xfvb simulated display launches and allows the browser to launch as intended. The scripts perform as expected in my local docker image.
Ive deployed this to Heroku and spun up a dyno, and have tried to run the scripts on the dyno. The dyno runs fine, the server launches and listens for requests. When I try to run the node scripts, It seems that the browser isnt able to connect to or launch the simulated display.
Xvfb:
https://www.x.org/releases/X11R7.6/doc/man/man1/Xvfb.1.xhtml
Error when trying to run scripts on the heroku dyno
steveszumski#Steves-MacBook-Pro crawler % heroku run node crawler.js
Running node crawler.js on ⬢ suresale-crawler... up, run.3678 (Free)
Running getReport Headless Script
Error: Failed to launch the browser process!
Fontconfig warning: "/etc/fonts/fonts.conf", line 100: unknown element "blank"
[15:15:1211/184429.509801:ERROR:browser_main_loop.cc(1439)] Unable to open X display.
[1211/184429.534086:ERROR:nacl_helper_linux.cc(307)] NaCl helper process running without a sandbox!
Most likely you need to configure your SUID sandbox correctly
TROUBLESHOOTING: https://github.com/puppeteer/puppeteer/blob/main/docs/troubleshooting.md
at onClose (/app/node_modules/puppeteer/lib/cjs/puppeteer/node/BrowserRunner.js:193:20)
at ChildProcess.<anonymous> (/app/node_modules/puppeteer/lib/cjs/puppeteer/node/BrowserRunner.js:184:79)
at ChildProcess.emit (node:events:388:22)
at Process.ChildProcess._handle.onexit (node:internal/child_process:284:12)
Section of crawler.js that errors
try {
const browser = await puppeteer.launch({
headless: false,
args: [
'--no-sandbox'
]
});
const page = await browser.newPage();
Dockerfile
FROM node:latest
# update and add all the steps for running with xvfb
RUN apt-get update &&\
apt-get install -yq gconf-service libasound2 libatk1.0-0 libc6 libcairo2 libcups2 libdbus-1-3 \
libexpat1 libfontconfig1 libgcc1 libgconf-2-4 libgdk-pixbuf2.0-0 libglib2.0-0 libgtk-3-0 libnspr4 \
libpango-1.0-0 libpangocairo-1.0-0 libstdc++6 libx11-6 libx11-xcb1 libxcb1 libxcomposite1 \
libxcursor1 libxdamage1 libxext6 libxfixes3 libxi6 libxrandr2 libxrender1 libxss1 libxtst6 \
ca-certificates fonts-liberation libappindicator1 libnss3 lsb-release xdg-utils wget \
xvfb x11vnc x11-xkb-utils xfonts-100dpi xfonts-75dpi xfonts-scalable xfonts-cyrillic x11-apps \
&& apt-get install -y wget gnupg \
&& wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - \
&& sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' \
&& apt-get update \
&& apt-get install -y google-chrome-stable fonts-ipafont-gothic fonts-wqy-zenhei fonts-thai-tlwg fonts-kacst fonts-freefont-ttf libxss1 \
--no-install-recommends \
&& rm -rf /var/lib/apt/lists/*
RUN apt-get install bash
RUN apt-get install python
RUN apt-get install curl
RUN apt-get install openssh-client
#remove zombie images
ADD https://github.com/Yelp/dumb-init/releases/download/v1.2.2/dumb-init_1.2.2_x86_64 /usr/local/bin/dumb-init
RUN chmod +x /usr/local/bin/dumb-init
ENTRYPOINT ["dumb-init", "--"]
WORKDIR /app
# using /tmp in order to read/write to container fs as non-root user
# WORKDIR /tmp
# add the required dependencies - this breaks the build process for some reason, manually npm install below works
# COPY node_modules /app/node_modules
RUN npm install puppeteer \
# Add user so we don't need --no-sandbox.
# same layer as npm install to keep re-chowned files from using up several hundred MBs more space
&& groupadd -r pptruser && useradd -r -g pptruser -G audio,video pptruser \
&& mkdir -p /home/pptruser/Downloads \
&& chown -R pptruser:pptruser /home/pptruser \
&& npm install express \
&& npm install basic-ftp
# && chown -R pptruser:pptruser /node_modules
# For Heroku exec
RUN rm /bin/sh && ln -s /bin/bash /bin/sh
ADD ./.profile.d /app/.profile.d
EXPOSE 3000
# Run everything after as non-privileged user.
USER pptruser
# USER root
# Finally copy the build application
COPY . .
# make sure we can run without a UI
ENV DISPLAY :99
CMD Xvfb :99 -screen 0 1024x768x16 -nolisten unix & node server.js

Resources