Using the default chrome profile with puppeteer that my chrome app uses - node.js

I'm having issues getting puppeteer to use the default profile that my chrome browser uses. I've tried setting path to the user profile, but when I go to a site with puppeteer that I know is saved with chrome app's userDataDir, there's nothing saved there. What am I doing wrong? I appreciate any help!
const browser = await puppeteer.launch({
headless: false,
userDataDir: 'C:\\Users\\Bob\\AppData\\Local\\Google\\Chrome\\User Data',
}).then(async browser => {
I've also tried userDataDir: 'C:/Users/Phil/AppData/Local/Google/Chrome/User Data',, but still nothing.
UPDATED:
const username = os.userInfo().username;
(async () => {
try {
const browser = await puppeteer.launch({
headless: false, args: [
`--user-data-dir=C:/Users/${username}/AppData/Local/Google/Chrome/User Data`]
}).then(async browser => {

I had same exact issue before. However connecting my script to a real chrome instance helped to solve a lot of problems specially the profile one.
You can see the steps here:
https://medium.com/#jaredpotter1/connecting-puppeteer-to-existing-chrome-window-8a10828149e0
//MACOS
/*
Open this instance first:
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222 --no-first-run --no-default-browser-check --user-data-dir=$(mktemp -d -t 'chrome-remote_data_dir')
// Windows:
- Add this to Target of launching chrome --remote-debugging-port=9222
- Navigate to http://127.0.0.1:9222/json/version
- copy webSocketDebuggerUrl
More Info: https://medium.com/#jaredpotter1/connecting-puppeteer-to-existing-chrome-window-8a10828149e0
*/
// Puppeteer Part
// Always update this socket after running the instance in terminal (look up ^)
and this is abstracted controller written in Typescript, that I always use in any project:
import * as puppeteer from 'puppeteer';
import { Browser } from 'puppeteer/lib/cjs/puppeteer/common/Browser';
import { Page } from 'puppeteer/lib/cjs/puppeteer/common/Page';
import { PuppeteerNode } from 'puppeteer/lib/cjs/puppeteer/node/Puppeteer';
import { getPuppeteerWSUrl } from './config/config';
export default class Puppeteer {
public browser: Browser;
public page: Page;
getBrowser = () => {
return this.browser;
};
getPage = () => {
return this.page;
};
init = async () => {
const webSocketUrl = await getPuppeteerWSUrl();
try {
this.browser = await ((puppeteer as unknown) as PuppeteerNode).connect({
browserWSEndpoint: webSocketUrl,
defaultViewport: {
width: 1920,
height: 1080,
},
});
console.log('BROWSER CONNECTED OK');
} catch (e) {
console.error('BROWSER CONNECTION FAILED', e);
}
this.page = await this.browser.newPage();
this.page.on('console', (log: any) => console.log(log._text));
};
}
Abstracted webosocket fecther:
import axios from "axios";
import { exit } from "process";
export const getPuppeteerWSUrl = async () => {
try {
const response = await axios.get("http://127.0.0.1:9222/json/version");
return response.data.webSocketDebuggerUrl;
} catch (error) {
console.error("Can not get puppeteer ws url. error %j", error);
console.info(
"Make sure you run this command (/Applications/Google Chrome.app/Contents/MacOS/Google Chrome --remote-debugging-port=9222 --no-first-run --no-default-browser-check --user-data-dir=$(mktemp -d -t 'chrome-remote_data_dir')) first on a different shell"
);
exit(1);
}
};
Feel free to adjust the template to suit whatever you enviroment/tools currrently look like.

Related

NodeJS contextBridge receiving results from index.js

I've been able to use the preload.js for sending message to API to get things done. I'm able to get responses just fine from iPC, but I'm not able to relay the responses from iPC back to the renderer and I don't understand what I'm missing.
index.js (main)
// Modules to control application life and create native browser window
const { app, BrowserWindow, remote, ipcMain } = require('electron');
const path = require('path');
const { spawn } = require('child_process');
let mainWindow;
const createWindow = () => {
// Create the browser window.
mainWindow = new BrowserWindow({
width: 800,
height: 600,
webPreferences: {
preload: path.join(__dirname, 'preload.js')
}
});
// and load the index.html of the app.
mainWindow.loadFile('index.html');
// Open the DevTools.
// mainWindow.webContents.openDevTools()
}
// This method will be called when Electron has finished
// initialization and is ready to create browser windows.
// Some APIs can only be used after this event occurs.
app.on('ready',() => {
createWindow();
require('./request.js')(mainWindow);
app.on('activate', () => {
// On macOS it's common to re-create a window in the app when the
// dock icon is clicked and there are no other windows open.
if (BrowserWindow.getAllWindows().length === 0) createWindow();
});
});
// Quit when all windows are closed, except on macOS. There, it's common
// for applications and their menu bar to stay active until the user quits
// explicitly with Cmd + Q.
app.on('window-all-closed', () => {
if (process.platform !== 'darwin') app.quit();
});
request.js - this is used in index/main thread above to process API requests, in this case requests to winax which I use with a separate 32 bit nodeJs interpreter because I couldn't get it to build for use with the same version of Node I can use for electron
const { ipcMain, ipcRenderer } = require('electron');
const { spawn } = require('child_process');
module.exports = function(mainWindow){
let winax;
ipcMain.on('winax', (event,arguments) => {
console.log('winax=');
console.log(arguments);
if(winax === undefined) {
console.log('spawning winax');
winax = spawn(
'C:\\Program Files\\nvm\\v14.20.0\\node.exe',
[ 'winax_microamp/index.js' ], {
shell: false,
stdio: ['inherit', 'inherit', 'inherit', 'ipc' ],
windowsHide: true
}
);
}
console.log('sending winax arguments');
/*
arguments = {
method: 'functionToRun',
owner: 'requestingProcess',
params: { required_params ... }
}
*/
winax.send(arguments);
winax.once('message', (message) => {
console.log('winax response=')
console.log(message);
mainWindow.webContents.postMessage('response', message);
});
});
}
preload.js
'use strict';
// preload.js
// All the Node.js APIs are available in the preload process.
// It has the same sandbox as a Chrome extension.
// MAS: This function executes when the DOM is loaded so we should be able to add button interactions here
const { contextBridge, ipcRenderer } = require('electron');
let testAccDb = 'C:\\Users\\PZYVC7\\OneDrive - Ally Financial\\Access\\electron_microamp.accdb';
//const spawn = require('child_process').spawn
contextBridge.exposeInMainWorld(
'request', {
send: (func) => {
console.log('request.send.func=');
console.log(func);
if(func == 'openStagingTest'){
let args = {
method: 'requestStagingDatabase',
owner: 'ui',
params: {
path: testAccDb
}
};
ipcRenderer.send('winax',args);
} else if(func == 'closeStagingTest') {
let args = {
method: 'closeStagingDatabase',
owner: 'ui'
};
ipcRenderer.send('winax',args);
}
},
response: (message) => {
ipcRenderer.on('message', (message) => {
console.log('winax reply=');
console.log(message);
});
}
}
);
window.addEventListener('DOMContentLoaded', () => {
/*const replaceText = (selector, text) => {
const element = document.getElementById(selector);
if (element) element.innerText = text;
}
for (const dependency of ['chrome', 'node', 'electron']) {
replaceText(`${dependency}-version`, process.versions[dependency]);
}*/
});
render.js
'use strict';
onLoad();
function onLoad(){
/*document.querySelector('.one').addEventListener('click',() => {
writeLog()
});*/
/*var testButton = document.querySelector('button[class=test]');
testButton.addEventListener('click', (e) => {
//C:\\Program Files\\nvm\\v14.20.0\\node.exe
window.main.asyncProc( '"C:\\Program Files\\nvm\\v14.20.0\\node.exe" winax_microamp/index.js' );
});*/
var buttons = document.querySelectorAll('button');
buttons.forEach((button) => {
//console.log('button=');
//console.log(button.className);
button.addEventListener('click', (event) => {
window.request.send(button.className);
});
});
}
Ugh... ok, so a few things I figured out eventually, I took hours to figure this out, and I'm not 100% sure if I'm understanding it properly so I appreciate any correction.
I think one issue that threw me off for longer than it should have is that the console.log for the preload.js goes into the developer tools rather than the system console like the main thread areas do.
Another thing throwing me off was that I was writing the implementation inside of preload.js, rather than in render.js. I confirmed it does work in both places.
If I want it in preload.js, I leave the implementation like this, and I confirmed it was firing here based on changing the log message a little
preload.js
response: (message) => {
ipcRenderer.on('response', (message) => {
console.log('expose reply=');
console.log(message);
});
}
}
If I want it in render instead, I need my preload to be more basic
response: (message) => {
ipcRenderer.on('response', message);
}
And then I need render to have this to process it there instead.
window.request.response((event,message) => {
console.log('winax reply=');
console.log(message);
});

Crontab not running SOME node js scripts

I have a VPS running amazon linux with Node JS. I'm trying to run a node js script that sends a GET request to an API to get data, then sends a POST request to another API with said data. The script works when running using node in the terminal, but not after adding it to sudo crontab -e.
Now I know about the complications with absolute paths when using crontab, I've read many of the stackoverflow questions and I'm fairly certain pathing isn't the issue. To test this, I created a test script that creates a file in the root directory, and it works in crontab. This test script is in the SAME directory as the main script, which has me very confused.
The script uses two packages, Axios and Puppeteer. Here is the script with some information replaced with placeholders:
import axios from "axios";
import puppeteer from "puppeteer";
const apiKey = process.env.API_KEY
const USER_ID = process.env.USER_ID
const sitekey = process.env.SITE_KEY
const surl= 'url'
const url= 'url'
async function webSubmit (data) {
const browser = await puppeteer.launch({ headless: true, args: ['--no-sandbox'] });
const page = await browser.newPage();
await page.goto(url);
await page.$eval("input[name='data']", (el, data) => el.value = data, data)
await page.click('#submitButton')
setTimeout(() => {
browser.close()
}, 5000)
}
function checkStatus (id) {
axios.get(url)
.then(response => {
if (response.data === 'NOT_READY') {
setTimeout(() => {
checkStatus(id)
}, 10000)
} else {
const data = response.data.slice(3)
webSubmit(data)
}
})
.catch(e => console.log(e))
}
function post() {
const postUrl = url
axios.post(postUrl)
.then(response => {
const id = response.data.slice(3)
setTimeout(() => {
checkStatus(id)
}, 30000)
})
.catch(error => console.log(error))
}
post()
Test Script:
import * as fs from 'fs'
fs.writeFile("/vote" + Math.random(), "Hi", function(err) {
if(err){
return console.log(err)
}
console.log("File saved")
})
Crontab Jobs:
* * * * * /opt/bitnami/node/bin/node /home/bitnami/app/index.js
* * * * * cd /home/bitnami/app && /opt/bitnami/node/bin/node index.js
Both these jobs work if I change the index.js script to test.js, but not as index.js.
Any help would be appreciated, thank you!

Does top-level await have a timeout?

With top-level await accepted into ES2022, I wonder if it is save to assume that await import("./path/to/module") has no timeout at all.
Here is what I’d like to do:
// src/commands/do-a.mjs
console.log("Doing a...");
await doSomethingThatTakesHours();
console.log("Done.");
// src/commands/do-b.mjs
console.log("Doing b...");
await doSomethingElseThatTakesDays();
console.log("Done.");
// src/commands/do-everything.mjs
await import("./do-a");
await import("./do-b");
And here is what I expect to see when running node src/commands/do-everything.mjs:
Doing a...
Done.
Doing b...
Done.
I could not find any mentions of top-level await timeout, but I wonder if what I’m trying to do is a misuse of the feature. In theory Node.js (or Deno) might throw an exception after reaching some predefined time cap (say, 30 seconds).
Here is how I’ve been approaching the same task before TLA:
// src/commands/do-a.cjs
import { autoStartCommandIfNeeded } from "#kachkaev/commands";
const doA = async () => {
console.log("Doing a...");
await doSomethingThatTakesHours();
console.log("Done.");
}
export default doA;
autoStartCommandIfNeeded(doA, __filename);
// src/commands/do-b.cjs
import { autoStartCommandIfNeeded } from "#kachkaev/commands";
const doB = async () => {
console.log("Doing b...");
await doSomethingThatTakesDays();
console.log("Done.");
}
export default doB;
autoStartCommandIfNeeded(doB, __filename);
// src/commands/do-everything.cjs
import { autoStartCommandIfNeeded } from "#kachkaev/commands";
import doA from "./do-a";
import doB from "./do-b";
const doEverything = () => {
await doA();
await doB();
}
export default doEverything;
autoStartCommandIfNeeded(doEverything, __filename);
autoStartCommandIfNeeded() executes the function if __filename matches require.main?.filename.
Answer: No, there is not a top-level timeout on an await.
This feature is actually being used in Deno for a webserver for example:
import { serve } from "https://deno.land/std#0.103.0/http/server.ts";
const server = serve({ port: 8080 });
console.log(`HTTP webserver running. Access it at: http://localhost:8080/`);
console.log("A");
for await (const request of server) {
let bodyContent = "Your user-agent is:\n\n";
bodyContent += request.headers.get("user-agent") || "Unknown";
request.respond({ status: 200, body: bodyContent });
}
console.log("B");
In this example, "A" gets printed in the console and "B" isn't until the webserver is shut down (which doesn't automatically happen).
As far as I know, there is no timeout by default in async-await. There is the await-timeout package, for example, that is adding a timeout behavior. Example:
import Timeout from 'await-timeout';
const timer = new Timeout();
try {
await Promise.race([
fetch('https://example.com'),
timer.set(1000, 'Timeout!')
]);
} finally {
timer.clear();
}
Taken from the docs: https://www.npmjs.com/package/await-timeout
As you can see, a Timeout is instantiated and its set method defines the timeout and the timeout message.

While using capture-website, puppeteer with webpack throwing error:browser not downloaded, at ChromeLauncher.launch (webpack-internal)

I am trying to take screenshot by providing html file on node js.
I have used capture-website package.
Here is the code:
try{
await captureWebsite.file('file.html', 'file.png', {overwrite: true}, function (error) {
if (error) {
console.log('error',error);
}
});
}catch(e){
console.log('error in capture image:', e)
}
Version:
node:12.16.1
capture-website: "0.8.1"
Angular : 7
You can do something like:
async function screenshotFromHtml ({ html, timeout = 2000 }: ScreenshotOptions) {
const browser = await puppeteer.launch({
headless: !process.env.DEBUG_HEADFULL,
args: [
'--no-sandbox',
'--disable-setuid-sandbox',
]
})
const page = await browser.newPage()
// Set viewport to something big
// Prevents Carbon from cutting off lines
await page.setViewport({
width: 2560,
height: 1080,
deviceScaleFactor: 2
})
page.setContent(html)
const base64 = await page.screenshot({ encoding: "base64" }) as string;
// Wait some more as `waitUntil: 'load'` or `waitUntil: 'networkidle0'
await page.waitFor(timeout)
// Close browser
await browser.close()
return base64
}
This code is in typescript, but you can use the function body in your JS project
In this github file you can see a html render code too

Cache-busting page-data.json files in Gatsby

I have a gatsby generated website on which I have replaced the contents of the homepage.
Unfortunately the previous version was serving up /page-data/index/page-data.json with the incorrect cache-control headers, resulting in /page-data/index/page-data.json being cached on client browsers (and stale data being shown unless force-refreshed). I have also discovered that page-data.json files are not hashed (see https://github.com/gatsbyjs/gatsby/issues/15080).
I've updated the cache-control headers so that versions from now on will not be cached but this does not help with clients that have the cached version now.
What can I do to force clients to request the latest version of this file?
I got there in the end... This is in my gatsby-node.js
const hash = md5(`${new Date().getTime()}`)
const addPageDataVersion = async file => {
const stats = await util.promisify(fs.stat)(file)
if (stats.isFile()) {
console.log(`Adding version to page-data.json in ${file}..`)
let content = await util.promisify(fs.readFile)(file, 'utf8')
const result = content.replace(
/page-data.json(\?v=[a-f0-9]{32})?/g,
`page-data.json?v=${hash}`
)
await util.promisify(fs.writeFile)(file, result, 'utf8')
}
}
exports.onPostBootstrap = async () => {
const loader = path.join(__dirname, 'node_modules/gatsby/cache-dir/loader.js')
await addPageDataVersion(loader)
}
exports.onPostBuild = async () => {
const publicPath = path.join(__dirname, 'public')
const htmlAndJSFiles = glob.sync(`${publicPath}/**/*.{html,js}`)
for (let file of htmlAndJSFiles) {
await addPageDataVersion(file)
}
}
Check out this tutorial, this is the solution I've been using.
https://examsworld.co.in/programming/javascript/how-to-cache-bust-a-react-app/
It's basically a wrapper component that checks to see if the browser's cached version matches the build's version number in package.json. If it doesn't, it clears the cache and reloads the page.
This is how I'm using it.
gatsby-browser.js
export const wrapRootElement = ({ element }) => (
<CacheBuster>
{({ loading, isLatestVersion, refreshCacheAndReload }) => {
if (loading) return null
if (!loading && !isLatestVersion) {
// You can decide how and when you want to force reload
refreshCacheAndReload()
}
return <AppProvider>{element}</AppProvider>
}}
</CacheBuster>
)
CacheBuster.js
import React from 'react'
import packageJson from '../../package.json'
global.appVersion = packageJson.version
// version from response - first param, local version second param
const semverGreaterThan = (versionA, versionB) => {
const versionsA = versionA.split(/\./g)
const versionsB = versionB.split(/\./g)
while (versionsA.length || versionsB.length) {
const a = Number(versionsA.shift())
const b = Number(versionsB.shift())
// eslint-disable-next-line no-continue
if (a === b) continue
// eslint-disable-next-line no-restricted-globals
return a > b || isNaN(b)
}
return false
}
class CacheBuster extends React.Component {
constructor(props) {
super(props)
this.state = {
loading: true,
isLatestVersion: false,
refreshCacheAndReload: () => {
console.info('Clearing cache and hard reloading...')
if (caches) {
// Service worker cache should be cleared with caches.delete()
caches.keys().then(function(names) {
for (const name of names) caches.delete(name)
})
}
// delete browser cache and hard reload
window.location.reload(true)
},
}
}
componentDidMount() {
fetch('/meta.json')
.then(response => response.json())
.then(meta => {
const latestVersion = meta.version
const currentVersion = global.appVersion
const shouldForceRefresh = semverGreaterThan(
latestVersion,
currentVersion
)
if (shouldForceRefresh) {
console.info(
`We have a new version - ${latestVersion}. Should force refresh`
)
this.setState({ loading: false, isLatestVersion: false })
} else {
console.info(
`You already have the latest version - ${latestVersion}. No cache refresh needed.`
)
this.setState({ loading: false, isLatestVersion: true })
}
})
}
render() {
const { loading, isLatestVersion, refreshCacheAndReload } = this.state
const { children } = this.props
return children({ loading, isLatestVersion, refreshCacheAndReload })
}
}
export default CacheBuster
generate-build-version.js
const fs = require('fs')
const packageJson = require('./package.json')
const appVersion = packageJson.version
const jsonData = {
version: appVersion,
}
const jsonContent = JSON.stringify(jsonData)
fs.writeFile('./static/meta.json', jsonContent, 'utf8', function(err) {
if (err) {
console.log('An error occured while writing JSON Object to meta.json')
return console.log(err)
}
console.log('meta.json file has been saved with latest version number')
})
and in your package.json add these scripts
"generate-build-version": "node generate-build-version",
"prebuild": "npm run generate-build-version"
Outside of going to each client browser individually and clearing their cache there isn't any other means of invalidating all of your client's caches. If your webpage is behind a CDN you can control, you may be able to force invalidation at the CDN-level so new clients will always be routed to the up to date webpage even if the CDN had a pre-existing, outdated copy cached.

Resources