I am trying to execute the below script from GitLab CI/CD.
This puppeteer script is in a .js which is getting called from the GitLab repositories .gitlab-ci.yml file.
The purpose of the script is to navigate to INITIAL_PAGE_URL, login and navigate to the HOME_PAGE. The sign-in-button has a click method which on successful login navigates to the HOME_PAGE.
The script runs perfectly when running from the local system but when running from GitLab:
no error is shown
console.log("logged in") is executed and prints the message.
However, it does not navigate to the next page and page.url() still shows the INITIAL_PAGE_URL.
Any suggestions?
const HOME_PAGE = "https://www.abcd.com/home"
const SIGN_IN_FORM = "#frmSignIn";
const USERNAME_SELECTOR = 'input#EmailAddress';
const PASSWORD_SELECTOR = 'input#Password';
const LOGIN_BUTTON_SELECTOR = '#sign-in-button';
const SECRET_EMAIL = 'username';
const SECRET_PASSWORD = 'password';
const CHROME_EXE_PATH =
process.env.CHROME_EXE_PATH === "" ? "" : process.env.CHROME_EXE_PATH || "/usr/bin/chromium-browser";
const puppeteer = require('puppeteer')
const main = async () => {
const browser = await puppeteer.launch({
headless: true,
executablePath: CHROME_EXE_PATH,
args: ['--no-sandbox', '--disable-setuid-sandbox']
})
console.log("browser loaded")
const page = await browser.newPage()
await page.setViewport({width: 1366, height: 768})
//Script for login page - start
console.log("Navigating to initial page")
await page.goto(INITIAL_PAGE_URL, { waitUntil: 'networkidle2' })
await page.waitForSelector('#frmSignIn')
await page.type('input#EmailAddress', SECRET_EMAIL)
await page.type('input#Password', SECRET_PASSWORD)
await page.click(LOGIN_BUTTON_SELECTOR)
console.log("logged in")
console.log(page.url());
await browser.close();
}
main()
Related
I have this browser started like this :
async function startBrowser() {
const browser = await puppeteer.launch({
args: [
'--no-sandbox',
'--disable-gpu',
'--disable-dev-shm-usage',
'--disable-setuid-sandbox',
'--no-first-run',
'--no-zygote',
'--single-process',
],
});
const page = await browser.newPage();
return { browser, page };
}
and I have a function that is called that runs this code :
const { browser, page } = await startBrowser();
const tab = await browser.newPage()
const request = await (await tab.goto(``));
await page.goto(``);
console.log('scraping...');
console.log(page.url());
await page.screenshot({
path: `frontend/public/assets/${coinId}.png`,
});
const text = await request.text();
const $ = cheerio.load(text);
const coinImage = $('#one img').attr('src');
const coinTitle = $('#three div h1').text();
const bids = $('#bidsrow td input').attr('value');
const timeLeft = $('#endsrow #endstext').text();
const currentBid = $('#currentbidtext').text();
const itemId = $('#itemidrow td .bolder').text();
const minBid = $('#minimumbidtext').text().replace('$', '');
On my local machine, it scrapes the data perfectly fine, however when I deploy this app to Heroku or even AWS EC2, it doesnt seem to scrape anything? and just returns empty data.
Is this because the browser is not started correctly? or ?
Spend all day deploying this app to both heroku and aws and still very confused what is wrong?
I am trying to download invoice from website using puppeteer, I just started to learn puppeteer. I am using node to create and execute the code. I have managed to login and navigate to the invoice page, but it opens in new tab, so, code is not detecting it since its not the active tab. This is the code I used:
const puppeteer = require('puppeteer')
const SECRET_EMAIL = 'emailid'
const SECRET_PASSWORD = 'password'
const main = async () => {
const browser = await puppeteer.launch({
headless: false,
})
const page = await browser.newPage()
await page.goto('https://my.apify.com/sign-in', { waitUntil: 'networkidle2' })
await page.waitForSelector('div.sign_shared__SignForm-sc-1jf30gt-2.kFKpB')
await page.type('input#email', SECRET_EMAIL)
await page.type('input#password', SECRET_PASSWORD)
await page.click('input[type="submit"]')
await page.waitForSelector('#logged-user')
await page.goto('https://my.apify.com/billing#/invoices', { waitUntil: 'networkidle2' })
await page.waitForSelector('#reactive-table-1')
await page.click('#reactive-table-1 > tbody > tr:nth-child(1) > td.number > a')
const newPagePromise = new Promise(x => browser.once('targetcreated', target => x(target.page())))
const page2 = await newPagePromise
await page2.bringToFront()
await page2.screenshot({ path: 'apify1.png' })
//await browser.close()
}
main()
In the above code I am just trying to take screenshot. Can anyone help me?
Here is an example of a work-around for the chromium issue mentioned in the comments above. Adapt to fit your specific needs and use-case. Basically, you need to capture the new page (target) and then do whatever you need to do to download the file, possibly pass it as a buffer to Node as per the example below if no other means work for you (including a direct request to the download location via fetch or ideally some request library on the back-end)
const [PDF_page] = await Promise.all([
browser
.waitForTarget(target => target.url().includes('my.apify.com/account/invoices/' && target).then(target => target.page()),
ATT_page.click('#reactive-table-1 > tbody > tr:nth-child(1) > td.number > a'),
]);
const asyncRes = PDF_page.waitForResponse(response =>
response
.request()
.url()
.includes('my.apify.com/account/invoices'));
await PDF_page.reload();
const res = await asyncRes;
const url = res.url();
const headers = res.headers();
if (!headers['content-type'].includes('application/pdf')) {
await PDF_page.close();
return null;
}
const options = {
// target request options
};
const pdfAb = await PDF_page.evaluate(
async (url, options) => {
function bufferToBase64(buffer) {
return btoa(
new Uint8Array(buffer).reduce((data, byte) => {
return data + String.fromCharCode(byte);
}, ''),
);
}
return await fetch(url, options)
.then(response => response.arrayBuffer())
.then(arrayBuffer => bufferToBase64(arrayBuffer));
},
url,
options,
);
const pdf = Buffer.from(pdfAb, 'base64');
await PDF_page.close();
The following code is viable for reading the clipboard in headless/headfull:
var context = await client.defaultBrowserContext();
await context.overridePermissions('http://localhost', ['clipboard-read']);
page = await browser.newPage();
await page.goto( 'http://localhost/test/', {waitUntil: 'load', timeout: 35000});
// click button for clipboard..
let clipboard = await page.evaluate(`(async () => await navigator.clipboard.readText())()`);
But when you later start incognito its not working anymore:
const incognito = await client.createIncognitoBrowserContext();
page = await incognito.newPage();
and you get:
DOMException: Read permission denied.
I currently try to figure out to combine both.. Anybody know how to set overridePermissions inside of the new incognito window?
Please notice I do not want to use the incognito chrome arg at the start. I want to manually create new incognito pages inside of my scripts with correct overridePermissions.
I am having the very same issue. Here's a minimal reproducible example.
Nodejs version: v16.13.1
puppeteer version: puppeteer#14.4.1
'use strict';
const puppeteer = require('puppeteer');
const URL = 'https://google.com';
(async () => {
const browser = await puppeteer.launch();
const context = browser.defaultBrowserContext();
context.overridePermissions(URL, ['clipboard-read', 'clipboard-write'])
const page = await browser.newPage();
await page.goto(URL, {
waitUntil: 'networkidle2',
});
await page.evaluate(() => navigator.clipboard.writeText("Injected"));
const value = await page.evaluate(() => navigator.clipboard.readText());
console.log(value);
})();
I need scraping with headless mode a site with a lot of debugger;
There is a way to prevent pause on debugger?
I try to send on load CTRL+F8 and F8 with this code but without success!
await crt_page.keyboard.down('Control');
await crt_page.keyboard.press('F8');
await crt_page.keyboard.up('Control');
await crt_page.keyboard.press('F8');
any advice?
Puppeteer is automatically pressing keys inside the page, and not the browser.
So i think the solution is to install a npm package robotjs to do things outside the browser.
Hope this helps you!
Don't forget to select my answer as the correct answer if this code worked.
const puppeteer = require('puppeteer')
const robot = require('robotjs')
;(async () => {
const browser = await puppeteer.launch({
headless: false,
devtools: true
})
const [page] = await browser.pages()
const open = await page.goto('https://www.example.com', { waitUntil: 'networkidle0', timeout: 0 })
await page.waitFor(4000)
await robot.keyToggle(']','down','control') // For Mac, change 'control' to 'command'
await page.waitFor(500)
await robot.keyToggle(']','down','control') // For Mac, change 'control' to 'command'
await page.waitFor(500)
await robot.keyToggle(']', 'up', 'control') // For Mac, change 'control' to 'command'
await page.waitFor(1000)
await robot.keyToggle('f8','down','control') // For Mac, change 'control' to 'command'
await page.waitFor(500)
await robot.keyToggle('f8', 'up', 'control') // For Mac, change 'control' to 'command'
})()
To debugging your robotjs, is it worked or not, try this code.
Code below run puppeteer and change the URL using robotjs.
If this also not worked on your server, then i'm sorry i can't help you.
const puppeteer = require('puppeteer')
const robot = require('robotjs')
const pageURL = 'https://www.google.com'
const normal_Strings = ['`','1','2','3','4','5','6','7','8','9','0','-','=','[',']','\\',';','\'',',','.','/']
const shiftedStrings = ['~','!','#','#','$','%','^','&','*','(',')','_','+','{','}','|',':','"','<','>','?']
;(async () => {
const browser = await puppeteer.launch({
headless: false,
devtools: true
})
const [page] = await browser.pages()
const open = await page.goto('https://www.example.com', { waitUntil: 'networkidle0', timeout: 0 })
console.log('First URL:')
console.log(await page.url())
await robot.keyToggle('l','down','control') // For Mac, change 'control' to 'command'
await page.waitFor(500)
await robot.keyToggle('l', 'up', 'control') // For Mac, change 'control' to 'command'
await page.waitFor(1000)
for (let num in pageURL) {
if (shiftedStrings.includes(pageURL[num])) {
var key = normal_Strings[ shiftedStrings.indexOf(pageURL[num]) ]
await robot.keyToggle( key,'down','shift')
await page.waitFor(300)
await robot.keyToggle( key, 'up', 'shift')
await page.waitFor(300)
}
await robot.keyTap(pageURL[num])
await page.waitFor(200)
}
await page.waitFor(1000)
await robot.keyTap('enter')
await page.waitForSelector('img#hplogo[alt="Google"]', {timeout: 0})
console.log('Second URL:')
console.log(await page.url())
})()
I have created a Puppeteer script to run in offline, I have got the below code to take the screenshot. While running the offline-login-check.js script from the command prompt, could some one please advise where the screen shots are added ?
const puppeteer = require("puppeteer");
(async() => {
const browser = await puppeteer.launch({
headless: true,
chromeWebSecurity: false,
args: ['--no-sandbox']
});
try {
// Create a new page
const page = await browser.newPage()
// Connect to Chrome DevTools
const client = await page.target().createCDPSession()
// Navigate and take a screenshot
await page.waitFor(3000);
await page.goto('https://sometestsite.net/home',{waitUntil: 'networkidle0'})
//await page.goto(url, {waitUntil: 'networkidle0'});
await page.evaluate('navigator.serviceWorker.ready');
console.log('Going offline');
await page.setOfflineMode(true);
// Does === true for the main page but the fallback content isn't being served.
page.on('response', r => console.log(r.fromServiceWorker()));
await page.reload({waitUntil: 'networkidle0'});
await page.waitFor(5000);
await page.screenshot({path: 'screenshot.png',fullPage: true})
await page.waitForSelector('mat-card[id="route-tile-card]');
await page.click('mat-card[id="route-tile-card]');
await page.waitFor(3000);
} catch(e) {
// handle initialization error
console.log ("Timeout or other error: ", e)
}
await browser.close();
})();
const puppeteer = require('puppeteer');
(async() => {
const browser = await puppeteer.launch({
headless: false,
chromeWebSecurity: false,
args: ['--no-sandbox']
});
try {
// Create a new page
const page = await browser.newPage();
// Connect to Chrome DevTools
const client = await page.target().createCDPSession();
// Navigate and take a screenshot
await page.goto('https://example.com', {waitUntil: 'networkidle0'});
// await page.evaluate('navigator.serviceWorker.ready');
console.log('Going offline');
await page.setOfflineMode(true);
// Does === true for the main page but the fallback content isn't being served.
page.on('response', r => console.log(r.fromServiceWorker()));
await page.reload({waitUntil: 'networkidle0'});
await page.screenshot({path: 'screenshot2.png',fullPage: true})
// await page.waitForSelector('mat-card[id="route-tile-card]');
// await page.click('mat-card[id="route-tile-card]');
} catch(e) {
// handle initialization error
console.log ("Timeout or other error: ", e)
}
await browser.close();
})();
then in command line run ls | GREP .png and you should see screenshot there. Be aware i take rid of await page.evaluate('navigator.serviceWorker.ready'); which might be specified to your website
Your script is perfect. There is no problem with it!
The screenshot.png should be on the directory that you run the node offline-login-check.js command.
If its not there, maybe you are getting some error/timeout before the page.screenshot command runs. Since your script is ok, this can be caused by network issues or issues with the page. For example, if your page has a never ending connection (like WebSocket), change the "networkidle0" to "networkidle2" or "load", otherwise the first page.goto will get stuck.
Again, your script is perfect. You don't have to change it.