Cannot download file while headless: true, works when headless: false [Puppeteer] - node.js

Im running a script that logs into an authenticated session on a website and clicks a button to download an excel file. Im able to run it with no problems while headless: false, but when headless:true, the file does not download.
My research suggests that the browser is closing before the download completes possibly? Ive added a wait of about 15 seconds, which is much longer than it should need to download the file, but still not getting anything. Another solution I tried was manually removing the HeadlessChrome substring from the userAgent in case the site was blocking it, but that didnt work either. Is it okay to use headless:false in a script that is used in a production web application deployed on Heroku?
async function getData () {
try {
const wait = (ms) => new Promise(resolve => setTimeout(resolve, ms))
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('<url>');
//login
await page.type('#username',username);
await page.click('#signIn');
await wait(4000)
await page.type('#password',password);
await page.click('#signIn');
await page.waitForNavigation();
await page.keyboard.press('Enter'); //click out of any pop up
// //go to merchandising page
await page.click('#m_69-link');
await page.waitForSelector('#ExcelReportButton', {visible: true})
//click on export as excel icon
await wait(4000)
await page.click('#ExcelReportButton');
await wait(15000)
await browser.close();
} catch (error) {
console.log(error)
}
};

try by adding additional headers, it worked for me:
await page.setExtraHTTPHeaders({
'Accept-Language': 'en-US,en;q=0.9'
});
await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36');

Related

Does type input and click not support multiple lines?

I am trying to send a DM on Instagram, but each new line is sent as one message.
Does type input and click not support multiple lines?
If so, does anyone have any insight into alternative solutions, such as fixing this area of the puppeteer source code?
The simplified code is as follows. It may not work as it is because of the different language area. Sorry.
import pp from 'puppeteer';
import puppeteer from 'puppeteer-extra';
export const instagram = async () => {
console.log('START!!!');
const browser = await puppeteer.launch({
executablePath: pp.executablePath(),
args: [
'--user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36',
],
headless: true,
slowMo: 100,
timeout: 30000,
});
const page = await browser.newPage();
await page.setViewport({
height: 900,
width: 1366,
});
await page.goto('https://www.instagram.com/accounts/login/');
await page.waitForSelector('input[name="username"]');
await page.type('input[name=username]', '___your_id___');
await page.type('input[name=password]', '___your_password___');
await page.click('button[type=submit]');
await page.waitForNavigation({ waitUntil: 'domcontentloaded' });
await page.goto('https://www.instagram.com/direct/__your_frend_dm_direct_url__');
await page.waitForSelector('textarea[placeholder="メッセージ..."]'); // japanese
await page.type('textarea[placeholder="メッセージ..."]', "this is\nmultiple line");
await browser.close();
};
enter image description here
I want to send a multi-line message in a single transmission.

Node JS, Captcha Solving

This is my code!
const https = require('https');
const puppeteer = require('puppeteer-extra')
// add recaptcha plugin and provide it your 2captcha token (= their apiKey)
// 2captcha is the builtin solution provider but others would work as well.
// Please note: You need to add funds to your 2captcha account for this to work
const RecaptchaPlugin = require('puppeteer-extra-plugin-recaptcha')
puppeteer.use(
RecaptchaPlugin({
provider: {
id: '2captcha', token: '93...' },
visualFeedback: true // colorize reCAPTCHAs (violet = detected, green = solved)
})
async function login() {
global.browser = await puppeteer.launch({
headless: false,
slowMo: 10,
userDataDir: 'C:\\userData',
});
global.page = await browser.pages();
const setUserAgent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.75 Safari/537.36';
var userAgent = require('user-agents');
await page[0].setUserAgent(userAgent.toString());
console.log("Going to Website");
await page[0].goto("https://www.google.com/recaptcha/api2/demo", {
waitUntil: 'networkidle2'
});
console.log("Solving Captcha");
await page[0].solveRecaptchas()
await Promise.all([
page[0].waitForNavigation()
])
}
This is a roughly my code, i have 2captcha balance
code above doesnt solve the captcha, it detects the captcha and turns it purple, but it doesnt solve the captcha, i need help
After you got response from 2captcha, You have to call submit button manually using following code.
await page.waitForSelector('#recaptcha-demo-submit')
await page.click('#recaptcha-demo-submit')

How to fix "please wait a few minutes before you try again" error instagram?

I'm using puppeteer for the first time running it on locally hosted firebase cloud functions.
I've been trying with different accounts, and I waited hours so that the error may resolves, but no luck. The error I'm getting:
I can't interact with the site, and even if I switch routs this is the only thing popping up.
What I did/tried:
I followed this tutorial and coded the exact same app: https://www.youtube.com/watch?v=dXjKh66BR2U
Searched for hours on google if there is anything like my problem, still no solution that worked for me.
Edit:
The code I'm using is basically from fireship.io:
const puppeteer = require('puppeteer');
const scrapeImages = async (username) => {
const browser = await puppeteer.launch( { headless: true });
const page = await browser.newPage();
await page.goto('https://www.instagram.com/accounts/login/');
// Login form
await page.screenshot({path: '1.png'});
await page.type('[name=username]', 'fireship_dev');
await page.type('[name=password]', 'some-pa$$word');
await page.screenshot({path: '2.png'});
await page.click('[type=submit]');
// Social Page
await page.waitFor(5000);
await page.goto(`https://www.instagram.com/${username}`);
await page.waitForSelector('img ', {
visible: true,
});
await page.screenshot({path: '3.png'});
// Execute code in the DOM
const data = await page.evaluate( () => {
const images = document.querySelectorAll('img');
const urls = Array.from(images).map(v => v.src);
return urls;
});
await browser.close();
console.log(data);
return data;
}
The error I'm getting on console:
UnhandledPromiseRejectionWarning: TimeoutError: waiting for selector `input[name="username"]` failed: timeout 30000ms exceeded
Try to add additional headers, before your page.goto(), like this:
await page.setExtraHTTPHeaders({
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36',
'upgrade-insecure-requests': '1',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9,en;q=0.8'
})
It adds headers to make puppeteer look like a normal browser used from a normal OS

How to enable outgoing requests of outgoing HTTP Connection on Google Cloud App Engine by proxy for Puppeteer and Request(http modules)?

I am working on a Scraping App that uses puppeteer and is hosted on Google Cloud App Engine. This app works well locally then when on google cloud, it fails I suspect it has to do with the proxy that I need to configure. I have been trying to search but nothing has worked yet. The site that I am scraping is http://status.wsu.ac.za/status/statuscheck.php.
Following is the code I am using to scrape:
(async () => {
const browser = await puppeteer.launch({
args: [
"--no-sandbox",
"--headless",
"--disable-gpu",
"--user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0",
],
});
await new Promise((resolve) => setTimeout(resolve, 5000));
const page = await browser.newPage();
await page.goto("http://status.wsu.ac.za/status/statuscheck.php");
await page.screenshot({ path: "example.png" });
let content = await page.content();
await browser.close();
})();

Access to Windows upload file dialog

I'm using node.js puppeteer library to handle WhatsApp Web. I've managed to handle the entire page, except for when I try to upload a file via upload dialog.
I've tried many ways to handle the window dialog, including VBS, batch,sendkeys etc.
Is there any way that i could enter a text inside the text-line of the dialog box and press "Open" https://i.stack.imgur.com/cRVNJ.jpg][1] as well?
Here is my code up to the adding file in WhatsApp (raw code)
const page = await browser.newPage();
await page.setUserAgent('Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3264.0 Safari/537.36');
page.goto('https://web.whatsapp.com/', { waitUntil: 'networkidle2', timeout: 0 }).then(async function (response) {
await page.waitFor(networkIdleTimeout);
await page.waitFor(user_chat_selector);
await page.click(user_chat_selector);
await page.waitFor(networkIdleTimeout);
await page.keyboard.type('Testing');
await page.waitFor(networkIdleTimeout);
await page.keyboard.press('Enter');
await page.waitFor(networkIdleTimeout);
await page.waitFor(pin_attach);
await page.click(pin_attach);
await page.waitFor(networkIdleTimeout);
await page.waitFor(add_image_icon);
await page.click(add_image_icon);
//await page.waitFor(networkIdleTimeout);
// await page.keyboard.type("a");
})
You don't need to open a real dialog, there is a method for uploading files:
const elementHandle = await page.$('input');
await elementHandle.uploadFile("/path/to/file");

Resources