How to bypass the nlp captcha in puppeteer[Login screen]
I am trying to bypass the page by giving the login details but I dont know how to bypass the captcha can anyone please help me to bypass the captcha in image and write the result in the text field
const puppeteer = require('puppeteer');
const Tesseract = require('tesseract.js');
const worker = createWorker({
logger: m => console.log(m),
});
async function main() {
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.goto('https://xx/xxxx');
await page.type("#UserName", "xxxxxx");
await page.type("#AuthKey", "xxxxxx");
Tesseract.recognize('Image url',
'eng', { logger: m => console.log(m) }
).then(({ data: { text } }) => {
console.log(text);
})
await page.click(".recaptcha-trigger-button.button.green.action-button.expand-right");
await page.goto('https://xx/xxxx');
The purpose of captcha is it can't be (easily) bypassed.
The easiest solution is not to build in captcha for an environment where automated scripts run, typically a QA envrironment where automated tests run, althought I'm not sure if this is your case.
There're some other ways to bypass captcha, but I haven't looked more into them.
Related
I'm making a automation script for filling a form online with puppeteer, and to not blacklist ip's I decided to use proxies for each request, this is the part which gives me error
console.log(`profile ${ii} started`)
let proxy = await proxy_rotation(proxy_data, ii);
console.log("using proxy: ", proxy);
let exec_path = path.resolve(path.dirname(process.execPath) + "/node_modules/puppeteer/.local-chromium/win64-869685/chrome-win/chrome.exe")
const browser = await puppeteer.launch({
executablePath: exec_path,
args: ['--disable-web-security']
});
const page = await browser.newPage();
console.log("1");
await page.setRequestInterception(true);
await useProxy(page, proxy);
console.log("2");
await page.goto(data[ii][0]); //this is where the error gets thrown
this part below doesn't get to run when using a proxy, without it, runs smotthly
console.log("3");
await page.type("#name", data[ii][1]);
await page.type("#yourEmail", data[ii][2]);
await page.type("#phone", data[ii][3]);
await page.type("#street", data[ii][4]);
await page.type("#city", data[ii][5]);
await page.type("#psc", data[ii][6]);
await page.select('select#state', data[ii][7]);
await page.select('select#prefered_size_sel', data[ii][8]);
await page.$eval('input[name="agreed_personal_info_tiny_contact_form"]', check => check.checked = true);
await page.evaluate(() => {
document.querySelector('input[name="agreed_personal_info_tiny_contact_form"]').click();
});
I just console logged a few numbers, to debug where the script is getting stuck. I also tested the proxy and website I'm trying to access both with a proxy tester and manually, and had no problem accessing it
but when I run my script I get this
I understand it pretty much says it cannot access the url, but there should be no reason for that. Do I need to change the way I'm acessing the url when using a proxy? Or add some extra args to the browser? Can I get a more specific error message somehow? Thanks for any suggestions
Also this is the puppeteer function that throws the error
async function navigate(client, url, referrer, frameId) {
try {
const response = await client.send('Page.navigate', {
url,
referrer,
frameId,
});
ensureNewDocumentNavigation = !!response.loaderId;
return response.errorText
? new Error(`${response.errorText} at ${url}`)
: null;
}
catch (error) {
return error;
}
}
That error indicates that something is off how you are using your proxy. Is useProxy your own function or the one from puppeteer-page-proxy? You mention setting the proxy per-request but seems you are setting it for the whole page, is that intentional? The way your proxy is formatted also seems off- check how I do it below.
You can try launching the browser with your proxy server and using page.authenticate() to handle auth. Like this:
let proxy = await proxy_rotation(proxy_data, ii);
const [host, port, username, password] = proxy.split(':');
const parsedProxy = new URL(`http://${username}:${password}#${host}:${port}`);
const browser = await puppeteer.launch({
executablePath: exec_path,
args: ['--disable-web-security', '--ignore-certificate-errors', `--proxy-server=${parsedProxy.host}`]
});
const page = await browser.newPage();
await page.authenticate({
username: parsedProxy.username,
password: parsedProxy.password,
});
Before doing that I would change what you pass to useProxy such that it looks like http://username:pw#host:port (Lines 2-3).
I am trying to use puppeteer to sign into TikTok. However, each time I try to sign into TikTok with puppeteer it says "You are visiting our site too frequently" as pictured below.
TikTok after running the code
Here are the things I've tried:
using puppeteer stealth
using firefox puppeteer
using both puppeteer stealth and firefox puppeteer
using a VPN
logging into the account on a different device, logging out on that device, and then running the code
waiting 4 hours between running the code
Puppeteer doesn't throw any errors either
Let me know what you guys think!
Here is the code too:
const puppeteer = require("puppeteer-extra");
const StealthPlugin = require("puppeteer-extra-plugin-stealth");
puppeteer.use(StealthPlugin());
(async () => {
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.goto("https://www.tiktok.com/login/phone-or-email/email");
await page.type("input[name=email]", EMAIL, { delay: 20 });
await page.type("input[name=password]", PASSWORD, { delay: 20 }); // log in w email and password
await page.evaluate(() => {
document.querySelector("button[type=submit]").click();
}); // press login button
await page.screenshot({ path: "example.png" });
await browser.close();
})();
Log in manually one time and then do a cookie injection for each subsequent login :)
(of course, you'll need to save down all the cookies to do so, but only once!)
I'm attempting to use Playwright (https://github.com/microsoft/playwright) and I'm met by the location popup when I try to test the library. Is there a way to bypass this popup or at least click either "Block" or "Allow"? I've tried using the Page.on("popup") event but it isn't quite working the way I was expecting it to.
You have to use the grantPermissions function to grant geolocation for the site.
await context.grantPermissions(['geolocation'], { origin: 'https://www.bestbuy.com' });
This is how I grant geo localization on my script
const { chromium } = require("playwright");
(async () => {
// const browser = await chromium.launch({ headless: false});
const browser = await chromium.launch();
const context = await browser.newContext();
await context.grantPermissions(['geolocation'], { origin: 'yourPage.com' });
const page = await context.newPage();
await page.goto('yourPage.com');
browser.close();
})();
here is the documentation playwright.dev
Here is a simple program on puppeteer:
const puppeteer = require('puppeteer');
async function run() {
const browser = await puppeteer.launch({
headless: false,
args:[ `--proxy-server=104.233.50.38:3199`]
});
;
const page = await browser.newPage();
await page.authenticate({
username: 'myusername',
password: 'mypassword'
})
await page.goto('https://google.com')
};
run();
Note: I have tried similar with over 10 proxies and none of them are working in puppeteer
The credentials are exactly what is provided to me, I have checked multiple times.
This is what I get:
Now again , this is the console of the page:
Why is this happening?
I checked the addresses and username, password multiple times. There is no other error message except this.
It seems that page.authenticate is not working for me either,instead you can use page.setExtraHTTPHeaders
async function run() {
const browser = await puppeteer.launch({
ignoreHTTPSErrors: true,
args: ['--proxy-server=104.233.50.38:3199']
});
const page = await browser.newPage();
await page.setExtraHTTPHeaders
({'Proxy-Authorization': 'Basic ' + Buffer.from('username:password').toString('base64'),
});
};
run();
You can use puppeteer-page-proxy, it offers username and password auth very easily. It also supports http, https, socks4 and socks5 proxies. https://github.com/Cuadrix/puppeteer-page-proxy
You can define the proxy this way:
const proxy = 'http://login:pass#IP:Port';
or
const proxy = 'socks5://IP:Port';
Then you can use it per request:
const useProxy = require('puppeteer-page-proxy');
await page.setRequestInterception(true);
page.on('request', req => {
useProxy(req, proxy);
});
I'm actually trying to use puppeteer for scraping and I need to use my current chrome to keep all my credentials. However, chrome can't remember previous session and I have to click the login button every time. By contrast, chrome can remember the saved credential. Is there a way to make it?
I'm actually using:
Node v12.16.1
chrome 80.0.3987.132 (Official Build) (64-bit) (cohort: Stable)
puppeteer-core 2.1.0 // see: https://github.com/puppeteer/puppeteer/blob/v2.1.0/docs/api.md
test.js:
const pptr = require('puppeteer-core');
(async () => {
const browser = await pptr.launch({
executablePath: 'C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe',//path to your chrome
headless: false,
args:[
'--user-data-dir=D:/Users/xxx/AppData/Local/Google/Chrome/User Data2',
]
});
const page = await browser.newPage();
await page.goto('https://hostloc.com');
await page.screenshot({path: 'example.png'});
await page.waitFor(10000);
await browser.close();
})();
You should use cookies so that you can get the previous data from them. Here is a link about the set cookie in the puppeteer.
Here below is an example of code for how to set cookies in puppeteer. It Sets the "login_email" property in a Paypal cookie so the login screen is pre-filled with an email address.
const cookie = {
name: 'login_email',
value: 'set_by_cookie#domain.com',
domain: '.paypal.com',
url: 'https://www.paypal.com/',
path: '/',
httpOnly: true,
secure: true
}
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch()
const page = await browser.newPage()
await page.setCookie(cookie)
await page.goto('https://www.paypal.com/signin')
await page.screenshot({ path: 'paypal_login.png' })
await browser.close()
})()
Regarding get the cookies, You can create a Chrome DevTools Protocol session on the page target using target.createCDPSession(). Then you can send Network.getAllCookies to obtain a list of all browser cookies.
The page.cookies() function will only return cookies for the current URL. So we can filter out the current page cookies from all of the browser cookies to obtain a list of third-party cookies only.
const client = await page.target().createCDPSession();
const all_browser_cookies = (await client.send('Network.getAllCookies')).cookies;
const current_url_cookies = await page.cookies();
const third_party_cookies = all_browser_cookies.filter(cookie => cookie.domain !== current_url_cookies[0].domain);
console.log(all_browser_cookies); // All Browser Cookies
console.log(current_url_cookies); // Current URL Cookies
console.log(third_party_cookies); // Third-Party Cookies
For example, get all of the cookies
const puppeteer = require('puppeteer');
(async() => {
const browser = await puppeteer.launch({});
const page = await browser.newPage();
await page.goto('https://stackoverflow.com', {waitUntil : 'networkidle2' });
// Here we can get all of the cookies
console.log(await page._client.send('Network.getAllCookies'));
})();
I hope this will help you.