How to get challenge key from a target website using geetest captcha - node.js

So I am scraping data from a target website using puppeteer.
Target website used geetest captcha, for anti-captcha, I am using 2capcta service,
on their documentation, it's mentioned that we need to get the challenge key every time.
From that the problem begins, target website has embedded the challenge key under
<Iframe>
<Html>
<head>
<script>
when accessing the iframe through DOM elements throw me a CORS error.
I have tried another way also which is available on the scraper box link is below
https://scraperbox.com/blog/solving-a-geetest-slider-captcha-with-puppeteer
it throws me no selector '[aria-label="Click to verify"]' found
it tried the codegrepper way link is below
https://www.codegrepper.com/code-examples/whatever/puppeteer+get+network+requests
throw me on console.error().
Any help would be appreciated to bypass geetest captcha
let me know also if my question is unclear.

Thank you so much for the answer,
so with the response by the above gentleman, the final solution is
when you load your page through puppeteer
await page.waitForSelector('iframe');
this will wait till the time iframe is loaded, now for me the target website has use the iframe with hash link to access it
const elementHandle = await page.$('iframe');
const frame = await elementHandle.contentFrame();
now the frame will have access to your iframe page, so you use the rest same like
await frame.waitForSelector("your selector")

Related

Login to oauth dialog from facebook with puppeteer

I'm trying to interface a website, and one of the ways you can log on there is by using a Facebook login (it opens a pop-up, you can enter username/password or simply confirm if you're already logged on, you know the drill...).
Well, I'm trying to interface it with Puppeteer, and the strange thing is that I don't get the page. I mean, I'm working in non-headless mode so I can SEE the popup, it just looks like Puppeteer can't see it...
A lot of pages said to try something like this:
const newPagePromise = new Promise(x => browser.once('targetcreated', target => x(target.page())));
await page.click('<some selector>');
const popup = await newPagePromise;
But this doesn't give me a "popup" I can use (popup: null). I also tried to make it wait for 10s, but no luck then either...
When looking at all_pages: let all_pages = await browser.pages();, this array has 1 page. My original page... No Facebook popup. (But it is displayed on my screen!)
What am I missing here? How can I get this information in my automation process?
BTW: the Facebook popup also has 'Chrome is being controlled by automated test software.'. So I would assume I can reach this information somehow.
Thanks for any assistance!
browser is used to connect with Chromium, it is only called at the beggining. The same way you interact with the main page using const page = await browser.newPage(),
you have to use that same variable once the popup emerges as well, on the other hand,'targetcreated' is not the event you are looking for but 'popup'. Have in mind that page provides methods to interact with tabs, therefore, updating the code:
const newPagePromise = new Promise(event => page.once('popup', event));
This can be found in puppeteer documentation:
https://github.com/puppeteer/puppeteer/blob/v5.2.1/docs/api.md#event-popup

Docusign return parameters in embedded signing by breaking out iframe

How to pass parameters(envelope,PF and r ID) within iframe while returning URL in embedded docusigning? If I enter the POWERFORM link on browser I'm returning URL with the parameters (envelope,PF and r ID) but if I run code within iframe I'm unable to get the parameters. Please do assist me about this issue.
You are opening Powerform inside an IFrame, so the scope of the opened URL is inside the IFrame only and DocuSign cannot do anything to redirect the browser to come out of the IFrame. You have write a code on your end to capture the redirect URL and break the flow out of IFrame, you can find a similar query here. Normally DocuSign does not recommend using IFrame for Signing, also to capture the data like envelopeId, r Id etc, it is better to configure DocuSign Connect with a listener on your side. Using url redirect is a fragile solution as user might close the browser (or browser hangs/network issue) and you might lose the data. Whereas with DS Connect, DocuSign will publish the event to your listener and you will be able to capture all the required data in your listener.
<script>
function myFunction()
{
var x = document.getElementById("form1").action;
document.getElementById("demo").innerHTML = x;
}
</script>
This thing works for displaying the parameters.
window.parent.window.location.href = 'Parent URL' works to break out of an iframe and load the parent page.

wait for page load for non-angular Application using protractor

i am new to protractor and testing Non-Angular login Page and on clicking login button on login page a new page appears and i need to click on a planning link.But on clicking Login button application takes around 50 seconds.I want the protractor to wait untill the planning link appears.I used browser.wait(),browser.driver.implicitltyWait() but no success. I am able to click on planning link using browser.sleep() only.
Please help me to resolve the issue.
You need to wait for any WebElement in the page that is loaded after you perform login operation.
var EC = protractor.ExpectedConditions;
browser.wait(EC.visibilityOf(element(by.id("someId"))),60000)
it will wait for the element and throw exception after waiting for 1 minute
So what I understood from your question is that you have a non angular login page and click on login button takes you to another page(Is this angular or non angular?) which takes around 50 sec to load and contains a link(planning). Right?? And clicking on that link will take you to your angular home page.
And the issue which you are facing now is that the protractor is not waiting 50sec for the page containing the planning link to load.
Please try this and let me know the result..
this.clickLoginBtn = function () {
browser.driver.findElement(loginBtn).click();
return browser.wait(function () {
return browser.driver.isElementPresent(planningLink);
}, 50000);
};
I used browser.driver.findElement since we are on the non angular page.
I wrote a blog post about this, and have working examples on github, specifically when testing non-Angular apps. It makes use of Expected Conditions, and Page Objects.
If you're not using Page Objects yet, you'd do something like:
var EC = protractor.ExpectedConditions;
// Waits for the element with id 'loginBtn' to be clickable.
browser.wait(EC.elementToBeClickable($('#loginBtn')), 50000);

Scraping youtube mix playlist id for a video

So here is what I am trying to do:
The app in question that this problem is regarding is here: https://github.com/viperfx/ng-juketube, to give everyone some context. So I am loading a youtube video in the background using the youtube Iframe API giving it a youtube video id. That works well. Next, I want to find out the playlist ID of the youtube mix's that appear sometimes on the sidebar.
Current solution:
I am using nodejs in the backend to scrape the youtube video page, and then look in the sidebar for the string 'Youtube Mix'. When I am running my server locally this works well. However when I am running the app from heroku I do not get the same results (as in a mix does not show up) because I am assuming the youtube server and the IP address I have are affecting youtube mix from showing up.
So my question is how can I obtain the youtube mix playlist id using the client (browser) rather than the server?
I have tried things like trying to load the youtube page as an iframe - does not work. iframes only allowed for /embed*
So here is how I solved this issue.
Using a service called http://www.corsproxy.com/ I was able to scrape youtube and get the playlist id using client side code. Here is a snippet from my code showing the solution:
$.get('http://www.corsproxy.com/www.youtube.com/watch?v='+newVal.videoId,function(response) {
var doc = new DOMParser().parseFromString(response, 'text/html');
$scope.mixId = doc.querySelector('.related-playlist').getAttribute('href').split('list=')[1];
$scope.$apply();
});

web site scraping through Jsoup

I have spent few hours on signing in to web site by using jsoup. But it always gives same login page. To clarify the issue I tried with facebook site. It also gives same result.
Below I mentioned my code
String url ="http://www.facebook.com/";
Document doc;
doc = Jsoup.connect(url)
.data("email","abc#gmail.com","pass","xyz")
.userAgent("Mozilla").post();
System.out.println(doc);
can anybody point me where I made a mistake and how can i fix this issue?
In data portion "email" and "pass" are input field id of facebook login page.
Thank you.
Try this:
String url ="http://www.facebook.com/";
Document doc;
doc = Jsoup.connect(url)
.data("email","abc#gmail.com")
.data("pass","xyz")
.userAgent("Mozilla")
.post();
Anyway, Jsoup is not bad at all, you only need how to use it properly, but also you need to keep in mind that Facebook is expecting a lot more parameters to make a successfull login via POST emulating a web page navigation.
By example:
charset_test
default_persistent
lgnjs
lgnrnd
locale
lsd
pass
persistent
timezone
If you need to authenticate and get proper data I suggest that you must give a try to a Facebook SDK for Android:
https://github.com/facebook/facebook-android-sdk/

Resources