web site scraping through Jsoup - web

I have spent few hours on signing in to web site by using jsoup. But it always gives same login page. To clarify the issue I tried with facebook site. It also gives same result.
Below I mentioned my code
String url ="http://www.facebook.com/";
Document doc;
doc = Jsoup.connect(url)
.data("email","abc#gmail.com","pass","xyz")
.userAgent("Mozilla").post();
System.out.println(doc);
can anybody point me where I made a mistake and how can i fix this issue?
In data portion "email" and "pass" are input field id of facebook login page.
Thank you.

Try this:
String url ="http://www.facebook.com/";
Document doc;
doc = Jsoup.connect(url)
.data("email","abc#gmail.com")
.data("pass","xyz")
.userAgent("Mozilla")
.post();
Anyway, Jsoup is not bad at all, you only need how to use it properly, but also you need to keep in mind that Facebook is expecting a lot more parameters to make a successfull login via POST emulating a web page navigation.
By example:
charset_test
default_persistent
lgnjs
lgnrnd
locale
lsd
pass
persistent
timezone
If you need to authenticate and get proper data I suggest that you must give a try to a Facebook SDK for Android:
https://github.com/facebook/facebook-android-sdk/

Related

How to fill login prompt with Webdriver IO?

I'm working on a CLI with OCLIF. In one of the commands, I need to simulate a couple of clicks on a web page (using the WebdriverIO framework for that). Before you're able to reach the desired page, there is a redirect to a page with a login prompt. When I use WebdriverIO methods related to alerts such as browser.getAlertText(), browser.sendAlertText() or browser.acceptAlert, I always get the error no such alert.
As an alternative, I tried to get the URL when I am on the page that shows the login prompt. With the URL, I wanted to do something like browser.url(https://<username>:<password>#<url>) to circumvent the prompt. However, browser.url() returns chrome-error://chromewebdata/ as URL when I'm on that page. I guess because the focus is on the prompt and that doesn't have an URL. I also don't know the URL before I land on that page. When being redirected, a query string parameter containing a token is added to the URL that I need.
A screenshot of the prompt:
Is it possible to handle this scenario with WebdriverIO? And if so, how?
You are on the right track, probably there are some fine-tunings that you need to address to get it working.
First off, regarding the chrome-error://chromewebdata errors, quoting Chrome DOCs:
If you see errors with a location like chrome-error://chromewebdata/
in the error stack, these errors are not from the extension or from
your app - they are usually a sign that Chrome was not able to load
your app.
When you see these errors, first check whether Chrome was able to load
your app. Does Chrome say "This site can't be reached" or something
similar? You must start your own server to run your app. Double-check
that your server is running, and that the url and port are configured
correctly.
A lot of words that sum up to: Chrome couldn't load the URL you used inside the browser.url() command.
I tried myself on The Internet - Basic Auth page. It worked like a charm.
URL without basic auth credentials:
URL WITH basic auth credentials:
Code used:
it('Bypass HTTP basic auth', () => {
browser.url('https://admin:admin#the-internet.herokuapp.com/basic_auth');
browser.waitForReadyState('complete');
const banner = $('div.example p').getText().trim();
expect(banner).to.equal('Congratulations! You must have the proper credentials.');
});
What I'd do is manually go through each step, trying to emulate the same flow in the script you're using. From history I can tell you, I dealt with some HTTP web-apps that required a refresh after issuing the basic auth browser.url() call.
Another way to tackle this is to make use of some custom browser profiles (Firefox | Chrome) . I know I wrote a tutorial on it somewhere on SO, but I'm too lazy to find it. I reference a similar post here.
Short story, manually complete the basic auth flow (logging in with credentials) in an incognito window (as to isolate the configurations). Open chrome://version/ in another tab of that session and store the contents of the Profile Path. That folder in going to keep all your sessions & preserve cookies and other browser data.
Lastly, in your currentCapabilities, update the browser-specific options to start the sessions with a custom profile, via the '--user-data-dir=/path/to/your/custom/profile. It should look something like this:
'goog:chromeOptions': {
args: [
'--user-data-dir=/Users/iamdanchiv/Desktop/scoped_dir18256_17319',
],
}
Good luck!

How To Get XSRF Token value from blogger.com to post content

I have contents for blogger.com at my mongo db, and I want create python script to post the contents to blogger.com.
When I look developer console when publish a post at developer console. I need to pass some value,
{
"method":"editPost",
"params":"{\"1\":1,\"2\":\"wadaw\",\"3\":\"ffrdgd\",\"4\":\"3425436456546\",\"5\":0,\"6\":0,\"7\":1,\"9\":0,\"10\":2,\"11\":1,\"12\":[\"grdhth\"],\"13\":0,\"14\":{},\"15\":\"en\",\"16\":0,\"17\":{\"1\":2017,\"2\":12,\"3\":18,\"4\":21,\"5\":32},\"20\":0,\"21\":\"\",\"22\":{\"1\":1,\"2\":{\"1\":0,\"2\":0,\"3\":0,\"4\":0,\"5\":0,\"6\":0,\"7\":0,\"8\":0,\"9\":0,\"10\":\"0\"}},\"23\":1,\"27\":0,\"28\":0}",
"xsrf":"AOuZoY7tEYY0lUcn9E2mDmaJil5uHpTCnw:23543543141"
}
When i search what is xsrf, it should be placed at hidden value / session / cookie, but I didnt't find it ?
is there any method to get xsrf value ?
actually, I have search another method to do this. Its to use blogger api, but is it possible to get oAuth2 token without google prompt ?
Your answer is regex like:
"xsrf":"(.+?)"

How can I get a token for the Drive API?

I want to implement the Google Drive API to my web application using NodeJS and I'm struggling when I try to get a token via OAuth.
I've copied the code from this guide and run the script using Node and it returns an error in this line:
var redirectUrl = credentials.installed.redirect_uris[0];
Googling around I found that I can set that variable as http://localhost:8080 and set the same value in the Authorized redirect URIs configuration in the Google Developers Console and that error goes away, fine, it works. Now it asks for a code that I should get by using an URL.
https://accounts.google.com/o/oauth2/auth?access_type=offline&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.metadata.readonly&response_type=code&client_id=CLIENT_ID&redirect_uri=http%3A%2F%2Flocalhost%3A8080
Then I've added the client id and enter to that URL with Chrome and then returns a connection refused error. No clue what to do in here, I searched about my problem and I can't found an answer. By looking at the direction bar in Chrome I see that there's a parameter called code and after it, there's random numbers and letters. Like this:
http://localhost:8080/?code=#/r6ntY87F8DAfhsdfadf78F7D765lJu_Vk-5qhc#
If I add any of these values it returns this error...
Error while trying to retrieve access token { [Error: invalid_request] code: 400 }
Any ideas on what should I do? Thanks.
Did you follow all the directions on the page you indicated, including all of those in Step 1 where you create the credentials in the console and download the JSON for it? There are a few things to note about creating those credentials and the JSON that you get from it:
The steps they give are a little different from what I went through. They're essentially correct, but the "Go to credentials" didn't put me on the page that has the "OAuth Consent Screen" and "Credentials" tabs on the top. I had to click on the "Credentials" left navigation for the project first.
Similarly, on the "Credentials" page, my button was labeled "Create Credentials", not "Add Credentials". But it was a blue button on the top of the page either way.
It is very important that you select "OAuth Client ID" and then Application Type of "Other". This will let you create an OAuth token that runs through an application and not through a server.
Take a look at the client_secret.json file it tells you to download. In there, you should see an entry that looks something like "redirect_uris":["urn:ietf:wg:oauth:2.0:oob","http://localhost"] which is the JSON entry that the line you reported having problems with was looking for.
That "urn:ietf:wg:oauth:2.0:oob" is a magic string that says that you're not going to redirect anywhere as part of the auth stage in your browser, but instead you're going to get back a code on the page that you will enter into the application.
I suspect that the "connection refused" error you're talking about is that you used "http://localhost:8080/" for that value, so it was trying to redirect your browser to an application running on localhost... and I suspect you didn't have anything running there.
The application will prompt you to enter the code, will convert the code into the tokens it needs, and then save the tokens for future use. See the getNewToken() function in the sample code for where and how it does all this.
You need to use this code to exchange for a token. I'm not sure with nodejs how to go about this but in PHP I would post the details to the token exchange url. In javascript you post array would look similar to this ....
var query = {'code': 'the code sent',
'client_id': 'your client id',
'client_secret': 'your client secret',
'redirect_uri': 'your redirect',
'grant_type': 'code' };
Hope this helps
Change redirect uri from http://localhost:8080 to https://localhost:8080.
For this add SSL certificates to your server.

eBay API Grant Application Access return URL contains empty 'ebaytkn' parameter

I need some help. When I click the "I agree" button on the "Grant Application Access Page" it returns to the predefined return url just fine which is https://localhost/app/return.
My problem is the query string that eBay sends to that return url
?ebaytkn=&tknexp=1970-01-01+00%3A00%3A00&username=testuser_USERNAME
The parameter ebaytkn is completely empty and I cannot understand whats causing this issue at all.
Someone help.
Thanks,
Grady
If your eBay token return method is FetchToken (the recommended method), then this shouldn't matter. When you get to your accept page all you have to do is make a FetchToken call with the same Session Id that you used to generate the url. This is all documented in the eBay dev docs here.

How to provide information in the html link for Facebook open graph api call of "property name" when posting trying to post an action

I am trying to create an html object dynamically with the necessary header information depending on the query string in the link I provide to Facebook. I am hoping that Facebook open graph will call this html link as I provided. However it seems that query string info are not getting passed to my server. Do anyone know how to make this work or what is the more appropriate way to do this. BTW, I am writing my code in Node.js.
To get more info about Facebook open graph api, look here, https://developers.facebook.com/docs/beta/opengraph/actions/.
For example, the link I am trying to pass to Facebook is, "http://xxx-url.com/getFacebookObject?objectId=&description=first dynamic post", so I sent a request with the link as, "https://graph.facebook.com/me/app-name:action-name?object=http://xxx-url.com/getFacebookObject?objectId=&description=first dynamic post". However, when I check the log on the server, I don't see anything in the query string.
Instead of using the query string, you can embed the data in the URL:
http://some-domain.com/getFacebookObject/id/description
Then, depending on what node.js packages you're using, extract the data from the request:
// expess.js style
app.get("/getFacebookObject/:id/:description", function(req, res) {
var id = req.params.id,
desc = req.params.description;
// your code...
});
(See http://expressjs.com/guide.html.)
Sorry, Facebook will strip off all query string information from the URL when they launch your site in the iframe. If it was a page tab app, then you could add it to the app_data query string parameters which in turn gets passed to your iframe's page tab app via the app_data part of the signed_request parameter.

Resources