Scraping websphere website using node js with encrypted value - node.js

I am scraping website which is made on websphere.
I see that whenever the user logged in, It hits 4 url while reaching to home page.
While in 3rd URL, It has some encrypted value which looks like this
L0lDU0NTSUpKZ2tLQ2xFS0NXXXXXXXXXXXXXXXXXXX..XXXXXXXXXvZD1vbkxvYWQ!
The URL looks like this :
http://example.com/escares/wps/myportal/!ut/p/c1/XXXXXXXXXX/dl2/d1/L0lDU0NTSUpKZ2tLQ2xFS0NXXXXXXXXXXXXXXXXXXX..XXXXXXXXXvZD1vbkxvYWQ!
The problem is, I noticed this only encrypted value changes for every login.
Is there any algorithm in websphere that generates this kind of url ? Or is there any way I can replicate this encrypted value ?
Is there any one who has done crawling/scraping on the websphere site ?

wps/myportal suggests a Websphere web portal login. The 'encrypted' URI you're seeing is most likely a hash to maintain the user login sessions.
The best way to replicate this is to supply your web scraping program with a username and password to access the portal section of the website so it can POST a login while scraping. The website itself will generate the session info. You will need to instruct your scraping application to follow any dynamic URLs that are generated. Usually this is done by following any URLs in the HTML supplied by the server after logging in.
As an example, scrapy can be configured to follow any URLs in target pages when scraping:
https://doc.scrapy.org/en/latest/intro/tutorial.html#following-links
Although you are using your own solution to scrape the contents of the portal for a logged in user, hopefully the logic and progression illustrated in my examples help steer you in the right direction for resolving what appears to be a session/cookie storage issue.

Though Chris has answered the question and it helped me.
This line
Usually this is done by following any URLs in the HTML supplied by the server after logging in.
Just want to update with Node js. The same thing can be acheived by request module and cheerio for parsing the html(which comes in response) in Node JS.
P.S. : In case anyone is looking where i found that dynamic url, I found that in HTML form which came to me in response. It was the action of that form.

Related

How is the data from PassportJS used/gathered from the front end?

I've been looking into trying to get steam logins working with a small website project I'm making, and I've been looking through some resources but come to a similar roadblock(or rather a question) on how this is useful.
For example, in this article: https://medium.com/geekculture/sign-in-through-steam-using-nodejs-e3202d4719
They do all of the setup and such, and when the user logs in, it sends them to the steam login, and if successful, redirects them to the nodeJS port (ex: 3001) + api, in the article it is "localhost:XXXX/", but my question is: How does this get used on the actual website/frontend? How does the website know when to grab from this API? How does it know if the login was successful or not? Would it be a useEffect that checks the localhost:XXXX/ api every time the page loads to see if the api returned valid data or if it was just NULL data?

Unable to log in to Azure web app via VS2015 web performance test

How do I correctly handle the login/authentication scenario for an Azure web app in my VS2015 web performance test?
I created an XML file as a data source for the WAAD username and password. I bind the username and password to the Form Post Parameters: login and passwd respectively at request: https://login.microsoftonline.com/xxxx/login
But when I run the test, the Web Browser tab shows this error:
We can't sign you in
Your browser is currently set to block JavaScript. You need to allow
JavaScript to use this service.
To learn how to allow JavaScript or to find out whether your browser
supports JavaScript, check the online help in your web browser.
I also get a number of errors like this:
The value of the ExpectedResponseUrl property
Validation xxxx.azurewebsites.net/xxxx/docs/xxxx.aspx does
not equal the actual response URL
login.microsoftonline.com/xxxx/wsfed. QueryString
parameters were ignored.
Any idea how I can successfully log in to the Azure web app via the web performance test?
There are several methods of login and authentication that can be used. Just binding values to form post parameters may not be sufficient or correct. You will find the login form has hidden session identities that must be passed as well as the login data. I find that recording a test two times using as nearly as possible the same inputs and doing the same activities helps. These two tests can then be compared to find the dynamic data that needs to be handled.
In a comment the questioner added "I noticed these parameters, n1-43 are different but I have no idea what they represent. How do I handle them?". I can have no idea what they represent as I do not know the website you are testing. You could ask the website developers. Or, better, treat them as dynamic data. Find where the values come from, save them into context variables and use them as needed. This is basic web test development. Here and here are two good articles on what to do.
The message about JavaScript not being supported can be ignored. Visual Studio web tests do not support JavaScript or any other "active" parts of a web page, they only support the html part. Your job as a tester is to simulate what the JavaScript does for the specific user journeys you are testing. That simulation is generally just filling in the correct values (via context parameters) in the recorded requests.
Unexpected response urls can be due to earlier failures, such as teh login not working. I suggest not worrying about them until all of the other test problems are solved. Then, if you need help ask another new question.

So how do properly setup a Redirect Uri?

A few days ago, I was playing around with a local API(not Google) and it required me to provide a Redirect Uri while trying to setup my app in their dashboard.
I did some googling and top searches led me to oAuth2.0 and Google Developer's website. But this API I'm using is not related with any of Google's so I thought it won't be relevant.
Is the setup of Redirect Uri for most APIs universal or almost the same? What programming languages can I use to implement this?
The description also says I need to parse a subscriber_number and access_token in JSON format. How do I do that?
Please note that I have already found a free hosting site via Firebase and have provided my own link. I also did the initial steps from another user to fire the required access_token that I needed to parse from the Redirect Uri. But accessing it from the browser right after triggering doesn't give me anything. I'm so clueless. Any help is much appreciated!

Iframe - access information workaround

I have site that is on another domain and iframe access will not work. The site on other domain is a questionnaire with series of questions. Once the questions are completed that site will give me a number.
My task is allow a user on my site to go over those questions and to be able to access that number.
How can I achieve that? I was thinking to spin up a node server and have it fetch the complete site and serve it to me, something like mediator. That way I will not have CORS issues.
Is it possible to stream the complete page along with css/js to my frontend (backbone app)? Run it in the window for the user. Upon completion it can send the answer back to node, so node can post it and return the number to me.
I am open to any suggestions :)

Why does new Facebook Javascript SDK not violate the "same origin policy"?

The new Facebook Javascript SDK can let any website login as a Facebook user and fetch data of a user...
So it will be, www.example.com including some Javascript from Facebook, but as I recall, that script is considered to be of the origin of www.example.com and cannot fetch data from facebook.com, because it is a violation of the "same origin policy". Isn't that correct? If so, how does the script fetch data?
From here: https://developer.mozilla.org/en/Same_origin_policy_for_JavaScript
The same origin policy prevents a
document or script loaded from one
origin from getting or setting
properties of a document from another
origin. This policy dates all the way
back to Netscape Navigator 2.0.
and explained slightly differently here: http://docs.sun.com/source/816-6409-10/sec.htm
The same origin policy works as
follows: when loading a document from
one origin, a script loaded from a
different origin cannot get or set
specific properties of specific
browser and HTML objects in a window
or frame (see Table 14.2).
The Facebook script is not attempting to interact with script from your domain or reading DOM objects. It's just going to do its own post to Facebook. It gets yous site name, not by interacting with your page, or script from your site, but because the script itself that is generated when you fill out the form to get the "like" button. I registered a site named "http://www.bogussite.com" and got the code to put on my website. The first think in this code was
iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fwww.bogussite.com&
so the script is clearly getting your site info by hard-coded URL parameters in the link to the iFrame.
Facebook's website is by far not alone in having you use scripts hosted on their servers. There are plenty of other scripts that work this way.. All of the Google APIs, for example, including Google Gears, Google Analytics, etc require you to use a script hosted on their server. Just last week, while I was trying to figure out how to do geolocation for our store finder for a mobile-friendly web app, I found a whole slew of geolocation services that had you use scripts hosted on their servers, rather than copying the script to your server.
I think, but am not sure, that they use the iframe method. At least the cross domain receiver and xfbml stuff for canvas apps uses that. Basically the javascript on your page creates an iframe within the facebook.com domain. That iframe then has permission to do whatever it needs with facebook. Communication back with the parent can be done with one of several methods, for example the url hash. But I'm not sure which if any method they use for that part.
If I recall, they use script tag insertion. So when a JS SDK call needs to call out to Facebook, it inserts a <script src="http://graph.facebook.com/whatever?params...&callback=some_function script tag into the current document. Then Facebook returns the data in JSON format as some_function({...}) where the actual data is inside the ... . This results in the function some_function being called in the origin of example.com using data from graph.facebook.com.

Resources