Get URL of last redirected address NodeJS - node.js

I'm trying to get the last redirected address of a LinkedIn address: https://www.linkedin.com/school/18451/?legacySchoolId=18451 which in the browser gives: https://www.linkedin.com/school/babson-college/
In NodeJS, I have tried the following (I have tried all the solutions of that post):
request({ url: 'https://www.linkedin.com/school/18451/?legacySchoolId=18451', followRedirect: false }, function (err, res, body) {
console.log(res.headers.location);
});
But I'm still getting the same initial address (legacySchoolId) instead of the final (babson) one. It seems that the redirect is performed by a javascript function so I was wondering how I could get the last address in all use cases.

I tested it and I see two obstacles here:
1) You get the final URL only when you are logged in, otherwise you get a JS redirect to an authwall.
2) The final URL that you see in the browser does not come from a redirect, instead the displayed URL is just rewritten using replaceState (HTML5 history API). There is no navigation to a new page there.
I see two options to solve this:
1) Use a headless browser like Puppeteer. Write code to log in with your username and password and then navigate to those URLs, wait a bit (for example until some company info gets rendered) and then read the current URL.
2) Simulate only the most necessary requests and extract the info from the page (not sure if it works with LinkedIn though), using a library such as slimtomato.* You'd then start by simulating a login with your username and password and then use the same tomato object (or at least the same cookie jar) for the requests to those school links in order to get the final URLs. I didn't find a straight-forward way to see the final URL in the page source, but what would still work in this specific case though is parsing the page for this meta tag...
<meta name="apple-itunes-app" content="app-id=288429040, affiliate-data=ct=campaign_vw_smart_app_banner&pt=10746, app-argument=voyager://school/babson-college/?trk=vw_smart_app_banner">
...and then using the app-argument value (voyager://school/babson-college/?trk=vw_smart_app_banner) without the query and replacing voyager:// by https://www.linkedin.com/.
*: Disclaimer: I wrote that library. But I didn't find a good alternative with the same scope.

Related

Rendering a user modified page using PhantomJS

My use case is: a user goes onto a webpage and modifies it by either filling in a form, populating the page with data from the database, or dragging around some draggables on the page. He can then download the page he modified as pdf. I was thinking of using PhantomJS to do the conversion from html to pdf.
I understand the basic functionality of PhantomJS and got the basic example working but in all the examples I've seen, either a local file or a url is passed in. Example:
page.open('./test.html', function () { ... }
How would I render the page that is getting modified by a user using PhantomJS? I have 2 ideas:
Have the url change as the user modifies the page, and simply pass in the url. For example, the url contains the position of a draggable div.
Send the modified html to back-end, save it, and run PhantomJS
Do these solutions make sense? I'm hoping there would be a simpler way.

Difference between the <a> tag and get request

I have a perhaps simple question. What would be the difference between an <a> tag and a normal GET request with any element. I know the <a> tag automatically sends you to the url specified in its href attribute. So I assume that a Get request does something similar in it's success callback (as demonstrated below)
But let's say that I also want to send some information along with a normal get request when a for example <span> element is clicked on so I write:
$('span').click(() => {
$.ajax({
url: '/someurl',
type: 'GET',
data: {
title: someTitle,
email: someEmail
},
success: (data) => {
window.location = '/someurl';
}
});
});
Is there any way to achieve this with an <a> tag? Sending information to the server so it's available in req.query.title and req.query.email ?
Doing the ajax request above will run my app.get('/someurl',(req,res)=>{})twice because I am sending a GET request to send the data (title and email) and then I am making another GET request when I write window.location = '/someurl' How can I redo this so that it only sends the GET request ONCE but also allows for the sending and storing information to the req object AND ensures that the browser is now displaying /someurl.
Just create the appropriate query string in the URL you put in the href of the <a> tag and it will work just like your ajax call. Suppose someTitle has the value of "The Hobbit" and someEmail has the value of foo#whatever.com, then you can construct that URL like this:
Click Me
A number of non-letter characters have to be escaped in URLs. In the above URL, the space is replaced with %20 and the # with %40. In your particular example, you could open the network tab in the chrome debugger and see the EXACT URL that Chrome was sending for your ajax call, copy that to the clipboard and insert it into your <a> tag.
Here's a table that shows what characters have to be replaced in a query string component (the part after & or after =):
I'm just wondering then, aside from semantic reasons, is there any other advantages to using an a tag instead of anything else?
<a> tags are understood by all sorts of machines that may read your page such as screen readers for the disabled or crawlers indexing your site. In addition, they work automatically with browser keyboard support, Ctrl-click to open a new tab. Whereas a piece of Javascript may not automatically support any of that functionality. So, basically, if the <a> tag can do what you need it is widely preferred because it has so much other default functionality that can be necessary or handy for users.
Hello

Embedding a youtube video in a mobile site works, but 'Domains protocols and ports must match' error is jamming browser

I am trying to dynamically embed a youtube video into a mobile web page by injecting the following code via jQuery.
$("#tagetId").append("http://www.youtube.com/embed/oHg5SJYRHA0' frameborder='0'>");
I am testing this on chrome ios and the video does render correctly however some part of the web page seems to think the video hasn't rendered and every half second or so I get a new instance of the following error.
Unsafe JavaScript attempt to access frame with URL http://mydomain.html from frame with URL http://www.youtube.com/embed/oHg5SJYRHA0. Domains, protocols and ports must match.
This seems to really jam up the browser and causes the the load event call back function (i.d. 'first line of code') to trigger over and over.
$('iframe').load(function(){
//first line of code
$(this).load(function(){
//second line of code
})
});
Is there a better way to do this. Can any one explain what I'm doing wrong?
This fixed it:
<iframe scrolling='no' class='youtube-player' style='height:200px;width:100%' src='https://www.youtube.com/embed/oHg5SJYRHA0?html5=1' frameborder='0'></iframe>
Not sure what you were doing with
$("#tagetId").append("http://www.youtube.com/embed/oHg5SJYRHA0' frameborder='0'>");
but that looks like malformed HTML being appended.
Maybe you just didn't append the whole iframe tag?

How to prevent CSRF attacks when a CRUD action uses a link instead of a form?

I have implemented CSRF protection on my website using a CSRF token in a hidden input field in my forms. However at some places in my website I don't use a form for certain actions, e.g. a user can click a link to delete something (e.g. /post/11/delete). Currently this is open to a CSRF attack, so I want to implement a prevention for these links. How can I do this? I can think of two possible ways:
Make all links (which for example delete something) into tiny forms with only one hidden field (the CSRF token) and one submit button (styled as a normal link).
Add the CSRF token to the query-string
I don't like either of those options:
Styling a submit button to act exactly as a link might have some issues getting it correct (cross platform)?
Although it will never be picked up by search engines and don't like some random string in my URL (just aesthetics).
So is there a way I'm overlooking or are those two my options?
Add a token to your links.
styling submit to look like link is not hard. Though there will be issues with middle click or 'copy link location' command. Obviously.
facebook / google are not afraid of putting 'random strings' in urls. Neither should you. (Adding nofollow to those links, and excluding them in robots.txt should solve your fears with SEO. That is in case you for some reason show REST links to guest users / search engines).
If you really don't want URL parameters with long random values, you could implement a confirmation page for each Delete action, and have a form with your hidden field there.
Requests received at /post/11/delete without valid token will make the server respond with the confirmation page.
Requests received at /post/11/delete with valid token will trigger the deletion.
Best practice is to not perform updates via a GET operation.
Here's a clever little script that will hook into all of your links and make them POST a single hidden variable in addition to the payload in the querystring. Hope this is helpful!
document.ready = function () {
var makeLinkPost = function(link) {
var handleClick = function(event) {
event.preventDefault();
$("<form action='" + this.href + "' method='POST'><input type='hidden' value='CSRF'/></form>'").appendTo("body").submit();
}
$(link).click(handleClick);
}
$("a").each(function() {
makeLinkPost(this);
})
}

Preventing iframe caching in browser

How do you prevent Firefox and Safari from caching iframe content?
I have a simple webpage with an iframe to a page on a different site. Both the outer page and the inner page have HTTP response headers to prevent caching. When I click the "back" button in the browser, the outer page works properly, but no matter what, the browser always retrieves a cache of the iframed page. IE works just fine, but Firefox and Safari are giving me trouble.
My webpage looks something like this:
<html>
<head><!-- stuff --></head>
<body>
<!-- stuff -->
<iframe src="webpage2.html?var=xxx" />
<!-- stuff -->
</body>
</html>
The var variable always changes. Although the URL of the iframe has changed (and thus, the browser should be making a new request to that page), the browser just fetches the cached content.
I've examined the HTTP requests and responses going back and forth, and I noticed that even if the outer page contains <iframe src="webpage2.html?var=222" />, the browser will still fetch webpage2.html?var=111.
Here's what I've tried so far:
Changing iframe URL with random var value
Adding Expires, Cache-Control, and Pragma headers to outer webpage
Adding Expires, Cache-Control, and Pragma headers to inner webpage
I'm unable to do any JavaScript tricks because I'm blocked by the same-origin policy.
I'm running out of ideas. Does anyone know how to stop the browser from caching the iframed content?
Update
I installed Fiddler2 as Daniel suggested to perform another test, and unfortunately, I am still getting the same results.
This is the test I performed:
Outer page generates random number using Math.random() in JSP.
Outer page displays a random number on the webpage.
Outer page calls iframe, passing in a random number.
Inner page displays a random number.
With this test, I'm able to see exactly which pages are updating, and which pages are cached.
Visual Test
For a quick test, I load the page, navigate to another page, and then press "back." Here are the results:
Original Page:
Outer Page: 0.21300034290246206
Inner Page: 0.21300034290246206
Leaving page, then hitting back:
Outer page: 0.4470929019483644
Inner page: 0.21300034290246206
This shows that the inner page is being cached, even though the outer page is calling it with a different GET parameter in the URL. For some reason, the browser is ignoring the fact that the iframe is requesting a new URL; it simply loads the old one.
Fiddler Test
Sure enough, Fiddler confirms the same thing.
(I load the page.)
Outer page is called. HTML:
0.21300034290246206
<iframe src="http://ipv4.fiddler:1416/page1.aspx?var=0.21300034290246206" />
http://ipv4.fiddler:1416/page1.aspx?var=0.21300034290246206 is called.
(I navigate away from the page and then hit back.)
Outer page is called. HTML:
0.4470929019483644
<iframe src="http://ipv4.fiddler:1416/page1.aspx?var=0.4470929019483644" />
http://ipv4.fiddler:1416/page1.aspx?var=0.21300034290246206 is called.
Well, from this test, it looks as though the web browser isn't caching the page, but it's caching the URL of the iframe and then making a new request on that cached URL. However, I'm still stumped as to how to solve this issue.
Does anyone have any ideas on how to stop the web browser from caching iframe URLs?
This is a bug in Firefox:
https://bugzilla.mozilla.org/show_bug.cgi?id=356558
Try this workaround:
<iframe src="webpage2.html?var=xxx" id="theframe"></iframe>
<script>
var _theframe = document.getElementById("theframe");
_theframe.contentWindow.location.href = _theframe.src;
</script>
I have been able to work around this bug by setting a unique name attribute on the iframe - for whatever reason, this seems to bust the cache. You can use whatever dynamic data you have as the name attribute - or simply the current ms or ns time in whatever templating language you're using. This is a nicer solution than those above because it does not directly require JS.
In my particular case, the iframe is being built via JS (but you could do the same via PHP, Ruby, whatever), so I simply use Date.now():
return '<iframe src="' + src + '" name="' + Date.now() + '" />';
This fixes the bug in my testing; probably because the window.name in the inner window changes.
As you said, the issue here is not iframe content caching, but iframe url caching.
As of September 2018, it seems the issue still occurs in Chrome but not in Firefox.
I've tried many things (adding a changing GET parameter, clearing the iframe url in onbeforeunload, detecting a "reload from cache" using a cookie, setting up various response headers) and here are the only two solutions that worked from me:
1- Easy way: create your iframe dynamically from javascript
For example:
const iframe = document.createElement('iframe')
iframe.id = ...
...
iframe.src = myIFrameUrl
document.body.appendChild(iframe)
2- Convoluted way
Server-side, as explained here, disable content caching for the content you serve for the iframe OR for the parent page (either will do).
AND
Set the iframe url from javascript with an additional changing search param, like this:
const url = myIFrameUrl + '?timestamp=' + new Date().getTime()
document.getElementById('my-iframe-id').src = url
(simplified version, beware of other search params)
After trying everything else (except using a proxy for the iframe content), I found a way to prevent iframe content caching, from the same domain:
Use .htaccess and a rewrite rule and change the iframe src attribute.
RewriteRule test/([0-9]+)/([a-zA-Z0-9]+).html$ /test/index.php?idEntity=$1&token=$2 [QSA]
The way I use this is that the iframe's URL end up looking this way: example.com/test/54/e3116491e90e05700880bf8b269a8cc7.html
Where [token] is a randomly generated value. This URL prevents iframe caching since the token is never the same, and the iframe thinks it's a totally different webpage since a single refresh loads a totally different URL :
example.com/test/54/e3116491e90e05700880bf8b269a8cc7.html
example.com/test/54/d2cc21be7cdcb5a1f989272706de1913.html
both lead to the same page.
You can access your hidden url parameters with $_SERVER["QUERY_STRING"]
To get the iframe to always load fresh content, add the current Unix timestamp to the end of the GET parameters. The browser then sees it as a 'different' request and will seek new content.
In Javascript, it might look like:
frames['my_iframe'].location.href='load_iframe_content.php?group_ID=' + group_ID + '&timestamp=' + timestamp;
I found this problem in the latest Chrome as well as the latest Safari on the Mac OS X as of Mar 17, 2016. None of the fixes above worked for me, including assigning src to empty and then back to some site, or adding in some randomly-named "name" parameter, or adding in a random number on the end of the URL after the hash, or assigning the content window href to the src after assigning the src.
In my case, it was because I was using Javascript to update the IFRAME, and only switching the hash in the URL.
The workaround in my case was that I created an interim URL that had a 0 second meta redirect to that other page. It happens so fast that I hardly notice the screen flash. Plus, I made the background color of the interim page the same as the other page, and so you notice it even less.
It is a bug in Firefox 3.5.
Have a look..
https://bugzilla.mozilla.org/show_bug.cgi?id=279048
I set iframe src attribute later in my app. To get rid of the cached content inside iframe at the start of the application I simply do:
myIframe.src = "";
... somewhere in the beginning of js code (for instance in jquery $() handler)
Thanks to
http://www.freshsupercool.com/2008/07/10/firefox-caching-iframe-data/
I also had this problem in 2016 with iOS Safari. What seemed to work for me was
giving a GET-parameter to the iframe src and a value for it like this
<iframe width="60%" src="../other/url?cachebust=1" allowfullscreen></iframe>
I also met this issue, after trying different browsers, and a ton of trial and error, I came up with this solution, which works well in my case:
import { defineComponent } from 'vue'
import { v4 as uuid } from 'uuid'
export default defineComponent({
setup() {
return () => (
// append a uuid after `?` to prevent browsers from caching it
<iframe src={`https://www.example.com?${uuid()}`} frameborder='0' />
)
},
})
If you want to get really crazy you could implement the page name as a dynamic url that always resolves to the same page, rather than the querystring option?
Assuming you're in an office, check whether there's any caching going on at a network level. Believe me, it's a possibility. Your IT folks will be able to tell you if there's any network infrastructure around HTTP caching, although since this only happens for the iframe it's unlikely.
Have you installed Fiddler2?
It will let you see exactly what is being requested, what is being sent back, etc. It doesn't sound plausible that the browser would really hit its cache for different URLs.
Make the URL of the iframe point to a page on your site which acts as a proxy to retrieve and return the actual contents of the iframe. Now you are no longer bound by the same-origin policy (EDIT: does not prevent the iframe caching issue).

Resources