Detecting where user has come from a specific website - web

I'm wanting to know if it's possible to detect which website a user has come from and serve to them different content based on which website they have just come from.
So if they've come from any other website on the internet and landed on my page, they will see my normal html and css page, but if they come from a specific website (this specific website would have also been developed by me so I have control over the code server-side and client-side) then I want them to see something slightly different.
It's a very small difference that I want them to see, and that's why I don't want to consider taking them to a different version of the website or a different page.
I'm also not sure if this solution will be placed on the page they coming from or the page that they arriving on?
Hope that's clear. Thanks!

I would add a URL parameter like http://example.com?source=othersite. This way you can easily adjust the parameter and can use javascript to detect this and slightly alter your landing page.
Otherwise, you can use the HTTP referrer sent via the browser to detect where they came from, but you would need to tell us your back end technology to get an example of that, as it differs a bit.
In javascript, you can do something as easy as
if(window.location.href.indexOf('source=othersite') > 0)
{
// alter DOM here
}
Or you can use a URL Parameter parser as suggested here: How to get the value from the GET parameters?

What you want is the Referer: HTTP header. It will give the URL of the page which the user came from. Bear in mind that the Referer can easily be spoofed, so don't take it as a guarantee if security is an issue.
Browsers may disable the referer, though. Why not just use a URL parameter?

Related

Kentico: PortalTemplate.aspx explicitly throwing a 404 error when directly invoked

We work on a product that is a series of components that could be installed on different CMSs and provide different services. We take a CMS agnostic approach and try to use the same code in all the CMSs as much as possible (we try to avoid using CMS API as much as we can).
Some part of the code needs to work with the current URL for some redirections so we use Request.Url.ToString() that is something that has worked fine in other environments but in Kentico instead of returning the current page we always get a reference to CMSPages/PortalTemplate.aspx with a querystring parameter aliasPath that holds the real URL. In addition to that, requesting the Template page using a browser gives you a 404 error.
Example:
Real URL (this works fine on a browser):
(1) https://www.customer.com/Membership/Questionnaire?Id=7207f9f9-7354-df11-88d9-005056837252
Request.Url.ToString() (this gives you a 404 error on a browser):
(2) https://www.customer.com/CMSPages/PortalTemplate.aspx?Id=7207f9f9-7354-df11-88d9-005056837252&aliaspath=/Membership/Questionnaire
I've noticed that the 404 error is thrown explicitly by the template code when invoked directly. Please see below code from Page_Init method of PortalTemplate.aspx.cs:
var resolvedTemplatePage = URLHelper.ResolveUrl(URLHelper.PortalTemplatePage);
if (RequestContext.RawURL.StartsWithCSafe(resolvedTemplatePage, true))
{
// Deny direct access to this page
RequestHelper.Respond404();
}
base.OnInit(e);
So, if I comment the above code out my redirection works fine ((2) resolves to (1)). I know it is not an elegant solution but since I cannot / don't want to use Kentico API is the only workaround I could find.
Note that I know that using Kentico API will solve the issue since I'm sure I will find an API method that will return the actual page. I'm trying to avoid that as much as possible.
Questions: Am I breaking something? Is there a better way of achieving what I trying to accomplish? Can you think on any good reason I shouldn't do what I'm doing (security, usability, etc)?
This is kind of a very broad question so I was not able to find any useful information on Kentico docs.
I'm testing all this on Kentico v8.2.50 which is the version one of my customers currently have.
Thanks in advance.
It's not really recommended to edit the source files of Kentico, as you may start to run into issues with future upgrades and also start to see some unexpected behaviour.
If you want to get the original URL sent to the server before Kentico's routing has done its work, you can use Page.Request.RawUrl. Using your above example, RawUrl would return a value of /Membership/Questionnaire?Id=7207f9f9-7354-df11-88d9-005056837252, whereas Url will return a Uri with a value of https://www.customer.com/CMSPages/PortalTemplate.aspx?Id=7207f9f9-7354-df11-88d9-005056837252&aliaspath=/Membership/Questionnaire (as you stated).
This should avoid needing to use the Kentico API and also avoid having to change a file that pretty much every request goes through when using the portal engine.
If you need to get the full URL to redirect to, you can use something like this:
var redirectUrl = Request.Url.GetLeftPart(UriPartial.Authority) + Request.RawUrl;

Handle custom URL portion after /

I am honestly not sure what it is called, however hopefully someone here knows how and what exactly it is.
I want to be able to handle a URL sent to my server, and display different site based on the URL. Such as facebook and twitter do, (i.e. facebook.com/usernamehere) I assume that the server takes that link and parses to load the right information based on the URL, but I am not exactly sure what that is called or how I can achieve that effect.

Hiding route parameters

I am learning nodejs with express and i am creating my first single page application with the help of knockoutjs, i have a lot of routes and i am looking for a way to hide the parameters in the Url other than encoding them, if i have header links like :
http://www.mywebapp.com/login
http://www.mywebapp.com/logout
http://www.mywebapp.com/signup
http://www.mywebapp.com/users/username
http://www.mywebapp.com/users/1-8
can i still make those links appear as
http://www.mywebapp.com
no matter what the route to be called is?
if not possible can someone please explain why?
my application is completely ajax driven.
Well if your app is ajax then the url won't change. You can also wrap the entire thing in an iframe and only navigate inner frame. But keep in mind that this is generally bad practice as history and bookmarks don't work.

how to check if my website is being accessed using a crawler?

how to check if a certain page is being accessed from a crawler or a script that fires contineous requests?
I need to make sure that the site is only being accessed from a web browser.
Thanks.
This question is a great place to start:
Detecting 'stealth' web-crawlers
Original post:
This would take a bit to engineer a solution.
I can think of three things to look for right off the bat:
One, the user agent. If the spider is google or bing or anything else it will identify it's self.
Two, if the spider is malicious, it will most likely emulate the headers of a normal browser. Finger print it, if it's IE. Use JavaScript to check for an active X object.
Three, take note of what it's accessing and how regularly. If the content takes the average human X amount of seconds to view, then you can use that as a place to start when trying to determine if it's humanly possible to consume the data that fast. This is tricky, you'll most likely have to rely on cookies. An IP can be shared by multiple users.
You can use the robots.txt file to block access to crawlers, or you can use javascript to detect the browser agent, and switch based on that. If I understood the first option is more appropriate, so:
User-agent: *
Disallow: /
Save that as robots.txt at the site root, and no automated system should check your site.
I had a similar issue in my web application because I created some bulky data in the database for each user that browsed into the site and the crawlers were provoking loads of useless data being created. However I didn't want to deny access to crawlers because I wanted my site indexed and found; I just wanted to avoid creating useless data and reduce the time taken to crawl.
I solved the problem the following ways:
First, I used the HttpBrowserCapabilities.Crawler property from the .NET Framework (since 2.0) which indicates whether the browser is a search engine Web crawler. You can access to it from anywhere in the code:
ASP.NET C# code behind:
bool isCrawler = HttpContext.Current.Request.Browser.Crawler;
ASP.NET HTML:
Is crawler? = <%=HttpContext.Current.Request.Browser.Crawler %>
ASP.NET Javascript:
<script type="text/javascript">
var isCrawler = <%=HttpContext.Current.Request.Browser.Crawler.ToString().ToLower() %>
</script>
The problem of this approach is that it is not 100% reliable against unidentified or masked crawlers but maybe it is useful in your case.
After that, I had to find a way to distinguish between automated robots (crawlers, screen scrapers, etc.) and humans and I realised that the solution required some kind of interactivity such as clicking on a button. Well, some of the crawlers do process javascript and it is very obvious they would use the onclick event of a button element but not if it is a non interactive element such as a div. The following is the HTML / Javascript code I used in my web application www.so-much-to-do.com to implement this feature:
<div
class="all rndCorner"
style="cursor:pointer;border:3;border-style:groove;text-align:center;font-size:medium;font-weight:bold"
onclick="$TodoApp.$AddSampleTree()">
Please click here to create your own set of sample tasks to do
</div>
This approach has been working impeccably until now, although crawlers could be changed to be even more clever, maybe after reading this article :D

Secure isolated iFrame? Alternative?

I am running into a problem. I want to host an external page securely. Meaning, no JavaScript in the iFrame. Or it only execute safe code, such as change the text of its page or set the color of its page. And I want to keep CSS alive.
They should look the same from the source, but, no melacious code running behind. No ActiveX, no Flash, no Plug-in. I want them look correct without all the security compromise.
I have tried jQuery load(), but, it only works for internal pages, not external pages. And the CSS in that DIV overwrite my site's CSS, which is not what I wanted.
I am looking for an isolated frame like iframe. But, without security problem. Is this possible?
HTML5 now has a 'sandbox' option for iframes.
This will allow you to block code inside the iframe.
You can learn more at:
https://developer.mozilla.org/en-US/docs/Web/HTML/Element/iframe
You can create a server side stateful proxy, like a php script that read the remote page and clean whatever you don't like. Not a really simple thing to do, but I'm afraid there is no really easy way around.
I mean, for instance, you create proxy.php:
<?php
$remote = file($_GET['remote']);
// .. filter whatever you like in $remote then print it
And then link to a site using
<iframe src="proxy.php?remote=http://www.example.com"></iframe>
This is not a complete example, just a way of showing my idea.

Resources