Is the url path '/#!' special or an exploit?

Is the url path '/#!' special or an exploit? - security

I am getting the path /#! requested regularly on my blog and i was wondering why this was (as it doesn't match to any URL/resource on my blog). The user agent says its always IE7 browsers which request this but from multiple different IP Addresses. I'm trying to work out if I can ignore this or if I need to do something about it.
I specifically want to know the following:
Is it some kind of special URL for certain web browsers/web servers?
Is it connected to a specific exploit?
Can I just ignore it?
If its relevant the site is hosted in windows azure and running on MVC4.

It's a hash-bang URL. They're used by some AJAX web applications, like Facebook and Twitter. Google has some special treatment for them, to make normally uncrawlable AJAX sites crawlable.
However, if your site is not running an app that uses them, you shouldn't be seeing them. And you definitely shouldn't be seeing them on the server side, since the whole point is that everything following a # in a URL is a fragment identifier, and should be stripped off by the user agent before requesting the URL from the server.
Edit: If I had to guess what's requesting such URLs, I'd say it might be some buggy bot. The fact that it's apparently pretending to be IE suggests that it might not be up to anything good; maybe it's a spambot of some sort. Anyway, the requests as such are most likely harmless, and you can ignore them. If it makes you feel better, you could always set up a rewrite rule to explicitly reject them, something like:
RewriteRule \x23 - [F]
This should reject any requests for URLs containing the # character with a 403 Forbidden error.

Well, # is a valid anchor that just means "the page". You can also make a '!' anchor, e.g.
<!-- some html here -->
Click me!
<!-- lots more html -->
<div id="!">
Wooaaaah!
</div>
So my guess is that you can safely ignore it... but that's just a guess ;)

Related

Can .htaccess be configured to retain the same address on different pages?

Im configuring a desktop and mobile version of my site and was looking to use js to test for browser dimensions and then load the relevant version, however the problem is if someone shares a link from the mobile version and sends it to a desktop user then they circumvented the check. Is there a way to configure .htaccess (or some other method) to have the address bar show 'mysite.com' even though i would be loading 'mysite.com/mobile.htm'? I know i can always use media queries but that has the downfall of loading unused assets, so this method would be alot better.

Use a rewrite instead of a redirect. With a redirect, the browser is instructed to go to another address. With a URL rewrite, the server just responds with the contents of a different URL.
For just this page it will be simple, but it could be complicated, based on your site.
Another way is to include a little JS in every page to make sure you are on the right one for the device and redirect to the other if not. It would help if there was some pattern to easily determine the corresponding page.

Is there any way to tell a browser that this is a bad URL to remember?

I'm sending emails to customers, and I'm providing a custom URL for each, which when they go to, will log them in.
This is fine, except if they are using a shared browser that will remember the URL.
Is there any way at all to suggest to the browser that it shouldn't remember a URL?
Edit: This question has nothing to do with caching of the page.

Have the link log them in once. Then make them create credentials that let them access the site in the future. Whats to stop a random person from typing in the url and gaining access to the content?

Yes. You can redirect them with a 301 or 302. Then the browser won't save the URL they went to. At least that work with the Mozilla based browsers and I would imagine others too.
Another way, it is uglier though is to reply with an error and include a body which does a refresh. Whether that works in most browsers, probably not. However, browsers do not cache pages that return an error (404 Page Not Found would work, you could also use 403 Forbidden.)
Other than that, there isn't much you can do. JavaScript does not allow you to temper with the history anymore...

Firefox or Chrome plugin to block and filter all outgoing connections

In Firefox or Chrome I'd like to prevent a private web page from making outgoing connections, i.e. if the URL starts with http://myprivatewebpage/ or https://myprivatewebpage/ in a browser tab, then that browser tab must be restricted so that it is allowed to load images, CSS, fonts, JavaScript, XmlHttpRequest, Java applets, flash animations and all other resources only from http://myprivatewebpage/ or https://myprivatewebpage/, i.e. an <img src="http://www.google.com/images/logos/ps_logo.png"> (or the corresponding <script>new Image(...) must not be able to load that image, because it's not on myprivatewebpage. I need a 100% and foolproof solution: not even a single resource outside myprivatewebpage can be accessible, not even at low probability. There must be no resource loading restrictions on Web pages other than myprivatewebpage, e.g. http://otherwebpage/ must be able to load images from google.com.
Please note that I assume that the users of myprivatewebpage are willing to cooperate to keep the web page private unless it's too much work for them. For example, they would be happy to install a Chrome or Firefox extension once, and they wouldn't be offended if they see an error message stating that access is denied to myprivatewebpage until they install the extension in a supported browser.
The reason why I need this restriction is to keep myprivatewebpage really private, without exposing any information about its use to webmasters of other web pages. If http://www.google.com/images/logos/ps_logo.png was allowed, then the use of myprivatewebpage would be logged in the access.log of Google's ps_logo.png, so Google's webmasters would have some information how myprivatewebpage is used, and I don't want that. (In this question I'm not interested in whether the restriction is reasonable, but I'm only interested in the technical solutions and its strengths and weaknesses.)
My ideas how to implement the restriction:
Don't impose any restrictions, just rely on the same origin policy. (This doesn't provide the necessary protection, the same origin policy lets all images pass through.)
Change the web application on the server so it generates HTML, JavaScript, Java applets, flash animations etc. which never attempt to load anything outside myprivatewebpage. (This is almost impossibly hard to foolproof everywhere on a complicated web application, especially with user-generated content.)
Over-sanitize the web page using a HTML output filter on the server, i.e. remove all <script>, <embed> and <object> tags, restrict the target of <img src=, <link rel=, <form action= etc. and also restrict the links in the CSS files. (This can prevent all unwanted resources if I can remember all HTML tags properly, e.g. I mustn't forget about <video>. But this is too restrictive: it removes all dyntamic web page functionality like JavaScript, Java applets and flash animations; without these most web applications are useless.)
Sanitize the web page, i.e. add an HTML output filter into the webserver which removes all offending URLs from the generated HTML. (This is not foolproof, because there can be a tricky JavaScript which generates a disallowed URL. It also doesn't protect against URLs loaded by Java applets and flash animations.)
Install a HTTP proxy which blocks requests based on the URL and the HTTP Referer, and force all browser traffic (including myprivatewebpage, otherwebpage, google.com) through that HTTP proxy. (This would slow down traffic to other than myprivatewebpage, and maybe it doesn't protect properly if XmlHttpRequest()s, Java applets or flash animations can forge the HTTP Referer.)
Find or write a Firefox or Chrome extension which intercepts all outgoing connections, and blocks them based on the URL of the tab and the target URL of the connection. I've found https://developer.mozilla.org/en/Setting_HTTP_request_headers and thinkahead.js in https://addons.mozilla.org/en-US/firefox/addon/thinkahead/ and http://thinkahead.mozdev.org/ . Am I correct that it's possible to write a Firefox extension using that? Is there such a Firefox extension already?
Some links I've found for the Chrome extension:
http://www.chromium.org/developers/design-documents/extensions/notifications-of-web-request-and-navigation
https://groups.google.com/a/chromium.org/group/chromium-extensions/browse_thread/thread/90645ce11e1b3d86?pli=1
http://code.google.com/chrome/extensions/trunk/experimental.webRequest.html
As far as I can see, only the Firefox or Chrome extension is feasible from the list above. Do you have any other suggestions? Do you have some pointers how to write or where to find such an extension?

I've found https://developer.mozilla.org/en/Setting_HTTP_request_headers and thinkahead.js in https://addons.mozilla.org/en-US/firefox/addon/thinkahead/ and http://thinkahead.mozdev.org/ . Am I correct that it's possible to write a Firefox extension using that? Is there such a Firefox extension already?
I am the author of the latter extension, though I have yet to update it to support newer versions of Firefox. My initial guess is that, yes, it will do what you want:
User visits your web page without plugin. Web page contains ThinkAhead block that would send a simple version header to the server, but this is ignored as plugin is not installed.
Since the server does not see that header, it redirects the client to a page to install the plugin.
User installs plugin.
User visits web page with plugin. Page sends version header to server, so server allows access.
The ThinkAhead block matches all pages that are not myprivatewebpage, and does something like set the HTTP status to 403 Forbidden. Thus:
When the user visits any webpage that is in myprivatewebpage, there is normal behaviour.
When the user visits any webpage outside of myprivatewebpage, access is denied.
If you want to catch bad requests earlier, instead of modifying incoming headers, you could modify outgoing headers, perhaps screwing up "If-Match" or "Accept" so that the request is never honoured.
This solution is extremely lightweight, but might not be strong enough for your concerns. This depends on what you want to protect: given the above, the client would not be able to see blocked content, but external "blocked" hosts might still notice that a request has been sent, and might be able to gather information from the request URL.

How to best normalize URLs

I'm creating a site that allows users to add Keyword --> URL links. I want multiple users to be able to link to the same url (exactly the same, same object instance).
So if user 1 types in "http://www.facebook.com/index.php" and user 2 types in "http://facebook.com" and user 3 types in "www.facebook.com" how do I best "convert" them to what these all resolve to: "http://www.facebook.com/"
The back end is in Python...
How does a search engine keep track of URLs? Do they keep a URL then take what ever it resolves to or do they toss URLs that are different from what they resolve to and just care about the resolved version?
Thanks!!!

So if user 1 types in "http://www.facebook.com/index.php" and user 2 types in "http://facebook.com" and user 3 types in "www.facebook.com" how do I best "convert" them to what these all resolve to: "http://www.facebook.com/"
You'd resolve user 3 by fixing up invalid URLs. www.facebook.com isn't a URL, but you can guess that http:// should go on the start. An empty path part is the same as the / path, so you can be sure that needs to go on the end too. A good URL parser should be able to do this bit.
You could resolve user 2 by making a HTTP HEAD request to the URL. If it comes back with a status code of 301, you've got a permanent redirect to the real URL in the Location response header. Facebook does this to send facebook.com traffic to www.facebook.com, and it's definitely something that sites should be doing (even though in the real world many aren't). You might allow consider allowing other redirect status codes in the 3xx family to do the same; it's not really the right thing to do, but some sites use 302 instead of 301 for the redirect because they're a bit thick.
If you have the time and network resources (plus more code to prevent the feature being abused to DoS you or others), you could also consider GETting the target web page and parsing it (assuming it turns out ot be HTML). If there is a <link rel="canonical" href="..." /> element in the page, you should also treat that URL as being the proper one. (View Source: Stack Overflow does this.)
However, unfortunately, user 1's case cannot be resolved. Facebook is serving a page at / and a page at /index.php, and though we can look at them and say they're the same, there is no technical method to describe that relationship. In an ideal world Facebook would include either a 301 redirect response or a <link rel="canonical" /> to tell people that / was the proper format URL to access a particular resource rather than /index.php (or vice versa). But they don't, and in fact most database-driven web sites don't do this yet either.
To get around this, some search engines(*) compare the content at different [sub]domains, and to a limited extent also different paths on the same host, and guess that they're the same if the content is sufficiently similar. Of course this is a lot of work, requires a lot of storage and processing, and is ultimately not terribly reliable.
I wouldn't really bother with much of this, beyond fixing up URLs like in the user 3 case. From your description it doesn't seem that essential that pages that “are the same” have to share actual identity, unless there's a particular use-case you haven't mentioned.
(*: well, Google anyway; more traditional ones traditionally didn't and would happily serve up multiple links for the same page, but I'd assume the other majors are doing something similar now.)

There's no way to know, other than "magic" knowledge about the particular website, that "/index.php" is the same as fetching "/".
So, your problem, as stated, is impossible.

i'd save 3 link as separated, since you can never reliably tell they resolve to same page. it all depends on how the server (out of our control) resolve the url.

Can IIS 6 serve requests for pages with no extensions?

Is there any way in IIS to map requests to a particular URL with no extension to a given application.
For example, in trying to port something from a Java servlet, you might have a URL like this...
http://[server]/MyApp/HomePage?some=parameter
Ideally I'd like to be able to map everything under MyApp to a particular application, but failing that, any suggestions about how to achieve the same effect would be really helpful.

You can set the IIS6 to handle all requests, but the key to handle files without extensions is to tell the IIS not to look for the file.
http://weblogs.asp.net/scottgu/archive/2007/03/04/tip-trick-integrating-asp-net-security-with-classic-asp-and-non-asp-net-urls.aspx

You can also create an ISAPI filter that re-writes urls. The user enters a url with no extension, but the filter will interpret the request so that it does. Note that in IIS it's real easy to screw this up, so you might want to find a pre-written one. I haven't used any myself so I can't recommend a specific product that's any different than what you'd find via google, especially as I don't know your specific use case. But at least now you know what to search for.
You can also rewrite your urls using ASP.Net:
http://msdn.microsoft.com/en-us/library/ms972974.aspx

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string