Block user access to internals of a site using HTTP_REFERER - security

I have control over the HttpServer but not over the ApplicationServer or the Java Applications sitting there but I need to block direct access to certain pages on those applications. Precisely, I don't want users automating access to forms issuing direct GET/POST HTTP requests to the appropriate servlet.
So, I decided to block users based on the value of HTTP_REFERER. After all, if the user is navigating inside the site, it will have an appropriate HTTP_REFERER. Well, that was what I thought.
I implemented a rewrite rule in the .htaccess file that says:
RewriteEngine on
# Options +FollowSymlinks
RewriteCond %{HTTP_REFERER} !^http://mywebaddress(.cl)?/.* [NC]
RewriteRule (servlet1|servlet2)/.+\?.+ - [F]
I expected to forbid access to users that didn't navigate the site but issue direct GET requests to the "servlet1" or "servlet2" servlets using querystrings. But my expectations ended abruptly because the regular expression (servlet1|servlet2)/.+\?.+ didn't worked at all.
I was really disappointed when I changed that expression to (servlet1|servlet2)/.+ and it worked so well that my users were blocked no matter if they navigated the site or not.
So, my question is: How do I can accomplish this thing of not allowing "robots" with direct access to certain pages if I have no access/privileges/time to modify the application?

I'm not sure if I can solve this in one go, but we can go back and forth as necessary.
First, I want to repeat what I think you are saying and make sure I'm clear. You want to disallow requests to servlet1 and servlet2 is the request doesn't have the proper referer and it does have a query string? I'm not sure I understand (servlet1|servlet2)/.+\?.+ because it looks like you are requiring a file under servlet1 and 2. I think maybe you are combining PATH_INFO (before the "?") with a GET query string (after the "?"). It appears that the PATH_INFO part will work but the GET query test will not. I made a quick test on my server using script1.cgi and script2.cgi and the following rules worked to accomplish what you are asking for. They are obviously edited a little to match my environment:
RewriteCond %{HTTP_REFERER} !^http://(www.)?example.(com|org) [NC]
RewriteCond %{QUERY_STRING} ^.+$
RewriteRule ^(script1|script2)\.cgi - [F]
The above caught all wrong-referer requests to script1.cgi and script2.cgi that tried to submit data using a query string. However, you can also submit data using a path_info and by posting data. I used this form to protect against any of the three methods being used with incorrect referer:
RewriteCond %{HTTP_REFERER} !^http://(www.)?example.(com|org) [NC]
RewriteCond %{QUERY_STRING} ^.+$ [OR]
RewriteCond %{REQUEST_METHOD} ^POST$ [OR]
RewriteCond %{PATH_INFO} ^.+$
RewriteRule ^(script1|script2)\.cgi - [F]
Based on the example you were trying to get working, I think this is what you want:
RewriteCond %{HTTP_REFERER} !^http://mywebaddress(.cl)?/.* [NC]
RewriteCond %{QUERY_STRING} ^.+$ [OR]
RewriteCond %{REQUEST_METHOD} ^POST$ [OR]
RewriteCond %{PATH_INFO} ^.+$
RewriteRule (servlet1|servlet2)\b - [F]
Hopefully this at least gets you closer to your goal. Please let us know how it works, I'm interested in your problem.
(BTW, I agree that referer blocking is poor security, but I also understand that relaity forces imperfect and partial solutions sometimes, which you seem to already acknowledge.)

I don't have a solution, but I'm betting that relying on the referrer will never work because user-agents are free to not send it at all or spoof it to something that will let them in.

You can't tell apart users and malicious scripts by their http request. But you can analyze which users are requesting too many pages in too short a time, and block their ip-addresses.

Using a referrer is very unreliable as a method of verification. As other people have mentioned, it is easily spoofed. Your best solution is to modify the application (if you can)
You could use a CAPTCHA, or set some sort of cookie or session cookie that keeps track of what page the user last visited (a session would be harder to spoof) and keep track of page view history, and only allow users who have browsed the pages required to get to the page you want to block.
This obviously requires you to have access to the application in question, however it is the most foolproof way (not completely, but "good enough" in my opinion.)

Javascript is another helpful tool to prevent (or at least delay) screen scraping. Most automated scraping tools don't have a Javascript interpreter, so you can do things like setting hidden fields, etc.
Edit: Something along the lines of this Phil Haack article.

I'm guessing you're trying to prevent screen scraping?
In my honest opinion it's a tough one to solve and trying to fix by checking the value of HTTP_REFERER is just a sticking plaster. Anyone going to the bother of automating submissions is going to be savvy enough to send the correct referer from their 'automaton'.
You could try rate limiting but without actually modifying the app to force some kind of is-this-a-human validation (a CAPTCHA) at some point then you're going to find this hard to prevent.

If you're trying to prevent search engine bots from accessing certain pages, make sure you're using a properly formatted robots.txt file.
Using HTTP_REFERER is unreliable because it is easily faked.
Another option is to check the user agent string for known bots (this may require code modification).

To make the things a little more clear:
Yes, I know that using HTTP_REFERER is completely unreliable and somewhat childish but I'm pretty sure that the people that learned (from me maybe?) to make automations with Excel VBA will not know how to subvert a HTTP_REFERER within the time span to have the final solution.
I don't have access/privilege to modify the application code. Politics. Do you believe that? So, I must to wait until the rights holder make the changes I requested.
From previous experiences, I know that the requested changes will take two month to get in Production. No, tossing them Agile Methodologies Books in their heads didn't improve anything.
This is an intranet app. So I don't have a lot of youngsters trying to undermine my prestige. But I'm young enough as to try to undermine the prestige of "a very fancy global consultancy services that comes from India" but where, curiously, there are not a single indian working there.
So far, the best answer comes from "Michel de Mare": block users based on their IPs. Well, that I did yesterday. Today I wanted to make something more generic because I have a lot of kangaroo users (jumping from an Ip address to another) because they use VPN or DHCP.

You might be able to use an anti-CSRF token to achieve what you're after.
This article explains it in more detail: Cross-Site Request Forgeries

Related

htaccess block and allow sites from the same domain name

I run a service where I offer css files and scripts and images for a third party website www.myfantasyleague.com that is a football hosting service for fantasy football and recently they have went through some changes over the last couple of years.
I am trying to block certain websites on their servers that are using my work fraudulently, while allowing the folks whom purchase my work on the same domain to be able to use my work and it not be blocked by the HTA file. Once you create a football site MFL gives it a permanent server number and 5-digit code that never changes now from each year it stays the same. Here is a link to a MFL search for the word football, and you can see there are many sites and if you click on a few they all have different 5 digit IDs and some have different server ID’s.
The site I want to start with to block, would be this site url below, and the MFL domain has an option to have http and https now, so getting both protocols would be idea.
SITE TO BLOCK EXAMPLE
https://www67.myfantasyleague.com/2019/home/63928#0
SITE TO ALLOW EXAMPLE
http://www51.myfantasyleague.com/2019/home/46087#0
On myfantasyleague domains they give each site its own 5-digit unique code at the end of the url, and also many are on different server id’s, like the www67 and the www51, and you see those 2 links one is https and one is http.
In the past I use to use this code below and it will still work today, however once I add it to my root access file, it takes out both sites and I can’t have that, as I want to be able to control which sites are blocked by the server number and the 5-digit league ID if possible.
CODE THAT I TRIED THAT WORKED BUT KILLS ALL SITES FROM THAT DOMAIN NAME.
RewriteEngine On
RewriteCond %{HTTP_REFERER} https?://(www\.)?www(67).myfantasyleague.com.+(63928) [NC,OR]
RewriteRule .*\.(jpe?g|gif|bmp|png|js|css)$ [L]
Maybe i can turn that URL to be blocked into the actual IP and try blocking the IP?
I don't know what else to try and it might not even be possible i dont know. I appreciate any and all feedback.
Thank you
Though the pattern you posted certainly can be improved there is no reason why it should "block" all referrers from that host, if those sites send a referrer header at all ... Keep in mind that such header is optional and can be modified easily, so anyone can work around limitations you implement based on that header.
Blocking an IP on the other hand means you block all services from that host which is not what you want, as I understand. The numerical addition to the "www" prefix indicates that the service operator uses sharding to balance request load, an old and outdated approach. You can expect that to change any time, either for individual sites or in general, so better not rely on it. You are only interested in the numerical ID at the end of the referring URL.
Your issue with that approach you posted however is the actual rewriting rule: it is syntactically invalid. So I would expect it to raise an internal error, thus blocking all requests. I would suggest something like this instead:
RewriteEngine On
RewriteCond %{HTTP_REFERER} !/63928$ [OR]
RewriteCond %{HTTP_REFERER} !/63927$ [OR]
RewriteCond %{HTTP_REFERER} !/63926$
RewriteRule ^ [F,NC]
This would actively white list specific sites by mentioning their numerical ID and block all other requests by sending out a "Forbidden" header.
Please note that I have not actually tested above code, it might contain some minor glitch which you might have to fix. For such things it is important to have access to the http servers error log file. Not sure if you have it in your situation...

Redirect HTTP to HTTPS (Standard Domain) - with New Google Chrome Rules

I have used the following redirect within .htaccess to force HTTPS for a long time with no problem.
RewriteEngine On
RewriteCond %{HTTPS} off
RewriteRule ^(.*) https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L]
However, it seems that Google Chrome's latest update gives me a security warning.
Your connection is not private Attackers might be trying to steal your
information from ofertaclarocloud.com.co (for example, passwords,
messages or credit cards). Learn more
NET::ERR_CERT_COMMON_NAME_INVALID Automatically send some system
information and page content to Google to help detect dangerous apps
and sites. Privacy Policy
Does anyone know any way around this?
(Note: I am using a form on the page. It does not, however, have any sensitive information such as passwords, credit card information, etc. Just a simple form with name, phone, and email.)
(Note 2: I have also tried a few other solutions around StackOverflow to such as this.)
Thanks!

Hiding some GET parameters from URL

I am redirecting page using PHP header location:
Current URL in browser
https://mywebsite/open/firstpage/php/start.php?&cnt=us&language=en&url=http://secureURL.com
but want to show
https://mywebsite/open/firstpage/php/start.php?&cnt=us&language=en
I am using GET method on the other side to collect all variables. I have to hide &url in querystring but want to receive it on other side $_GET['url']
How can I share my &url without showing in URL querystring? HOw can I write htaccess?
Redirect for all URLs
RewriteEngine on
RewriteRule ^(.*) $1?%{QUERY_STRING}&url=http://secureURL.com [L]
Redirect for only /open/firstpage/php/start.php
RewriteEngine on
RewriteCond %{REQUEST_URI} ^/open/firstpage/php/start.php
RewriteRule ^(.*) $1?%{QUERY_STRING}&url=http://secureURL.com [L]
I think this is what you want.
You can't do that. If a parameter is not present in the query string, it won't be available anywhere, it's just not there. There's no such thing as "hiding" the query string.
You could, however, use some form of session mechanism to pass a piece of data from one page to another. You could put it in the $_SESSION, or use cookies. There may also be a way to achieve this through really arcane mod_rewrite magic, but you shouldn't go down that route. Really.
More importantly: what are you trying to achieve? Why are you trying to do this?
Aesthetic reasons? Then be aware that modern browsers tend to hide the query string part of the URI from the user.
Security reasons? Then you're doing it horribly wrong, you shouldn't use something so easily manipulated by the client.
User tracking? There are established solutions out there for that (say, Google Analytics).

Best way to restrict a website to a single browser (user agent)?

So I'm in the process of building my own web-application type project. However, I only want the website to be viewable through a web client of mine. I have set the web client's user agent setting to a custom name (MySecretClient) and am now attempting to only allow access from browsers with the user agent, MySecretClient. Everyone else gets redirected.
Is there a better way to go about doing this?
As with so many web technology questions, there is a strict, theoretical answer and a "good enough for what you probably want" answer: The strict answer is: You cant, it doesn't work that way. Since the client can send whatever user agent string it wants to, you have no way of knowing what client is actually behind any given request.
The "good enough" answer that will prevent the vast majority of users from seeing your site with the "wrong" user agent is documented here:
http://www.htaccesstools.com/articles/detect-and-redirect-iphone/
The relevant .htaccess block from the link, which redirects requests from iPhone user agents to an iPhone specific site is:
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} iPhone
RewriteCond %{REQUEST_URI} !^/my-iPhone-site/
RewriteRule .* /my-iPhone-site/ [R]
Which you could modify in your case to redirect users with the wrong client:
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} !^MySecretClient$
RewriteRule .* <URL of a tropical island paradise> [R]
There is one other answer to what might be your intention in doing this. If this is part of your application's security strategy, it is a bad idea! This is what's known as "security through obscurity" and is a well-established anti-pattern that should be avoided. Any but the most casual attacker of your software will quickly realize what's going on, figure out what client your application is meant to run on, and spoof it.
<?php
define('MY_USER_AGENT', 'Custom User Agent');
define('REDIRECT_LOCATION', 'http://www.google.com');
if ($_SERVER['HTTP_USER_AGENT'] !== MY_USER_AGENT) {
header('Location: ' . REDIRECT_LOCATION);
die();
}

Force HTTPS for specific URL

This should be a quick one... here is my current .htaccess file:
# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>
# END WordPress
What I need to do is make sure that if http://www.mydomain.com/cart/ is reached, it needs to force HTTPS ... so /cart/ and anything within /cart/
Once the request has been sent to http://www.mydomain.com/cart/, if there is any sensitive data in the request, it's too late. Force it to break! At least, it will give you an indication that there's something wrong with your links. More details in previous answers:
https://stackoverflow.com/a/8765067/372643
https://stackoverflow.com/a/8964190/372643
[ ... ] by the time the request reaches the server,
it's too late. If there is a MITM, he has done his attack (or part of
it) before you got the request.
The best you can do by then is to reply without any useful content. In
this case, a redirection (using 301 or 302 and the Location header)
could be appropriate. However, it may hide problems if the user (or
even you as a developer) ignores the warnings (in this case, the
browser will follow the redirection and retry the request almost
transparently).
Therefore, I would simply suggest returning a 404 status:
http://yoursite/ and https://yoursite/ are effectively two distinct sites. There is no reason to expect a 1:1 mapping of all
resources from the URI spaces from one to the other (just in the same
way as you could have a completely different hierarchy for
ftp://yoursite/).
More importantly, this is a problem that should be treated upstream: the link that led your user to this resource using http://
should be considered as broken. Don't make it work automatically.
Having a 404 status for a resource that shouldn't be there is fine. In
addition, returning an error message when there is an error is good:
it will force you (or at least remind you) as a developer that you
need to fix the page/form/link that led to this problem.
EDIT: (Example)
Let's say you have http://example.com/, the non-secure section of your site that allows the user to browse items. They're not logged in at that stage, so it's fine to do it over plain HTTP.
Now, it's cart/payment time. You want HTTPS. You send the user to https://example.com/cart/. If one of the links that sends the user to the cart part is using plain HTTP (i.e. http://example.com/cart/), it's a development mistake. It just shouldn't be there. Making the process break when you thought you were going to be sent to https://example.com/cart/ allows the developer to see it (and, once fixed, the user should never have the problem).
If it's just about the point to the HTTPS section of your site (typically, an HTTP GET via a link somewhere), it's not necessarily that big a risk.
Where automatic redirects become even more dangerous is when they hide bigger problems.
For example, you're on https://example.com/cart/creditcarddetails and you've filled in some information that should really just stay over SSL. However, the developer has made a mistake and a plain http:// link is used in the form. In addition, the developer (a user/human after all) has clicked on "don't show me this message again" in Firefox when it says "Warning: you're going from a secure page to a non-secure page" (by the way, unfortunately, Firefox warns a posteriori: it has already made the insecure request by the time it shows the user that message). Now, that GET/POST request with sensitive data is sent first to that incorrect plain http:// link and the automatic rewrites tells the browser to try the request again over https://. It looks fine because, as far as the user is concerned, this all happened in a fraction of a second. However, it's not: sensitive data was sent in clear.
Making the plain HTTP section of what should only be over HTTPS not do anything useful actually helps you see what's wrong more clearly. Since the users should never end up there anyway if the links are correctly implemented, this isn't really an issue for them.
Try adding this before the other rules (but after RewriteBase):
RewriteCond %{HTTPS} off
RewriteRule ^cart/(.*)$ https://www.mydomain.com/cart/$1 [R,L]

Resources