htaccess block and allow sites from the same domain name

htaccess block and allow sites from the same domain name - .htaccess

I run a service where I offer css files and scripts and images for a third party website www.myfantasyleague.com that is a football hosting service for fantasy football and recently they have went through some changes over the last couple of years.
I am trying to block certain websites on their servers that are using my work fraudulently, while allowing the folks whom purchase my work on the same domain to be able to use my work and it not be blocked by the HTA file. Once you create a football site MFL gives it a permanent server number and 5-digit code that never changes now from each year it stays the same. Here is a link to a MFL search for the word football, and you can see there are many sites and if you click on a few they all have different 5 digit IDs and some have different server ID’s.
The site I want to start with to block, would be this site url below, and the MFL domain has an option to have http and https now, so getting both protocols would be idea.
SITE TO BLOCK EXAMPLE
https://www67.myfantasyleague.com/2019/home/63928#0
SITE TO ALLOW EXAMPLE
http://www51.myfantasyleague.com/2019/home/46087#0
On myfantasyleague domains they give each site its own 5-digit unique code at the end of the url, and also many are on different server id’s, like the www67 and the www51, and you see those 2 links one is https and one is http.
In the past I use to use this code below and it will still work today, however once I add it to my root access file, it takes out both sites and I can’t have that, as I want to be able to control which sites are blocked by the server number and the 5-digit league ID if possible.
CODE THAT I TRIED THAT WORKED BUT KILLS ALL SITES FROM THAT DOMAIN NAME.
RewriteEngine On
RewriteCond %{HTTP_REFERER} https?://(www\.)?www(67).myfantasyleague.com.+(63928) [NC,OR]
RewriteRule .*\.(jpe?g|gif|bmp|png|js|css)$ [L]
Maybe i can turn that URL to be blocked into the actual IP and try blocking the IP?
I don't know what else to try and it might not even be possible i dont know. I appreciate any and all feedback.
Thank you

Though the pattern you posted certainly can be improved there is no reason why it should "block" all referrers from that host, if those sites send a referrer header at all ... Keep in mind that such header is optional and can be modified easily, so anyone can work around limitations you implement based on that header.
Blocking an IP on the other hand means you block all services from that host which is not what you want, as I understand. The numerical addition to the "www" prefix indicates that the service operator uses sharding to balance request load, an old and outdated approach. You can expect that to change any time, either for individual sites or in general, so better not rely on it. You are only interested in the numerical ID at the end of the referring URL.
Your issue with that approach you posted however is the actual rewriting rule: it is syntactically invalid. So I would expect it to raise an internal error, thus blocking all requests. I would suggest something like this instead:
RewriteEngine On
RewriteCond %{HTTP_REFERER} !/63928$ [OR]
RewriteCond %{HTTP_REFERER} !/63927$ [OR]
RewriteCond %{HTTP_REFERER} !/63926$
RewriteRule ^ [F,NC]
This would actively white list specific sites by mentioning their numerical ID and block all other requests by sending out a "Forbidden" header.
Please note that I have not actually tested above code, it might contain some minor glitch which you might have to fix. For such things it is important to have access to the http servers error log file. Not sure if you have it in your situation...

Related

Blocking direct access to an URL (not a file)

A drupal site is pushing International traffic over quota on my (Plesk 10.4) server, and it looks as though much of that of that (~250,000 visits/month) is direct access to the URL /user/register. We are already using the botcha module to filter out spambot registrations, but that approach is resulting in two full pages being served to each bot. And while Drupal
I'm thinking that a .htaccess rule which returns a 403 response to that URL unless the referer is from the site might be the way to go, but my .htaccess-fu is not strong, and I can only find examples for blocking hot-linking of images.
What do I need to add and where?
Thanks,
Richard

You'd be checking against the HTTP referer. It's not a guarantee way to block incoming traffic linked from a site other than yours, since the field can be easily forged. But you can try adding this to the htaccess file (above any rules that are already there):
RewriteEngine On
RewriteCond %{HTTP_REFERER} !^https?://(www\.)?your-domain\com/ [NC]
RewriteRule ^user/register - [L,F]

htaccess redirecting by ip range

I'm trying to block off range of ip addresses by using below commands in .htaccess file.
So 1.2.3.0 - 1.2.3.154 needs to be blocked for example but it's not working. Why is this the case?
<IfModule mod_rewrite.c>
RewriteCond %{REMOTE_HOST} ^1\.2\.3\.0/154
RewriteCond %{REQUEST_URI} !^/australia-ip-restriction?
RewriteCond %{REQUEST_URI} !^/templates/?
RewriteCond %{REQUEST_URI} !^/files/?
RewriteCond %{REQUEST_URI} /(.*)$
RewriteRule /(.*)$ /australia-ip-restriction [R=301,L]
</IfModule>

Redirecting and Blocking IP addresses with REGEX is risky. You can accidentally block or create a redirect loop for all traffic to your site, so you'll want some sort of countermeasures in place. Check log files to make sure the results you're seeking are actually working and watch for false positives. Visit the pages, make sure they work as intended. Test with a proxy for out-of-country IP access restrictions.
There are registrars around the world where you can look up various IP addresses: Ripe.net, APNIC.net, Arin.net to name a few. The IP addresses these registrars are responsible for are non-sequential in blocks. There are lists available by country where people have tried to collect these blocks in one place. Nirsoft maintains some of these lists. They are not complete by any means and IP addresses change hands on a daily basis. If a block of IPs originally listed to a company in Australia is picked up by a corporation like Virgin Mobile or Microsoft, you may end up accidentally bouncing traffic in the UK or the US.
If you have access to the core server configs or have the ability to install packages, to keep traffic down on the server you can install or setup iptables and have it block repeated attempts to areas you do not want traffic on. Also you can have it block any traffic that exceeds a speed threshold ('2 pages per second').
Let's say you have a hidden folder listed in robots.txt that you're telling bots to disallow all. If you see traffic to that folder then you can assume it's someone who looked in robots.txt and then decided to take a peek. This can be treated as a honeypot. Additionally you can hide code that has no keyword value inside of your documents and make it non-visible to humans. When someone runs this code or follows a hidden link in the code you can also add them to the list of offenders if they're not obeying your robots.txt file.
You can also setup databases in MySQL and use a language like PHP for dropping connections (or redirecting offsite) that access a honeypot as well or addresses that try to brute force attack a login form or an FTP account. I implement databases on web forms to keep track of people who have already completed the form. If there are multiple spam attempts they're added to my blacklist.
It's much easier to block users with a reverse look-up and searching out their TLD for their country code or by blocking users whose browser asks for a page in a specific native language than it is to block ranges of IP addresses.
Back to your example
In your code, each of the RewriteCond lines should be true unless you're using [or]. Here is a more in-depth reference for access restrictions with Apache
REMOTE_HOST is for domain names... so this line:
RewriteCond %{REMOTE_HOST} ^1\.2\.3\.0/154
Should be using REMOTE_ADDR instead:
RewriteCond %{REMOTE_ADDR} ^1\.2\.3\.(1[0-4][0-9]|15[0-4]|[0-9][0-9]|[0-9])$
RewriteRule ^(.*)$ australia-ip-restriction [R=301,L]

Redirecting all traffic from HTTP_REFERER to something else

Someone is scraping a friends blog, word for word, picture for picture.
I am trying some code out in htaccess so that when they do this in the future they will scrape some fake content.
Here is what I have at the moment.
RewriteEngine On
RewriteCond %{HTTP_REFERER} ^http://www.theirdomain.com
RewriteRule ^(.*)$ http://www.mydomain.com/rickrolling.html [R=301,L]
I want the scraping domain to always see the rickrolling.html page when they scrape.
Had a look on here and the examples I found don't seem to work. Above seems like it should work for me.
Thanks for your help

When they're scraping, do they actually set their own domain name as the HTTP Referer header? I suspect not. You'd be better off checking against their IP address or something. Do you have access to the server logs? Can you identify the requests that are coming from them?
But ultimately, you're here looking for a technical solution to a legal problem. You'd be better off doing a whois lookup on their domain name, and threatening to sue them and their host. If both you and their host are in the USA, there's a special legal tool called a DMCA Takedown Notice. I'm not in the USA myself, so I don't know much about it. But look it up.

URL/Subdomain rewrites (htaccess)

Say I have the following file:
http://www.example.com/images/folder/image.jpg
I want to serve it on
http://s1.example.com/folder/image.jpg
How can I do a htaccess rewrite to point it to it?
Like for example, I make a subdomain s1.example.com and then on that subdomain, I add a htaccess rule to point any files, to pull it from http://www.example.com/images/
Does serving files this way act as serving content from a cookieless domain?

First let me talk a bit about the concept of cookieless domains. Normally, when requesting anything over http, any relevant cookies are sent with the request. Cookies, are dependent on which domain they come from. The idea of using a cookieless domain is that you relocate static content that doesn't cookies, like images, to a separate domain so that no cookies will be sent with that request. This cuts out a small amount of traffic.
How much you gain from doing this depends on the type of page. The more images you have, the more you gain. If your site loads a big bunch of small images, such as avatars or image thumbnails, you might have a lot to gain. On the contrary, if your site doesn't use any cookies, you have nothing to gain. It's entirely possible that your page won't load noticeably faster, if it only uses a small amount of images, which will be cached between page loads anyway.
One thing to keep in mind, too, is that cookies set for example.com will also be sent with requests to s1.example.com as "s1." is a subdomain to example.com. You need to use www. (or any other subdomain of your choice) in order to separate the cookie spaces.
Secondly, if you decide that a cookieless domain is actually something worth trying, let's talk about the implementation.
Shikhar's solution is bad! While the solution appears to work on the surface, it actually defeats the purpose of using a cookieless domain. For every image, first the s1. url is tried. The s1. URL then makes a redirect to the www. domain which triggers a second http request. This is a loss, no matter how you look at it. What you need is a rewrite, which changes the URL internally on the web server, without the browser even realizing.
For simplicity, I'm assuming that all domains point to the same directory, so that www.example.com/something = example.com/something = s1.example.com/something = blub.example.com/something. This makes things simpler if you really need store the images physically in "www.example.com/images".
I'd recommend a .htaccess that looks a little something like this:
# Turn on rewrites
RewriteEngine On
# Rewrite all requests for images from s1, so they are fetched from the right place
RewriteCond %{HTTP_HOST} ^s1\.example\.com
# Prevent an endless loop from ever happening
RewriteCond %{REQUEST_URI} !^/images
RewriteRule (.+) /images/$1 [L]
# Redirect http://s1.example.com/ to the main page (in case a user tries it)
RewriteCond %{HTTP_HOST} ^s1\.example\.com
RewriteRule ^$ http://www.example.com/ [R=301,L]
# Redirect all requests with other subdomains, or without a subdomain to www.
# Eg, blub.example.com/something -> www.example.com/something
# example.com/something -> www.example.com/something
RewriteCond %{HTTP_HOST} !^www\.example\.com
RewriteCond %{HTTP_HOST} !^s1\.example\.com
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]
# Place any additional rewrites below.

Just for people's general info who like me may be investigating the benefits of this. From what I'm reading it isn't just cutting down on the upstream overhead of eliminating cookies sent with http requests. Apparently many browsers limit max connections to 1 domain/server to 6 concurrent. So if you have a separate domain on a diff server you get to double that to 12. Which to me would seem like the main potential here for a serious speed boost.
Though anyway, if I'm understanding this correctly. The other domain serving the static content needs to be located on another server from the main domain. Actually makes sense, avid firefox user and tweaker. When you check the about:config settings in firefox the max connections per server is set to 6 by default. A person can manually bump it up to a max of 8. But most firefox users probably don't spend enough time getting familiar with how to modify the browser and leave it to the default max of 6.
Not sure how many the other browsers set by default and then there is older browser versions that are still in use to consider. Bottomline ... makes perfect sense that enabling the browser to double the total number of connections using two servers would have to be a loadtime improvement. Using a sub-domain on the same server a person isn't going to be able to take advantage of that.

If you mean to redirect the traffic from www.example.com to s1.example.com, use the following htaccess on www.example.com
RewriteCond %{HTTP_HOST} ^(s1\.example\.com)
RewriteRule (.*) http://www.example.com%{REQUEST_URI}[R=301,NC,L]
If this is not what you are looking for, elaborate the question further.

I think you may have it backwards, (or very possibly I do). To clarify, if you're implementing a cookie-less subdomain & have a base URL of www. at least in this case, cookies are set on www, for example: a major cookie setter is google analytics, so when setting their script on my site it looks like this:
var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'analytics-acc-#],
['_setDomainName', '[www.valpocreative.com][1]'],
['_trackPageview']);
You can see here that I set my main domain to www, correct me if i'm wrong in my case I would need to redirect www to non www subdomain & not the other way around. This is also the cname setup made on my cpanel (cname= "cdn" pointing to www.domain.com)

Block user access to internals of a site using HTTP_REFERER

I have control over the HttpServer but not over the ApplicationServer or the Java Applications sitting there but I need to block direct access to certain pages on those applications. Precisely, I don't want users automating access to forms issuing direct GET/POST HTTP requests to the appropriate servlet.
So, I decided to block users based on the value of HTTP_REFERER. After all, if the user is navigating inside the site, it will have an appropriate HTTP_REFERER. Well, that was what I thought.
I implemented a rewrite rule in the .htaccess file that says:
RewriteEngine on
# Options +FollowSymlinks
RewriteCond %{HTTP_REFERER} !^http://mywebaddress(.cl)?/.* [NC]
RewriteRule (servlet1|servlet2)/.+\?.+ - [F]
I expected to forbid access to users that didn't navigate the site but issue direct GET requests to the "servlet1" or "servlet2" servlets using querystrings. But my expectations ended abruptly because the regular expression (servlet1|servlet2)/.+\?.+ didn't worked at all.
I was really disappointed when I changed that expression to (servlet1|servlet2)/.+ and it worked so well that my users were blocked no matter if they navigated the site or not.
So, my question is: How do I can accomplish this thing of not allowing "robots" with direct access to certain pages if I have no access/privileges/time to modify the application?

I'm not sure if I can solve this in one go, but we can go back and forth as necessary.
First, I want to repeat what I think you are saying and make sure I'm clear. You want to disallow requests to servlet1 and servlet2 is the request doesn't have the proper referer and it does have a query string? I'm not sure I understand (servlet1|servlet2)/.+\?.+ because it looks like you are requiring a file under servlet1 and 2. I think maybe you are combining PATH_INFO (before the "?") with a GET query string (after the "?"). It appears that the PATH_INFO part will work but the GET query test will not. I made a quick test on my server using script1.cgi and script2.cgi and the following rules worked to accomplish what you are asking for. They are obviously edited a little to match my environment:
RewriteCond %{HTTP_REFERER} !^http://(www.)?example.(com|org) [NC]
RewriteCond %{QUERY_STRING} ^.+$
RewriteRule ^(script1|script2)\.cgi - [F]
The above caught all wrong-referer requests to script1.cgi and script2.cgi that tried to submit data using a query string. However, you can also submit data using a path_info and by posting data. I used this form to protect against any of the three methods being used with incorrect referer:
RewriteCond %{HTTP_REFERER} !^http://(www.)?example.(com|org) [NC]
RewriteCond %{QUERY_STRING} ^.+$ [OR]
RewriteCond %{REQUEST_METHOD} ^POST$ [OR]
RewriteCond %{PATH_INFO} ^.+$
RewriteRule ^(script1|script2)\.cgi - [F]
Based on the example you were trying to get working, I think this is what you want:
RewriteCond %{HTTP_REFERER} !^http://mywebaddress(.cl)?/.* [NC]
RewriteCond %{QUERY_STRING} ^.+$ [OR]
RewriteCond %{REQUEST_METHOD} ^POST$ [OR]
RewriteCond %{PATH_INFO} ^.+$
RewriteRule (servlet1|servlet2)\b - [F]
Hopefully this at least gets you closer to your goal. Please let us know how it works, I'm interested in your problem.
(BTW, I agree that referer blocking is poor security, but I also understand that relaity forces imperfect and partial solutions sometimes, which you seem to already acknowledge.)

I don't have a solution, but I'm betting that relying on the referrer will never work because user-agents are free to not send it at all or spoof it to something that will let them in.

You can't tell apart users and malicious scripts by their http request. But you can analyze which users are requesting too many pages in too short a time, and block their ip-addresses.

Using a referrer is very unreliable as a method of verification. As other people have mentioned, it is easily spoofed. Your best solution is to modify the application (if you can)
You could use a CAPTCHA, or set some sort of cookie or session cookie that keeps track of what page the user last visited (a session would be harder to spoof) and keep track of page view history, and only allow users who have browsed the pages required to get to the page you want to block.
This obviously requires you to have access to the application in question, however it is the most foolproof way (not completely, but "good enough" in my opinion.)

Javascript is another helpful tool to prevent (or at least delay) screen scraping. Most automated scraping tools don't have a Javascript interpreter, so you can do things like setting hidden fields, etc.
Edit: Something along the lines of this Phil Haack article.

I'm guessing you're trying to prevent screen scraping?
In my honest opinion it's a tough one to solve and trying to fix by checking the value of HTTP_REFERER is just a sticking plaster. Anyone going to the bother of automating submissions is going to be savvy enough to send the correct referer from their 'automaton'.
You could try rate limiting but without actually modifying the app to force some kind of is-this-a-human validation (a CAPTCHA) at some point then you're going to find this hard to prevent.

If you're trying to prevent search engine bots from accessing certain pages, make sure you're using a properly formatted robots.txt file.
Using HTTP_REFERER is unreliable because it is easily faked.
Another option is to check the user agent string for known bots (this may require code modification).

To make the things a little more clear:
Yes, I know that using HTTP_REFERER is completely unreliable and somewhat childish but I'm pretty sure that the people that learned (from me maybe?) to make automations with Excel VBA will not know how to subvert a HTTP_REFERER within the time span to have the final solution.
I don't have access/privilege to modify the application code. Politics. Do you believe that? So, I must to wait until the rights holder make the changes I requested.
From previous experiences, I know that the requested changes will take two month to get in Production. No, tossing them Agile Methodologies Books in their heads didn't improve anything.
This is an intranet app. So I don't have a lot of youngsters trying to undermine my prestige. But I'm young enough as to try to undermine the prestige of "a very fancy global consultancy services that comes from India" but where, curiously, there are not a single indian working there.
So far, the best answer comes from "Michel de Mare": block users based on their IPs. Well, that I did yesterday. Today I wanted to make something more generic because I have a lot of kangaroo users (jumping from an Ip address to another) because they use VPN or DHCP.

You might be able to use an anti-CSRF token to achieve what you're after.
This article explains it in more detail: Cross-Site Request Forgeries

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string