URL/Subdomain rewrites (htaccess) - .htaccess

Say I have the following file:
http://www.example.com/images/folder/image.jpg
I want to serve it on
http://s1.example.com/folder/image.jpg
How can I do a htaccess rewrite to point it to it?
Like for example, I make a subdomain s1.example.com and then on that subdomain, I add a htaccess rule to point any files, to pull it from http://www.example.com/images/
Does serving files this way act as serving content from a cookieless domain?

First let me talk a bit about the concept of cookieless domains. Normally, when requesting anything over http, any relevant cookies are sent with the request. Cookies, are dependent on which domain they come from. The idea of using a cookieless domain is that you relocate static content that doesn't cookies, like images, to a separate domain so that no cookies will be sent with that request. This cuts out a small amount of traffic.
How much you gain from doing this depends on the type of page. The more images you have, the more you gain. If your site loads a big bunch of small images, such as avatars or image thumbnails, you might have a lot to gain. On the contrary, if your site doesn't use any cookies, you have nothing to gain. It's entirely possible that your page won't load noticeably faster, if it only uses a small amount of images, which will be cached between page loads anyway.
One thing to keep in mind, too, is that cookies set for example.com will also be sent with requests to s1.example.com as "s1." is a subdomain to example.com. You need to use www. (or any other subdomain of your choice) in order to separate the cookie spaces.
Secondly, if you decide that a cookieless domain is actually something worth trying, let's talk about the implementation.
Shikhar's solution is bad! While the solution appears to work on the surface, it actually defeats the purpose of using a cookieless domain. For every image, first the s1. url is tried. The s1. URL then makes a redirect to the www. domain which triggers a second http request. This is a loss, no matter how you look at it. What you need is a rewrite, which changes the URL internally on the web server, without the browser even realizing.
For simplicity, I'm assuming that all domains point to the same directory, so that www.example.com/something = example.com/something = s1.example.com/something = blub.example.com/something. This makes things simpler if you really need store the images physically in "www.example.com/images".
I'd recommend a .htaccess that looks a little something like this:
# Turn on rewrites
RewriteEngine On
# Rewrite all requests for images from s1, so they are fetched from the right place
RewriteCond %{HTTP_HOST} ^s1\.example\.com
# Prevent an endless loop from ever happening
RewriteCond %{REQUEST_URI} !^/images
RewriteRule (.+) /images/$1 [L]
# Redirect http://s1.example.com/ to the main page (in case a user tries it)
RewriteCond %{HTTP_HOST} ^s1\.example\.com
RewriteRule ^$ http://www.example.com/ [R=301,L]
# Redirect all requests with other subdomains, or without a subdomain to www.
# Eg, blub.example.com/something -> www.example.com/something
# example.com/something -> www.example.com/something
RewriteCond %{HTTP_HOST} !^www\.example\.com
RewriteCond %{HTTP_HOST} !^s1\.example\.com
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]
# Place any additional rewrites below.

Just for people's general info who like me may be investigating the benefits of this. From what I'm reading it isn't just cutting down on the upstream overhead of eliminating cookies sent with http requests. Apparently many browsers limit max connections to 1 domain/server to 6 concurrent. So if you have a separate domain on a diff server you get to double that to 12. Which to me would seem like the main potential here for a serious speed boost.
Though anyway, if I'm understanding this correctly. The other domain serving the static content needs to be located on another server from the main domain. Actually makes sense, avid firefox user and tweaker. When you check the about:config settings in firefox the max connections per server is set to 6 by default. A person can manually bump it up to a max of 8. But most firefox users probably don't spend enough time getting familiar with how to modify the browser and leave it to the default max of 6.
Not sure how many the other browsers set by default and then there is older browser versions that are still in use to consider. Bottomline ... makes perfect sense that enabling the browser to double the total number of connections using two servers would have to be a loadtime improvement. Using a sub-domain on the same server a person isn't going to be able to take advantage of that.

If you mean to redirect the traffic from www.example.com to s1.example.com, use the following htaccess on www.example.com
RewriteCond %{HTTP_HOST} ^(s1\.example\.com)
RewriteRule (.*) http://www.example.com%{REQUEST_URI}[R=301,NC,L]
If this is not what you are looking for, elaborate the question further.

I think you may have it backwards, (or very possibly I do). To clarify, if you're implementing a cookie-less subdomain & have a base URL of www. at least in this case, cookies are set on www, for example: a major cookie setter is google analytics, so when setting their script on my site it looks like this:
var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'analytics-acc-#],
['_setDomainName', '[www.valpocreative.com][1]'],
['_trackPageview']);
You can see here that I set my main domain to www, correct me if i'm wrong in my case I would need to redirect www to non www subdomain & not the other way around. This is also the cname setup made on my cpanel (cname= "cdn" pointing to www.domain.com)

Related

htaccess - Transform get parameter value in a subdomain

In my web application, I currently have URLs like this:
https://example.com/mypage?company=companyname&otherparameter=othervalue&...
I would like to transform the above URL this way:
https://companyname.example.com/mypage?otherparameter=othervalue&...
so basically transforming the value of the GET parameter "company" into a subdomain while preserving the other GET parameters in the URL (and preserving, obviously, also the path of the file on the server).
I also need to exclude the "/api" directory from this rule (so all files under the "/api" subdirectory should be served as usual).
I know I need to use .htaccess but I can't find a way to get it to work. If someone's got a hint, that would be very helpful.
Thanks!
This will capture the "subdomain name" from any incoming request and add it as query parameter to the internally rewritten target:
RewriteEngine on
RewriteCond %{HTTP_HOST} !^(?:www\.)?example\.com$
RewriteCond %{HTTP_HOST} !^([^.]+)?example\.com$
RewriteCond %{REQUEST_URI} !^/api/?
RewriteRule ^ %{REQUEST_URI}?company=%1 [QSA,L]
This will take care of handling incoming requests. This does not somehow magically change references you hand out, so links embedded in HTML markup or javascript for example.
You need to make sure that your http server actually responds to requests to those "subdomain" based host names. A default virtual host is usually used for such thing. You also need to take care that the DNS resolution of such names works and points towards your http server. And finally you have to provide a valid SSL certificate for all those host names. A wildcard certificate is an option here, but unlike normal certificates that does not come free of charge.
It is a good idea to implement such general rules in the actual host configuration of your http server. You can use a distributed configuration file for this (".htaccess"), but that comes with a few disadvantages.

htaccess block and allow sites from the same domain name

I run a service where I offer css files and scripts and images for a third party website www.myfantasyleague.com that is a football hosting service for fantasy football and recently they have went through some changes over the last couple of years.
I am trying to block certain websites on their servers that are using my work fraudulently, while allowing the folks whom purchase my work on the same domain to be able to use my work and it not be blocked by the HTA file. Once you create a football site MFL gives it a permanent server number and 5-digit code that never changes now from each year it stays the same. Here is a link to a MFL search for the word football, and you can see there are many sites and if you click on a few they all have different 5 digit IDs and some have different server ID’s.
The site I want to start with to block, would be this site url below, and the MFL domain has an option to have http and https now, so getting both protocols would be idea.
SITE TO BLOCK EXAMPLE
https://www67.myfantasyleague.com/2019/home/63928#0
SITE TO ALLOW EXAMPLE
http://www51.myfantasyleague.com/2019/home/46087#0
On myfantasyleague domains they give each site its own 5-digit unique code at the end of the url, and also many are on different server id’s, like the www67 and the www51, and you see those 2 links one is https and one is http.
In the past I use to use this code below and it will still work today, however once I add it to my root access file, it takes out both sites and I can’t have that, as I want to be able to control which sites are blocked by the server number and the 5-digit league ID if possible.
CODE THAT I TRIED THAT WORKED BUT KILLS ALL SITES FROM THAT DOMAIN NAME.
RewriteEngine On
RewriteCond %{HTTP_REFERER} https?://(www\.)?www(67).myfantasyleague.com.+(63928) [NC,OR]
RewriteRule .*\.(jpe?g|gif|bmp|png|js|css)$ [L]
Maybe i can turn that URL to be blocked into the actual IP and try blocking the IP?
I don't know what else to try and it might not even be possible i dont know. I appreciate any and all feedback.
Thank you
Though the pattern you posted certainly can be improved there is no reason why it should "block" all referrers from that host, if those sites send a referrer header at all ... Keep in mind that such header is optional and can be modified easily, so anyone can work around limitations you implement based on that header.
Blocking an IP on the other hand means you block all services from that host which is not what you want, as I understand. The numerical addition to the "www" prefix indicates that the service operator uses sharding to balance request load, an old and outdated approach. You can expect that to change any time, either for individual sites or in general, so better not rely on it. You are only interested in the numerical ID at the end of the referring URL.
Your issue with that approach you posted however is the actual rewriting rule: it is syntactically invalid. So I would expect it to raise an internal error, thus blocking all requests. I would suggest something like this instead:
RewriteEngine On
RewriteCond %{HTTP_REFERER} !/63928$ [OR]
RewriteCond %{HTTP_REFERER} !/63927$ [OR]
RewriteCond %{HTTP_REFERER} !/63926$
RewriteRule ^ [F,NC]
This would actively white list specific sites by mentioning their numerical ID and block all other requests by sending out a "Forbidden" header.
Please note that I have not actually tested above code, it might contain some minor glitch which you might have to fix. For such things it is important to have access to the http servers error log file. Not sure if you have it in your situation...

Prevent Subdomains being viewed/crawled

cPanel only allows me to create 'AddOn' domains. I have pointed all my TLDS to the server which saves them under 'public_html/main/sites' directories '/site1.com' , '/site2.com' etc. mainwebsite.com will be served under 'public_html' and all my client sites under 'public_html/main/sites'
It also creates subdomains 'username.mainsite.com' How can i prevent google from indexing those subdomains yet still index the TLDS. And stop users from being able to access the TLD from the subdomain too?
If i created a RewriteRule would google still index the TLD? Or is there a better way to go about this?
RewriteEngine on
RewriteCond %{HTTP_HOST} !^www\.site1\.com$ [NC]
RewriteRule ^(.*)$ http://www.site1.com/$1 [L,R=301]
If the 301-redirect works (for you), it will work for search engines, too.
See Google’s documentation:
301 redirects are particularly useful in the following circumstances:
[…]
People access your site through several different URLs. If, for example, your home page can be reached in multiple ways - for instance, http://example.com/home, http://home.example.com, or http://www.example.com - it's a good idea to pick one of those URLs as your preferred (canonical) destination, and use 301 redirects to send traffic from the other URLs to your preferred URL.

Blocking direct access to an URL (not a file)

A drupal site is pushing International traffic over quota on my (Plesk 10.4) server, and it looks as though much of that of that (~250,000 visits/month) is direct access to the URL /user/register. We are already using the botcha module to filter out spambot registrations, but that approach is resulting in two full pages being served to each bot. And while Drupal
I'm thinking that a .htaccess rule which returns a 403 response to that URL unless the referer is from the site might be the way to go, but my .htaccess-fu is not strong, and I can only find examples for blocking hot-linking of images.
What do I need to add and where?
Thanks,
Richard
You'd be checking against the HTTP referer. It's not a guarantee way to block incoming traffic linked from a site other than yours, since the field can be easily forged. But you can try adding this to the htaccess file (above any rules that are already there):
RewriteEngine On
RewriteCond %{HTTP_REFERER} !^https?://(www\.)?your-domain\com/ [NC]
RewriteRule ^user/register - [L,F]

Too many Rewrite Rules in .htaccess

I had to redesign a site last week. The problem is that last urls weren't seo friendly so, in order to avoid Google penalizing my site because too many 404 errors, I have to create a lot of Rewrite Rules because all the content had awful URL's ( and that content had a good position on SERP's).
For example:
RewriteRule ^documents/documents_for_subject/22-ecuaciones-exponenciales-y-logaritmicas http://%{HTTP_HOST}/1o-bachillerato/matematicas-cc.ss/aritmetica-y-algebra/ecuaciones-exponenciales-y-logaritmicas [R=301,L]
Is this a problem on my performance? Is there another solution to my situation?
Thanks
They are in the same domain.
Then an internal redirect is much better. A header redirect sends the new URL to the browser and causes it to make a new request; an internal one is handled, as the name says, internally.
This should work:
RewriteRule ^documents/documents_for_subject/22-ecuaciones-exponenciales-y-logaritmicas /1o-bachillerato/matematicas-cc.ss/aritmetica-y-algebra/ecuaciones-exponenciales-y-logaritmicas [L]
Any performance issues are going to be negligible with this - except maybe if you have many thousands or tens of thousands of individual rules, those may slow down Apache. In that case, if you have access to the central server configuration, put the rules there instead of a .htaccess file, because instructions in the server config get stored in memory and are faster.
A. Yes using 301 is the right way to notify search bots about changed URLs and eventually your old URL's will be removed from search results.
B. You don't need to use %{HTTP_HOST} in your rewrite rule just use it like this:
RewriteRule ^documents/documents_for_subject/22-ecuaciones-exponenciales-y-logaritmicas http://%{HTTP_HOST}/1o-bachillerato/matematicas-cc.ss/aritmetica-y-algebra/ecuaciones-exponenciales-y-logaritmicas [R=301,L]
C. If you have lots of RewriteRules like above I recommend using RewriteMap or else use some scripting support (like PHP) to redirect from old to new URL with 301.

Resources