RewriteRule Preventing Google from Indexing Site - .htaccess

I have a site built on ExpressionEngine (EE). By default, EE requires index.php to be present in the first segment of the URL. To pretty up my URLs, I use a .htaccess RewriteRule:
# Remove index.php from ExpressionEngine URLs
RewriteCond $1 !\.(gif|jpe?g|png)$ [NC]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ /index.php?/$1 [L]
The entire site is also served with SSL, which I accomplish with another RewriteRule:
# Force SSL
RewriteCond %{SERVER_PORT} 80
RewriteRule ^(.*)$ https://%{HTTP_HOST}/$1 [R,L]
Recently, the client asked to move their RSS feeds to Feedburner. However, Feedburner doesn't like https URLs, so I had to modify my SSL RewriteRule to not force SSL on feed pages:
# Force SSL except on RSS feeds
RewriteCond %{SERVER_PORT} 80
RewriteCond %{REQUEST_URI} !^/feeds/ [NC]
RewriteCond %{REQUEST_URI} !^/index\.php [NC]
RewriteRule ^(.*)$ https://%{HTTP_HOST}/$1 [R,L]
So my whole .htaccess file looks like this:
RewriteEngine On
RewriteBase /
# Force SSL except on RSS feeds
RewriteCond %{SERVER_PORT} 80
RewriteCond %{REQUEST_URI} !^/feeds/ [NC]
RewriteCond %{REQUEST_URI} !^/index\.php [NC]
RewriteRule ^(.*)$ https://%{HTTP_HOST}/$1 [R,L]
# Remove index.php from ExpressionEngine URLs
RewriteCond $1 !\.(gif|jpe?g|png)$ [NC]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ /index.php?/$1 [L]
As soon as I added the feeds rule to the .htaccess file, however, Google stopped indexing the site's pages. The sitemap URL that's submitted to Google is /index.php/sitemap, so I'm thinking that index.php is playing a role here.
How can I adjust my .htaccess file to allow SSL on my feed pages, but not mess up Google's indexing?

This was happening because the rule
RewriteCond %{REQUEST_URI} !^/index\.php [NC]
was preventing any URLs starting with index.php from being redirected to HTTPS.
The reason Google stopped indexing the site is because the sitemap is dynamically generated, and uses the current host URL to create the links.
Since /index.php/sitemap was no longer being redirected to HTTPS, Google was indexing URLs starting with HTTP, which were totally new as far as Google was concerned, because it had been indexing HTTPS URLs up to that point.

Related

Force site to HTTPS except for some pages and Facebook crawler

There are a few similar questions to this, but none really covered everything I need to do and I'm a bit over my head!
I have an existing wordpress site. I want to force the home page and any new subpages to HTTPS but force existing subpages (about 20 of them) to HTTP. Reason being these subpages have long Facebook comment threads that I don't want to lose, and the canonical workarounds only retain likes/shares, not comments. To retain likes/shares, the Facebook crawler needs to be able to access the HTTP version of the home page.
So I need to work out the code for htaccess to enable:
1. Force site generally to be HTTPS
2. Force certain pages to be HTTP
3. Allow the Facebook crawler to access the HTTP version of the home page (only).
Any help greatly appreciated.
EDIT added code I thought I'd try, but haven't:
RewriteEngine On
# Go to https for all but existing subpages
RewriteCond %{SERVER_PORT} 80
RewriteCond %{REQUEST_URI} !^ page1 | page2 | page3 $ [NC]
RewriteRule ^(.*)$ https://www.example.com/$1 [R,L]
# Go to http for existing subpages
RewriteCond %{SERVER_PORT} !80
RewriteCond %{REQUEST_URI} ^ page1 | page2 | page3 $ [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [R,L]
Not sure where to put the Facebook crawler exception, nor whether I have the correct syntax to exclude pages, bearing in mind it's a wordpress site.
You can check the facebook crawler user agent, which list here.
# Go to http for home page if Facebook Crawler
RewriteCond %{SERVER_PORT} !80
RewriteCond %{HTTP_USER_AGENT} ^facebookexternalhit|Facebot
RewriteRule ^$ http://www.example.com/ [R,L]
RewriteCond %{HTTP_USER_AGENT} ^facebookexternalhit|Facebot
RewriteRule ^$ - [L]
# Go to https for all but existing subpages
RewriteCond %{SERVER_PORT} 80
RewriteCond %{REQUEST_URI} !^/(page1|page2|page3)$ [NC]
RewriteRule ^(.*)$ https://www.example.com/$1 [R,L]
# Go to http for existing subpages
RewriteCond %{SERVER_PORT} !80
RewriteCond %{REQUEST_URI} ^/(page1|page2|page3)$ [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [R,L]
Thanks #ben, that worked almost perfectly. The only change I needed to make was add a slash to each page name I wanted to redirect, as I got 404 errors without them. I assume because of the wordpress URL formatting. So the full working code is:
# Go to http for home page if Facebook Crawler
RewriteCond %{SERVER_PORT} !80
RewriteCond %{HTTP_USER_AGENT} ^facebookexternalhit|Facebot
RewriteRule ^$ http://www.example.com/ [R,L]
RewriteCond %{HTTP_USER_AGENT} ^facebookexternalhit|Facebot
RewriteRule ^$ - [L]
# Go to https for all but existing subpages
RewriteCond %{SERVER_PORT} 80
RewriteCond %{REQUEST_URI} !^/(page1/|page2/|page3/)$ [NC]
RewriteRule ^(.*)$ https://www.example.com/$1 [R,L]
# Go to http for existing subpages
RewriteCond %{SERVER_PORT} !80
RewriteCond %{REQUEST_URI} ^/(page1/|page2/|page3/)$ [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [R,L]
I also added a og:url tag to the head of the home page, directing Facebook to crawl the HTTP version, maintaining the original like and share count.

Redirect a specifc but wild card * URL (with folder(s)) to new domain (same structure)

I need to redirect a specific URL (with structure) to a the same URL(s) using a new domain, but not other URLS.
domainA.com/company/careers*
domainB.com/company/careers*
The reason for this is a 3rd party vendor supplying a jquery based iframe app that perfoms a referrer check before loading.
I realize there is a bigger seo/duplicate content issue that needs to be addressed, but there is a lot of additional work that needs to happen before domainA.com is fully redirected to domainB.com so for now, Its only the "career" section.
The site is using IIS6 with HeliconTech's ISAP ReWrite3
http://www.helicontech.com/isapi_rewrite/doc/introduct.htm
Current Rules:
# Helicon ISAPI_Rewrite configuration file
# Version 3.1.0.59
<VirtualHost www.domainA.com www.domainB.com>
RewriteEngine On
#RewriteBase /
#RewriteRule ^pubs/(.+)\.pdf$ /404/?pub=$1.pdf [NC,R=301,L]
# Send away some bots
RewriteCond %{HTTP:User-Agent} (?:YodaoBot|Yeti|ZmEu|Morfeus\Scanner) [NC]
RewriteRule .? - [F]
# Ignore dirctories from FarCry Friendly URL processing
RewriteCond %{REQUEST_URI} !(^/measureone|^/blog|^/demo|^/_dev)($|/)
RewriteRule ^([a-zA-Z0-9\/\-\%:\[\]\{\}\|\;\<\>\?\,\*\!\#\#\$\ \(\)\^_`~]*)$ /index.cfm?furl=$1 [L,PT,QSA]
RewriteCond %{REQUEST_URI} ^/company/careers [NC]
RewriteRule ^company/careers/?(.*)$ http://www.domainname.com/company/careers/$1 [R=301,L]
# Allow CFFileServlet requests
RewriteCond %{REQUEST_URI} !(?i)^[\\/]CFFileServlet
RewriteBase /blog/
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule .* /blog/index.php [L]
</VirtualHost>
<VirtualHost blog.domainA.com>
RewriteEngine On
#redirect old blog.domainA.com/* posts to www.domainB.com/blog/*
RewriteCond %{HTTP_HOST} ^blog.domainA\.com [nc]
RewriteRule (.*) http://www.domainB.com/blog$1 [R=301,L]
</VirtualHost>
It seems that "RewriteBase /blog/" line corrupts your "careers" rule as it implies that the request should be domainA.com/blog/company/careers*
Please consider having it like this:
<VirtualHost www.domainA.com www.domainB.com>
RewriteEngine On
RewriteBase /
#RewriteRule ^pubs/(.+)\.pdf$ /404/?pub=$1.pdf [NC,R=301,L]
# Send away some bots
RewriteCond %{HTTP:User-Agent} (?:YodaoBot|Yeti|ZmEu|Morfeus\Fucking\Scanner) [NC]
RewriteRule .? - [F]
# Ignore dirctories from FarCry Friendly URL processing
RewriteCond %{REQUEST_URI} !(^/measureone|^/blog|^/demo|^/_dev)($|/)
RewriteRule ^([a-zA-Z0-9\/\-\%:\[\]\{\}\|\;\<\>\?\,\*\!\#\#\$\ \(\)\^_`~]*)$ /index.cfm?furl=$1 [L,PT,QSA]
RewriteCond %{REQUEST_URI} ^/company/careers [NC]
RewriteRule ^company/careers/?(.*)$ http://www.domainname.com/company/careers/$1 [R=301,L]
# Allow CFFileServlet requests
RewriteCond %{REQUEST_URI} !(?i)^[\\/]CFFileServlet
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^blog/.* /blog/index.php [L]
</VirtualHost>
If you still have issues, enable logging in httpd.conf by putting
RewriteLogLevel 9
and check how your request is processed in rewrite.log.
Just check to see if the request starts with /company/careers
RewriteEngine On
RewriteCond %{REQUEST_URI} ^/company/careers [NC]
RwriteRule ^company/careers/?(.*)$ http://domainB.com/company/careers/$1 [R=301,L]
See if that works for you.

Temporary URL redirect loop

I just bought a shared hosting account from bluehost.com & they gave me a temporary URL like http://ipaddress/~username/. The problem is that I get redirect loops when I try to access my URLs. Everything is working fine on my proper URL.
Proper URL = Accessing from the domain.
If you have any solution to this problem, please specify it will be a great help.
I have the following rewrite rules in my .htaccess files.
redirect www to non-www version
RewriteCond %{HTTP_HOST} www\.mydomain\.com$ [NC]
RewriteRule ^(.*)/?$ http://mydomain.com/$1 [L,R=301]
rewrite the domain.com/index.php to / (base-directory)
RewriteCond %{THE_REQUEST} /index\.php
RewriteRule ^(.*?)index\.php$ /$1 [R=301,NE,L]
rewrite all requests to index.php
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-l
RewriteRule ^(.*)$ index.php?url=$1 [QSA,NE,L]

Subdomain is taking over requests to domain

I've created a subdomain called mobile for my website throuch cPanel. I redirect mobile devices to that subdomain, but there is javascript that lives there that makes AJAX calls to the actual domain. I have structured these calls to go to website.com/mobile/.... However, these aren't going through, and I suspect that it's because it is looking for ... in my /mobile, but the request is supposed to be rewritten in .htaccess to website.com/index.php?params=mobile/....
Here's the .htaccess:
# redirect phones/tablets to mobile site
RewriteCond %{HTTP_USER_AGENT} "android|blackberry|ipad|iphone|ipod|iemobile|opera mobile|palmos|webos|googlebot-mobile" [NC]
RewriteCond %{HTTP_HOST} !mobile\.website\.com [NC]
RewriteCond %{REQUEST_URI} !^/mobile [NC]
RewriteRule ^(.*)$ http://www.mobile.website.com/$1 [L,R=302]
# not a file or directory
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
# website.com/home => website.com/index.php?params=home
RewriteRule ^(.+)(\?.+)?$ index.php?params=$1 [L,QSA]
This works on my local machine but not on the live server. I have created a sudomain locally via
<VirtualHost *:80>
DocumentRoot "C:/Program Files (x86)/Apache Software Foundation/Apache2.2/htdocs/website/mobile"
ServerName mobile.website.local
</VirtualHost>
and it works perfectly: when I go to mobile.website.local or website.local/mobile, I get the mobile site, and when I go to website.local/mobile/users/login I get the correct JSON output for the AJAX request.
How can I keep my mobile subdomain alive in /mobile/ but have requests to website.com/mobile/... be forwarded with the last rewrite rule?
Thanks!
Just add the specific redirect for your /mobile, forcing to ignore the file or directory statement:
RewriteCond %{REQUEST_URI} ^/mobile [NC]
RewriteRule ^(.+)(\?.+)?$ index.php?params=$1 [L,QSA]
# redirect phones/tablets to mobile site
RewriteCond %{HTTP_HOST} !mobile\.website\.com [NC]
RewriteCond %{REQUEST_URI} !^/mobile [NC]
RewriteCond %{QUERY_STRING} !^params=mobile(.*)$
RewriteRule ^(.*)$ http://www.mobile.website.com/$1 [L]
# not a file or directory
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
# website.com/home => website.com/index.php?params=home
RewriteRule ^(.+)(\?.+)?$ index.php?params=$1 [L,QSA]
Anything let me know and I'll see if I can help :)

Apache rewrite all URL's to https with www + a few exceptions

I've tried all the answers to similar stack questions and nothing has worked. I need to redirect all to https://www except for example.com/blogs/* and example.com/page-name.
I currently have this:
RewriteCond %{HTTPS} =off
RewriteRule ^(.*)$ https://www.example.com/$1 [R=301,L]
RewriteCond %{http_host} ^example.com [NC]
RewriteRule ^(.*)$ https://www.example.com/$1 [R=301,L]
which redirects everything except for https://example.com, it will NOT add the www.
You can see for yourself at https://moblized.com
RewriteEngine On
RewriteCond %{HTTPS} =off
RewriteRule ^(.*)$ https://www.moblized.com/$1 [R=301,L]
RewriteCond %{http_host} ^moblized.com [NC]
RewriteRule ^(.*)$ https://www.moblized.com/$1 [R=301,L]
RewriteCond %{SERVER_PORT} 80
RewriteCond %{REQUEST_URI} blogs
RewriteRule ^(.*)$ http://moblized.com/blogs/$1 [R,L]
# Rewrite URLs of the form 'x' to the form 'index.php?q=x'.
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !=/favicon.ico
RewriteRule ^(.*)$ index.php?q=$1 [L,QSA]
</IfModule>
# $Id: .htaccess,v 1.90.2.4 2009/12/07 12:00:40 goba Exp $
AddHandler php5-script .php
Thank you!
I hope I understood you correctly. You want:
redirect from example.com to www.example.com (except /blogs/ and /page-name)
redirect all pages to HTTPS (except /blogs/ and /page-name)
based on your current .htaccess under /page-name you mean /favicon.ico
Here are the rules for the above requirements -- put them into your .htaccess:
# activate rewrite engine
RewriteEngine On
# don't touch favicon.ico (always accept as is regardless of the domain or protocol)
RewriteRule ^favicon.ico$ - [L]
# don't touch /index.php (usually means already overwritten URL)
# otherwise we may enter into a loop
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^index\.php$ - [L]
# ensure trailing slash is present for /blogs -> /blogs/
RewriteRule ^blogs$ http://mobilized.com/blogs/ [R=301,QSA,L]
# /blogs/ should only be accessible via http://example.com/blogs/
RewriteCond %{HTTP_HOST} !^moblized\.com$ [NC]
RewriteRule ^blogs/(.*)$ http://mobilized.com/blogs/$1 [R=301,QSA,L]
RewriteCond %{HTTP_HOST} ^moblized\.com$ [NC]
RewriteCond %{HTTPS} =on
RewriteRule ^blogs/(.*)$ http://mobilized.com/blogs/$1 [R=301,QSA,L]
RewriteRule ^blogs/.* - [L]
# redirect to www.example.com if necessary
RewriteCond %{HTTP_HOST} ^moblized\.com$ [NC]
RewriteCond %{REQUEST_URI} !=/client-ipad-contest
RewriteRule ^(.*)$ https://www.moblized.com/$1 [R=301,QSA,L]
# redirect to HTTPS if not there already
RewriteCond %{HTTPS} !=on
RewriteCond %{REQUEST_URI} !=/client-ipad-contest
RewriteRule ^(.*)$ https://www.moblized.com/$1 [R=301,QSA,L]
# Rewrite URLs of the form 'x' to the form 'index.php?q=x'.
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ /index.php?q=$1 [L,QSA]
BTW, browser most likely will show "Untrusted Certificate" warning if your customer go to https://example.com. This is because HTTPS session has to be fully established first before the request starts processing by Apache's rewrite module.
If that is problem -- then consider buying another SSL certificate (or from another vendor) which will cover both example.com and www.example.com (GoDaddy does this for sure) or get wildcard certificate which will cover all subdomains -- *.example.com (but this most likely will be much more expensive).
UPDATE: After simulating your requirements locally (sorry, I have no SSL with working Apache, so I have replaced it (in my testing) with different kind of rule/domain name) I have revised and updated the rules.
I've tested these rules locally (all pages are very simple, just include 1 image & css and a bit of text) -- everything looking good. Let me know if something does not work.

Resources