htaccess block pages based on query string for crawlers - .htaccess

I would like to block some specific pages from being indexed / accessed by Google. This pages have a GET parameter in common and I would like to redirect bots to the equivalent page without the GET parameter.
Example - page to block for crawlers:
mydomain.com/my-page/?module=aaa
Should be blocked based on the presence of module= and redirected permanently to
mydomain.com/my-page/
I know that canonical can spare me the trouble of doing this but the problem is that those urls are already in the Google Index and I'd like to accelerate their removal. I have already added a noindex tag one month ago and I still see results in google search. It is also affecting my crawl credit.
What I wanted to try out is the following:
RewriteEngine on
RewriteCond %{QUERY_STRING} module=
RewriteCond %{HTTP_USER_AGENT} (googlebot|bingbot|Baiduspider) [NC]
Is this correct?
What should I add for the final redirection?
It's a tricky thing to do so before implementing anything I'd like to make sure it's the right thing to do.
Thanks

That would be:
RewriteEngine On
RewriteCond %{QUERY_STRING} module= [NC]
RewriteCond %{HTTP_USER_AGENT} (googlebot|bingbot|Baiduspider) [NC]
RewriteRule ^ %{REQUEST_URI}? [L,R=301]
Last ? in %{REQUEST_URI}? will remove previous query string.

Related

.htaccess: rewriteCond to another page within same URL with params

I've an condition for my .htaccess for crawlers and search engines which takes them to a "static" page where they can scrape all content.
Up until now I've had my domain {client}.realdomain.com where {client} is a subdomain for one client.
When the client then shares something on a social network, e.g. facebook/linkedin their crawlers are taken to my .htaccess which have following conditions (and this works)
URL example: http://{client}.realdomain.com/s/token
RewriteCond %{HTTP_USER_AGENT} (LinkedInBot/[0-9]|facebookexternalhit/[0-9]|Facebot|Twitterbot|twitterbot|Pinterest|pinterest|Google.*snippet|baiduspider|rogerbot|embedly|quora\ link\ preview|showyoubot|outbrain|slackbot|vkShare|W3C_Validator)
RewriteCond %{HTTP_HOST} ^(.+?)\.realdomain\.com$
RewriteRule ^s/(.*)$ http://%1.realdomain.com/static.php?token=$1 [NC,L]
will end up as http://{client}.realdomain.com/static.php?token=token
As said, anything here works perfect but now I'm moving into having different domains, so it can be
{client}.real-domain.com and {client}.sunset.com
I essentially what the same thing in my .htaccess but it should take the whole domain with it when it redirects so it will go to e.g. http://{client}.sunset.com?static.php=token=my-secret-token if a crawler comes to {client}.sunset.com/s/my-secret-token
How would I got about doing this? I seem to be a simple solution but for some reason I just can't seem to get my head around it.
Thanks
RewriteCond %{HTTP_USER_AGENT} (LinkedInBot/[0-9]|facebookexternalhit/[0-9]|Facebot|Twitterbot|twitterbot|Pinterest|pinterest|Google.*snippet|baiduspider|rogerbot|embedly|quora\ link\ preview|showyoubot|outbrain|slackbot|vkShare|W3C_Validator)
RewriteCond %{HTTP_HOST} ^(.+?)\.%{HTTP_HOST}%\.com$
RewriteRule ^s/(.*)$ http://%1.%{HTTP_HOST}%.com/static.php?token=$1 [NC,L]
Can you test this out? By replacing your domain name with %{HTTP_HOST}%

rewriterule for multiple 301 redirects

I've tried looking my question up, but the closest answers I've found didn't work--especially since I'm VERY new to editing .htaccess files.
I have a site that has been programmed to dynamically generate copies of a page to fit a location. For instance, example.com/help/work/ was set up to make about 100 duplicates that look like this: example.com/help/work/?city=Washington&state=DC with the city and state dynamically changing with each page.There are tons of these variations and I want to 301 redirect all the pages with a city and state parameter so they point to the original page (example.com/help/work/).
After some research, I was able to find a RewriteRule that helped me do this on a page by page basis, but only with the homepage:
RewriteCond %{QUERY_STRING} ^city=Philadelphia&state=PA$
RewriteRule ^$ http://example.com/? [R=301,L]
With all that said, I have a two part question:
Is there a way to write this so that it targets subdirectory pages? (I could only get it to do the index)
Is there a way I can use a wildcard like (.*) in a single RewriteRule so example.com/help/work/?city=Washington&state=DC and all its city/state variations point to the original page (example.com/help/work/)?
I'm a bit confused on your request. It appears you want to point every city and state to this single page. http://example.com/help/work/ See if this is what you're looking for.
RewriteCond %{QUERY_STRING} city=.+&state=.+
RewriteRule ^([^/]+)/([^/]+)/?$ http://example.com/$1/$2/? [R=301,L]
Yes, You can use a regex capture group in Rewrite rule that captures the request_uri dynmically .
Like this
RewriteCond %{QUERY_STRING} ^city=.+&state=.+$
RewriteRule ^(.*)$ http://example.com/$1? [R=301,L]

310 Redirect a URL Containing a ? (question mark)

I am currently in the final stages of redeveloping a website however having some trouble redirecting the old blog links to the new format.
We have inbound links to the old blog in the form of:
Index Page
http://www.domain_name.co.uk/blog-page/
Needs to become
http://www.domain_name.co.uk/news/
This is easy enough and has been done by using
RewriteRule ^blog-page$ /news/ [R=301,L]
Profile page
http://www.domain_name.co.uk/blog-page/index.php?/archives/1541-title-of-the-blog.html
The above needs to link to
http://www.domain_name.co.uk/news/1541-title-of-the-blog
However the '?' in the middle of the URL structure appears to break my rewriterule. I have read online about QUERYSTRING however I do not believe this solves my issue as there are no actual parameters passed through in the URL
The below code works but passes through the '/?/archives/' info also.
RewriteRule ^blog-page/index.php(.*)$ /news/$1 [R=301,L]
Any help would be massively appreciated. There are several other sections of the previous site build which for some reason use the same URL structure.
You will need an additional rule for matching query string. Have your DOCUMENT_ROOT/.htaccess like this:
RewriteEngine On
RewriteBase /
RewriteCond %{QUERY_STRING} ^/archives/(.+?)\.html$ [NC]
RewriteRule ^blog-page/index\.php$ /news/%1? [R=301,NC,L]
RewriteRule ^blog-page$ /news/ [R=301,L,NC]

301 redirect get pagination

I'm trying to 301 redirect a paginated blog list from an old site onto a new url.
I think I'm getting pretty close with the RewriteRule but I'm not quite there yet, this is what I have:
RewriteCond %{QUERY_STRING} ^page=
RewriteRule ^(blog)?$ http://www.newdomain.com/news/page/$1? [R=301,L]
Using this rule if I go to
http://www.olddomain.com/blog?page=1
I currently get redirected to
http://www.newdomain.com/news/page/blog
I would like to be sent to
http://www.newdomain.com/news/page/1
I'm sure its just something small and simple that I'm missing.
Edit
Expanding on the solution below, I've added tags/category support to the rewrite rule using $1.
RewriteCond %{QUERY_STRING} ^page=([^&]+) [NC]
RewriteRule ^blog/tag/([^/\.]+)?$ http://www.newdomain.com/news/tag/$1/page/%1? [R=301,L,NC]
Few minor mistakes in your code.
You need to capture page parameter's value from query string first
Then use that capture value using % instead of $1
No need to capture blog since you don't need it.
Change your code with:
RewriteCond %{QUERY_STRING} ^page=([^&]+) [NC]
RewriteRule ^blog/?$ http://www.newdomain.com/news/page/%1? [R=301,L,NC]

How to redirect a first time visitor using htaccess

How to redirect a first time user to a special landing page using htaccess based on referrer? I mean if they came from another domain then they are the first time visitor?
I am really noob at url rewriting and explanation would be great .
Note: the landing page is nothing but a php script that detects browser. On that page I will use cookie, but need to redirect the user if the referrer is empty or its from another domain.
I suggest this :
RewriteEngine on
RewriteCond %{HTTP_REFERER} !^(www\.)?(https?://)?example\.com[NC]
RewriteCond %{REQUEST_URI} !^/welcome.html [NC]
Rewriterule ^(.*)$ http://example.com/welcome.html [r=307,L]
The first RewriteCond check if referer contains your domain name, and the second check if you are not just redirected by the RewriteRule.
The RewriteRule brings you to the welcome page as a [L]ast rule.
How about redirect the use if his referer is not your domain ?
RewriteEngine on
RewriteCond %{HTTP_REFERER} ^(www\.)?(https?://)?(?!example\.com) [NC]
Rewriterule ^(.*)$ http://example.com/welcome.html [r=307,NC]
That means that the user will be redirected to welcome.html if he writes example.com in the address bar or comes from a link in another site. Once on your site it won't be redirected anymore if he load another page in your site.
P.S. AFAIK you can use cookies in PHP that generates a plain html page see here
Edit: Update tested code
Excuse my reheating the old steak once more.. I would still be interested in knowing if anyone knows the solution to this problem - without using cookies or HTML5 features...
I have read here that the HTTP_REFERER might be blank. Is that why this method of redirecting is not good for this application? I have experimented with this on my server but the closest result working result was being always redirected to my landing page index.htm, which is not desired..
Could this rule interfere with other rewrite rules?
Also, there is an error in the former snippet:
And I think the NC flag in the latter snippet does not make sense. Should it not be L?
RewriteEngine on
RewriteCond %{HTTP_REFERER} !^(www\.)?(https?://)?example\.com[NC]
#missing space after .com and before [----------------here----^
RewriteCond %{REQUEST_URI} !^/welcome.html [NC]
Rewriterule ^(.*)$ http://example.com/welcome.html [r=307,L]
RewriteEngine on
RewriteCond %{HTTP_REFERER} ^(www\.)?(https?://)?(?!example\.com) [NC]
Rewriterule ^(.*)$ http://example.com/welcome.html [r=307,NC]
#Should this flag not be L? ------------------------------^

Resources