Block search spiders for specific directory with parameters - .htaccess

I'm trying to write .htaccess rewrite for page with categories and search filters.
I want to disallow the special places with .htaccess . I have already specified places in robots.txt, but spiders still crawling the places.
Places i want to allow to crawl:
www.domain.com/path1.html
www.domain.com/path1/path2.html
www.domain.com/path1/path2/path3.html
www.domain.com/path1/path2/path3.html
www.domain.com/path4/path5.html
Places i want to disallow to crawl:
www.domain.com/path1.html?search[param1]=value&...
www.domain.com/path1/path2.html?search=param2&...
www.domain.com/path1/path2/path3.html?searchHash=param3
As i understand .htaccess code for search param will look, something like this, but it's not correct and I'm stack..
RewriteCond %{HTTP_USER_AGENT} (googlebot|bingbot|yandex) [NC]
RewriteRule ^(.*).html\?search=.*$ http://www.domain.com/$1 [R=301,L]

No you cannot match QUERY_STRING in RewriteRule. You need to use RewriteCond %{QUERY_STRING} like this:
RewriteCond %{QUERY_STRING} ^search=.+ [NC]
RewriteCond %{HTTP_USER_AGENT} (googlebot|bingbot|yandex) [NC]
RewriteRule ^(.+?\.html)$ http://www.domain.com/$1 [R=301,L,NC]

Related

htaccess - syntax to remove multiple query strings?

I would like to remove certain URL parameters from my site, so Googlebot doesn't get confused & thinks it's duplicate content.
The parameters are:
?sort=
?limit=
?order=
Based on some examples I've come across, here's what I'm currently using in .htaccess:
RewriteCond %{QUERY_STRING} "sort=" [NC]
RewriteRule (.*) /$1? [R=301,L]
RewriteCond %{QUERY_STRING} "limit=" [NC]
RewriteRule (.*) /$1? [R=301,L]
RewriteCond %{QUERY_STRING} "order=" [NC]
RewriteRule (.*) /$1? [R=301,L]
What is the proper syntax to combine these parameters into one rule?
It is not a good solution to remove the parameters if you need them.
The best way to avoid problems related to duplicate content, is to add in the html <head>:
<link rel="canonical" href="http://www.domain.com/url-file.php?param=xxx">
By indicating the complete url of the page, with the only parameters you want to index by Google.
You can use alternation in regex:
RewriteCond %{QUERY_STRING} ^(limit|sort|order)= [NC]
RewriteRule ^ %{REQUEST_URI}? [R=301,L,NE]

Multipart htaccess redirect

Hey guys I'm having a bit of trouble getting my htaccess to redirect properly and was hoping for some help.
I'm expecting DEV-domain.com?CampID=AB12345 to redirect to
http://DEV-www.domain.com/landing/external-marketing/direct-mail/AB?CampId=AB12345
RewriteCond %{HTTP_HOST} ^DEV-(www\.)?domain\.com [NC]
RewriteCond %{QUERY_STRING} ^CampID=
RewriteRule (\w{2})(\w{5})$ http://DEV-www\.domain\.com/landing/external-marketing/direct-mail/$1?CampId=$1$2 [R=301,L]
Unfortunetly I can't get it working for some reason?
Because the RewriteRule matching is meant for the url path, not query strings. Try this:
RewriteCond %{HTTP_HOST} ^DEV-(www\.)?domain\.com [NC]
RewriteCond %{QUERY_STRING} ^CampID=(\w{2})(\w{5})
RewriteRule .* http://DEV-www.domain.com/landing/external-marketing/direct-mail/%1?CampId=%1%2 [R=301,L]
also you don't need to escape dots . in the target url, only in matching patterns. And be aware that if you decide to make your target url CampID instead of CampId, you need to put in another condition:
RewriteCond %{REQUEST_URI} !^/landing/external-marketing/direct-mail/
to avoid an infinite redirect as a target with CampID would match your RewriteCond rule...

Strip query_string

I want to get rid of some query string on my whole website (explicitly facebook shared/like query which usually begin with fb_action).
I thought about using .htaccess to do that.
I want this:
Link 1)
http://example.com/documents/de_desmarais_en_sirois?fb_action_ids=10151430962018298&fb_action_types=og.likes&fb_source=timeline_og&action_object_map={%2210151430962018298%22%3A157643874359578}&action_type_map={%2210151430962018298%22%3A%22og.likes%22}&action_ref_map=[]
to look like:
Link 2)
http://example.com/documents/de_desmarais_en_sirois
In my .htaccess, I added:
RewriteCond %{QUERY_STRING} fb_action
RewriteRule ^(.*) /$1? [R=301,L]
But, when I go on the original link, I am directed to my root folder example.com/pages.php, but I want to keep the first part of the URL which is example.com/documents/de_desmarais_en_sirois. I don't want to go to my root.
What should I modify/do?
I assume you've added those rules to the htaccess file in the /documents/ directory?
You'll need to remove the leading slash in your rule's target and add a rewrite base:
RewriteBase /documents/
RewriteCond %{QUERY_STRING} fb_action
RewriteRule ^(.*) $1? [R=301,L]
The query string match could be improved a bit though:
RewriteCond %{QUERY_STRING} (.*)(^|&)fb_[^&]+(.*)$
RewriteRule ^(.*) $1?%1%2 [R=301,L]
RewriteCond %{QUERY_STRING} (.*)(^|&)action_[^&]+(.*)$
RewriteRule ^(.*) $1?%1%2 [R=301,L]
To remove only the facebook query string parameters, if you have other parameters that you want to preserve.

.htaccess protect urls with query string combinations

I need to protect a "single logical" url in a Joomla CMS with htaccess. I found here .htaccess code to protect a single URL?
this solution, which works great for a specific url:
RewriteEngine On
RewriteCond %{REQUEST_URI} ^/index\.php$
RewriteCond %{QUERY_STRING} ^option=com_content&task=view&id=76$
RewriteRule ^(.*)$ /secure.htm
However, how can I make sure that the url parts can't be swapped around or amended, therefore circumventing the secure access. For example I don't want to allow access to
option=com_content&task=view&id=76&dummy=1
option=com_content&id=76&task=view
either.
I have tried this, which doesn't seem to work:
RewriteEngine On
RewriteCond %{REQUEST_URI} ^/index\.php$
RewriteCond %{QUERY_STRING} option=com_content
RewriteCond %{QUERY_STRING} task=view
RewriteCond %{QUERY_STRING} id=76
RewriteRule ^(.*)$ /secure.htm
Your rules work fine for me when I go to any of these URLs:
http://localhost/index.php?blah=blah&option=com_content&task=view&id=76
http://localhost/index.php?option=com_content&task=view&id=76&dummy=1
http://localhost/index.php?option=com_content&id=76&task=view
I get served the content at /secure.htm
However, you could add sound boundaries to your query string rules:
RewriteCond %{QUERY_STRING} (^|&)option=com_content(&|$)
RewriteCond %{QUERY_STRING} (^|&)task=view(&|$)
RewriteCond %{QUERY_STRING} (^|&)id=76(&|$)
So that you don't end up matching something like id=761

Specific .htaccess redirect based on query_string

I need this logic to work:
I want rewrite this string for users to see
http://mysite.com/index.php?cl=mykeystring
to
http://mysite.com/otherkey/
http://mysite.com/index.php?myvar=test&cl=mykeystring&mysecondvar=morevalue
to
http://mysite.com/otherkey/myvar=test&mysecondvar=morevalue
But when http://mysite.com/otherkey/ is written, so load
http://mysite.com/index.php?cl=mykeystring, but no redirects will be done.
Is it possible? There are no possibility to change anything in codes, but only .htaccess
This logic is nearly realized by this code:
RewriteCond %{QUERY_STRING} ^(.*?)cl=mykeystring(.*?)$ [NC]
RewriteRule ^index.php$ /otherkey/%1%2? [R,L]
RewriteRule ^otherkey/(.*?)$ /index.php?cl=mykeystring&$1
but im getting some not needed amp symbols on first rewrite rule. any solutions?
I think you can do something like this:
RewriteCond %{QUERY_STRING} cl=mykeystring [NC]
RewriteRule ^index.php /otherkey/ [QSA]
Docs here: http://httpd.apache.org/docs/2.0/misc/rewriteguide.html

Resources