Restrict indexing and remove current indexes - .htaccess

I have looked through few questions on this topic, but I am still not sure if I am getting this right.
I have got a php file which returns xml/json responses based on the GET parameters.
http://someDomain.com/get.php?param=option1
Google has indexed quite a few of those urls already.
As I understand I can restrict robots from indexing any further urls on someDomain.com by adding someDomain.com/robots.txt:
User-agent: *
Disallow: *
I understand that by adding robots.txt search engines will not be able to see the noindex meta to remove the current indexed urls.
But the get.php is not returning any meta/head information anyway, becuse it only returns json/xml data.
So how can I get google to remove the already indexed urls from search results?

Try the following code in htaccess :
RewriteEngine On
#If user agent is "googlebot"
RewriteCond %{HTTP_USER_AGENT} googlebot [NC]
#And query string is "param=anychar"
RewriteCond %{QUERY_STRING} ^param=(.+)$ [NC]
#Then 301 redirect "get.php" to "/backwhole"
RewriteRule ^get\.php$ /backwhole [L,R=301]

Related

How to remove GET parameters from url with htaccess?

My website do not use any GET parameters except on one page. Nonetheless, I can see that Google managed to index a bunch of my pages with GET parameters. This is not great for SEO (duplicate content)...
So I'm trying to edit my .htaccess to do 301 redirects between all urls with GET parameters to url without GET parameters (except for one url). Some examples:
example.com/?foo=42 => example.com/
example.com/about?bar=42 => example.com/about
example.com/r.php?foobar=42 => the url r.php should keep the GET parameters
So far I'm trying to remove all GET parameters, and it doesn't work.
RewriteEngine On
RewriteRule ^(.*)\?(.*)$ http://www.example.com/$1 [L,NC,R=301]
Any idea how to fix that?
You cannot match query string using RewriteRule.
You can use this generic rule to remove all query string except for requests that have DOT:
RewriteEngine On
RewriteCond %{QUERY_STRING} .
RewriteRule ^([^.]*)$ /$1? [L,NE,R=301]

Redirect for unwanted GET params

A search bot is scanning pages on my site with a lot of strange GET params right now. For example ?x?, ?728%10%02, ?%18%9B%D9%DF%05 etc. I don't know where the bot found that urls but it makes my cpu to smoke because a cache system doesn't process urls with GET params.
I have no ability to modify cache system, but i want to redirect requests with GET params to the same url without GET params through .htaccess. But I have some important GET params that shoudn't be redirected. It's ?s=... for site search and utm labels.
In summary I want to redirect the following urls
/some-url?x?
/some-url?728%10%02
/some-url?%18%9B%D9%DF%05
and a lot of others GET params to
/some-url
But keep untouched urls like this:
/some-url?s=searh_term or
/some-url?utm_campaign=my_campaign
If you've a selected number of GET parameters possible, then you can check against them in your htaccess file, and redirect all requests without the allowed parameters.
RewriteEngine On
# check that there is indeed a query string
RewriteCond %{QUERY_STRING} ^.+$
# check that it doesn't start with one of allowed parameters
RewriteCond %{QUERY_STRING} !^(utm_campaign|s|other|parameters|list)= [NC]
RewriteRule ^(.*)$ /$1? [R=301,L]

310 Redirect a URL Containing a ? (question mark)

I am currently in the final stages of redeveloping a website however having some trouble redirecting the old blog links to the new format.
We have inbound links to the old blog in the form of:
Index Page
http://www.domain_name.co.uk/blog-page/
Needs to become
http://www.domain_name.co.uk/news/
This is easy enough and has been done by using
RewriteRule ^blog-page$ /news/ [R=301,L]
Profile page
http://www.domain_name.co.uk/blog-page/index.php?/archives/1541-title-of-the-blog.html
The above needs to link to
http://www.domain_name.co.uk/news/1541-title-of-the-blog
However the '?' in the middle of the URL structure appears to break my rewriterule. I have read online about QUERYSTRING however I do not believe this solves my issue as there are no actual parameters passed through in the URL
The below code works but passes through the '/?/archives/' info also.
RewriteRule ^blog-page/index.php(.*)$ /news/$1 [R=301,L]
Any help would be massively appreciated. There are several other sections of the previous site build which for some reason use the same URL structure.
You will need an additional rule for matching query string. Have your DOCUMENT_ROOT/.htaccess like this:
RewriteEngine On
RewriteBase /
RewriteCond %{QUERY_STRING} ^/archives/(.+?)\.html$ [NC]
RewriteRule ^blog-page/index\.php$ /news/%1? [R=301,NC,L]
RewriteRule ^blog-page$ /news/ [R=301,L,NC]

How to target urls with specific characters in a query string for htaccess redirects

I have a site that was converted to ExpressionEngine from a different blog platform, and I'm getting a bunch of crawl errors from previously indexed urls that now lead to an error page because ExpressionEngine doesn't allow certain characters in urls.
The urls that are causing the errors follow one of three patterns:
http://www.example.com/general/404/?404%3Bhttp://old.example.com:80/old-blog/random/segments
or
http://www.example.com/blog/?404%3Bhttp://old.example.com:80/old-blog/random/segments
or
http://www.example.com/blog/Default.aspx?404;http://old.example.com:80/old-blog/random/segments
I was able to redirect the urls from the third example using this code:
RewriteRule ^blog/Default.aspx?/?$ http://www.example.com/general/404/? [L,R=301]
Is there a way I can intercept the other URLS with htaccess before they hit EE and redirect them to my 404 page: http:www.example.com/general/404/? I'm not sure how to target them specifically since there is nothing before the ? in the query string url segment.
Try:
RewriteCond %{QUERY_STRING} ^404(%3B|;) [NC]
RewriteRule ^ http://www.example.com/general/404/? [L,R=301]

I changed the structure of my site to reach index cards

Excuse me for my english.
I make a brands directory web site.
Before to acces to the brands pages I use requests like this :
mydomain.com/fiche.php?id=115
where id is the id of the brand in my directory
I change the structure of the brands pages and now use this request:
mydomain.com/annuaire.php?type=fiche&id_marq=115
where id has become id_marq
I try to use a rewritebrule like this:
RewriteRule ^fiche.php$ http://www.annuaire-sites-officiels.com/annuaire.php?detail=fiche&id_marq=$1 [L,QSA,R=301]
to redirect the old links to the new pages but result dont pass the id_marq value and the url is:
http://www.annuaire-sites-officiels.com/annuaire.php?detail=fiche&id_marq=&id=115
&id= is too.
What am I doing wrong?
Your rule is not evaluating query string and that's why its not capturing id query parameter.
Change your code to:
Options +FollowSymLinks -MultiViews
# Turn mod_rewrite on
RewriteEngine On
RewriteBase /
RewriteCond %{QUERY_STRING} ^id=([^&]+) [NC]
RewriteRule ^fiche\.php$ /annuaire.php?detail=fiche&id_marq=%1 [R=302,L,QSA,NC]
Once you verify it is working fine, replace R=302 to R=301. Avoid using R=301 (Permanent Redirect) while testing your mod_rewrite rules.
Check out Regex Back Reference Availability:
You have to capture the query string. [QSA] passes it forward unaltered, so unless you're using id for anything you don't need that bit of code. Your 301 redirect is correct since this is a permanent redirect. Remember if you add a failed redirect your browser may cache that redirect so it might not look like it's working.
In this string match I'm only catching numbers to prevent someone from passing something like an asterisk * and XSS exploiting your site.
I've not included and [NC] matches in my code because when you allow multiple cases they can seem like different URLs to search engines (bad for SEO).
RewriteCond %{QUERY_STRING} id=([0-9]+)
RewriteRule ^fiche.php$ http://%{HTTP_HOST}/annuaire.php?detail=fiche&id_marq=%1 [R=301,L]

Resources