Can I 'noindex, follow' a specific page using x robots in .htaccess?
I've found some instructions for noindexing types of files, but I can't find instruction to noindex a single page, and what I have tried so far hasn't worked.
This is the page I'm looking to noindex:
http://www.examplesite.com.au/index.php?route=news/headlines
This is what I have tried so far:
<FilesMatch "/index.php?route=news/headlines$">
Header set X-Robots-Tag "noindex, follow"
</FilesMatch>
Thanks for your time.
It seems to be impossible to match the request parameters from within a .htaccess file. Here is a list of what you can match against: http://httpd.apache.org/docs/2.2/sections.html
It will be much easier to do it in your script. If you are running on PHP try:
header('X-Robots-Tag: noindex, follow');
You can easily build conditions on $_GET, REQUEST_URI and so on.
RewriteEngine on
RewriteBase /
#set env variable if url matches
RewriteCond %{QUERY_STRING} ^route=news/headlines$
RewriteRule ^index\.php$ - [env=NOINDEXFOLLOW:true]
#only sent header if env variable set
Header set X-Robots-Tag "noindex, follow" env=NOINDEXFOLLOW
FilesMatch works on (local) files, not urls. So it would try to match only the /index.php part of the url. <location> would be more appropriate, but as far as I can read from the documentation, querystrings are not allowed here. So I ended up with the above solution (I really liked this challenge). Although php would be the more obvious place to put this, but that is up to you.
The solution requires mod_rewrite, and mod_headers of course.
Note that you'll need the mod_headers module enabled to set the headers.
Though like others have said, it seems better to use the php tag. Does that not work?
According to Google the syntax would be a little different:
<Files ~ "\.pdf$">
Header set X-Robots-Tag "noindex, nofollow"
</Files>
https://developers.google.com/webmasters/control-crawl-index/docs/robots_meta_tag
Related
I have the below htacess code and am wondering how i can specify it to only apply to a specific folder. in this case, only all image files in: /public_html/img/icons/
<filesMatch ".(png)$">
Header set Cache-Control "max-age=31536000, public"
</filesMatch>
Set an environment variable (using SetEnvIf) when the URL requested matches the specific folder and image type(s). Set the Header conditionally based on this env var (using the env= argument of the Header directive).
For example:
# Cache images in the "/img/icons" subdirectory
SetEnvIf Request_URI "^/img/icons/.+\.(png|jpg|gif)$" ENABLE_CACHE
Header set Cache-Control "max-age=31536000, public" env=ENABLE_CACHE
This method allows you to keep all the directives in the single .htaccess file in the document root. It also works on all versions of Apache (unlike <If> expressions that require Apache 2.4).
However, this may also be dependent on other directives you have in the .htaccess file. If, for instance, you are on Apache (as opposed to LiteSpeed) and rewriting the URL with mod_rewrite (a front-controller pattern perhaps) then you may need to test for REDIRECT_ENABLE_CACHE in the Header directive instead. (Or change your existing directives to prevent a loop of the rewrite engine, that results in the env var being prefixed with REDIRECT_.)
my payment gateway is blocked by mod_security when trying to access Woocommerce endpoint.
receiving 403 permission denied when trying to access the "/wc-api/my_gateway_payment_callback" endpoint.
im on an Litespeed shared host.
when disabling the mod_security from .htaccess
<IfModule mod_security.c>
SecFilterEngine Off
SecFilterScanPOST Off
</IfModule>
it solves the issue but exposes Wordpress admin to attacks, so i want to be more specific.
i tried to add a LocationMatch
<LocationMatch "/wc-api/my_gateway_payment_callback">
<IfModule mod_security.c>
SecRule REQUEST_URI "#beginsWith /wc-api/my_gateway_payment_callback/" \"phase:2,id:1000,nolog,pass, allow, msg:'Update URI accessed'"
</IfModule>
</LocationMatch>
or
<IfModule mod_security.c>
SecRule REQUEST_URI "#beginsWith /my_gateway_payment_callback" \"phase:2,id:1000,nolog,pass, allow, msg:'Update URI accessed'"
</IfModule>
but they dont work and im still getting the 403 error.
I can spot multiple problems here:
<IfModule mod_security.c>
SecFilterEngine Off
SecFilterScanPOST Off
</IfModule>
Are you really using ModSecurity v1? That is VERY old and suggests you are using Apache 1 as ModSecurity v1 is not compatible with ModSecurity v1. If not this should be:
<IfModule mod_security2.c>
SecRuleEngine Off
</IfModule>
Next you say:
it solves the issue but exposes Wordpress admin to attacks
I don't see how it can solve the issue unless you are on REALLY old software, so suspect this is a red herring.
so i want to be more specific. i tried to add a LocationMatch
Good idea to be more specific. However LocationMatch runs quite late in Apache process - after ModSecurity rules will have run so this will not work. However you don’t really need LocationMatch since your rule already scopes it to that location. So let’s look at the next two pieces:
SecRule REQUEST_URI "#beginsWith /wc-api/my_gateway_payment_callback/" \"phase:2,id:1000,nolog,pass, allow, msg:'Update URI accessed'"
SecRuleRemoveById 3000
You shouldn't need to remove the rule if you allow it on the previous lines. Typically you would only do one or the other.
or
<IfModule mod_security.c>
SecRule REQUEST_URI "#beginsWith /my_gateway_payment_callback" > \
"phase:2,id:1000,nolog,pass, allow, msg:'Update URI accessed'"
</IfModule>
but they dont work and im still getting the 403 error.
You have pass (which means continue on to the next rule) and allow (which means skip all future rules). It seems to me you only want the latter and not the former. As these are conflicting, I suspect ModSecurity will action the former first hence why it is not working.
However the better way is to look at the Apache error logs to see what rule it's failing on (is it rule 3000 as per your other LocationMatch workaround?) and just disable that one rule rather than disable all rules for that route.
So all in all I'm pretty confused with your question as seems to be a lot of inconsistencies and things that are just wrong in there...
I'm using Joomla 3.4.5 with SEF and .htacess with cache on.
The image URLs in modules and content are being incorrectly rendered. Sometimes, they are correctly displayed and some other times the following happens:
Instead of rendering:
www.domain.com/images/image.jpg
It is rendering:
www.domain.com/menu_item/images/image.jpg
I'm not sure why this is happening and if it is related wiht .htaccess or SEF, cache or everything.
I'm using the following custom redirects:
RewriteCond %{HTTP_HOST} !^www\.
RewriteRule ^(.*)$ https://www.%{HTTP_HOST}/$1 [R=301,L]
RewriteEngine On
RewriteRule ^item/(.+)$ /artigos/$1 [R=301,L]
<FilesMatch "\.(ico|pdf|flv|jpg|jpeg|png|gif|js|css|swf)$">
Header set Cache-Control "max-age=2592000, public"
</FilesMatch>
Which I tried to comment out to check if it helped, but no luck.
Help anyone?
its about the image path.
for example:
/images/image1.jpg
refers to image in /website-rootfolder/images/image1.jpg
but
images/image1.jpg
refers to: currentUrl/images/image1.jpg
So with slash it means: load image from /root
without slash it means: load image from /current url
hope it helps ;)
So for ur example:
www.domain.com/menu_item/images/image.jpg = images/images/image.jpg
just add the remaining slash / (/images/images/image.jpg) and your image on the page is getting loaded from:
www.domain.com/images/image.jpg
In your backend, go to:
Components -> JCE Editor -> Editor Profiles -> Default-> Editor Parameters
and make sure that:
"File Directory Path" is set to "images" (without the quotes).
Save it and then clear your Joomla cache.
If that still doesn't work, then maybe the ordering of your plugins is wrong. Check this post we have written about 4 years ago (it is very old, but it is still valid): http://www.itoctopus.com/images-not-appearing-on-your-joomla-website .
Hope this helps!
Is it possible to apply HTTP header directives based on the URL's query string using an apache .htaccess?
For example, based on this resource http://code.google.com/web/controlcrawlindex/docs/robots_meta_tag.html under the section titled "Practical implementation of X-Robots-Tag with Apache" it says the following .htaccess file directive can be used:
<Files ~ "\.pdf$">
Header set X-Robots-Tag "noindex, nofollow"
</Files>
I'm looking for something along the lines of:
<QueryString ~ "m=_!">
Header set X-Robots-Tag "noindex, nofollow"
</QueryString>
This way the following URL would NOT get indexed by search engines:
http://domain.com/?m=_!ajax_html_snippet
Any hints/tips/clues would be much appreciated. Thanks.
You can try the following in your .htaccess file
#modify query string condition here to suit your needs
RewriteCond %{QUERY_STRING} (^|&)m=_\! [NC]
#set env var MY_SET-HEADER to 1
RewriteRule .* - [E=MY_SET_HEADER:1]
#if MY_SET_HEADER is present then set header
Header set X-Robots-Tag "noindex, nofollow" env=MY_SET_HEADER
I was told that this is the right way to redirect anyone who is trying to open:
/users/username/something.txt
But i can't seem to get it work.
RewriteEngine on
RewriteRule \.txt$ /notallowed.html [F,L,NC]
Is this wrong?
The simplest way to deny users from all TXT files would be to use something like:
<FilesMatch "\.(txt)$">
Order Allow,Deny
Deny from all
</FilesMatch>
However, the code you have there should work for all intents and purposes. Depending on your server configuration, however, you may need to add "Options +FollowSymLinks".
If you decide to go the FilesMatch route, you can use ErrorDocument to control what page the user is taken to.