Regarding htaccess and robots.txt - .htaccess

Hi this is my first question in Stackoverflow please can you help. It regards htaccess files and robot.txt files. In October I created a WordPress website from what was previously a non-WordPress site. I had built the new site on a sub-domain of the existing site so the live site could remain live whilst I built the new one.
The site I built on the subdomain is live but I am concerned about the old htaccess files and robots txt files as to whether I should delete them; I created new htaccess and robots.txt files on the new site and have left the old htaccess files there. Just to mention that all the old content files are still sat on the server under a folder called 'old files' so I am assuming that these aren't affecting matters. Here are the contents of each file:
I access the htaccess and robots.txt files by clicking on 'public html' via ftp filezilla. The site I built (htaccess details below). W3TC is a wordpress caching plugin which I installed just a few days ago so I am not querying anything here about W3TC:
# BEGIN W3TC Browser Cache
<IfModule mod_deflate.c>
<IfModule mod_headers.c>
Header append Vary User-Agent env=!dont-vary
</IfModule>
<IfModule mod_filter.c>
AddOutputFilterByType DEFLATE text/css text/x-component application/x-javascript application/javascript text/javascript text/x-js text/html text/richtext image/svg+xml text/plain text/xsd text/xsl text/xml image/x-icon application/json
<IfModule mod_mime.c>
# DEFLATE by extension
AddOutputFilter DEFLATE js css htm html xml
</IfModule>
</IfModule>
</IfModule>
# END W3TC Browser Cache
# BEGIN W3TC CDN
<FilesMatch "\.(ttf|ttc|otf|eot|woff|font.css)$">
<IfModule mod_headers.c>
Header set Access-Control-Allow-Origin "*"
</IfModule>
</FilesMatch>
# END W3TC CDN
# BEGIN W3TC Page Cache core
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteCond %{HTTP:Accept-Encoding} gzip
RewriteRule .* - [E=W3TC_ENC:_gzip]
RewriteCond %{HTTP_COOKIE} w3tc_preview [NC]
RewriteRule .* - [E=W3TC_PREVIEW:_preview]
RewriteCond %{REQUEST_METHOD} !=POST
RewriteCond %{QUERY_STRING} =""
RewriteCond %{REQUEST_URI} \/$
RewriteCond %{HTTP_COOKIE} !(comment_author|wp\-postpass|w3tc_logged_out|wordpress_logged_in|wptouch_switch_toggle) [NC]
RewriteCond "%{DOCUMENT_ROOT}/wp-content/cache/page_enhanced/%{HTTP_HOST}/%{REQUEST_URI}/_index%{ENV:W3TC_PREVIEW}.html%{ENV:W3TC_ENC}" -f
RewriteRule .* "/wp-content/cache/page_enhanced/%{HTTP_HOST}/%{REQUEST_URI}/_index%{ENV:W3TC_PREVIEW}.html%{ENV:W3TC_ENC}" [L]
</IfModule>
# END W3TC Page Cache core
# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>
# END WordPress
I have 7 redirects in place to new page urls and I have no issue with these I have tested and each one works.
#Force non-www:
RewriteEngine on
RewriteCond %{HTTP_HOST} ^www\.websiteurl\.co.uk [NC]
RewriteRule ^(.*)$ http://websiteurl/$1 [L,R=301]
The previous site (htaccess for the old site):
Deny from all
The site I built (Robots.txt):
User-agent: *
Disallow: /wp-admin/
Sitemap:
http://websitehomepageurl/sitemap_index.xml
The previous site (Robots.txt):
User-agent: *
Disallow:
Please can you assist. I'd really appreciate your time.
Thanks a lot.

Remove the old robot.txt and htaccess.

Hi thanks for the somewhat minimal response. I got help elsewhere. I added a robots.txt file to the development site so bots aren't allowed. I did a redirect for all attachments to their original page. All other files are in place. I will leave it there. To the guy who did reply, thanks. But to say all I had to do was to just delete the old robot and htaccess files was incorrect because they are still needed in the grand scheme of things. Stackoverflow has a really good reputation online so when helping others try to explain so that they can understand your logic behind your advice. I am glad I did not take your advice because I could have been looking at a larger problem to fix. Have a good day.

A little follow up tip: In addition to the blocking of content via robots.txt I would suggest that you use ON EACH PAGE
meta content="noindex,noarchive,nofollow" name="robots" (you will need to add the < and closing tag to this).
The reason is that some bots do not take into account the robots.txt content.
Also I would NEVER allow people or bots to see old htaccess files !! You risk serious security issues if people can read your htaccess content.

Related

Htaccess order of mod modules

I'm writing a script that will alter the roots .htaccess file to block user agents (yandex in this example). I wrote it to use the setenvif command but it wasn't working. Via another post here I was told it was because a redirect prior to the command was causing it to fail due to the .htaccess file being reloaded. I added a rewritecond statement to do the block and it worked. If I place the setenvif statement at the top of the file it also works correctly.
In researching this others have stated that mod_rewrite is handled before mod_alias but I can't find any mention of when mod_setenvif is handled. There also seems to be little information regarding the order of things in the .htaccess file in general. Below is an example of the things in my .htaccess file. Even as a novice I can see it is not ordered well.
But before I deal with that I wanted to see where to properly place the setenvif statement?
And am I correct in assuming that any setenvif, rewritecond and deny statement that causes a block or redirect to another site should be placed at the top?
Options +FollowSymLinks -Indexes
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} "WOW32" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "WOW64" [NC]
RewriteRule ^(.*)$ - [F]
RewriteCond %{SERVER_PORT} 80
RewriteRule ^(.*)$ https://example.com/$1 [R,L]
redirect 301 /about_us.php https://example.com/about.html
<Files test.html>
order allow,deny
</Files>
<FilesMatch "\.(inc|tpl|h|ihtml|sql|ini|conf|class|bin|spd|theme|module|exe)$">
</FilesMatch>
<IfModule mod_expires.c>
ExpiresActive On
ExpiresByType image/jpg "access plus 1 year"
</IfModule>
<IfModule mod_deflate.c>
AddOutputFilterByType DEFLATE application/javascript
BrowserMatch \bMSIE !no-gzip !gzip-only-text/html
Header append Vary User-Agent
</IfModule>
ErrorDocument 400 /server_error.php?id=400
order allow,deny
deny from 1.2.3.4/32
allow from all
SetEnvIfNoCase User-Agent "^yandex$" my_block
<IfModule !authz_core_module>
Order Allow,Deny
Allow from ALL
Deny from env=my_block
</IfModule>
RewriteCond %{HTTP_USER_AGENT} ^.*(yandex).*$ [NC,OR]

How do you rewrite http to https for an addon domain in htaccess

I have a cPanel hosting account at Godaddy with an SSL for an addon domain. I need to rewrite http to https. The problem is that all methods of rewrite are loading from the webroot, not the folder containing the index or even the htaccess that has the rewrite rule. I know there are topics on this already on this forum but I have tried those suggested solution, that's how I have what you see in the code quote, and they did not work. Please do not arbitrarily close this thread.
webroot (loading this content / not desired)
|
target_directory (htaccess & SSL in question)
The following is the complete htaccess at the time of this post
RewriteEngine On
RewriteBase /
RewriteCond %{HTTPS} off
#RewriteRule (.*)$ https://%{HTTP_HOST}%{REQUEST_URI} [R,L]
RewriteRule (.*)$ https://emeraldcoast\.rocks/$1 [R,L]
RewriteRule ^([a-zA-Z0-9\-_]+)$ index.php?page=$1
RewriteRule ^([a-zA-Z0-9\-_]+)/$ index.php?page=$1
<filesMatch ".(xml|txt|html|js|css)$">
ExpiresDefault A7200
Header append Cache-Control "max-age=290304000, public"
</filesMatch>
<filesMatch ".(xml|txt|html|php)$">
AddDefaultCharset utf-8
</filesMatch>
<ifmodule mod_deflate.c>
AddOutputFilterByType DEFLATE text/text text/html text/plain text/xml text/css application/x-javascript application/javascript
</ifmodule>
Using a control panel especially cpanel takes control of the server and configures things a lot differently than a standard setup. Anyway, you might need add a rewritebase to your addon domain. Inside .htaccess for the addon domain change the rewritebase to your addon folder.
RewriteBase /target_directory

301 redirect old directory contents to new directory

My site used to have lots of company listings under:
www.example.com/Directory <!-- old directory homepage
www.example.com/Directory/viewprofile.php?id=1599&company=test+company
I've now changed the entire CMS. The new profiles are as follows now:
www.example.com/directory <!-- directory homepage
www.example.com/listing/listing/rennicks-mts/ <!-- individual listing page
1. I just want to get an idea if my 301 redirect is correct for this new setup.
Redirect 301 ^/Directory http://www.highwaysindustry.com/directory/
Also, are my redirects meant to be wrapped in anything first? The entire thing is as follows:
# BEGIN W3TC Browser Cache
<IfModule mod_deflate.c>
<IfModule mod_headers.c>
Header append Vary User-Agent env=!dont-vary
</IfModule>
AddOutputFilterByType DEFLATE text/css text/x-component application/x-javascript application/javascript text/javascript text/x-js text/html text/richtext image/svg+xml text/plain text/xsd text/xsl text/xml image/x-icon application/json
<IfModule mod_mime.c>
# DEFLATE by extension
AddOutputFilter DEFLATE js css htm html xml
</IfModule>
</IfModule>
# END W3TC Browser Cache
# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteRule ^([_0-9a-zA-Z-]+/)?uploads/wpjobboard/application/(.+) /wp-content/plugins/wpjobboard/restrict.php?url=application/$2 [QSA,L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>
# END WordPress
Redirect 301 ^/Directory http://www.newwebsite.com/directory/
Redirect 301 /About-Us http://www.newwebsite.com/about
Redirect 301 /Uploads http://www.newwebsite.com/
Redirect 301 /News/?page=1 http://www.newwebsite.com/
Redirect 301 /News/?page=2 http://www.newwebsite.com/
Redirect 301 /News/?page=3 http://www.newwebsite.com/
Redirect 301 /News/?page=4 http://www.newwebsite.com/
Redirect 301 /News/?page=5 http://www.newwebsite.com/
Redirect 301 /News/?page=6 http://www.newwebsite.com/
2. All my redirect just immediately start, should they be wrapped in any other tags?
**The domain is the same after the CMS change. Did not change domain.
In this instance, you can't use a Redirect directive along with your mod_rewrite rules. It's because Redirect is part of mod_alias, and mod_alias gets applied before mod_rewrite gets applied, and both will get applied to every request. Thus, you'll end up with some weird results. Additionally, you can't match against the query string (e.g. the ?page= stuff) in a Redirect, you need to use the %{QUERY_STRING} variable in a condition.
Try something like this:
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^Directory http://www.newwebsite.com/directory/ [L,R=301]
RewriteRule ^About-Us http://www.newwebsite.com/about [L,R=301]
RewriteRule ^Uploads http://www.newwebsite.com/ [L,R=301]
RewriteCond %{QUERY_STRING} ^page=[1-6]$
RewriteRule ^News/$ http://www.newwebsite.com/ [L,R=301]
RewriteRule ^index\.php$ - [L]
RewriteRule ^([_0-9a-zA-Z-]+/)?uploads/wpjobboard/application/(.+) /wp-content/plugins/wpjobboard/restrict.php?url=application/$2 [QSA,L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>
# END WordPress
And get rid of all those Redirect directives at the end

Removing index.php from Oxwall URLs

I've just installed Oxwall (oxwall.org) on my shared 1and1 host and would like to remove "index.php" from all URLs.
Essentially PHP needs to see this
http://site.com/index.php/Page/Param1/Param2
While I want the user to see this
http://site.com/Page/Param1/Param2
I understand this should be done using the .htaccess file, but I don't know how to do it...
Currently, .htaccess file looks like this:
Options +FollowSymLinks
RewriteEngine On
AddEncoding gzip .gz
AddEncoding gzip .gzip
<FilesMatch "\.(js.gz|js.gzip)$">
ForceType text/javascript
</FilesMatch>
<FilesMatch "\.(css.gz|css.gzip)$">
ForceType text/css
</FilesMatch>
RewriteCond %{HTTP_HOST} !oxwall.(host).com$
RewriteCond %{REQUEST_URI} !^/index\.php
RewriteCond %{REQUEST_URI} !/ow_updates/index\.php
RewriteCond %{REQUEST_URI} !/ow_updates/
RewriteCond %{REQUEST_URI} !/ow_cron/run\.php
RewriteCond %{REQUEST_URI} (/|\.php|\.html|\.htm|\.xml|\.feed|robots\.txt|\.raw|/[^.]*)$ [NC]
RewriteRule (.*) index.php
Any suggestions?
I've found similar messages on stackoverflow, but I wasn't able to apply the proposed fixes to my .htaccess file...
Thank you very much for your time and support!
Found the solution!
Simply used the htaccess code mentionned in this topic: http://www.oxwall.org/forum/topic/7924
Hope this helps others!

<filesmatch> for only one page

I have a need to have only the home page have the following Filesmatch where .pdf files need to ask before opening up, but other .pdf files in the site (not on the home page) I want to open automatically. Is this the best way to accomplish this?
This is a Wordpress site, latest version, and the home page is a set page. Also, the owner of the site could add more .pdf links within the site so I don't want to complicate it too much where she has to manipulate code every time she adds a file.
Here is what is currently in my .htaccess file:
# Use PHP5 as default
AddHandler application/x-httpd-php5 .php
ErrorDocument 404 /index.php
ErrorDocument 403 /index.php
#BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>
RewriteEngine On
RewriteCond %{SERVER_PORT} 80
RewriteRule ^(.*)$ https://www.dramanotebook.com/$1 [R,L]
<Files 403.shtml>
order allow,deny
allow from all
</Files>
<FilesMatch "\.pdf$">
ForceType applicaton/octet-stream
Header set Content-Disposition attachment
</FilesMatch>
# END WordPress
Thanks in advance,
Jim
I was able to accomplish this by using <Files></Files>.
example:
<Files "filenameexample.pdf">
ForceType applicaton/octet-stream
Header set Content-Disposition attachment
</Files>
The trick is you need to add a <files></files> for each file you want to individually control. There are no options to say "all PDF's on this page be protected, but other pages do something else". Actually, you could..but you would have to put an .htaccess file in each folder to "revert" it back to another way for those extensions. Crazy. Faster just to do the 3 files I needed specifically.
I found none of the information online helpful nor any responses. I just tested a lot.
Jim

Resources