I have a WordPress website with the basic structure: the URL keyword separator symbols are /. The problem is that the pages I create can be accessed using the / or + symbols in the URL.
I mean, I can access the same page in mydomain.com/example-page/ and mydomain.com/example+page/. I know that this is harmful for SEO so I make a question: is it possible to set, via htaccess, a noindex nofollow order to all the pages that uses the + symbol separator in the URL?
If you have a better solution, I will be grateful!
You can use (before your actual htaccess code):
RewriteEngine on
# executes repeatedly as long as there are more than 1 spaces in URI
RewriteRule "^(\S*)\s+(\S*\s.*)$" /$1-$2 [L,NE]
# executes when there is exactly 1 space in URI
RewriteRule "^(\S*)\s(\S*)$" /$1-$2 [L,R=302,NE]
Who redirects version example+page to example-page
Related
Hoping this isn't a duplicate, done a lot of looking and I just get more confused as I don't use .htaccess often.
I would like to have some pretty URLs and see lots of help regarding getting information where for example index.php is passed a parameter such as page. So I can currently convert www.example.com/index.php?page=help to www.example.com/help.
Obviously I'm not clued up on this but I would like to parse a URL such as www.example.com/?page=help.
Can't seem to find much info and adapting the original I am obviously going wrong somewhere.
Any help or pointers in the right direction would be greatly appreciated. I'm sure its probably stupidly simple.
My alterations so far which do not seem to work are:
RewriteCond %{THE_REQUEST} ^.*/?page=$1
RewriteRule ^(.*)/+page$ /$1[QSA,L]
Also recently tried QUERY_STRING but just getting server error.
RewriteCond %{QUERY_STRING} ^page=([a-zA-Z]*)
RewriteRule ^(.*) /$1 [QSA,L]
Given up as dead to the world so thought I would ask. Hoping to ensure the request/url etc starts ?page and wanting to make a clean URL from the page parameter.
This is the whole/basic process...
1. HTML Source
Make sure you are linking to the "pretty/canonical" URL in your HTML source. This should be a root-relative URL starting with a slash (or absolute), in case you rewrite from different URL path depths later. For example:
Help Page
2. Rewrite the "pretty" URL
In .htaccess (using mod_rewrite), internally rewrite the "pretty" URL back to the file that actually handles the request, ie. the "front-controller" (eg. index.php, passing the page URL parameter if you wish). For example:
DirectoryIndex index.php
RewriteEngine On
# Rewrite URL of the form "/help" to "index.php?page=help"
RewriteRule ^[^.]+$ index.php?page=$0 [L]
The RewriteRule pattern ^[^.]+$ matches any URL-path that does not include a dot. By excluding a dot we can easily omit any request that would map to a physical file (that includes a file extension delimited by a dot).
The $0 backreference contains the entire URL-path that is matched by the RewriteRule pattern.
The DirectoryIndex is required when the "homepage" (root-directory) is requested, when the URL-path is otherwise empty. In this case the page URL parameter is not passed to our script.
3. Implement the front-controller / router (ie. index.php)
In index.php (your "front-controller" / router) we read the page URL parameter and serve the appropriate content. For example:
<?php
$pages = [
'home' => '/content/homepage.php',
'help' => '/content/help-page.php',
'about' => '/content/about-page.php',
'404' => '/content/404.php',
];
// Default to "home" if "page" URL param is omitted or is empty
$page = empty($_GET['page']) ? 'home' : $_GET['page'];
// Default to 404 "page" if not found in the array/DB of pages
$handler = $pages[$page] ?? $pages['404'];
include($_SERVER['DOCUMENT_ROOT'].$handler);
As seen in the above script, the actual "content" is stored in the /content subdirectory. (This could also be a location outside of the document root.) By storing these files in a separate directory they can be easily protected from direct access.
4. Redirect the "old/ugly" URL to the "new/pretty" URL [OPTIONAL]
This is only strictly necessary (in order to preserve SEO) if you are changing an existing URL structure and the "old/ugly" (original) URLs have been exposed (indexed by search engines, linked to by third parties, etc.), otherwise the "old" URL (ie. /index.php?page=abc) is accessible. This is the same whenever you change an existing URL structure.
If the site is new and you are implementing the "new/pretty" URLs from the start then this is not so important, but it does prevent users from accessing the old URLs if they were ever exposed/guessed.
The following would go before the internal rewrite and after the RewriteEngine directive. For example:
# Redirect "old" URL of the form "/index.php?page=help" to "/help"
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{REQUEST_URI} ^/index\.php$ [OR]
RewriteCond %{QUERY_STRING} ^page=([^.&]*)
RewriteRule ^(index\.php)?$ /%1 [R=301,L]
The check against the REDIRECT_STATUS environment variable prevents a redirect-loop by not redirecting requests that have already been rewritten by the later rewrite.
The %1 backreference contains the value of the page URL parameter, as captured from the preceding CondPattern (RewriteCond directive). (Note how this is different to the $n backreference as used in the rewrite above.)
The above redirects all URL variants both with/without index.php and with/without the page URL parameter. For example:
/index.php?page=help -> /help
/?page=help -> /help
/index.php -> / (homepage)
/?page= -> / (homepage)
TIP: Test first with 302 (temporary) redirects to prevent potential caching issues.
Comments / improvements / Exercises for the reader
The above does not handle additional URL parameters. You can use the QSA (Query String Append) flag on the initial rewrite to append additional URL parameters on the initially requested URL. However, implementing the reverse redirect is not so trivial.
You don't need to pass the page URL parameter in the rewrite. The entire (original) URL is available in the PHP superglobal $_SERVER['REQUEST_URI'] (which also includes the query string - if any). You can then parse this variable to extract the required part of the URL instead of relying on the page URL parameter. This generally allows greatest flexibility, without having to modify .htaccess later.
However, being able to pass a page URL parameter can be "useful" if you ever want to manually rewrite (override) a URL route using .htaccess.
Incorporate regex (wildcard pattern matching) in the "router" script so you can generate URLs with "parameters". eg. /<page>/<param1>/<param2> like /photo/cat/large.
Reference:
https://httpd.apache.org/docs/2.4/rewrite/
https://httpd.apache.org/docs/2.4/rewrite/intro.html
https://httpd.apache.org/docs/2.4/mod/mod_rewrite.html
RewriteCond %{QUERY_STRING} ^page=([^&]+)
RewriteRule ^$ /%1? [R=302,L]
Can't delete and didn't want to waste anyones time responding.
I moved an old website to a new cms and some content have links that cannot be found on the new system.
For instance in the old system there was a link called https://example.com/newsletter/1234/123
in the new one i do not require those ids at the end. Therefore I would like to redirect the user to
https://example.com/newsletter/ directly.
I wrote the following in my htaccess file:
RewriteRule ^(.*newsletter)(.*) $1 [L,R=301]
This gives me unfortnately a "too man redirects" error.
Can anyone point out the error that I made?
I planned to use the regex to capture all links that contain the name "newsletter" and remove the rest of the url to redirect the user. Any help is appreciated thanks.
Using (.*newsletter)(.*) causes the rewrite to match /newsletter alone because .* matches zero or more characters. This sends mod_rewrite into a loop.
Instead, you can use .+ which matches one or more characters, and to that I would prepend /? to optionally match a trailing slash. There is no need to capture it in () because you do not reuse it as $2.
RewriteRule ^(.*newsletter)/?.+ $1 [L,R=301]
Your example began with .*. If you actually need that, leave it in. But if you are really just trying to match /newsletter/123/1234 and not /someother/newsletter/123/1234, simplify it with:
RewriteRule ^newsletter/?.+ /newsletter [L,R=301]
When you test this, use a private/incognito browsing window or a fresh browser. Browsers aggressively cache 301 redirects and it can make it very hard to debug rewrite rules if you are fighting against the cache.
My website is not stripping the text in * condition.
Website URL is like http://www. domain.com/search/adasd
I want to redirect it to https://www. domain.com/noindex-page
Using below redirect rule
RewriteRule ^search/\*$ https://www. domain.com/noindex-page? [L,R=301]
but it is redirecting to https://www. domain.com/noindex-pageadasd and giving 404 not found.
Please suggest how to strip adasd from this.
Using * is necessary as there are number of URLs with this prefix.
Based on your shown samples could you please try following. I believe problem in your regex \* is you have escaped * which means it is NO longer matching everything after search keyword and is treated as a literal as * hence it may not be working. Since you are looking for anything after search keyword then we could simply check if uri starts from search here.
RewriteEngine ON
RewriteRule ^search https://www.yourdomain.com/noindex-page? [L,R=301,NC]
Some how I had an invalid directory indexed in Google, and because of some dynamic relative links I now have 2500 "missing" pages indexed. I'm trying to use an .htaccess 301 redirect to correct the problem but I can't seem to get it to work. I need to redirect www.domain.com/shop/pc/.../pc/filename.asp to www.domain.com/shop/pc/filename.asp.
The rule I have written that doesn't want to work is RewriteRule ^shop/pc/\.\.\./pc/(.*)$ /shop/pc/$1 [R=301,L]
Any thoughts?
mod_rewite uses PCRE, so for these Unicode characters (I included the two dot leader as well, since I imagine that is more likely to sneak into a URL than an ellipsis):
# U+2026 … \xe2\x80\xa6 HORIZONTAL ELLIPSIS
RewriteRule ^shop/pc/\xe2\x80\xa6/pc/(.*)$ /shop/pc/$1 [R=301,L]
# U+2025 ‥ \xe2\x80\xa5 TWO DOT LEADER
RewriteRule ^shop/pc/\xe2\x80\xa5/pc/(.*)$ /shop/pc/$1 [R=301,L]
Note you may need the [B] flag (see flags) if the browser is percent-escaping the ellipsis.
I'm trying to use a RewriteRule (using ISAPI, NOT on an Apache server) to 301 redirect a url such as:
http://www.mydomain.com/news/story-title/
to
http://www.mydomain.com/news/detail/story-title/
What I've gotten so far is:
RewriteRule ^news/(?!detail)/?$ news/detail/$1/ [L,R=301]
which successfully ignores urls that already have the "detail" in them (in some of my first attempts I ended up with a loop and a url like "/news/detail/detail/detail..."), but visiting /news/story-title/ gives me a 404 so it's not redirecting to the proper location.
Change your rewrite rule to
RewriteRule ^news/(?!detail)([^/]+)/?$ news/detail/$1/ [L,R=301]
EDIT : (How it works?)
/(?!detail) is a negative lookahead but it's also non-capturing i.e. it matches / but not what comes after it; just makes sure that it isn't "detail". So, I added a capturing group ([^/]+) to capure those characters (one or more + of anything that's not a/) optionally ending with a /.
Hence, the $1 now gets replaced with the matched directory name.