mod_rewrite Redirect Rule Variables question - .htaccess

I'm a bit of an .htaccess n00b, and can't for the life of me get a handle of regular expressions.
I have the following piece of RewriteRule code that works just fine:
RewriteRule ^logo/?$ /pages/logo.html
Basically, it takes /pages/logo.html and makes it /logo.
Is there a way for me to generalize that code with variables, so that it works automatically without having to have an independent line for each page?
I know $1 can work as a variable, but thats usually for queries, and I can't get it to work in this instance.

First you need to know that mod_rewrite can only handle requests to the server. So you would need to request /logo to have it rewritten to /pages/logo.html. And that’s what the rule does, it rewrites requests with the URL path /logo internally to /pages/logo.html and not vice versa.
If you now want to use portions of the matched string, you need to use groups to group them ( (expr)) that you then can reference to with $n. In your case the pattern [^/] will be suitable that describes any character other than the slash /:
RewriteRule ^([^/]+)$ /pages/$1.html

Try this:
RewriteRule ^/pages/(.*)\.html$ /$1
The (.*) matches anything between pages/ and .html. Whatever it matches is used in $1. So, /pages/logo.html becomes /logo, and /pages/subdir/other_page.html would become /subdir/other_page

Related

RewriteRule giving me issues with my regex

I'm trying to do a simple redirect where going to a url like www.example.com/foo will take me to www.example.com/quokka/inquiry/ask.php?user=foo.
For testing purposes I started with this:
RewriteRule ^(m.*)$ /quokka/inquiry/ask.php?user=$1
This works great for use cases where the foo starts with the letter: m, but I want it to be super customizable. So then I make this my redirect (note the removal of the letter m):
RewriteRule ^(.*)$ /quokka/inquiry/ask.php?user=$1
Why isn't the RewriteRule above not working for any instance of foo? I believe there's something wrong with my Regex?
Any help would be greatly appreciated.
RewriteRule ^(.*)$ /quokka/inquiry/ask.php?user=$1
Depending on what other directives you have in your .htaccess file, this is possibly causing an internal rewrite loop, which is preventing the URL from ever resolving correctly (do you get a 500 Internal Server Error?). Or, at best, an invalid rewrite to /quokka/inquiry/ask.php?user=quokka/inquiry/ask.php.
Aside: Note that, as mentioned, this is an internal rewrite, not strictly a "redirect" as you stated in your question. The term "redirect" usually refers to an "external 3xx redirect". (Although admittedly the Apache docs also confuse these terms, but do at least qualify this as an "internal redirect".)
In the case of the above directive, the rewritten URL is also captured by the ^(.*)$ pattern (which captures anything), which results in a loop something like:
Request: www.example.com/foo
Rewritten to: /quokka/inquiry/ask.php?user=foo
Rewritten to: /quokka/inquiry/ask.php?user=quokka/inquiry/ask.php
Rewritten to: /quokka/inquiry/ask.php?user=quokka/inquiry/ask.php
:
URL-rewriting does not stop when it gets to the end of the .htaccess file. Processing loops until the URL passes through unchanged. (Although what is considered a "change" is not always entirely clear, as you can get loops simply by rewriting the URL, even when the rewritten URL is the same, as in step#4 above.)
The pattern ^(m.*)$ "works" because the rewritten URL does not start with an "m". But if you have an other URLs that start with an "m", then these will also be rewritten and become inaccessible.
You need to have a unique URL that only captures "user IDs" (in this case). For example, all URLs that reference "user IDs" could have a specific prefix, eg. example.com/u/<userid>.
RewriteRule ^u/(.*)$ /quokka/inquiry/ask.php?user=$1
Or perhaps are of a maximum length that does not conflict with any other URL (eg. between 3 and 8 chars):
RewriteRule ^(.{3,8})$ /quokka/inquiry/ask.php?user=$1
Also, if you are restrictive as possible on the format of the user ID then this might also be sufficient. eg. only lowercase letters:
RewriteRule ^([a-z]+)$ /quokka/inquiry/ask.php?user=$1
However, using a prefix and restriction (regex should always be as restrictive as possible) would be my preference, as it avoids potential conflicts in the future. For example:
RewriteRule ^u/([a-z]{3,8})$ /quokka/inquiry/ask.php?user=$1 [L]
Also, include the L flag to ensure that no other directives that immediately follow are processed.

How to redirect only when there is something after .html?

I have found that there are some people with bad syntax links to our articles.
For example, we have an article with URL
http://www.oursite.com/demo/article-179.html
The issue is that lot of people have linked back to this article with bad syntax such as
http://www.oursite.com/demo/article-179.html%5Cohttp:/www.oursite.com/demo/glossary.php
Now, I added the following ReWrite Rule in the .htaccess file to take care of such links.
RewriteRule article-179\.html(.*)$ "http\:\/\/www\.oursite\.com\/demo\/article-179\.html [301,L]
But this has resulted in a Redirect Loop message. How can we fix this issue via htaccess rewrite rule. Basically, we need something in our rewrite rule that works only when there is one or more characters after the .html. If not, then it should not redirect.
Any help would be highly appreciated!
With best regards!
Use + instead of *. * matches zero or more, which causes the pattern to match for the redirected path too, + instead matches one or more.
Also you should make the pattern as precise as possible, ie don't just check whether it ends with article-179.html, better check for the full path. And if this all happens on the same domain, then there's no need to use the absolute URL for the redirect.
There's also no need for escaping the substitution parameter like you did, it's treated as a simple string except for:
back-references ($N) to the RewriteRule pattern
back-references (%N) to the last matched RewriteCond pattern
server-variables as in rule condition test-strings (%{VARNAME})
mapping-function calls (${mapname:key|default})
http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html#rewriterule
Long story short, theoretically this should do it:
RewriteRule ^demo/article-179\.html(.+)$ /demo/article-179.html [R=301,L]
or this if you really need the absolute URL:
RewriteRule ^demo/article-179\.html(.+)$ http://www.oursite.com/demo/article-179.html [R=301,L]

.htaccess and dynamically generated SEO friendly URLs

I'm trying to build a website that may be called from the URL bar with any one of the following examples:
domainname.com/en
domainname.com/zh-cn/
domainname.com/fr/page1
domainname.com/ru/dir1/page2
domainname.com/jp/dir1/page2/
domainname.com/es-mx/dir1/dir2/page3.html
These page requests need to hit my .htaccess template and ultimately be converted into this php call:
/index.php?lng=???&tpl=???
I've been trying to make RewriteCond and RewriteRule code that will safely deal with the dynamic nature of the URLs I'm trying to take in but totally defeated. I've read close to 50 different websites and been working on this for almost a week now but I have no idea what I'm doing. I don't even know if I should be using a RewriteCond. Here is my last attempt at making a RewriteRule myself:
RewriteRule ^(([a-z]{2})(-[a-z]{2})?)([a-z0-9-\./]*) /index.php?lng=$1&tpl=$4 [QSA,L,NC]
Thanks for any help,
Vince
What's causing your loop is that your regex pattern matching /index.php. Why? Let's take a look:
First, the prefix is stripped because these are rules in an htaccess file, so the URI after the first rewrite is: index.php (query string is separate)
The beginning of your regex: ^(([a-z]{2})(-[a-z]{2})?), matches in in the URI
The next bit of your regex: ([a-z0-9-\./]*) matches dex.php. Thus the rule matches and gets applied again, and will continue to get applied until you've reached the internal recursion limit.
Your URL structure:
domainname.com/en
domainname.com/zh-cn/
domainname.com/fr/page1
domainname.com/ru/dir1/page2
domainname.com/jp/dir1/page2/
domainname.com/es-mx/dir1/dir2/page3.html
Either has a / after the country code or nothing at all, so you need to account for that:
# here -------------------v
^(([a-z]{2})(-[a-z]{2})?)(/([a-z0-9-\./]*))?$
# and an ending match here ------------^
You shouldn't need to change anything else:
RewriteRule ^(([a-z]{2})(-[a-z]{2})?)(/([a-z0-9-\./]*))?$ /index.php?lng=$1&tpl=$4 [QSA,L,NC]

remove part of url via mod_rewrite

Is there any way to hide part of a Url via mod_rewrite. I am currently using part of the url, .htm, to split the page that is being requested and the query string.
Example
http://www.example.com/page/article/single.htm/articleid=8
This would let me know that the page requested is:
http://www.example.com/page/article/single
And the quest string is:
article=8
Ideally i would like the have this to work the same url without the .htm visible
http://www.example.com/page/article/single/articleid=8
The number of variables in the query sting varies as does the number of levels before the .htm so the rule would need to be dynamic
Thanks
To also do multiple querystring parameters, how do you want it to look? I started with this, which keeps this simple, then got trickier below.
http://www.example.com/page/article/single/articleid=8&anothervar=abc
Try this rule:
RewriteRule ^([^=]+)/(.+)$ $1.htm?$2 [NC,L]
This handles one or more querystring parameters, but does require at least one. This looks for anything without an = up to a slash, then everything else. Basically, it uses the = as the indicator of the path vs. the querystring portions; but actually splits it on the slash. (The NC is a habit of mine; not needed in this case, but when I leave it out I forget it when it's needed.)
To let querystrings be optional, so it could handle just
http://www.example.com/page/article/single
I found it easiest with two rules, instead of trying to mingle this into one rule:
RewriteRule ^([^=]+)$ $1.htm [NC,L]
RewriteRule ^([^=]+)/(.+)$ $1.htm?$2 [NC,L]
You can do something even prettier, using slashes for everything including multiple querystring parameters, like this:
http://www.example.com/page/article/single/articleid=8/anothervar=abc
It's a little hairy, but I think this works (couldn't let it go...)
Another rule handles replacing the slashes with ampersands, then doing the rewrite as above. This was easier to keep straight - maybe there's a way to do it all at once, but this was tricky enough for me:
RewriteRule ^([^=]+)$ $1.htm [NC,L]
RewriteRule ^([^=]+)/([^=]+=[^/]+)/([^=]+=.+)$ $1/$2&$3 [NC,LP]
RewriteRule ^/([^=]+)/(.+)$ /$1.htm?$2 [NC,L]
The first rule is as above, handling no querystrings at all. That just gets it out of the way.
The second rule is a loop LP, which is what I tend to find in examples whenever you have an unknown number of replacements. In this case, it's replacing the last querystring-slash with an ampersand, and looping until there's only one left (leaving that for the question mark in the third rule).
It's looking for the last one of these articleid=8/anothervar=abc where there are two parameters left. It replaces the slash with an ampersand like articleid=8&anothervar=abc
In words, it's looking for (and capturing in parentheses):
(not-equalsign) slash (not-equalsign equalsign not-slash) slash (not-equalsign equalsign anything)
This lines up as:
(not-equalsign) /page/article/single
slash /
(not-equalsign equalsign not-slash) articleid = 8
slash /
(not-equalsign equalsign anything) anothervar = abc
It replaces the last slash with an ampersand, and after looping, turns it into the first draft above: http://www.example.com/page/article/single/articleid=8&anothervar=abc . The third rule handles this as described above.
A note: These also assume all your urls will look like this, since they're going to tack on .htm to everything. If you want still allow explicit /something/page.htm then these rules would need to not-match on .htm if it's already there - something like that. Or maybe an initial rule up front that looks for .htm and just stops rewriting there. Or maybe only do this for the /page paths.

URL Rewriting based on form input

I'm creating a frontpage for my website with a single form and input text, Google-style. It's working fine, however, I want to generate a pretty URL based on the input. Let's say, my input is called "id", and using the GET method of form, and the action defined to "/go/", on submission, the URL will be:
site.com/go/?id=whateverIType
and I want to change it to
site.com/go/whateverIType
I was thinking on Mod Rewrite, but if the user put something in the URL, like:
site.com/go/?dontwant=this&id=whateverIType&somemore=trash
I want to ignore the other variables but "id", and rewrite the rule.
What's the better way of get this done? Thanks in advance!
PS: I'm using CodeIgniter, maybe there's something I can use for it as well. I already have a controller for "go".
I'm not familiar with CodeIgniter, but you can try the following RewriteRule
RewriteEngine on
RewriteCond %{REQUEST_URI} ^\/go\/
RewriteCond %{QUERY_STRING} id=([^&]*)
RewriteRule (.*) /go/%1? [L,R]
The %1 references the regex group from the previous RewriteCond, and the trailing ? will strip the querystring from the redirected URL.
Hope this helps.
Mod_rewrite supports conditions and rules with RegEx, so you could have a rule that matched the ?id=XXXX, that would extract it from the URL (keeping the other parameters), and rewrote the URL accordingly.
However... I don't think you want to do this, because if you rewrite the URL to be /go/Some+Search+Query, you won't be able to pick it up with say, PHP, without parsing the URL out manually.
It's really tough to have custom, SEO-friendly URLs with user input, but it is technically possible. You're better off leaving in the ?id=XXX part, and instead, using mod_rewrite in the opposite approach... take all URLs that match the pattern /go/My+Search+Terms and translate that back into something like ?id=My+Search+Terms, that way you'll be able to easily parse out the value using the URL's GET parameters. This isn't an uncommon practice - Google actually still uses URL parameters for user input (example URL: http://www.google.com/search?q=test).
Just keep in mind that mod_rewrite rewrites the URL before anything else (even PHP), so anything you do to the URL you need to handle. Think of mod_rewrite as a regular expression-based, global "Find and Replace" for URLs, every time a page is called on the server. For example, if you remove the query string, you need to make sure your website/application/whatever accounts for that.
In application/config/routes.php
$route['go/(:any)'] = "go/index/$1";
Where go is your controller and index is the index action.
http://codeigniter.com/user_guide/general/routing.html
You can use something like this in your .htaccess if you aren't already:
RewriteEngine on
RewriteCond $1 !^(index\.php|images|css|js|robots\.txt)
RewriteRule ^(.*)$ /index.php/$1 [L]

Resources