.htaccess and dynamically generated SEO friendly URLs - .htaccess

I'm trying to build a website that may be called from the URL bar with any one of the following examples:
domainname.com/en
domainname.com/zh-cn/
domainname.com/fr/page1
domainname.com/ru/dir1/page2
domainname.com/jp/dir1/page2/
domainname.com/es-mx/dir1/dir2/page3.html
These page requests need to hit my .htaccess template and ultimately be converted into this php call:
/index.php?lng=???&tpl=???
I've been trying to make RewriteCond and RewriteRule code that will safely deal with the dynamic nature of the URLs I'm trying to take in but totally defeated. I've read close to 50 different websites and been working on this for almost a week now but I have no idea what I'm doing. I don't even know if I should be using a RewriteCond. Here is my last attempt at making a RewriteRule myself:
RewriteRule ^(([a-z]{2})(-[a-z]{2})?)([a-z0-9-\./]*) /index.php?lng=$1&tpl=$4 [QSA,L,NC]
Thanks for any help,
Vince

What's causing your loop is that your regex pattern matching /index.php. Why? Let's take a look:
First, the prefix is stripped because these are rules in an htaccess file, so the URI after the first rewrite is: index.php (query string is separate)
The beginning of your regex: ^(([a-z]{2})(-[a-z]{2})?), matches in in the URI
The next bit of your regex: ([a-z0-9-\./]*) matches dex.php. Thus the rule matches and gets applied again, and will continue to get applied until you've reached the internal recursion limit.
Your URL structure:
domainname.com/en
domainname.com/zh-cn/
domainname.com/fr/page1
domainname.com/ru/dir1/page2
domainname.com/jp/dir1/page2/
domainname.com/es-mx/dir1/dir2/page3.html
Either has a / after the country code or nothing at all, so you need to account for that:
# here -------------------v
^(([a-z]{2})(-[a-z]{2})?)(/([a-z0-9-\./]*))?$
# and an ending match here ------------^
You shouldn't need to change anything else:
RewriteRule ^(([a-z]{2})(-[a-z]{2})?)(/([a-z0-9-\./]*))?$ /index.php?lng=$1&tpl=$4 [QSA,L,NC]

Related

Rewrite rule in .htacess from URL with equal sign and short folder name

I need a rewrite rule in my .htaccess for the following example URL:
https://www.example.com/book=18ABCDEFG
This is the resulting URL that I need:
https://www.example.com/books/2018/ABCDEFG.pdf
I have spent a couple of hours googling and trying to solve this, but I am really stuck. If there is a rewrite wizard out there, I would really appreciate the help.
EDITED:
This is what i have come up with so far, but the only result is a 404 (not found):
RewriteEngine on
RewriteCond %{QUERY_STRING} ^book=18(.*)$
RewriteRule ^(.*)book=(.*)$ http://www.examplesite.com/books/2018/$1.pdf [R=301,L]
I was hoping that the $1 should reference the string after "18" since the only parenthesised group in the condition contains that string, but so far I haven't found the right syntax.
I should explain about the "18" too. Now the URLs are different and there will never be any other year than 2018 for book URLs with this pattern. So it can be hard coded.
But how do I reference the string after "18" in the rewrite rule?
This is what i have come up with so far, but the only result is a 404 (not found):
RewriteEngine on
RewriteCond %{QUERY_STRING} ^book=18(.*)$
RewriteRule ^(.*)book=(.*)$ http://www.examplesite.com/books/2018/$1.pdf [R=301,L]
As mentioned in comments, this is trying to match two different URLs at the same time: one where the information is contained in a query string (after an "imaginary" ?) and the other where the information is contained in the URL-path. So, it's probably not doing anything; hence the 404.
I was hoping that the $1 should reference the string after "18" since the only parenthesised group in the condition contains that string
$1 refers to the first parenthesised group in the RewriteRule pattern (of which there are two). If you want to match the first subpattern in the last matched condition then you need to use a backreference of the form %1.
However, your example does not contain a ? and therefore there is no query string. The information is contained in the URl-path instead. (Unless that is a typo in your question?! It looks like a typo, since the = is superfluous otherwise. But that would also completely change your question.)
To redirect the URL example.com/book=18ABCDEFG (ie. information in the URL-path) then your would need something like the following near the top of your .htaccess file in the document root:
RewriteEngine on
RewriteRule ^book=18([^/]+)$ /books/2018/$1.pdf [R=302,L]
If the code can only be specific characters then the regex should be appropriately specific. As it stands, it matches pretty much anything.
Test with 302 (temporary) redirects and only change to 301 (permanent) when you are sure it's working OK (if this is intended to be a permanent redirect and cached by the browser).
You will need to clear your browser cache before testing.

.htaccess, virtual directories, and semi-complex URLs

I'm basically just trying to have a master syntax for predictable URLs. Simple URL is no problem
RewriteEngine on
# RewriteRule ^friendlyUrl/content/?$ /index.php?app=main&module=content
Which to my understanding looks for the url structure and allows 1 or 0 trailing "/"'s
But some parts of the website have a /urlPrefix/ to access, eg. mysite.com/membersArea/
and /membersArea/ will be apart of every query there. I'm having trouble accomodating for trailing ?s and &s in URLs like these.
RewriteRule ^secureUrl/\?(.*)$ /index.php?app=admin&$1
This is my attempt to handle everything from mysite.com/secureUrl/ to mysite.com/secureUrl/?var1=foo&var2=bar and after many server errors and a search, I find myself here.
This is the most complex line I have and between you and me, I couldn't tell you exactly what's happening other than it looks for /friendlyUrl/10DIGITKEY/(possible task)/?possiblevars=foo&var2=bar
RewriteRule ^friendlyUrl/([a-zA-Z0-9]{10})/?([a-z]*)/?\??(.*)$ /index.php?app=main&module=web&id=$1&$2&$3
Htaccess has always been my weakest subject, and as a webmaster I pay the price constantly, any help would be appreciated.
Need to input the same request to the PHP file (plus ANY query with or without ? or &) whether its just /friendlyUrl/ or /friendyUrl/?var=1, /friendlyUrl/&var=1, /friendlyUrl/var=1
You're looking to keep the query string of your request URI to remain as is, or to be included in the rewritten URL after the rewrite process is done.
For this purpose, you use the QSA flag in your RewriteRule directive. So, to rewrite /friendlyUrl/10DIGITKEY/(possible task)/?possiblevars=foo&var2=bar, you'd have:
RewriteRule ^friendlyUrl/([a-z\d]{10})/([^/]*)/?$ /index.php?app=main&module=web&id=$1&task=$2 [QSA]
Notice the QSA flag at the end. Also, keep in mind that I'm passing the second match (the possible task of your URL) as another variable (named task). This variable will be empty if nothing was found.
QSA|qsappend
When the replacement URI contains a query string, the default behavior
of RewriteRule is to discard the existing query string, and replace it
with the newly generated one. Using the [QSA] flag causes the
query strings to be combined.

how to rewrite a custom url in joomla using htaccess

I am designing a News Website using joomla 2.5
I want rewrite this url:
http://domain.com/categoryname/?format=feed&type=rss
to:
http://domain.com/rss/categoryname
Note: I'm using mode_rewrite .htaccess for joomla.
please help me quickly.
thanks to every body in this site.
Apache's mod_rewrite allows you to transform a url to a different url utilizing regex patterns.
The pattern applies to the path and allows you to do your in your example write a regex pattern like /rss/(.+) which will match anything beginning with /rss/ and has at least one character after. The parenthesis are called a capturing group and you can reference that in the second parameter in the RewriteRule directive.
The second part /$1/?format=feed&type=rss, references the first captured group in the pattern and places it in the new url.
Finally you want to signify that it is the last rule to be processed with an [L] flag.
This gives you a rule of:
RewriteEngine On
RewriteRule /rss/(.+) /$1/?format=feed&type=rss [L]
If you intend to pass query strings to this new url, you will need to add an additional flag QSA which will result in [L,QSA] in place of [L].

How to redirect only when there is something after .html?

I have found that there are some people with bad syntax links to our articles.
For example, we have an article with URL
http://www.oursite.com/demo/article-179.html
The issue is that lot of people have linked back to this article with bad syntax such as
http://www.oursite.com/demo/article-179.html%5Cohttp:/www.oursite.com/demo/glossary.php
Now, I added the following ReWrite Rule in the .htaccess file to take care of such links.
RewriteRule article-179\.html(.*)$ "http\:\/\/www\.oursite\.com\/demo\/article-179\.html [301,L]
But this has resulted in a Redirect Loop message. How can we fix this issue via htaccess rewrite rule. Basically, we need something in our rewrite rule that works only when there is one or more characters after the .html. If not, then it should not redirect.
Any help would be highly appreciated!
With best regards!
Use + instead of *. * matches zero or more, which causes the pattern to match for the redirected path too, + instead matches one or more.
Also you should make the pattern as precise as possible, ie don't just check whether it ends with article-179.html, better check for the full path. And if this all happens on the same domain, then there's no need to use the absolute URL for the redirect.
There's also no need for escaping the substitution parameter like you did, it's treated as a simple string except for:
back-references ($N) to the RewriteRule pattern
back-references (%N) to the last matched RewriteCond pattern
server-variables as in rule condition test-strings (%{VARNAME})
mapping-function calls (${mapname:key|default})
http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html#rewriterule
Long story short, theoretically this should do it:
RewriteRule ^demo/article-179\.html(.+)$ /demo/article-179.html [R=301,L]
or this if you really need the absolute URL:
RewriteRule ^demo/article-179\.html(.+)$ http://www.oursite.com/demo/article-179.html [R=301,L]

Why does this cause an infinite request loop?

Earlier today, I was helping someone with an .htaccess use case, and came up with a solution that works but can't quite figure it out myself!
He wanted to be able to:
Browse to index.php?id=3&cat=5
See the location bar read index/3/5/
Have the content served from index.php?id=3&cat=5
The last two steps are fairly typical (usually from the user entering index/3/5 in the first place), but the first step was required because he still had some old-format links in his site and, for whatever reason, couldn't change them. So he needed to support both URL formats, and have the user always end up seeing the prettified one.
After much to-ing and fro-ing, we came up with the following .htaccess file:
RewriteEngine on
# Prevents browser looping, which does seem
# to occur in some specific scenarios. Can't
# explain the mechanics of this problem in
# detail, but there we go.
RewriteCond %{ENV:REDIRECT_STATUS} 200
RewriteRule .* - [L]
# Hard-rewrite ("[R]") to "friendly" URL.
# Needs RewriteCond to match original querystring.
# Uses "?" in target to remove original querystring,
# and "%n" backrefs to move its components.
# Target must be a full path as it's a hard-rewrite.
RewriteCond %{QUERY_STRING} ^id=(\d+)&cat=(\d+)$
RewriteRule ^index\.php$ http://example.com/index/%1/%2/? [L,R]
# Soft-rewrite from "friendly" URL to "real" URL.
# Transparent to browser.
RewriteRule ^index/(\d+)/(\d+)/$ /index.php?id=$1&cat=$2
Whilst it might seem to be a somewhat strange use case ("why not just use the proper links in the first place?", you might ask), just go with it. Regardless of the original requirement, this is the scenario and it's driving me mad.
Without the first rule, the client enters into a request loop, trying to GET /index/X/Y/ repeatedly and getting 302 each time. The check on REDIRECT_STATUS makes everything run smoothly. But I would have thought that after the final rule, no more rules would be served, the client wouldn't make any more requests (note, no [R]), and everything would be gravy.
So... why would this result in a request loop when I take out the first rule?
Without being able to tinker with your setup, I can't say for sure, but I believe this problem is due to the following relatively arcane feature of mod_rewrite:
When you manipulate a URL/filename in per-directory context mod_rewrite first rewrites the filename back to its corresponding URL (which is usually impossible, but see the RewriteBase directive below for the trick to achieve this) and then initiates a new internal sub-request with the new URL. This restarts processing of the API phases.
(source: mod_rewrite technical documentation, I highly recommend reading this)
In other words, when you use a RewriteRule in an .htaccess file, it's possible that the new, rewritten URL maps to an entirely different directory on the filesystem, in which case the .htaccess file in the original directory wouldn't apply anymore. So whenever a RewriteRule in an .htaccess file matches the request, Apache has to restart processing from scratch with the modified URL. This means, among other things, that every RewriteRule gets checked again.
In your case, what happens is that you access /index/X/Y/ from the browser. The last rule in your .htaccess file triggers, rewriting that to /index.php?id=X&cat=Y, so Apache has to create a new internal subrequest with the URL /index.php?id=X&cat=Y. That matches your earlier external redirect rule, so Apache sends a 302 response back to the browser to redirect it to /index/X/Y/. But remember, the browser never saw that internal subrequest; as far as it knows, it was already on /index/X/Y/. So it looks to you as though you're being redirected from /index/X/Y/ to that same URL, triggering an infinite loop.
Besides the performance hit, this is probably one of the better reasons that you should avoid putting rewrite rules in .htaccess files when possible. If you move these rules to the main server configuration, you won't have this problem because matches on the rules won't trigger internal subrequests. If you don't have access to the main server configuration files, one way you can get around it (EDIT: or so I thought, although it doesn't seem to work - see comments) is by adding the [NS] (no subrequest) flag to your external redirect rule,
RewriteRule ^index\.php$ http://example.com/index/%1/%2/? [L,R,NS]
Once you do that, you should no longer need the first rule that checks the REDIRECT_STATUS.
The solution below worked for me.
RewriteEngine on
RewriteBase /
#rule1
#Guard condition: only if the original client request was for index.php
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.php [NC]
RewriteCond %{QUERY_STRING} ^id=(\d+)&cat=(\d+)$ [NC]
RewriteRule . /index/%1/%2/? [L,R]
#rule 2
RewriteRule ^index/(\d+)/(\d+)/$ /index.php?id=$1&cat=$2 [L,NC]
Here is what I think is happening
From the steps you quoted above
Browse to index.php?id=3&cat=5
See the location bar read index/3/5/
Have the content served from index.php?id=3&cat=5
At Step 1, Rule 1 matches and redirects to location bar and fulfills Step 2.
At Step 3, Rule 2 now matches and rewrites to index.php.
The rules are rerun, for the reasons David stated, but since THE_REQUEST is immutable once set to the original request, it still contains /index/3/5 so Rule 1 does not match.
Rule 2 does not match either and the result of index.php is served.
Most other variables are mutable e.g. REQUEST_URI. Their modification during rule processing, and the incorrect expectation that the pattern matches are against the original request is a common reason for infinite loops.
Its feels quite esoteric sometimes, but I am sure there is a logical reason for its complexity :-)
EDIT
Surely there are two distinct requests
There are 2 client requests, the original one from Step1 and the one from the external redirect in step 2.
What I glossed over above is that when Rule 2 matches on the second request, it is rewritten to /index.php and causes an internal redirect. This forces the .htaccess file for / directory to be loaded again (it could easily have been another another directory with different .htaccess rules) and Re-run all the rules again.
So... why would this result in a request loop when I take out the first rule?
When the rules are re-run, the first rule now unexpectedly matches, as a result of Rule2's rewrite, and does a redirect, causing an infinite loop.
David's answer does contain most of this information and is what I meant "for the reasons David stated".
However, the main point here is that you do need the extra condition, either your condition, which stops further rule processing on internal redirects, or mine, which prevents rule 1 from matching, is necessary to prevent the infinite loop.

Resources