.htaccess - rewrite, first param optional - .htaccess

I am trying to make rewrite rule for pages and language.
There are two possible options:
website.com/photos/photo1 => website.com?page=photos&menu=photo1
and
website.com/en/photos/photo1 => website.com?lang=en&page=photos&menu=photo1
I have this, but problem is when first param is optional...
RewriteCond %{REQUEST_URI} ^/(en)/(.*)
RewriteRule ^.* %2?lg=%1 [QSA,L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^([^/]+)/([^/]+)?/?$ index.php?page=$1&menu=$2 [L,QSA]

You probably don't need the RewriteCond directive that checks whether the request maps to a physical file, since a request of the form /photos/photo1 (without a file extension) is unlikely to map to a file (providing your regex is sufficiently restrictive). Assuming of course that dots are not permitted in the last path segment.
Unless you allow directories to be requested directly then you probably don't need the directory check either?
As in your code sample, you should rewrite directly to the file that handles the request, ie. index.php. This is omitted from your example URLs. If you omit it in the rewrite then you are reliant on mod_dir issuing an internal subrequest for the directory index.
For the language code I'll assume any 2 lowercase letters ie. [a-z]{2}. If this can only be en or only a select few languages then change as required (eg. en|dk|jp - using alternation).
RewriteRule ^(?:([a-z]{2})/)?(\w+)/(\w+)/?$ index.php?lang=$1&page=$2&menu=$3 [QSA,L]
The whole first path segment (non-capturing) is made optional, with a capturing subpattern that provides the value for the $1 backreference (which is empty when omitted).
For the page and menu path segments I've used the \w shorthand character class (rather than anything other than a slash, as in your example) - this matches a-z, A-Z, 0-9 and _ (underscore) - so this naturally excludes the dot.
I've made the trailing slash optional, as in your directive, however it would be preferable to decide whether you allow trailing slashes or not. What is the canonical URL?
The only potential "caveat" with using a single directive is that when the lang path segment is not supplied then you'll get an empty URL parameter (but it is always present). eg. index.php?lang=&page=photos&menu=photos1. Although this would be preferable in my opinion.

Related

Mirror a file in htaccess

I'm trying to work on making a new site and I want to be able to mirror a site. Below is an example:
User visits: https://example.com/items/{some child folder}
User sees this file mirrored: https://example.com/items/listing.php
I want user to be able to see that file, but, when doing so, it don't want it to redirect. Any ideas?
UPDATE
I found a solution to the above problem. However, I need another question fixed. How would I stop the file listing.php in the /products folder from following the redirect?
RewriteEngine On
RewriteRule ^products/(.*) index.php?name=$1 [NC,L]
How would I stop the file listing.php in the /products folder from following the redirect?
RewriteRule ^products/(.*) index.php?name=$1 [NC,L]
Be more specific in the regex. If your products don't contain dots in the URL-path then exclude dots in the regex. For example:
RewriteRule ^products/([^./]+)$ index.php?name=$1 [L]
The above assumes your product URLs are of the form /products/<something>. Where <something> cannot consist of dots or slashes (so naturally excludes listing.php) and must consist of "something", ie. not empty.
Unless you specifically need the NC flag then this potentially opens you up to duplicate content.
If you want to be explicit then include a condition (RewriteCond directive):
RewriteCond %{REQUEST_URI} !^/products/listing\.php$
RewriteRule ^products/([^/]+)$ index.php?name=$1 [L]
The REQUEST_URI server variable contains the root-relative URL-path, starting with a slash. The ! prefix on the CondPattern negates the regex.
Or, use a negative lookahead in the RewriteRule pattern, without using a condition. For example:
RewriteRule ^products/(?!listing\.php)([^/]+)$ index.php?name=$1 [L]
Reference:
https://httpd.apache.org/docs/current/rewrite/intro.html
https://httpd.apache.org/docs/current/mod/mod_rewrite.html

HTACCESS How to "cut" URL at one point

I am new to .htaccess and I don't understand it well. Recently I have built the following code:
RewriteEngine On
RewriteCond %{HTTP_HOST} (.*)
RewriteCond %{REQUEST_URI} /api/v2/
RewriteRule ^api/v2(.*) /api/v2/api.php?input=$1
This was in the root public folder (example.com/.htaccess). But now I have to create second Rewrite and I want to make .htaccess file in example.com/api/v2/ folder. I tried to remove /api/v2/ part in each Rewrite Rule, but only thing I got was error 500.
What I want to achieve:
If someone uses this link: https://example.com/api/v2/test/test/123, I'd like to make it into https://example.com/api/v2/api?input=test/test/123 with .htaccess located in example.com/api/v2 folder.
Addressing your existing rule first:
RewriteCond %{HTTP_HOST} (.*)
RewriteCond %{REQUEST_URI} /api/v2/
RewriteRule ^api/v2(.*) /api/v2/api.php?input=$1
The first RewriteCond (condition) is entirely superfluous and can simply be removed. The second condition simply asserts that there is a slash after the v2 and this can be merged with the RewritRule pattern. So, the above is equivalent to a single RewriteRule directive as follows:
RewriteRule ^api/v2(/.*) /api/v2/api.php?input=$1 [L]
This would internally rewrite the request from /api/v2/test/test/123 to /api/v2/api.php?input=/test/test/123 - note the slash prefix on the input URL parameter value.
However, unless you have another .htaccess file in a subdirectory that also contains mod_rewrite directives then this will create a rewrite loop (500 error).
Also note that you should probably include the L flag here to prevent the request being further rewritten (if you have other directives).
If someone uses this link: https://example.com/api/v2/test/test/123, I'd like to make it into https://example.com/api/v2/api?input=test/test/123 with .htaccess located in example.com/api/v2 folder.
I assume /api? is a typo and this should be /api.php?. Note also that the slash is omitted from the start of the URL parameter value (different to the rule above).
I tried to remove /api/v2/ part in each Rewrite Rule, but only thing I got was error 500.
This is the right idea, however, you need to be careful of rewrite loops (ie. 500 error response) since the rewritten URL is likely matching the regex you are trying to rewrite.
Try the following instead in the /api/v2/.htaccess file:
RewriteEngine On
RewriteCond %{REQUEST_URI} !api\.php$
RewriteRule (.*) api.php?input=$1 [L]
The preceding RewriteCond directive checks that the request is not already for api.php, thus avoiding a rewrite loop, since the pattern .* will naturally match anything, including api.php itself.
You could avoid the additional condition by making the regex more specific. For example, if the requested URL-path cannot contain a dot then the above RewriteCond and RewriteRule directives can be written as a single directive:
RewriteRule ^([^.]*)$ api.php?input=$1 [L]
The regex [^.]* matches anything except a dot, so avoids matching api.php.
Alternatively, only match the characters that are permitted. For example, lowercase a-z, digits and slashes (which naturally excludes the dot), which covers your test string test/test/123:
RewriteRule ^([a-z0-9/]*)$ api.php?input=$1 [L]
Or, if there should always be 3 path segments, /<letters>/<letters>/<digits>, then be specific:
RewriteRule ^([a-z]+/[a-z]+/\d+)$ api.php?input=$1 [L]

Htaccess - Redirect if URL does not contain at least three numbers

I'm struggling to get this htaccess redirect to work. I want to redirect any URL that does not contain at least three numbers in a row. I started with the following and it worked perfectly for redirecting any URL that DID have three numbers in a row:
RewriteCond %{REQUEST_URI} [0-9]{3,20} [NC]
RewriteRule (.*) "https\:\/\/info\.mywebsite\.com\/" [R=301,L]
However, I tried to modify that with the exclamation mark to make the condition NOT match three numbers in a row:
RewriteCond %{REQUEST_URI} !([0-9]{3,20}) [NC]
RewriteRule (.*) "https\:\/\/info\.mywebsite\.com\/" [R=301,L]
But that doesn't seem to work as expected. Am I missing something with turning this expression into a not match?
Having previously experimented with the opposite 301 (permanent) redirect then the results are most probably cached (by the browser) from the earlier redirect. It is a good idea to test with 302 (temporary) redirects to avoid caching issues.
Note also that the REQUEST_URI server variable contains the URL-path only, so if the digits are contained in the query string part of the URL-path then your condition will fail.
The quantifier {3,20} matches from 3 to 20 characters, if you want "at least three" then use the quantifier {3,} (no upper bound).
You don't need the capturing subpatterns, ie. surrounding parentheses (...) on the regex since you are not using backreferences anywhere. Incidentally, you can't capture subpattern on a negated regex.
You don't need the additional condition (RewriteCond directive) - this can all be done with the RewriteRule directive only.
The NC flag is not required here - you are checking digits only.
For example:
RewriteRule !\d{3,} https://info.mywebsite.com/" [R=302,L]
As noted in comments, the RewriteRule substitution string is a regular string, not a regex, so does not require any special backslash escaping (although colons and slashes don't need escaping anyway in Apache regex).

Remove parentheses from the URLs query string in rewrite rule

I would like to clean up the URL's by removing parentheses from all query strings.
I tried the following code, but couldn't get it to work.
RewriteCond %{REQUEST_URI} [\(\)]+
RewriteRule ^(.*)[\(]+([^\)]*)[\)]+(.*)$ /$1$2$3 [R=301,L]
Here's an example of a URL:
http://www.example.com/blog/abc-post/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+blogname+(Blog+Name+New+York)
In order to match the query string you need to check the QUERY_STRING server variable in a RewriteCond directive.
Here are some ways of doing this:
1. Any number of parentheses - multiple redirects
For example, to remove any number of opening/closing parentheses in the query string part of the URL:
RewriteCond %{QUERY_STRING} (.*)[()]+(.*)
RewriteRule (.*) /$1?%1%2 [R,NE,L]
The NE flag is required in your example to avoid the %-encoded character (ie. %3A) being doubly encoded.
This will, however, result in multiple redirects, depending on the number of "groups" of parentheses. In your example, this will result in two redirects, because there a two "groups" of parentheses (a single parenthesis in each "group").
2. Any number of parentheses pairs - multiple (but fewer) redirects
If the parenthesis are always in matching pairs, then you can specifically check for the opening/closing parenthesis and potentially reduce the number of redirects.
RewriteCond %{QUERY_STRING} (.*)\((.*)\)(.*)
RewriteRule (.*) /$1?%1%2%3 [R,NE,L]
In your example, this results in a single redirect because there is just a single pair of parentheses. But /abc?foo=(bar)&one=(two) would result in two redirects.
3. Any number of parentheses - single redirect
This method performs multiple internal rewrites to remove the parentheses, followed by a single redirect once all the parentheses have been replaced:
# Remove parentheses from query string
RewriteCond %{QUERY_STRING} (.*)[()]+(.*)
RewriteRule (.*) /$1?%1%2 [E=REPLACED_PARENS:1,NE,L]
# Redirect to "clean" URL
RewriteCond %{ENV:REDIRECT_REPLACED_PARENS} 1
RewriteCond %{THE_REQUEST} ^GET\ /(.*)\?
RewriteRule ^ /%1 [R,NE,L]
The first rule internally rewrites the request and sets an environment variable if a replacement is required.
The second rule checks for this environment variable (note that REPLACED_PARENS becomes REDIRECT_REPLACED_PARENS after the first rewrite) and ultimately redirects to the cleaned URL. The URL-path is grabbed from the initial request (contained in the THE_REQUEST server variable) to avoid inadvertantly redirecting to the directory index (eg. index.php) when a bare directory is requested (or front-controller is used).

Variable in htaccess, RewriteRule question

How could i use a RewriteRule accesing a variable?
If i have:
SetEnv MY_VAR (a|b|c)
RewriteRule ^%{ENV:MY_VAR}$ index.php?s=$1 [L,QSA]
RewriteRule ^%{ENV:MY_VAR}-some-one$ index.php?s=$1 [L,QSA]
I have these examples but doesn`t work.
Later edit
Ok Tim, thank you for the answer. Let say that i have:
RewriteRule ^(skoda|bmw|mercedes)-([0-9]+)-([0-9]+)-some-think$ index.php?a=$1 [L,QSA]
RewriteRule ^some-one-(skoda|bmw|mercedes)/pag-([0-9]+)$ index.php?a=$1 [L,QSA]
RewriteRule ^a-z-(skoda|bmw|mercedes)$ index.php?a=$1 [L,QSA]
(forget second part of RewriteRule) .. I don-t want to put everywhere (skoda|bmw|mercedes) this list. Is more quickly to make a variable then to use it in rule...
You can't do that, because mod_rewrite doesn't expand variables in the regular expression clauses.
You can only use variables in the input argument to a RewriteCond, and as the result argument to a RewriteRule. There's overhead in compiling the regular expressions (especially if you're forced to do it per-request as with .htaccess files), so if you allowed variable content in them, they'd have to be recompiled for every comparison to ensure accuracy at the cost of performance. It seems the solution therefore was to not let you do that.
What exactly did you want to do that for, anyway?
I`ve received another answer on a mod_rewrite forum from jdMorgan:
Mod_rewrite cannot use a variable in a regex pattern. The .htaccess directives are not a scripting language...
I'd recommend:
RewriteCond $1<>$3 ^<>-([0-9]+)-([0-9]+)-some-think$ [OR]
RewriteCond $1<>$3 ^some-one-<>/pag-([0-9]+)$ [OR]
RewriteCond $1<>$3 ^a-z+-<>$
RewriteRule ^([^\-/]+[\-/])*(skoda|bmw|mercedes)([\-/].+)?$ index.php?a=$2 [QSA,L]
Here, the RewriteRule pattern is evaluated first (See Apache mod_rewrite documentation "Rule Processing").
If the pattern matches, then whatever comes before "(skoda|bmw|mercedes)" in the requested URL-path is placed into local variable "$1".
Whatever follows "(skoda|bmw|mercedes)" is placed into local variable $3.
The value of the requested URL-path matching "(skoda|bmw|mercedes)" is placed into $2.
Then each of the RewriteConds is processed to check that the format of the requested URL without the "(skoda|bmw|mercedes)" part is one of the formats to be accepted.
Note that the "<>" characters are used only as a separator to assist correct and unambiguous parsing, and have no special meaning as used here. They simply "take the place of" the variable-string that you do not want to include in each line. You can use any character or characters that you are sure will never appear in one of your URLs without first being URL-encoded. I prefer to use any of > or < or ~ myself.
Note also that the RewriteRule assumes that the "(skoda|bmw|mercedes)" substring will always be delimited by either a hyphen or a slash if any other substring precedes or follows it. I am referring to the two RewriteRule sub-patterns containing "[^-/]" ("NOT a hyphen or a slash") and "[-/]" ("Match a hyphen or a slash"). This greatly improves efficiency of regular-expressions pattern matching, so use this method if possible instead of using an ambiguous and inefficient sub-pattern like ".*" (Match anything, everything, or nothing").

Resources