htaccess rewrite rule accept % character in parameter - .htaccess

I'm creating a rewrite rule which should include a parameter which could contain the character %, however when I add it to my rule it breaks my website and every page returns an error:
RewriteRule ^sale/([a-zA-Z0-9_-%]+)$ browse.php?id=$1
I wanted the parameter to be able to include characters, digits 0 to 9 and special characters -, _ and %.
If I remove the % it works fine but obviously I want that to be accepted as a character for example url :
http://www.websitename.com/sale/test%20parameter

Apache translates percentage-encoded character inside a url before feeding it to mod_rewrite. So if you want to accept %20 in your urls you need to just add a space inside your RewriteRule. Note however that space is also to separate the regulare expression and the replacement string in RewriteRule, so in the case of a space you need to escape it using \

You can use the B flag http://httpd.apache.org/docs/2.2/rewrite/flags.html#flag_b

Related

Remove qutation mark sring from URL with HTACCESS

We are seeing a strang thing where bots are sending odd URLs. They are adding an alexa URL in the url we have. We are looking to remove that part of the URL so it just has everything before the odd URL addition
So we want to go from
www.example.com/search/Linux/page/6/”http:/www.alexa.com/siteinfo/www.example.com“/page/900
to
www.example.com/search/Linux/page/6/
removing the: ”http:/www.alexa.com/siteinfo/www.example.com“/page/900
Due to it having the quotes, we I am unsure what htaccess rule would work to rewrite the URL, but am open to suggestions.
Not sure where the requests are coming from, only see them with our 404 monitor.
If these requests are triggering a 404 (as they should be) then you are essentially already "blocking" such requests - they won't get inadvertently indexed by search engines.
However, if a third party side is mistakenly linking to you with these erroneous links then you might be losing traffic. You can redirect to remove the erroneous portion of the URL.
Due to it having the quotes, we I am unsure what htaccess rule would work to rewrite the URL, but am open to suggestions.
There's nothing particularly special about matching quotes in the URL. However, the quotes used in your question are not the "standard" double-quotes. The opening quote is "U+201D: RIGHT DOUBLE QUOTATION MARK" and closing with "U+201C: LEFT DOUBLE QUOTATION MARK". This is not a problem, we can check for all three.
For example, using mod_rewrite at the top of the .htaccess file to remove the part of the URL from the first quote character onwards:
RewriteEngine On
# Remove everything from the first double quote onwards
RewriteRule ^([^"”“]+)["”“] /$1 [R=301,L]
The $1 backreference contains the part of the URL-path before the first double quote character.
The original query string (if any) is preserved.
Test first with a 302 (temporary) redirect to avoid potential caching issues.
Alternatively, if your URLs are limited to a known subset of characters, eg. a-z, A-Z, 0-9, _ (underscore), - (hyphen), / (slash - path separator) then check for valid chars instead. For example:
# Remove everything from the first "invalid character"
RewriteRule ^([\w-/]+)[^\w-/] /$1 [R=301,L]

Redirect old URL to new URL by .htaccess

I want to change URL format from :
https://example.com/modules/news/article.php?storyid=224039
to :
https://example.com/news/224039
Any one can help to write true .htaccess codes?
thanks
Untested:
RewriteEngine On
RewriteBase /
RewriteRule ^news/([0-9]+) /modules/news/article.php?storyid=$1 [NC,L]
The NC flag is for No Case, if you want case insensitivity. If not, remove this flag. The L is the Last flag, meaning it would be the last rule parsed in the given rewrite instance so further rewrites aren't used. This is a bit counterintuitive in the sense that Apache will re-read all the rules all over again from the beginning anyways after the rewrite to make sure it doesn't have to rewrite again, and is a gotcha for many people regarding infinite rewrite loops... Probably can also omit the L flag altogether, but is more expressive.
The RewriteEngine On can be omitted in Apache configurations that enable this in the httpd.conf file. It is best practice to put it on again before assuming the engine is on. The rewrite base / probably can be omitted, depends on how you write your RewriteRule. Finally the RewriteRule uses a regular expression on the left, the parenthesis stores the match, the brackets define a character list, 0-9 is the valid characters, could also use \d instead, the + means match 1 or more times. The expression on the right is what to replace it with. The leading slash can probably be omitted. Also note that due to the presence of a querystring on the right side, if a querystring was present on the left side, it will be discarded. If you want to merge query strings, use the QSA flag meaning querystring append, and then it will merge querystrings when adding your storyid. Finally the $1 means use the first match that was captured with parenthesis on the left.

what is the '-' (minus sign) for RewriteRule?

I have that rule in the .htaccess.
RewriteRule ^(.+)\.([0-9a-zA-Z]+)$ - [L,NC]
I don"t understand what is the "-" (the minus sign) for, just begin the [L,NC]
$ - [L,NC]
From the Apache mod_rewrite docs:
(dash)
A dash indicates that no substitution should be performed (the existing path is passed through untouched). This is used when a flag (see below) needs to be applied without changing the path.
Effectively it means to take no action when that input URL pattern is matched. Following that with [L] makes sure no subsequent matches will be performed so the input URL is used "as-is". This can be used to exempt one specific pattern from being rewritten when it would otherwise be matched by a more general pattern.
You won't see rules like the one in question too frequently, because it is usually possible to achieve the same result by reordering the RewriteRule, or by modifying the more general matching pattern so it doesn't match the exempt one to begin with.
A dash indicates that no substitution should be performed (the existing path is passed through untouched). This is used when a flag (see below) needs to be applied without changing the path.
(http://httpd.apache.org/docs/current/mod/mod_rewrite.html#rewriterule)
In combination with the L flag this indicates that these URLs should be processed without transformation and no other rules should be applied.

Rewrite rule to remove all non alphanumeric symbols from a URI

I could not find the exactly the same question on SO. I hope someone can help me out with this.
Say, user entered http://www.example.com/abc#!def, and what I want to do is remove all symbols in the ${REQUEST_URI} portion, then do a redirect to http://www.example.com/abcdef. The problem is that these symbols can occur anywhere in the string, e.g. #ab!cdeg and abcdef#! should both redirect to abcdef.
If I'm correct, there is no string replace function for mod_rewrite, so this seems impossible to do, but am I correct?
You can capture specific parts of an URL with regular expressions in a RewriteCond
or RewriteRule, but not remove arbitrary characters.
Furthermore, you will never see the hash character '#' and everything after it in a URL, because it is used by the client to navigate to a specific part of the document.
Update using the next flag:
RewriteRule (.*)[^a-zA-Z](.*) $1$2 [N]
This rule removes all characters, which are not ^ alphabetic.

.htaccess prevent B flag from encoding + symbols or change occurences of %20 into +?

This line of my .htaccess file basically escapes and turns the first directory into a query string.
RewriteRule ^([^/]+)/?$ /a/?s=$1 [L,QSA,B]
I did this mainly to escape & symbols, but it escapes all non alphanumeric characters, including '+' symbols. I don't want to escape these ones so that urls are more clean.
eat%20a%20pizza
I want:
eat+a+pizza
Is it possible to somehow replace '%20' with '+' or prevent the B flag from encoding then?
Not sure if there's a way to be specific about how the B flag works, but you can change the %20 back to + with this:
RewriteRule ^(.*)%20(.*)$ /$1+$2 [NE,L]
You're probably going to need to find the right place to put that, as it needs to loop in order to get rid of all the %20's.

Resources