Stop MediaWiki encoding parentheses in section anchors - .htaccess

I recently installed MediaWiki 1.23.9 on a HostGator-hosted server (Apache-based I believe). I got it all configured and got pretty URLs up and running, got action URls also rewriting properly and everything was nice. I noticed, however, that anchor links, specifically the auto-generated section headers, aren't quite so pretty. They undergo "dot encoding" for some reason I'm not 100% sure on.
This results in /w/MyPage#Section_1_(Stuff_Here) becoming /w/MyPage#Section_1_.28Stuff_Here.29.
With parentheses being valid URI characters (and in fact, if used in a page title, they are properly not encoded in the URI), I don't understand why this is happening, nor how to stop it. I looked through all manner of bug reports and even tried glancing through the MediaWiki source. I found the function that performs the encoding, but as far as I can tell parentheses shouldn't be getting encoded.
My question is: Is there a way to prevent MediaWiki from encoding parentheses in section header anchors? Failing that, can I mask this behavior using .htaccess rules? For reference, my current .htaccess file is below, though I would very much prefer turning it off rather than masking it.
RewriteEngine On
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} !-f
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} !-d
RewriteRule ^(.*)$ %{DOCUMENT_ROOT}/w/index.php [L]
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} !-f
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} !-d
RewriteRule ^/?w/images/thumb/[0-9a-f]/[0-9a-f][0-9a-f]/([^/]+)/([0-9]+)px-.*$ %{DOCUMENT_ROOT}/w/thumb.php?f=$1&width=$2 [L,QSA,B]
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} !-f
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} !-d
RewriteRule ^/?w/images/thumb/archive/[0-9a-f]/[0-9a-f][0-9a-f]/([^/]+)/([0-9]+)px-.*$ %{DOCUMENT_ROOT}/w/thumb.php?f=$1&width=$2&archived=1 [L,QSA,B]
Note: This answer to a different question provides a quick explanation of what the "dot encoding" process is, though not how to exclude parentheses from it.

MediaWiki encodes section ids to honor HTML4 restrictions. This is a relic of the past as MediaWiki uses HTML5 these days, which removed those restrictions. You can set $wgExperimentalHtmlIds to true to make MediaWiki follow HTML5 rules (where only whitespace needs to be converted).
This is called "experimental" because at the time (the setting was introduced in 2010) browser support for HTML5 was somewhat unreliable. Today that's probably fine but no one actually tested that so use it at your own risk.

Related

.htaccess RewriteRule query isn't working

I can't get the following RewriteRule to work.
I have a PHP SQL query to display a web page. It requires a RewriteRule rule which I'm trying to achieve in a .htaccess file.
Here is the full URL at the moment.
www.example.com/category/sub-cat/page.php?art_url=a-page-of-mine
I can't get it to do
www.example.com/category/sub-cat/a-page-of-mine
My Code below:
Options +MultiViews
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^([^\.]+)$ $1.php [NC,L]
ErrorDocument 404 /error-404.php
#error 404
RewriteRule ^error/?$ error-404.php [NC,L]
RewriteRule ^category/sub-cat/(0-9a-zA-Z]+) category/sub-cat/page.php?art_url=$1 [NC,L]
Can someone help me out?
AS I said in the comments
Missing a few things here (0-9a-zA-Z]+) like [- as in ([-0-9a-zA-Z]+)
This is going to bite you too...
RewriteRule ^([^\.]+)$ $1.php [NC,L]
Match everything that doesn't have a dot and add .php to it with the [L] last flag. I would bet it will never pass that one in the first place.
Generally you want the more specific rules first, followed by the more generic ones last.
Also if I recall correctly the NC i no case, so you can get rid of the A-Z and just do [-a-z0-9]+
A better way
I try to avoid query string rewrites and rely on the URI method of rewriting common in MVC frameworks
example.com/index.php/category/sub-cat/a-page-of-mine
And then use a router and HTACCESS to only remove the index.php it's much simpler that way.
I have a pretty bare bones router on my GitHub page that shows how to route URL's like that.
https://github.com/ArtisticPhoenix/MISC/tree/master/Router
One big issue with messing with the query string is you can lose the ability to use $_GET the way it's designed to be used for things like search forms etc. So it's better to route not rewrite. Also the MVC way gives you a single entry point for all requests to go through which can make it easier to manage things like Constants, and Autoloaders....
Oh well, this is broken of course:
(0-9a-zA-Z]+)
The charclass lacks the opening [ and doesn't contain/match a literal - as well.
Right. To get this working I needed to add QSA as in [QSA,NC,L]. After how many weeks!!??

What does "(?s)" mean to htaccess?

While reviewing the documentation of a popular framework, I stumbled upon the .htaccess code below. I pretty much understand what it does except for the (?s) part. What does it do?
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^((?s).*)$ index.php?_url=/$1 [QSA,L]
It turns on the single line mode which makes . to additionally match new line characters (which it normally does not).
In this case it's redundant (and looks awkward) since the uri is a single line anyway.
References:
regular-expressions.info - Specifying Modes Inside The Regular Expression

url rewriting with htaccess not working

I am trying to rewrite my urls in my site so whatever is after the slash is passed as an argument (example.com/Page goes to example.com/index.php?page=Page)
here is the code that isn't working (it gives a Forbidden):
RewriteEngine On
RewriteRule ^/(.+)/$ /index.php?page=$1 [L]
Any Help will be appreciated
This is what I suggested in the comment to your question:
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !^/index\.php
RewriteRule ^(.+)$ /index.php?page=$1 [L,B]
The leading slash does not make sense in .htaccess style files, since you do not process an absolute oath in there, but a relative one. About the trailing slash: your example does not show such a slash, so why do you want to have it in the regular expression? It results in your pattern not matching anything but a request terminated by a slash. Which is not what you want.
The RewriteCond lines are there to still allow access to physical existing files and directories and to prevent an endless loop, though that should not occur with an internal-only rewriting. And you need the B flag to escape the part of the request url you want to specify as GET argument.
The last condition is actually obsolete, since obviously /index.php should be a file. I leave it in for demonstration purposes.
In general it is a very good idea to take a look at the documentation of apaches rewriting module: httpd.apache.org/docs/current/mod/mod_rewrite.html
It is well written, very precise and contains lots of really good examples. It should answer all your questions.

Redirect Desktop Internal Pages to Correct Mobile Internal Pages with Htaccess

I have built a Mobile site in a sub-domain.
I have successfully implemented the redirect 302 from:
www.domain.com to m.domain.com in htaccess.
What I'm looking to achieve now it to redirect users from:
www.domain.com/internal-page/ > 302 > m.domain.com/internal-page.html
Notice that URL name for desktop and mobile is not the same.
The code I'm using looks like this:
# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>
# END WordPress
# Mobile Redirect
# Verify Desktop Version Parameter
RewriteCond %{QUERY_STRING} (^|&)ViewFullSite=true(&|$)
# Set cookie and expiration
RewriteRule ^ - [CO=mredir:0:www.domain.com:60]
# Prevent looping
RewriteCond %{HTTP_HOST} !^m.domain.com$
# Define Mobile agents
RewriteCond %{HTTP_ACCEPT} "text\/vnd\.wap\.wml|application\/vnd\.wap\.xhtml\+xml" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "sony|symbian|nokia|samsung|mobile|windows ce|epoc|opera" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "mini|nitro|j2me|midp-|cldc-|netfront|mot|up\.browser|up\.link|audiovox"[NC,OR]
RewriteCond %{HTTP_USER_AGENT} "blackberry|ericsson,|panasonic|philips|sanyo|sharp|sie-"[NC,OR]
RewriteCond %{HTTP_USER_AGENT} "portalmmm|blazer|avantgo|danger|palm|series60|palmsource|pocketpc"[NC,OR]
RewriteCond %{HTTP_USER_AGENT} "smartphone|rover|ipaq|au-mic,|alcatel|ericy|vodafone\/|wap1\.|wap2\.|iPhone|android"[NC]
# Verify if not already in Mobile site
RewriteCond %{HTTP_HOST} !^m\.
# We need to read and write at the same time to set cookie
RewriteCond %{QUERY_STRING} !(^|&)ViewFullSite=true(&|$)
# Verify that we previously haven't set the cookie
RewriteCond %{HTTP_COOKIE} !^.*mredir=0.*$ [NC]
# Now redirect the users to the Mobile Homepage
RewriteRule ^$ http://m.domain.com [R]
RewriteRule $/internal-page/ http://m.domain.com/internal-page.html [R,L]
At the end, you have two RewriteRule lines which I believe should be changed to:
RewriteRule ^\/?$ http://m.domain.com [R=302]
RewriteRule ^\/?(.*)\/?$ http://m.domain.com/$1.html [R=302,L]
The ^\/?(.*)\/?$ means give me a string that starts at the beginning (^) and gives me all characters ((.*)) until the end ($) without the trailing/beginning (/) if there is one (?).
The http://m.domain.com/$1.html means that if the address is http://www.domain.com/internal-page/ then it becomes http://m.domain.com/internal-page.html.
The [R=302,L] should mean a 302 redirect (R=302) and the last rewrite (L), so no other rewrites can occur on our URL.
EDIT:
I believe that in the case of your RewriteRules the first one was redirecting to http://m.domain.com in the event that the URL was just the domain, but if there was anything else then the second rewrite was failing because it was not actually literally /internal-page/ and you needed a regex variable to put into the new URL.
EDIT (2):
To redirect to each mobile page from a specific desktop page:
RewriteRule ^\/foo\/?$ http://m.domain.com/bar.html [R=302]
RewriteRule ^\/hello\/?$ http://m.domain.com/world.html [R=302]
The (/?) means that a / is optional in that position and the (^) denotes beginning and ($) denotes ending in this case (the ^ can also be used to indicated something like [^\.] which means anything except a period).
Just put how ever many of those that you need in a row to do the redirecting and that should do the trick. To make sure there are no misconceptions, the first line would mean that http://www.domain.com/foo/ would become http://m.domain.com/bar.html and because the trailing slash is made optional http://www.domain.com/foo (notice the trailing forward slash is absent) would also redirect to http://m.domain.com/bar.html.
You can play with the syntax a bit to customize it, but hopefully I've pointed you in the right direction. If you need anything else, let me know, I'll do my best to assist.
I don't want to sound like a broken record or anything, but I feel that I could not, in good conscience, end this edit without pointing out that modifying the mobile site would be a much better way to do this. If it is not possible or you feel that a few static redirects are not a big deal versus modifying some pages, then I totally understand, but here are a few things for you to think about:
If the mobile site and desktop site are in separate folders then the exact same name scheme can be used for both making the Rewrites simpler and meaning that as new pages/content are added you will not need more Rewrite statements (making more rewrites means you have to create the new pages and then you have to create the redirects. that's extra work and more files which require your attention.)
If the mobile site is actually hosted from the same directory as the desktop site, then changing the files for one or the other so it becomes something like /desktop-foo/ or /d-foo/ then it is very easy to make the rewrite (redirect) go to something like /m-foo.html. You could forego modifying the desktop pages and make /foo/ become /m-foo.html and make all your mobile versions begin with an 'm'.
The third option that comes to mind is the most difficult and time consuming, depending on the content of the site, but it is a pretty cool one and ultimately would make the site the easiest to work on (after the initial work, of course). It is quite possible to use the same page for desktop, mobile, tablet, etc without the use of mod_rewrite or separate pages. Things like media queries in your CSS would allow you to change the look of the page depending on what the client is viewing it from. I came across a tutorial on the subject earlier which used media queries and the max-width of the screen to determine how the page should look. This would require a good bit of work now, but could save some hassle down the road as well as being an interesting learning experience if you are up to the challenge.
Again, sorry that this veered off topic at the end there, but I got the impression from your original question and your responses that you might find the alternatives interesting if you haven't already considered and dismissed them and that even if the alternatives do not interest you that you aren't going to be like some people and respond with, "Hey, $*%& you, buddy! I asked for Rewrites not all that other garbage!" I hope you take it as nothing more than what it is intended to be...helpful.

CodeIgniter + mod_rewrite URI shortening

I am building a blog-ish site using CI. I am using the HMVC plugin. The module that I am working in is "/journal".
The individual articles are accessed at /journal/article/ID/SLUG. This works fine, but I would like to shorten the URI to /journal/ID/SLUG using mod_rewrite.
Here are my rules:
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^journal/([0-9]+)(.*)$ index.php?/journal/article/$1$2 [L,NC]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php?/$1
For testing I am using /journal/2.
I know that the rules is matching. If, for instance, I change the redirect to http://google.com, I will indeed get redirected to Google. However, when using the rule as written it seems to be ignored and I get a 404 no matter what I put in.
Am I making some obvious (or arcane) error?
Edit: I figured this out shortly after posting the question. My rules are indeed correct but I need to change the following line in config/config.php:
$config['uri_protocol'] = 'AUTO';
to
$config['uri_protocol'] = 'PATH_INFO';
I won't claim to know exactly what that change does or why it fixes the issue. Perhaps someone can follow up with an explanation.
The $config['uri_protocol'] tells CI which $_SERVER superglobal to use to determine your apps URI. The 'PATH_INFO' option uses $_SERVER['PATH_INFO'] which is the URL request (without host portion), see php manual.
The 'AUTO' option is a CI thing to make suitable for different environments without config tweaks.
Personally, I have written a few PHP SEF controllers, I find it better to do all the processing with the PHP controller(s) scripts.
htaccess and rewrites can be tricky, harder to debug and one typo can kill the whole site (ouch). I am sure there are small performance gains, but one would need some pretty heavy demands. You are heading to your index.php controller anyway. I find happier code when it is all in one place ;-)
good luck with it...and hopefully I provided some insight to your issue.

Resources