Htaccess to block access to specific Mediawiki page? - .htaccess

In Mediawiki, I'm trying to find a way to block access to some of our template pages. I don't want some of our competition viewing our complex code and stealing it for their wikis (which is common in the fandom I'm from unfortunately). So I was trying to use htaccess to accomplish this by redirecting people to the main wiki page when they try to view a specific template page. However, nothing is happening. Here's what I used:
Redirect /wiki/index.php?title=Template:Box /wiki/index.php
I'm not sure what I'm trying to do is possible, though, or if this is how htaccess is supposed to be used!
Thank you in advance!

In short words: don't do that!
Let me quote the relevant part of MediaWiki docs: MediaWiki is not designed to be a CMS, or to protect sensitive data. To the contrary, it was designed to be as open as possible. Thus it does not inherently support full featured, air-tight protection of private content.
There's no way MediaWiki guarantees partial read permissions: either people are able to see every page, or none of them. Otherwise, there will be loopholes to read your precious data. For example, TerryE's trick with rewrite rules adds absolutely no security: among a hundred of other ways, one can simply change Template:Box into Template_:_Box and the latter will be normalised internally into the former. MW sometimes HTTP-redirects to normalised titles, but that is very easy to overcome.

There are lots of ways of getting template content in MW, and MW has its own access control extensions, so I think that you are trying to cure a leaking sieve, but answering your Q directly:
RewriteEngine On
RewriteBase /
RewriteCond %{QUERY_STRING} \bTemplate:Box\b
RewriteRule wiki/index.php $0? [L]
This will remove the query parameters if the URI is for /wiki/index.php and the query string contains Template:Box.

Related

.htaccess redirect to subfolder, and remove it's name

I'm kind of noob in the world of web so my apologies... I tried many things found on SO and elsewhere, but I didn't manage to do what I want. And the Apache documentation is... well too much complete.
Basically what I want to do is redirect my domain to a subfolder. I found easy solutions for this (many different actually).
http://www.foo.com/
http://foo.com/
should redirect to /bar and appear as http://foo.com/
Using the following I got the expected result :
RewriteEngine on
Options +FollowSymLinks
RewriteCond %{HTTP_HOST} ^www\.foo.com$
RewriteRule ^/?$ "http\:\/\/foo.com" [R=301,L]
RewriteRule ^((?!bar/).*)$ bar/$1 [NC,L]
But I also want the subfolder as well as filenames not to appear when explicitly entered, i.e :
http://www.foo.com/index.html
http://foo.com/index.html
http://wwww.foo.com/bar
http://foo.com/bar
http://wwww.foo.com/bar/index.html
http://foo.com/bar/index.html
Should all appear as
http://foo.com/
Is this possible ?
Obviously using .htaccess, since I'm on a virtual host.
Thanks
As Felipe says, it's not really possible, because you lose information when you do that R=301 redirect: a hard redirect like this starts a whole new request, with no memory of the previous request.
Of course, there are ways to do similar things. The easiest is to put the original request in the query string (here's a good rundown on how mod_rewrite works with query strings). Sure, the query string does show up in the URL, but most modern browsers hide the query string in the address bar, so if your goal is aesthethics, then this method would be workable.
If you really don't want to show any of the original query in the URL, you might use cookies by employing the CO flag (here are some very good examples about cookie manipulation). At any rate, the information about the original request must somehow be passed in the hard redirect.
But anyhow, and most importantly, why would you want to do something like this? It's bound to confuse humans and robots alike. Great many pages behaved like this back when frames were fashionable, and it was pretty terrible (no bookmarking, no easy linking to content, Google results with the snippet "your browser cannot handle frames", no reloading, erratic back button, oh boy, those were the days).
Speaking of which, if your content is html, you may just use a plain old iframe to achieve the effect (but I'd sincerely advise against it).

Create search engine friendly urls for our blog

We run a blog, and really need to tidy up the URLs using htaccess, but I am really stumped.
Example:
Working on a site, and I need to generate search engine friendly URLs
So I have the url currently as:
http://mywebsite.com/blog/read.php?art_id=11
Title of this page is:
Why do Australians pay so much for Cars ?
I need to change it to its corresponding SEF url. like so:
http://mywebsite.com/blog/Why-do-Australians-pay-so-much-for-Cars-?
The question mark is part of the title, and we could remove these if its a issue. Any suggestions please?
Also would prefer to drop the read.php portion. Need to create a rule that works across our entire blog.
They all follow the same pattern, only the art_id number changes.
(Assuming that you're using apache as a webserver)
Take a look at this answer for a very similar question: https://stackoverflow.com/a/8030760/851273
The problem here is that .htaccess and mod_rewrite doesn't know how to map page names to art_id's so there's 2 ways you can try to do this.
You can add some functionality to your read.php so that it can do a similar lookup but instead of art_id, it uses art_title or something. Essentially you'll have to do the backend lookup of a database (or wherever your articles are stored) and use the title as a key instead of the ID. This is a little messy since it's possible to have weird characters in titles such as non-ascii or reserved characters (like ? for instance), so you'll need to create a title encoder and decoder when pulling titles out of the database or when using titles to lookup an article in your database.
If you have access to the server config or vhost config, you may be able to setup a RewriteMap using an outside program (the prg type) and create a php script that does the title-to-ID lookup for you. Then you can create rewrite rules in your .htaccess that does something along the lines of:
RewriteRule ^blog/(.*)$ /blog/read.php?art_id=${title-to-id:$1} [L]
Where you are extracting the article title from your pretty URL, and feeding it through a rewrite map called title-to-id to get the art_id. Again you'll need to setup a title encoder/decoder so your titles will have the non-ascci and reserved characters dealt with.
Another thing that you can do is to stick an article ID in your pretty URLs so they look like this: http://mywebsite.com/blog/11-Why-do-Australians-pay-so-much-for-Cars. This is still pretty easy to see what the link is about, it's SEO friendly, and it bypasses the need to do title-to-ID lookups. The Rewrite Rules would also equally be simpler:
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
# add whatever other special conditions you need here
RewriteRule ^blog/([0-9]+)-(.*)$ /blog/read.php?art_id=$1 [L]
And that's it. Of course, you'd have to now generate all of your blog URL's to be of the form: http://(host)/blog/(art_id)-(art_title), and you'd also have to remove special characters from the title, but you don't have to worry about writing additional code to translate titles back to IDs.

How to emulate subdomain/language in .htacces file

I have been looking at some way to implement the language subdomain behavior of the enterprise version of GTranslate and have come close.
Basically, if you want subdomains like fr.domain.com, es.domain.com, de.domain.com, etc
for the PRO version (general implementation), then there should be a way to handle this with modifications in the .htaccess file.
This isn't quite right, but it's coming close - for example, if you wanted "de.domain.com" for German, there might be a modification like:
RewriteCond %{THE_REQUEST} !^[A-Z]+\ /gtranslate/
RewriteRule ^(.*)$ http://de.domain.com/gtranslate/translate.php?lang=de&url=$1 [NC,L,R=301]
The Condition is to prevent recursion... This seems to come close, but the problem is that the URL in the browser now shows de.domain.com/gtranslate/translate.php?lang=de&url=about.html instead of the desired de.domain.com/about.html
There must be a way to handle this so that we can emulate this behavior.
Motivation: Perhaps I should have stated this up top.
I'm using the general version with Joomla - seems to work most of the time, BUT it looses the language during a form submission because the redirection (POST?) of the form looses the language code (normally a 2-letter prefix to the URL.
If I was able to keep a language per subdomain, then it seems like I would be able to properly keep the language during a form submission (as well as the potential "niceness" of having a subdomain to reflect the language - could even be the name of the country).
Anyone who is better in mod_rewrite want to take a stab at this?
There must be a way to handle this so that we can emulate this
behavior.
No. As soon as you do a 301 this is a full redirect, so the URL is modified, and, unfortunaltely, you can't avoid this.

How to rename /page.php?1 to /welcome.html in htacces?

I have a cms that does not generate friendly url's
What is the best way to rename this without getting double content by google.
Now I have in .htacces:
RewriteEngine On
RewriteBase /
RewriteRule welcome.html page.php?1[L]
RewriteRule about-us.html page.php?2[L]
Is this the best way to do?
Any help would be appreciated
Google has no problem spidering and indexing this very simple dynamic URL scheme. But if you want extra onpage-optimization-bonus-points with the help of keyword-stuffed-URLs it would be best you switch to a CMS that creates them automatically. You save time by avoiding to maintain the link-scheme manually both in your content and the rule-file.
If not there's always the chance you forget to replace those dynamic links with your readable ones if you create new content. Also your cms will always answer both variants: the friendly one and the dynamic one, so you have to tell Google the "canonical" URL (Explanation here) to avoid duplicate content. This might happen because you can't tell how people link to content on your site.

Getting "mywebsite.org/" to resolve to "mywebsite.org/index.php"

At my work we have various web pages that, my boss feels, are being ranked lower than they should be because "mywebsite.org/category/" looks like a different URL to search engines than "mywebsite.org/category/index.php" does, even though they show the same file. I don't think it works this way but he's convinced. Maybe I'm wrong though. I have two questions:
How do i make it so that it will say "index.php" in the address bar of all subcategories?
Is this really how pagerank works?
Besides changing all the links everywhere, a simpler solution is to use a rewrite rule. Make sure it is a permanent redirect, or Google will keep using the old link (without index.php). How you do this exactly depends on your web server, but for Apache HTTPd it looks something like the example given below.
Yes. Or so I've heard. Very few people know for sure. But Google mentions this guideline (as "Be consistent"). Make sure to check out all of Google's Webmaster guidelines.
Apache config for rewrite rule:
# in the generic config
LoadModule rewrite_module modules/mod_rewrite.so
# in your virutal host
RewriteEngine On
# redirect everything that ends in a slash to the same, but with index.php added
RewriteRule ^(.*)/$ $1/index.php [R=301,L]
# or the other way around, as suggested
# RewriteRule ^(.*)/index.php$ $1/ [R=301,L]
Adding this code to the top of every page should also work:
<?php
if (substr($_SERVER['REQUEST_URI'], -1) == '/') {
$new_request_uri = $_SERVER['REQUEST_URI'].'index.php';
header('HTTP/1.1 301 Moved Permanently');
header('Location: '.$new_request_uri);
exit;
}
?>
You don't tell us if you're using straight PHP or some other framework, but for PHP, probably you just need to change all the links on your site to "mywebsite.org/category/index.php".
I think it's possible that this does affect your search engine rank. However, you would be better off using only "mywebsite.org/category" rather than adding "index.php" to each one.
Bottom line is that you need to make sure all your links in your website use one or the other. What actually gets shown in the address bar is unimportant.
A simple solution is to put in the <head> tag:
<link rel="canonical" href="http://mywebsite.org/category/" />
Then, no matter which page the search engine ends up on, it will know it is simply a different view of /category/
And for your second question--yes, it can affect your results, if Google thinks you are spamming. If it wasn't, they wouldn't have added support for rel="canonical". Although I wouldn't be surprised if they treat somedir/index.* the same as somedir/
I'm not sure if /category/ and /category/index.php are considered two urls for seo, but there is a good chance that it will effect them, one way or another. There is nothing wrong with making a quick change just to be sure.
A few thoughts:
URLs
Rather than adding /index.php, you will be better off making it so there is no index.php on any of them, since the keyword 'index' is probably not what you want.
You can make a script that will check if the URL of the current page ends in index.php and remove it, then forward to the resulting URL.
For example, on one of my sites, I require the 'www.' for my domain (www.domain.com and domain.com are considered two URLs for search purposes, though not always), so I have a script that checks each page and if there is no www., it ads it, and forwards.
if (APPLICATION_LIVE) {
if ( (strtolower($_SERVER["HTTP_HOST"]) != "www.domain.com") ) {
header("HTTP/1.1 301 Moved Permanently"); // Recognized by search engines and may count the link toward the correct URL...
header("Location: " . 'www.domain.com/'.$_SERVER["REQUEST_URI"] );
exit();
}
}
You could mode that to do what you need.
That way, if a crawler visits the wrong URL, it will be notified that it was replaced with the correct URL. If a person visits the wrong URL, they will be forwarded to the correct URL (most won't notice), and then if they copy the url from the browser to send someone or link to that page, they will end up linking to the correct url for that page.
LINKING URLS
They way other pages link to your pages is more important for seo. Make sure all your in-site links use the proper URL (without /index.php), and that if you have a 'link to this page' feature, it doesn't include the /index.php part. You can't control how everyone links to you, but you can take some control over it, like with the script in item 1.
URL ROUTING
You may also want to consider using some sort of framework or stand-alone URL rerouting scheme. It could make it so there were more keywords, etc.
See here for an example: http://docs.kohanaphp.com/general/routing
I agree with everyone who's saying to ditch the index.php. Please don't force your visitor to type index.php if not typing it could get them the same result.
You didn't say if you're on an IIS or Apache server.
IIS can be set to assume index.php is the default page so that http:// mywebsite.org/ will resolve correctly without including index.php.
I would say that if you want to include the default page and force your users to type the page name in the url, make the page name meaningful to a search engine and to your visitors.
Example:
http://mywebsite.org/teaching-web-scripting.php
is far more descriptive and beneficial for SEO rankings than just
http://mywebsite.org/index.php
Might want to take a look at robots.txt files? Not quite the best solution, but you should be able to implement something workable with them...

Resources