URLs with symbol "%" at the end make http error, how to prevent it with htaccess? - .htaccess

I have a doubt with some of my URLs from my acces_log . There are some URLs from external sites linking me like http://domain.com/url_name.htm% (yes, with %).
Then... my server returns http error, I need to redirect this fake URLs to the correct way, and I thought in htaccess.
I only need to detect the % symbol in the last character of URL, and redirect without it.
http://domain.com/url_name.htm% --> http://domain.com/url_name.htm
How can I do this? I was trying with some samples with ? symbol but I didn't have lucky.
Thanks!

I already found the mistake...
It seems that some malformed URLs don't pass to vhost, then these petitions don't read the .htaccess.
The only way to solve this, is adding in httpd.conf the ErrorDocument 400 directive... Not is the best option for servers with different vhosts.. because all of the will have the same behaviour... but I think that is the only way for this case.
Quotation from Apache documentation:
Although most error messages can be overriden, there are certain circumstances where the >internal messages are used regardless of the setting of ErrorDocument. In particular, if a >malformed request is detected, normal request processing will be immediately halted and the >internal error message returned. This is necessary to guard against security problems >caused by bad requests.
Thanks anyway!!

This page is super helpful about the .htaccess rules.
http://www.helicontech.com/isapi_rewrite/doc/RewriteRule.htm
I saw a few solutions to this that use a small php script too. IE this one replaces #
.htaccess
RewriteRule old.php redirect.php? url=http://example.com/new.php|hash [R=301,QSA,L]
redirect.php
<?php
$new_url = str_replace("|", "#", $_GET['url']);
header("Location: ".$new_url, 301);
die;
?>

Related

Use of unbase64 in httpd 2.4

I need to set a variable in http header after authentication in httpd.conf. But the value is base 64 encoded. I need to decode it before setting it in the header for which I am trying to use unbase64 of apache httpd.
I have tried like below. But nothing helps.
RewriteRule .* - [E=NEW_VAL:%{unbase64:%{AUTHORIZE_VAL}}]
Header set user.sid "%{NEW_VAL}e"
Can please help me in understanding the usage of this or is there any way to decode the value?
Hope this helps someone. I have written a script for base 64 decoding and configured RewriteMap for that script. Then used that map for conversion in RewriteRule.
Reference : this

How to make apache treat query string as file name?

I mirrored a site to local server with wget and the file names locally look like this:
comments
comments?id=123
Locally these are static files that show unique content.
But when I access second file in browser it keeps showing content from file comments and appends the query string to it ?id=123 so it is not showing content from file comments?id=123
It loads the correct file if I manually encode the ? TO %3F in browser window and I type:
comments%3Fid=123
Is there a way to fix this ? Maybe make apache stop treating ? as query separator and treat it as file name character ? Or make an URL rewrite and change ? into %3F ?
Edit: Indeed too many problems caused by ? in file name and requests. I ended up using the wget option --restrict-file-names=windows that would convert ? into an # when saving file name.
The short answer is "don't do that."
The longer answer is that ? is a reserved character in URLs, using it as a part of a filename is going to cause problems forever, and the recommended solution is to pick a different character to use in those filenames. There are many to choose from - just avoid ? & # and # and you'll probably be fine.
If you insist on keeping the file name (or if you don't have an option) try:
RewriteCond %{QUERY_STRING} (.*)
RewriteRule (.*) $1%%3F%1 [NE]
However, this is going to fire any time you have a query string, which is likely not what you want.

Config for find url in html content

Can anybody help me to configure Sphinx for best matching url (part of url) in html content?
My config:
index base_index
{
docinfo = extern
mlock = 0
morphology = none
min_word_len = 3
charset_type = utf-8
charset_table = 0..9, A..Z->a..z, a..z
enable_star = 1
blend_chars = _, -, #, /, .
html_strip = 0
}
I use SphinxAPI on backend (PHP) with SPH_MATCH_EXTENDED mode.
I don't understand how search works. If I find "domain.com" I have 37 results. If "www.domain.com" - 643 results. But why? The "domain.com" is needle of "www.domain.com" and in theory with first query a have to get more results.
FreeBSD 9.2. Sphinx 2.1.2
16 distributed indexes (147Gb)
This is a bit late, but here's my thoughts anyway.
It looks like when you search www.domain.com, sphinx is actually looking for www domain and com respectively. If you're searching for just domain.com, it's just looking for domain and com. This is probably the reason why www.domain.com returns more results, because www appears more frequently throughout the index.
Since you're searching URLs, I would setup stopwords depending on how you want to search. For me, I would make www com org and basically all top-level domains stopwords. You might want to leave the top-level domains and just make www a stopword. This would allow you to weight com higher than a net in a search result.
If you setup your stopwords right, when someone searches domain.com sphinx actually just looks for hits of domain in the index, whether it be domain.com or domain.org or domain.net.

.htaccess condition that works on many conditions inside

I want to try something like if in .htaccess:
I want to Redictes each ?sp=SOMEWHAT to diffrent ?p=NNN (some number)
I have a 100 ?sp= pages.
And I don't want to work on 100 Rules each page load.
If this another method to solve it, I happy to know.
if(RewriteCond %{HTTP_HOST ^?sp=}{
RewriteRule ^?sp=bar ?p=5
RewriteRule ^?sp=foo ?p=9
RewriteRule ^?sp=tin ?p=15
}
This is no logic between the ?sp= and ?p=
Update: I doesn't have access to server config.
This can be done with the RewriteMap directive (iff you have access to the server configuration, as pointed out in a comment. No idea why they thought that needed to be restricted...). For example:
RewriteMap sp_to_s txt:/path/to/map.txt
RewriteRule ^?sp=(.*) ?p=${sp_to_s:$1|0}
(the 0 is the default value if none of the pairs in the map match).
Here's a sample map.txt:
bar 5
foo 9
tin 15
There are more ways to use the map feature; see the documentation for mod_rewrite for details.

Nginx with Clean Urls, Get Parameters, and PHP-FPM

I was curious to know if it was possible to somehow configure nginx so that it will parse url arguments without specifying .php at the end of the filename before the arguments are sent.
For example, let's say I have an account module which processes which aspect of the account to load, based on the argument sent. If the argument is ?sk='login', it will load the login module. If it is ?sk='register', it will load the registration module - and so on and so forth.
The problem is that when I type http://host/account?sk='login', I do not get anything via $_GET when I try to print the values within the array. The thing is that the clean URL does load the file which is supposed to manage the $_GET arguments, except it will not actually process $_GET unless I specify .php at the end of the filename.
I'm guessing there's an nginx or php-fpm configuration somehow which allows this.
Is this possible?
I figured it out:
The idea, with nginx, is to define a location, specify that location's root, and then do a try_files while specifying the $uri variable and the .php file to execute, with the arguments.
For example:
location /somelocation {
root /path/to/somelocation/on/server;
try_files $uri /somelocation/somefile.php?key=$args;
}
Rather than typing http://host/somelocation?key='someargument', you would specify /somelocation?someargument
From there on, when you call $_GET[ 'key' ] in your php, it will output someargument with a key of key. The key can be whatever you specify.

Resources