Cyrillic in htaccess to block Analytics spam - .htaccess

I have a lot of spam on Google Analytics coming from different domains, and one of them is cyrillic encoded, so I'm in trouble to add it to my .htaccess file.
I want to add с.новым.годом.рф to the .htaccess file to block it, but I don't know how to do that, because the file when saved don't preserve the cyrilic characters.
RewriteCond %{HTTP_REFERER} ^http(s)?://(www\.)?с.новым.годом.рф.*$ [NC]
is converted to
RewriteCond %{HTTP_REFERER} ^http(s)?://(www\.)??.?????.?????.??.*$ [NC]
I have searched a way to convert cyrillic to unicode but I had no success.
Any suggestion?
Thanks

HTTP headers can't include arbitrary raw Unicode characters, so the Referer header contains an ASCII URI rather than Cyrillic characters in an IRI.
So you need to use the URI-form in the rule to match. To convert an IRI to a URI you use URL-UTF-8 encoding on path parts, and the IDN algorithm on the hostname.
eg using Python:
>>> u'с.новым.годом.рф'.encode('idna')
'xn--q1a.xn--b1aube0e.xn--c1acygb.xn--p1ai'
So:
RewriteCond %{HTTP_REFERER} ^https?://(www\.)?xn--q1a\.xn--b1aube0e\.xn--c1acygb\.xn--p1ai.*$ [NC]
It would still be a good idea to find a text editor for your .htaccess files that doesn't destroy perfectly good Unicode characters though.

Related

.htaccess redirect with Chinese characters

I am trying to redirect
新闻/事件/finance-for-sdgs-high-level-meeting-bellagio-financeforsdgs-2/?lang=zh-hans
to
/finance-for-sdgs-high-level-meeting-financeforsdgs-bellagio-25-27-february-2015/?lang=zh-hans
but am not sure of the encoding. The following is not working:
RewriteRule ^æ°é»/äºä»¶/finance-for-sdgs-high-level-meeting-bellagio-financeforsdgs-2/?lang=zh-hans$ http://ecosequestrust.org/finance-for-sdgs-high-level-meeting-financeforsdgs-bellagio-25-27-february-2015/?lang=zh-hans [R=301,L]
You can try using the \x escape sequence to escape the Unicode:
RewriteRule ^\xE6\x96\xB0\xE9\x97\xBB\x2F\xE4\xBA\x8B\xE4\xBB\xB6/finance-for-sdgs-high-level-meeting-bellagio-financeforsdgs-2/$ http://ecosequestrust.org/finance-for-sdgs-high-level-meeting-financeforsdgs-bellagio-25-27-february-2015/ [R=301,L]
Essentially, replacing the 新闻/事件 with \xE6\x96\xB0\xE9\x97\xBB\x2F\xE4\xBA\x8B\xE4\xBB\xB6. This way, you don't need to rely on the encoding of the htaccess file.

Ignore Query String

I have a website which supplies free wallpapers and from some reason when people try to get to it using google images, the link becomes broken...
Example
OK
http://www.hdwallfree.com/wp-content/uploads/2013/07/bugatti_venom_concept_silver_car_wallpaper-1440x900.jpg
BAD
http://www.hdwallfree.com/wp-content/uploads/2013/07/bugatti_venom_concept_silver_car_wallpaper-1440x900.jpg&ei=etRQVL66L4ePPfjqgPAF&bvm=bv.78597519,d.bGQ&psig=AFQjCNFhKbHEllHuv7ebxSATTR9udy2FQA&ust=1414669809124608
Pay attention thats google images adds query strings that makes my site no work properly...
So my question is, how can i make WordPress ignore those query strings?
The full .htaccess: http://pastebin.com/kHNL5DQi
In your main wordpress .htaccess you can insert this redirect rule just below RewriteBase line:
RewriteCond %{QUERY_STRING} .+
RewriteRule \.(jpe?g|gif|bmp|png)$ %{REQUEST_URI}? [L,NC,NE,R=301]
This will strip unwanted query string from image URLs.

htaccess RewriteRule with hard-coded special characters

There are a lot of similar questions, but none of them seem to deal with hard-coded strings in the destination. I merely want to redirect any requests for a sub-domain to a Google Group, such as follows:
RewriteEngine On
Options +FollowSymlinks
RewriteBase /
RewriteCond %{HTTP_HOST} gg.domain.com$
RewriteRule ^(.*)$ https://groups.google.com/forum/#!forum/myforum
I believe the problem is that the # character is being interpreted as a comment and thus the line is ignored. I have looked through the documentation for rewrite flags, and the only options I saw which might be relevant were [B] and [NE], neither of which seem to help, as I think they only work on string transformations.
The # character in your url should not be treated as a comment (it isn't for me). If it is treated as such though, you can escape it to let it loose it's special meaning. Escaping is the act of puting a \ before a character. Besides that, you require the [NE] flag to prevent the # from being ascii encoded.
RewriteCond %{HTTP_HOST} gg.domain.com$
RewriteRule ^(.*)$ https://groups.google.com/forum/\#!forum/myforum [NE]

htaccess read whitespaces "was not found on this server"

I have a little problem. I'm using htaccess for more userfriendly url's but when i add some whitespaces in URL it gives me the next error:
The requested URL /capitole/Limba Engleza was not found on this server.
My htaccess code looks like that:
<ifModule mod_rewrite.c>
RewriteEngine on
RewriteBase /
RewriteRule nota.jpg php/img_nota.php [R=301]
RewriteRule ^login login.php
RewriteRule ^recuperare recuperare.php
RewriteRule ^inregistrare inregistrare.php
RewriteRule ^/?([\sa-zA-Z0-9_-]+)(/?([a-z0-9=]+)(/=)?([a-z0-9=]+)?)?$ index.php?page=$1&par1=$3&par2=$5 [NC,L]
</ifModule>
Can someone help me?
The URL can not contain whitespaces. Whitespaces in URLs are encoded using the percentage (%).
If it is a query string (after ?) it's common to use the operator + (for example: big+ships).
If the string is in the path of the URL, then %20 is used (for example: big%20ships).
As detailed on the RFC: rfc2396
The space character is excluded because significant spaces may
disappear and insignificant spaces may be introduced when URI are
transcribed or typeset or subjected to the treatment of word-
processing programs. Whitespace is also used to delimit URI in many
contexts.
space = US-ASCII coded character 20 hexadecimal
your 2nd segment doesn't accept space modify it to read as follows.
RewriteRule ^/?([\sa-zA-Z0-9_-]+)(/?([\sA-Za-z0-9=]+)(/=)?([a-z0-9=]+)?)?$ index.php?page=$1&par1=$3&par2=$5 [NC,L]

symbols in .htaccess redirection issue

I'm having some issues with redirecting some pages with "%11" and "%28".
I'm trying to redirect a couple of pages, the rest work but I realized those with some symbols in it are not redirecting.
For example:
Redirect 301 /cars/mercedes%11benz/ http://www.example.com/cars/mercedes-benz/
Redirect 301 /alfa-romeo/alfa-romeo-147-%282001%E2%80%932009%29-2008090174/ http://www.example.com/cars/alfa-romeo-147/
do not work.
Thanks in advance for the help.
Basically specifies that only ASCII text is allowed. Might u having white spaces in url. Please remove them.
As for mod_rewrite, I believe that you've got that right except that the dot character in the character range need not be escaped, i.e., the hyphen is properly located at the beginning and the space is escaped. The ü probably doesn't need to be escaped but it shouldn't hurt). As for browsers making the conversion, that's a "browser thing" which Apache understands (and converts internally to the correct character).
Try this:
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /[^%?\ ]*\%
RewriteRule ^. http://www.example.com/ [R=301,L]
RewriteRule ^/cars/mercedes-benz/ http://www.test-site.com/cars/mercedes%11benz/ [QSA]

Resources