Block access for traffic to fake PDF pages - .htaccess

I have a lot of 404 hits to my site to PDF pages that have never existed on the site. These are all spammy-subject.pdf URLs. I get tens of these per day, which is much higher than genuine site traffic.
I'm currently adding 410 rewrites for each.
Can I use htaccess rule to totally block this traffic from reaching this site? Before it becomes a 404?

Can I use htaccess rule to totally block this traffic from reaching this site?
You can use .htaccess to prevent the request from being routed through a CMS such as WordPress, Joomla, etc. that uses a front-controller pattern - if that's what you mean by "site". However, the request has already reached your server by the time the .htaccess file is processed, so doing anything in .htaccess isn't necessarily going to help a "static site".
If you are already returning a 404 (or 410) - before it reaches your site - then the issue is already resolved.
The only potential issue is if the requests are being routed through your CMS and the 404 is being triggered by your CMS, not Apache. This would suggest you have the directives in the wrong place in your .htaccess file (or not present at all)? Blocking directives like this need to be at the top of your .htaccess file, before any existing rewrites.
For example:
# Prevent 404 request being routed unnecessarily through CMS
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule \.pdf$ - [NC,R=404]
There's no advantage to serving a 410 Gone instead of a 404 unless these files previously existed and you are trying to remove them from search engines (or telling 3rd parties they no longer exist).
UPDATE:
Should this code be at the very top or after the opening Wordpress rule: RewriteEngine On ?
It needs to be at the very top, before the # BEGIN WordPress comment marker (you should avoid manually editing the code in the WordPress section since WordPress itself maintains this section and your edits will be overwritten).
Yes, this is before the RewriteEngine On directive. You do not need to repeat the RewriteEngine directive. The location of the RewriteEngine directive does not actually matter. If there are multiple instances of this directive in the file then the last instance wins and controls the entire file. (It is a quick way to effectively comment out all the mod_rewrite directives in the file by simply placing a RewriteEngine Off directive at the very end.)

Related

Redirect all pages from old site to my blog homepage in blogger

Old and new site have different structure. Everything is differend.
I don't care about rankinks and SEO, but there are some backlinks that point there and I want to redirect them to the new site homepage. After some months I will delete the whole old site.
I just want a code for .htaccess that redirects any old site url to my new homepage blogger site. I repaeat: To NEW HOMEPAGE, not in related urls.
The old site is hosted as user in my companys domain.
http://users.company.com/myOLDname/
the new one is in google blogger
http://myNEWname.blogspot.com/
Note: If you can help, please put this above url's in the code to better understand.
Thank you.
You need something like that:
RewriteEngine on
RewriteCond %{HTTP_HOST} ^users\.example\.com$
RewriteRule ^/?myOLDname(?:/|$) https://myNEWname.blogspot.com/ [R=301]
It is a good idea to start out with a 302 temporary redirection and only change that to a 301 permanent redirection later, once you are certain everything is correctly set up. That prevents caching issues while trying things out...
Obviously the rewriting module needs to be loaded inside the http server and enabled in the http host. In case you use a distributed configuration file you need to take care that it's interpretation is enabled at all in the host configuration and that it is located in the host's DOCUMENT_ROOT folder.
This implementation will work likewise in the http servers host configuration or inside a distributed configuration file (".htaccess" file).

Blocking direct access to an URL (not a file)

A drupal site is pushing International traffic over quota on my (Plesk 10.4) server, and it looks as though much of that of that (~250,000 visits/month) is direct access to the URL /user/register. We are already using the botcha module to filter out spambot registrations, but that approach is resulting in two full pages being served to each bot. And while Drupal
I'm thinking that a .htaccess rule which returns a 403 response to that URL unless the referer is from the site might be the way to go, but my .htaccess-fu is not strong, and I can only find examples for blocking hot-linking of images.
What do I need to add and where?
Thanks,
Richard
You'd be checking against the HTTP referer. It's not a guarantee way to block incoming traffic linked from a site other than yours, since the field can be easily forged. But you can try adding this to the htaccess file (above any rules that are already there):
RewriteEngine On
RewriteCond %{HTTP_REFERER} !^https?://(www\.)?your-domain\com/ [NC]
RewriteRule ^user/register - [L,F]

Too many Rewrite Rules in .htaccess

I had to redesign a site last week. The problem is that last urls weren't seo friendly so, in order to avoid Google penalizing my site because too many 404 errors, I have to create a lot of Rewrite Rules because all the content had awful URL's ( and that content had a good position on SERP's).
For example:
RewriteRule ^documents/documents_for_subject/22-ecuaciones-exponenciales-y-logaritmicas http://%{HTTP_HOST}/1o-bachillerato/matematicas-cc.ss/aritmetica-y-algebra/ecuaciones-exponenciales-y-logaritmicas [R=301,L]
Is this a problem on my performance? Is there another solution to my situation?
Thanks
They are in the same domain.
Then an internal redirect is much better. A header redirect sends the new URL to the browser and causes it to make a new request; an internal one is handled, as the name says, internally.
This should work:
RewriteRule ^documents/documents_for_subject/22-ecuaciones-exponenciales-y-logaritmicas /1o-bachillerato/matematicas-cc.ss/aritmetica-y-algebra/ecuaciones-exponenciales-y-logaritmicas [L]
Any performance issues are going to be negligible with this - except maybe if you have many thousands or tens of thousands of individual rules, those may slow down Apache. In that case, if you have access to the central server configuration, put the rules there instead of a .htaccess file, because instructions in the server config get stored in memory and are faster.
A. Yes using 301 is the right way to notify search bots about changed URLs and eventually your old URL's will be removed from search results.
B. You don't need to use %{HTTP_HOST} in your rewrite rule just use it like this:
RewriteRule ^documents/documents_for_subject/22-ecuaciones-exponenciales-y-logaritmicas http://%{HTTP_HOST}/1o-bachillerato/matematicas-cc.ss/aritmetica-y-algebra/ecuaciones-exponenciales-y-logaritmicas [R=301,L]
C. If you have lots of RewriteRules like above I recommend using RewriteMap or else use some scripting support (like PHP) to redirect from old to new URL with 301.

Can you ReWrite a htaccess ReWriteRule?

Just wondering if its possible to 301 redirect an existing Rewriterule?
For example if I have the following line in my .htaccess file :
RewriteRule ^blue-widgets/ bluewidgets.php
and then I need to change my URL structure but the url "blue-widgets/" has a good ranking in the search engines which I dont wont to lose, is it possible to add another rewrite rule (301) that redirects that url too "newdirectory/blue-widgets/" ? If so, how is this done, is it a simple case of adding the new rewriterule under the existing one?
Does the fact that you have 2 rewrites, slow the page down or have any other problems?
You are confusing two quite different aspects: internal and external rewrites.
301 and 302 are external rewrites and in effect pass the redirect instruction back to the user's browser to do. 301 tells the browser (and the search engines) that the address change is permanent.
Rewrite rules without the [R] flag do an internal redirect -- that is a remapping inside the Apache / IIS subsystem than is not exposed to the outside world.
Yes, you can have multiple URI internally redirecting to the same target, but as you've written them, they will not be external and not 301s.
Try
RewriteRule ^blue-widgets/$ /new-directory/blue-widgets/ [L,NC,R=301]
RewriteRule ^new-directory/blue-widgets/$ bluewidgets.php [L,NC]
Does the fact that you have 2 rewrites, slow the page down or have any other problems?
The 301 to send blue-widgets to new-directory/blue-widgets is cached and will only happen once per client, so the performance should be minimally affected.
However, if you can, you should also change this link on your site to be new-directory/blue-widgets

.htaccess for 301 redirect: which syntax is best?

I am permanently redirecting my website
http://www.oldsite.com
to
http://newsite.com/blog
Is there a difference between using
Redirect 301 / http://newsite.com/blog/
or
RewriteEngine On
RewriteRule ^(.*)$ http://newsite.com/blog/$1 [R=301,L]
Any reason I should use one over the other?
The first uses Apache's internal redirection engine to direct all requests to / to http://newsite.com/blog with a 301 Moved Permanently response code.
The other loads the Apache rewriting engine and rewrites all of the incoming requests that match ^(.*)$ to http://newsite.com/blog/ (appending the matched part of the request URI to the target URI) with a 301 Moved Permanently response code, like the former.
The difference? The former rewrites everything to http://newsite.com/blog/ regardless of the request, and the second takes into account the request URI rewriting it as specified. The first is also somewhat faster than the second because it does not load the rewriting engine, does not introspect the request itself, and (depending on the AllowOverride setting) does not have to look up and load .htaccess files.
I believe the performance difference between the two would be imperceptible to a user.
However, assuming that all of the URLs on the old blog site cleanly map to the new site, then I would recommend using the second method.
If you use the first method, all links to your old blog posts will end up on the home page of your new site, which is not a great experience for users who may have bookmarked links etc.
If you care about SEO, then its the same story, all of your page rank will go from your old blog posts to your new site home page.

Resources