I have a drupal website which serves customized 404 pages (alternative content similar to that not found is searched and displayed in the 404 page), generating these pages is of course expensive in terms of resources so I'm currently wasting a lot of resources servig customized 404 pages to robots.
How can I setup my htaccess in order to serve a very light 404 page only to robots ip addresses? (without bootstraping drupal at all similarly to this example https://tedserbinski.com/drupal/preventing-drupal-from-handling-404s-for-performance/)
Thank you for any help.
Related
I have a mass page site with many URLs indexed in Google. I'd like to run a test for a couple weeks where I redirect all the traffic to another URL.
2 Questions:
Which htaccess code would you use to do this redirect? 302 redirect?
Would redirecting the traffic still cause high disk usage on my servers?
Our site was hacked and links to random content where added to the site. We completely removed the hacked site and put a new one in its place. Everything new including images and content, no other part of the old site was used.
The problem we have now is that the hacker has submitted 100,000's of links to the search bots and the server is continuously visited every 1 second by the bots trying to index the links that don't exist and have never existed on the old and the new site.
We have tried to combat this using the htaccess file of the site with several instances of various conditions and rewrite rules that tell the bots the content is gone.
Example
RewriteCond %{REQUEST_URI} .*/product/.*
RewriteRule ^ - [R=410,L]
The trouble with this is that some requests are getting through and producing 301 and 404 errors.
This is causing the bots to retest the request again and report our site as having 100,000s of bad links.
I am looking for a solution that returns 410 code to the bots for all requests excluding all source requests that our actually part of the site.
The site only has approx 10 pages but is a Joomla CMS so there are a mass of resources that get loaded in the background to deliver the page.
My idea was to visit each page in the site and use the browsers inspect to gather a list of all resource requests that a page makes.
The question is how do I formulate this into conditions and rules for the htaccess so that all page requests including route / are delivered but the hackers links requested by the bot aren't?
Also we are working on sending emails to the bot's to say their requests are being instigated by the hacker.
At LTCperformance.com, I've created a custom 404 page. If the user types in ltcperformance.com/fakepage.html, it forwards to the 404 page. But if there's no extension (ltcperformance.com/fakepage), it simply shows a default system 404 page.
I'm controlling the 404 page using htaccess:
ErrorDocument 404 http://ltcperformance.com/404.php
ErrorDocument 403 http://ltcperformance.com/404.php
ErrorDocument 500 http://ltcperformance.com/404.php
I have URL Rewriting in Joomla Administrator = on
Also, in Joomla Administrator, the Adds Suffix to URL = off
Any ideas? I've gone through every answer I can find on other posts and nothing will bring up my custom 404 page if there isn't an extension on the file.
ADDITIONAL INFO:
Any non-existent pages go to the homepage when I do these settings:
- Search Engine Friendly URLs / NO
- Use URL rewriting /Yes
- Adds Suffix to URL /No
I have someone taking a look at it on the server side, but I don't know what server issue it is - everybody online says it's a server issue but the support can't pinpoint what the actual server issue is. It's Godaddy; I did set their 404 page settings (they have a separate place to put it) to my 404 page, but that didn't work either.
Joomla .htaccess routes all requests to the index.php in order to support SEF urls.
In case it can't route a page, it will load the templates' error.php page. You can edit that to your requirements, this will be the easiest.
Should the error.php not be included in your template, copy the one in /templates/system to your template folder and customize it.
Ok sorry if this has been asked before but I could not find a working solution. I am using wordpress multisite. This is what I am trying to achieve.
Currently the domain http://mynew.com/ redirects (via my hosting co.) to one of the sites on my wordpress multisite installation as follows http://myold.com/subsite/
But I want to hide/swap the urls as follows, http://myold.com/subsite becomes http://mynew.com and all links that follow (eg. http://myold.com/subsite/another-link becomes http://mynew.com/another-link) without breaking.
I tried this in my .htaccess file which rewrote the url successfully however the links did not work and returned 404 errors.
RewriteRule ^subsite/(.+) http://mynew.com/$1 [R,L]
Hope that makes sense, thanks for your help.
Of course it will get you a 404 error, you redirected the request from the old site to the new site, but the new site does not contain the requested pages, so it through you a 404, what you need to do is to redirect from the new site back to the old site INTERNALY (that means without changing the browser address bar), but this requires your new site to work as a proxy server, see http://httpd.apache.org/docs/2.2/mod/mod_proxy.html
We have an IIS 404 asp.net handler that renders pages when an html page is not found. It uses the page's URL to query our Databases and builds rich relevant content on the fly. From what I can tell in the IIS logs and anaylyzing the pages from web browser tools there is NO indication the page does not actually exist and was dynamically generated.
In these cases is IIS actually sending a 404 to the client? Is there a redirect of any kind actually happening? Will Search engines punish me for this?
It's been 2 months and Google has indexed everything, but Bing and Yahoo have not indexed anything dynamic dispite my submitting various Directory pages, Sitemaps and Feeds with all my links. My home page is indexed on all search engines and has all my links. When I search very unique keywords in those links, I can see that bing and yahoo do see them on my Home Page links - but only there.
Is there anything I can run or check to make sure my dynamic pages are not somehow viewed as bad by Search engines? Any way to check if a 404 (whatever a 404 actually is to a client besides just another page) is returned to crawlers?
Many Thanks.
Is there anything I can run or check to make sure my dynamic pages are
not somehow viewed as bad by Search engines?
Dynamic pages are just fine. Most of the content on the Internet is dynamically produced. The search engines don't care if content is dynamic and, in fact, they usually do not know content is dynamic as all they see if the URL and the HTML that is produced by that URL.
Any way to check if a 404 (whatever a 404 actually is to a client
besides just another page) is returned to crawlers?
Use a tool like Firebug or the built in developer tools in Chrome to view your HTTP headers. Crawlers see the same headers a browser would see so that is an easy way to tell what headers your pages are sending out.