Our site was hacked and links to random content where added to the site. We completely removed the hacked site and put a new one in its place. Everything new including images and content, no other part of the old site was used.
The problem we have now is that the hacker has submitted 100,000's of links to the search bots and the server is continuously visited every 1 second by the bots trying to index the links that don't exist and have never existed on the old and the new site.
We have tried to combat this using the htaccess file of the site with several instances of various conditions and rewrite rules that tell the bots the content is gone.
Example
RewriteCond %{REQUEST_URI} .*/product/.*
RewriteRule ^ - [R=410,L]
The trouble with this is that some requests are getting through and producing 301 and 404 errors.
This is causing the bots to retest the request again and report our site as having 100,000s of bad links.
I am looking for a solution that returns 410 code to the bots for all requests excluding all source requests that our actually part of the site.
The site only has approx 10 pages but is a Joomla CMS so there are a mass of resources that get loaded in the background to deliver the page.
My idea was to visit each page in the site and use the browsers inspect to gather a list of all resource requests that a page makes.
The question is how do I formulate this into conditions and rules for the htaccess so that all page requests including route / are delivered but the hackers links requested by the bot aren't?
Also we are working on sending emails to the bot's to say their requests are being instigated by the hacker.
Related
I am trying to clean up a previously hacked WordPress site, and domain name reputation, the site has new hosting and is now on a different CMS system, but there are hundreds of spam links in Google I need to get rid of, they look like example.com/votes.php?10054nzwzm75042pw205039
Domain name, then votes.php?**** etc.. Numbers letters all sorts.
So how do I redirect ANYTHING that starts with the domain name then /votes.php?***
Any help greatly appreciated
Unless you have multiple domains, you don't need to explicitly check the domain name.
To send a "410 Gone" for anything that contains /votes.php in the URL-path (and any query string), you can do something like the following at the top of your root .htaccess file using mod_rewrite:
RewriteEngine On
# Serve a 410 Gone for any requests to "/votes.php"
RewriteRule ^votes\.php$ - [G]
A 410 is preferable to a "redirect" if you want to get these URLs removed from the search engines as quickly as possible.
To expedite the process of URL removal from Google then use Google's Removal Tool as well.
If you redirect these pages to the homepage then it will likely be seen as a soft-404 by Google and these URLs are likely to remain in the search results for a lot longer.
I've implemented IIS 7.0 rewrite rules for my site which strips out the .aspx extension, makes the URL lower case, and strips "default.aspx" to clean up the URL, and all of the rewrite rules work great.
However, in looking through my google analytics reports for a time period after the rules were put into production, under "Behavior", "Site Content", some of the entries still show the extension. For example, it shows these pages being hit:
/about/
/about/default.aspx
There is no way the rule isn't working for /about/default.aspx, and I can go to that URL and it redirects to /about/
So what is going on here? Analytics shouldn't know anything about default.aspx since all of those rules are done on the server before anything is returned to the client.
If your site uses redirects, the redirecting page becomes the landing page's referrer. For example, if you've changed your site so that default.aspx now redirects to /, then root folder(/) becomes the referrer for default.aspx. If someone reached your site via a Google search that sent them first to default.aspx, you won't have any data regarding the Google search.
For this reason, you should place the Analytics tracking code on the redirecting page as well as on the landing page. This way, the redirecting page will capture the actual referrer information for your reports.
By doing this, you may get the analytics log only the redirected url.
Figured it out. The GA Filters were set up to append "default.aspx" to any page which didn't have a page defined.
For example, if I went to
/about/
it would report as /about/default.aspx
But if I went to
/about/features
(features.aspx is a page)
it would not append it!
I have a relatively new site that has just started to pick up a bit of traction in the SERP's. My problem is that I have published it and had it indexed with PHP URL extensions, as follows:
www.example.com/page.php
www.example.com/product.php
And so on. Obviously it is a fairly easy matter of editing the .htacess file to remove these extensions. So I will end up with:
www.example.com/page
www.example.com/product
No problems there.
Because the site is still quite small, I can easily change all the links manually to drop the .php extension, and then update the sitemap. So Google, and all users, should have no way of reaching the .php pages, although of course they still exist if you were to manually type them in.
But, because Google has a 'record' of these pages existing (even though there are no direct links to reach them now), do I need to implement 301 redirects from the .php pages to the new non-php pages? I.e. will Google try to crawl those pages that are no longer in the sitemap, but once existed? In other words, since you can still reach www.example.com/page.php , even though will be no link on the site or in the sitemaps that will take you there, would I get penalised for having duplicate content - are 301 redirects basically required when doing this kind of thing, even if there are no links to the content anymore?
Thanks very much.
It is better to have 301 redirect for some time(month or two) even though you can change all your links to nonphp urls. This way any residual URLs(will always be there) that are hanging out there will be taken care and google will index nonphp urls from your 301 redirect. Once you are sure from Logs(depending on your system) that there are no more OLD urls coming in, you can remove the 301 redirects. This is little easier way of moving all your old URLs instead of abruptly throwing 404s. 301 helps to transfer SEO values of old URLs to new ones.
Another item to look out for is using rel="canonical" if you want your .php and nonphp pages to coexist. This signals that they are not duplicates.
The server where I developed a wordpress site was indexed by google. The site is now live with the actual domain, but google searches find links to the site at development server adddress. The site is on the same server where developed, making it live was simply pointing the domain to this new site. I need to redirect these links, but am not having an luck.
Also, the developer server address has a tilda, which was indexed as %7E in google. I have tried various version of the following, all to no avail.
RewriteCond %{HTTP_HOST} ^cardgym\.dcaccess\.net
RewriteRule ^cardgym.dcaccess.net/~chrs/$ http://chrs.org/$1 [R=301,nc]
RewriteRule ^/%7Echrs/(.*)$ http://chrs.org/$1 [R=301,nc]
going to development server results in an 404 error in wordpress: http://cardgym.dcaccess.net/~chrs/
Thanks
Can you change your internal web server configuration so that the development domain is an alias of the live site? That would be the easiest solution imo.
Otherwise check out the answer by Sigg3.net here RewriteRule for tilde
If I understand you correctly your site is live and you moved it to the new domain.
So it appears you already have the live site up and going at http://chrs.org. So there is nothing you need to do to redirect it as far as Google indexing.
It will take Google time to crawl the new site and index it.
You can help speed up the process by asking Google to index your new site by submitting it here.
https://www.google.com/webmasters/tools/submit-url?pli=1
.htaccess does not control the way Google indexes the site. If its on the internet it will be indexed unless you prevent it. There are a few options you can do to help make those dev links disappear.
A. Add a robots.txt to the root of the dev site with this code below in it and that will keep Google/search engines that respect robots.txt from indexing it.
# Make changes for all web spiders
User-agent: *
Disallow: /
B. Block the site using htaccess protected directory for the whole site which will stop it from being crawled.
OR
C. Take the dev site down.
It appears you've already moved the dev site to live domain that's why you are getting 404. The links in Google will disappear eventually because they no longer exist. The next time Google tries to crawl your dev site and see's it's not there the links will be removed. The new site will start to show up as Google begins crawling it. There is nothing you can do right now but wait. It can literally take weeks.
If indeed you really are trying to redirect, then you can add an htaccess file on the cardgym.dcaccess.net site using redirect.
Redirect 301 /~chrs http://chrs.org
First time user, been looking all night.
We recently changed our site from .net to wordpress. We transferred over half of the news articles and not the other half. So now we get old users coming to the site and getting a 404.
The news articles that exist in the wordpress site have been reditected and work fine, for example,
www.example.com/news/transfered-news-story.aspx
redirects to
www.example.com/blog/news/transfered-news-story
this was done manually.
What I need help with is if someone comes to the site with any other request, e.g.
www.example.com/news/this-didnt-get-moved.aspx
or
www,example.com/news/anything-else
or
www.example.com/news/2010/02
all just gets redirected to
www.example.com/blog/news
I have been reading on and off for a couple of weeks and tried a few things but they all append the additional stuff on the end of the redirected string.
so www.example.com/news/my-stuff-ok
becomes www.example.com/blog/news/my-stuff-ok (and I want to drop the my-stuff-ok)
I hope you get what I'm after, any help would be very much appreciated.
Thanks
Phil
You can simply write a directive that converts a 404 to a url (documentation):
ErrorDocument 404 /blog/news
However, you really should go through the motions of adding manual redirects (permanent redirect) to the new url for each of the other articles because you will take a considerable SEO hit if those urls no longer serve up the content that was linked by the search engine.