How can I change Google's crawl rate? - googlebot

I just switched servers; from a shared host to EC2 and have been seeing better performance. I guess Google noticed this too and cranked up the crawl rate, unfortunately for me it is currently too high and I can't handle it across my sites. How can I turn it down? I did some reading and people say you can do this in Google Webmaster Tools however the area where you switch the crawl rate setting has the message: "Your site has been assigned special crawl rate settings. You will not be able to change the crawl rate."
I depend on Google for the majority of my traffic so I don't want to block them via robots.txt I just want them to simmer down a tad. Is there a way to do this via robots.txt or some other method?

Related

How to block users accessing site outside of UK?

Searched the web and unable to find a solution. I have an umbraco site using IIS to host on a Windows server. Any ideas on approach to block users accessing site outside the UK? Htaccess approach would be too slow.... thank you in advance!
That's quite hard to do accurately, as you could have someone based in the UK using a European network provider, which means that they might appear to come from say Holland instead of the UK. It's also possible for people to spoof their location fairly easily if they really want to get at your site.
As Lex Li mentions there are plenty of commercial databases and tools for looking up a user's location, but the accuracy of these varies considerably, not to mention the fact that some of them only support IPv4. Any of these options are going to be slow though, as you'll have to check on every request. You also have to make sure you keep the databases up to date.
Another option would be to proxy your site through something like CloudFront or CloudFlare which both support blocking traffic by country.

Block traffic from referral spam bots in Azure Web App with DNN

I am sure many of you have found fake referral traffic in your google analytics reports/views. This makes it difficult for low to medium traffic sites to have accurate data for marketing. I am wondering what others are doing to exclude this traffic from their analytics reports.
If you go to your analytics account and go to acquisition -> all traffic -> referrals you will see sites like floating-share-buttons.com. These are the sites I want to filter out. Which you can do by setting up a custom filter for the view as described at the bottom of this page. I have done this and it works.
I would rather block these bots from hitting the site all together. Just a note: my sites are running as web apps in azure.
I am not sure if setting up url rewrite rules described here will work in azure apps or if this will mess with the existing url rewrite functions of the Content Management System I am using (DotNetNuke DNN platform 7).
I am really just looking to hear what others have done to block bots rather than than setting up filters in the analytics view's settings.
Thanks
PS
for those who are interested, this is the current filter list I am using:
webmonetizer\.net|trafficmonetizer\.org|success-seo\.com|event-tracking\.com|Get-Free-Traffic-Now\.com|buttons-for-website\.com|4webmasters\.org|floating-share-buttons\.com|free-social-buttons\.com|e-buyeasy\.com
With regards to this issue, there are a number of things that you can do. You are going the route that I see most commonly used and that is to block the information using the filters in Google Analytics.
You can go the route of an IIS Filter as well, just like you have linked. DNN's Friendly URL's will not necessarily be impacted by this as they are processed BEFORE DNN gets the request. There is a marginal performance impact by having two things process re-writes, but nothing to be concerned about until incredibly high user volume.
This is also a great collection of options.
First you need to know that there are mainly 2 types of spam affecting GA right now, Ghost and Crawlers.
The first(ghosts) never interacts with your page, so any server-side solutions like the HTTP rules or htaccess file won't have any effect and will only fill your config files with.
The crawlers as the name imply do access your website and can be blocked this way, but there are only a few of them compared with the ghost. To give you an Idea there are around 8 active crawlers while there are more than 100 ghosts and each week increasing.
This is because the ghost method is easier to implement for the spammers.
From your expression, only success-seo is a crawler. The rest should be filtered. Now there is a better way to get rid of all ghosts with just one filter based in your valid hostnames instead of creating of updating one every week.
You can find more information about the ghost spam and the solution here
https://stackoverflow.com/a/28354319/3197362
https://moz.com/ugc/stop-ghost-spam-in-google-analytics-with-one-filter
Hope it helps.

Web Page Compression - Looking for a definitive answer

In order to try and get to a resolution about web page compression, I'd like to pose the question to you 'gurus' here in the hope that I can arrive at some kind of clear answer.
The website in question: http://yoginiyogabahrain.com
I recently developed this site and am hosting it with Hostmonster in Utah.
My reasons for constructing it as a one page scrollable site was based around the amount of content that does not get updated - literally everything outside of the 'schedule' which is updated once a month. I realise that the 'departments' could have been displayed on separate pages, but felt that the content didn't warrant whole pages devoted to their own containers which also requires further server requests.
I have minimised the HTML, CSS and JS components of the site in accordance with the guidelines and recommendations from Google Page Speed and Yahoo YSlow. I have also applied server and browser caching directives to the .htaccess file to complete further recommendations.
Currently Pingdom Tools rates the site at 98/100 which pleases me. Google and Yahoo are hammering the site on the lack of GZIP compression and, in the case of Yahoo, the lack of CDN usage. I'm not so much worried about the CDN as this site simply doesn't warrant a CDN. But the compression bothers me in that it was initially being applied.
For about a week, the site was being GZipped and then it stopped. I contacted Hostmonster about this and they said that if it was determined that there were not enough resources to serve a compressed version of the site, it would not do so. But that doesn't answer the question about whether it would do so if the resources detrmined it could. To date, the site has no longer been compressed.
Having done a lot of online research to find an answer about whether this is such a major issue, I have come across a plethora of differing opinions. Some say we should be compressing, and some say it's not worth the strain on resources to do so.
If Hostmonster have determined that the site doesn't warrant being compressed, why do Google and Yahoo nail it for the lack of compression? Why does Pingdom Tools not even take that aspect into account?
Forgive the lengthy post, but I wanted to be as clear as possible about what I'm trying to establish.
So in summary, is the lack of compression on this a major issue or would it be necessary to perhaps look at a hosting provider who will apply compression without question on a shared hosting plan?
Many thanks!

I need to speed up my site and reduce the number of files calls

My webhost is aking me to speed up my site and reduce the number of files calls.
Ok let me explain a little, my website is use in 95% as a bridge between my database (in the same hosting) and my Android applications (I have around 30 that need information from my db), the information only goes one way (as now) the app calls a json string like this the one in the site:
http://www.guiasitio.com/mantenimiento/applinks/prlinks.php
and this webpage to show in a web view as welcome message:
http://www.guiasitio.com/movilapp/test.php
this page has some images and jquery so I think this are the ones having a lot of memory usage, they have told me to use some code to create a cache of those files in the person browser to save memory (that is a little Chinese to me since I don't understand it) can some one give me an idea and send me to a tutorial on how to get this done?. Can the webview in a Android app keep caches of this files?
All your help his highly appreciated. Thanks
Using a CDN or content delivery network would be an easy solution if it worked well for you. Essentially you are off-loading the work or storing and serving static files (mainly images and CSS files) to another server. In addition to reducing the load on your your current server, it will speed up your site because files will be served from a location closest to each site visitor.
There are many good CDN choices. Amazon CloudFront is one popular option, though in my optinion the prize for the easiest service to setup is CloudFlare ... they offer a free plan, simply fill in the details, change the DNS settings on your domain to point to CloudFlare and you will be up and running.
With some fine-tuning, you can expect to reduce the requests on your server by up to 80%
I use both Amazon and CloudFlare, with good results. I have found that the main thing to be cautious of is to carefully check all the scripts on your site and make sure they are working as expected. CloudFlare has a simple setting where you can specify the cache settings as well, so there's another detail on your list covered.
Good luck!

Difference between Ad company statistics, Google Analytics and Awstats on adult sites

I have this problem. I have web page with adult content and for several past months i had PPC advertisement on it. And I've noticed a big difference between Ad company statistics of my page, Google Analytics data and Awstats data on my server.
For example, Ad company tells me, that i have 10K pageviews per day, Google Analytics tells me, that i have 15K pageviews and on Awstats it's around 13K pageviews. Which system should I trust? Should i write my own (and reinvent a wheel again)? If so, how? :)
The joke is, that i have another web page, with "normal" content (MMORPG fan site) and those numbers are +- equal in all three systems (ad company, GA, Awstats). Do you think it's because it's not adult oriented page?
And final question, that is totally offtopic, do you know about Ad company that pays per impression and don't mind adult sites?
Thanks for the answers!
First, you should make sure not to mix up »hits«, »files«, »visits« and »unique visits«. They all have a different meaning and are sometimes called differently. I recommend you to look up some definitions if you are confused about the terms.
awstats has probably the most correct statistics, because it has access to the access.log from the web server. Unfortunately, a cached site (maybe cached by the browser, a proxy from an ISP or your own caching server) might not produce a hit on the web server. Especially if your site is served with good caching hints which don't enforce a revalidation and you are running your own web cache (e.g. Squid) in front of your site, the number will be considerable lower, because it only measures the work of the web server.
On the other hand, Google Analytics is only able to count requests from users which haven't blocked Google Analytics and have JavaScript enabled (but they will count pages served by a web cache). So, this count can be influenced by the user, but isn't affected by web caches.
The ad-company is probably simply counting the number of requests which they get from your site (probably based on their access.log). So, to get counted there, the add must not be cached and must not be blocked by the user.
So, as you can see, it's not that easy to get a single correct value. But as long as you use the measured values in comparison to those from the previous months, you should get at least a (nearly) correct rate of growth.
And your porn site probably serves a high amount of static content (e.g. images from the disk) and most of the web servers are really good at serving caching hints automatically for static files. Your MMORPG on the other hand, might mostly consist of some dynamic scripts (PHP?) which don't send any caching hints at all and web servers aren't able to determine those caching headers for dynamic content automatically. That's at least my explanation, without knowing your application and server configuration :)

Resources