Block traffic from referral spam bots in Azure Web App with DNN - azure

I am sure many of you have found fake referral traffic in your google analytics reports/views. This makes it difficult for low to medium traffic sites to have accurate data for marketing. I am wondering what others are doing to exclude this traffic from their analytics reports.
If you go to your analytics account and go to acquisition -> all traffic -> referrals you will see sites like floating-share-buttons.com. These are the sites I want to filter out. Which you can do by setting up a custom filter for the view as described at the bottom of this page. I have done this and it works.
I would rather block these bots from hitting the site all together. Just a note: my sites are running as web apps in azure.
I am not sure if setting up url rewrite rules described here will work in azure apps or if this will mess with the existing url rewrite functions of the Content Management System I am using (DotNetNuke DNN platform 7).
I am really just looking to hear what others have done to block bots rather than than setting up filters in the analytics view's settings.
Thanks
PS
for those who are interested, this is the current filter list I am using:
webmonetizer\.net|trafficmonetizer\.org|success-seo\.com|event-tracking\.com|Get-Free-Traffic-Now\.com|buttons-for-website\.com|4webmasters\.org|floating-share-buttons\.com|free-social-buttons\.com|e-buyeasy\.com

With regards to this issue, there are a number of things that you can do. You are going the route that I see most commonly used and that is to block the information using the filters in Google Analytics.
You can go the route of an IIS Filter as well, just like you have linked. DNN's Friendly URL's will not necessarily be impacted by this as they are processed BEFORE DNN gets the request. There is a marginal performance impact by having two things process re-writes, but nothing to be concerned about until incredibly high user volume.
This is also a great collection of options.

First you need to know that there are mainly 2 types of spam affecting GA right now, Ghost and Crawlers.
The first(ghosts) never interacts with your page, so any server-side solutions like the HTTP rules or htaccess file won't have any effect and will only fill your config files with.
The crawlers as the name imply do access your website and can be blocked this way, but there are only a few of them compared with the ghost. To give you an Idea there are around 8 active crawlers while there are more than 100 ghosts and each week increasing.
This is because the ghost method is easier to implement for the spammers.
From your expression, only success-seo is a crawler. The rest should be filtered. Now there is a better way to get rid of all ghosts with just one filter based in your valid hostnames instead of creating of updating one every week.
You can find more information about the ghost spam and the solution here
https://stackoverflow.com/a/28354319/3197362
https://moz.com/ugc/stop-ghost-spam-in-google-analytics-with-one-filter
Hope it helps.

Related

How to block users accessing site outside of UK?

Searched the web and unable to find a solution. I have an umbraco site using IIS to host on a Windows server. Any ideas on approach to block users accessing site outside the UK? Htaccess approach would be too slow.... thank you in advance!
That's quite hard to do accurately, as you could have someone based in the UK using a European network provider, which means that they might appear to come from say Holland instead of the UK. It's also possible for people to spoof their location fairly easily if they really want to get at your site.
As Lex Li mentions there are plenty of commercial databases and tools for looking up a user's location, but the accuracy of these varies considerably, not to mention the fact that some of them only support IPv4. Any of these options are going to be slow though, as you'll have to check on every request. You also have to make sure you keep the databases up to date.
Another option would be to proxy your site through something like CloudFront or CloudFlare which both support blocking traffic by country.

I need to speed up my site and reduce the number of files calls

My webhost is aking me to speed up my site and reduce the number of files calls.
Ok let me explain a little, my website is use in 95% as a bridge between my database (in the same hosting) and my Android applications (I have around 30 that need information from my db), the information only goes one way (as now) the app calls a json string like this the one in the site:
http://www.guiasitio.com/mantenimiento/applinks/prlinks.php
and this webpage to show in a web view as welcome message:
http://www.guiasitio.com/movilapp/test.php
this page has some images and jquery so I think this are the ones having a lot of memory usage, they have told me to use some code to create a cache of those files in the person browser to save memory (that is a little Chinese to me since I don't understand it) can some one give me an idea and send me to a tutorial on how to get this done?. Can the webview in a Android app keep caches of this files?
All your help his highly appreciated. Thanks
Using a CDN or content delivery network would be an easy solution if it worked well for you. Essentially you are off-loading the work or storing and serving static files (mainly images and CSS files) to another server. In addition to reducing the load on your your current server, it will speed up your site because files will be served from a location closest to each site visitor.
There are many good CDN choices. Amazon CloudFront is one popular option, though in my optinion the prize for the easiest service to setup is CloudFlare ... they offer a free plan, simply fill in the details, change the DNS settings on your domain to point to CloudFlare and you will be up and running.
With some fine-tuning, you can expect to reduce the requests on your server by up to 80%
I use both Amazon and CloudFlare, with good results. I have found that the main thing to be cautious of is to carefully check all the scripts on your site and make sure they are working as expected. CloudFlare has a simple setting where you can specify the cache settings as well, so there's another detail on your list covered.
Good luck!

Azure scaling with Orchard cms

I was reading through these questions:
Scaling Orchard with Azure Web Sites
Orchard CMS Performance
How to deploy Orchard CMS in Windows Azure?
I started to think about an e-commerce project I am undertaking and would like to clarify a few things if possible.
Please forgive me because I am finding it very difficult to articulate this question in a way I feel I have clearly communicated what I am thinking.
Firstly, what factors and when would those factors kick in for me to start thinking about scaling to handle the traffic of my web site. The type of factors I am aware of would include:
Session handling
Caching
I am thinking the amount of data being served in a request but not sure on the full implications of request size
Secondly, with all things there should be a certain level of up-front planning when trying to set up a web site that can handle traffic of certain levels. Would the Azure scaling need to be done upfront or is it a simple matter to make it work now for what is needed and then up-scale at a later date when it is necessary?
Let me give a real life scenario to try aid where my fear is:
A radio broadcast was put out for a certain web site trying to sell
their wares. The web site was not planned very well. The web site
started to receive visits from people listening to the radio show. So
many visitors that the web site was not able to handle the traffic and
an error message was displayed telling the world that they should
'talk to the administrator' or words to that effect. You know the
picture I am sure and I am also very certain it would be embarrassing
for any web developer to be told that this was happening to a web site
they had designed.
I would really like to really be able to distil a proper question out of this, but there are many things that I am just not aware of. To try an make this question less vague I will try to summarise what I would like to achieve:
I want to have a web site that is able to handle a lot of traffic following successful advertising/marketing campaigns. I want to walk the tightrope of budget versus functionality, which is why I would like to be able to do the least amount possible to start with and be able to easily up-scale as demand dictates.
Bearing this in mind, what approach/considerations should I take to avoid nasty pitfalls with performance/availability/reliability when using an Orchard CMS/Azure combination to deliver my project?
Orchard on Azure Web Sites is working great for us, see http://nublr.pt
A few things to bear in mind with the site configuration are:
follow the guidelines in http://docs.orchardproject.net/Documentation/Optimizing-Performance-of-Orchard-with-Shared-Hosting
set up caching (module Contrib.Cache available in the gallery) which will use IIS's application cache.
set up the Warmup feature to keep the site alive,
also ensure that dynamic compilation is off by using the Config/HostComponents.config
We are currently in "shared" mode of azure web sites, we don't have much traffic yet, but out load testing with https://loadimpact.com has not taken the site down once. at any time we can move to the "reserved" mode (it does take up to 24h for it to happen)
Version 1.6 will bring a lot of improvements to Orchard, try to get started with your development in it.
Hope this has helped.

number of people that has visited the website

I am working on a website i will like to know the number of people who has visited the website. Can someone tell me what to do?
Use google analytics: http://www.google.com/analytics/
I would give you a code to insert but to be honest the best option is to use something like Google Analytics. It gives you a very good analysis of your website visits and has many features that will take you a very long time to develop
Since you've tagged this with asp.net, I presume you're running on IIS. Make sure logging is enabled for the site you're working with and then you can determine from the log files how many users are coming to your site by IP addresses.
Since it wasn't yet mentioned here in years, let me add that AWStats is very different from Google Analytics, but may anyway be a good web server traffic analysis tool for network administrators.

Difference between Ad company statistics, Google Analytics and Awstats on adult sites

I have this problem. I have web page with adult content and for several past months i had PPC advertisement on it. And I've noticed a big difference between Ad company statistics of my page, Google Analytics data and Awstats data on my server.
For example, Ad company tells me, that i have 10K pageviews per day, Google Analytics tells me, that i have 15K pageviews and on Awstats it's around 13K pageviews. Which system should I trust? Should i write my own (and reinvent a wheel again)? If so, how? :)
The joke is, that i have another web page, with "normal" content (MMORPG fan site) and those numbers are +- equal in all three systems (ad company, GA, Awstats). Do you think it's because it's not adult oriented page?
And final question, that is totally offtopic, do you know about Ad company that pays per impression and don't mind adult sites?
Thanks for the answers!
First, you should make sure not to mix up »hits«, »files«, »visits« and »unique visits«. They all have a different meaning and are sometimes called differently. I recommend you to look up some definitions if you are confused about the terms.
awstats has probably the most correct statistics, because it has access to the access.log from the web server. Unfortunately, a cached site (maybe cached by the browser, a proxy from an ISP or your own caching server) might not produce a hit on the web server. Especially if your site is served with good caching hints which don't enforce a revalidation and you are running your own web cache (e.g. Squid) in front of your site, the number will be considerable lower, because it only measures the work of the web server.
On the other hand, Google Analytics is only able to count requests from users which haven't blocked Google Analytics and have JavaScript enabled (but they will count pages served by a web cache). So, this count can be influenced by the user, but isn't affected by web caches.
The ad-company is probably simply counting the number of requests which they get from your site (probably based on their access.log). So, to get counted there, the add must not be cached and must not be blocked by the user.
So, as you can see, it's not that easy to get a single correct value. But as long as you use the measured values in comparison to those from the previous months, you should get at least a (nearly) correct rate of growth.
And your porn site probably serves a high amount of static content (e.g. images from the disk) and most of the web servers are really good at serving caching hints automatically for static files. Your MMORPG on the other hand, might mostly consist of some dynamic scripts (PHP?) which don't send any caching hints at all and web servers aren't able to determine those caching headers for dynamic content automatically. That's at least my explanation, without knowing your application and server configuration :)

Resources