Why does Google Analytics show less visits than One&One stats? - statistics

Comparing google analytics results to one&one hosting monthly statics shows a huge discrepancy.
For last month:
Google shows 1046 visits.
One&one stats show 15304 unique visits.
The google code is in the footer which appears on every page.
I'm aware ga only works with js enabled but to assume that many non js users???

Google Analytics is a good indicator of how many humans are visiting your website.
Here are some things to check:
how many bots are in your monthly stats? You can usually find something that says User-Agent in your stats page. GoogleBot, Slurp, msnbot & others will be visiting every page on your site.
that you've read Google Analytics' definition of a visit.
that you have read what your statistics provider means by unique visit. Does that mean unique visitor, page view or something else?
Raw hits on servers can be misleading for a number of reasons..
If you have external style sheets & JavaScript etc, they could be counted as a hit in the webserver log
RSS feed readers will periodically update without being asked to by a human

Check the page views in Google Analytics - it's possible that 1&1 is tracking unique page views instead of the actual visits.

Google Analytics works for almost all users (I believe less than 5% have JS disabled). I have had the same discrepancy, in my case the difference was zeroed out when I took into account the bots (which server-side statistics often take into account, as they produce http-requests). You probably have the same "problem".
Neither stats are wrong, they just count different things. Google Analytics is the more "accurate", i.e. the numbers you want to take a look at. The hosting stats, which look only at http requests, often without filtering, are less interesting.

Blogger, and probably other sites, serve a different page template or skin to mobile visitors. In my case, that template didn't contain the google analytics snippet of code and so those hits were uncounted, until I noticed and fixed it.

Related

Get Google Search Results Content

I want to get or buy google search results content (structured) from Google itself or any other source that can sell google data legally. I want all results about a specific keyword for the recent 6 months for example.
It will be a good turnaround if I can only get the page content as a raw text for this stage.
Automatic reading out / scraping of Google SERP is against Google ToS. From this point of view there is no one who sells such data legally - any seller violates Googles ToS.
Tere are many offers on markt, where you can get SERP data as JSON or full HTML through API access - just google for it.
The way every seller does SERP scraping is always the same - you can do it by your own. Run many proxies with IP addresses of countries, from where you need SERPs, and query Google with a kind of headless browser. Use captcha solving services to get data even if IP should be banned. Multithread your queries to get more data at once. Thats the whole magic.

Suspiciously high number of web visits from "exotic" countries

I set up a small business website which is only displaying informations about the offered services and some contact informations. It is not interactive at all and no user is enabled to submit any data.
We are now monitoring the visits and pis with the tools offered by google. Since the first days after the going public we are observing a lot of ips from places in the world we have absolutely no relation to (like Russia, China, Brazil, even some african states...). Also the overall number of visits is much higher than we expected.
Now I'm wondering where these "exotic" visitors may come from. And if this is some kind of attack we should be aware of and protect somehow. Does anybody know what might be happening here?
This is a common situation, Websites with the default Google Analytics tracking code like UA-XXXXXXX-1 have been receiving attacks from what is known as "Ghost referrals". These ghosts are often coming from Russia through different sources such as forum.topic59010277.darodar.com, humanorightswatch.org, o-o-6-o-o.com and s.click.aliexpress.com.
Most recently I have noticed another source simple-share-buttons.com coming from different countries like USA, China, Finland, Singapore and Argentina.
They distort metrics like bounce rate and session duration. Google might deliver a solution soon, meanwhile you can use view-filters to block them from appearing in your GA reports.
Create a filter that only excludes ghosts from your view. Go to your view and set up the Filter as follow:
Filter type: Custom
Exclude
Filter Field: Referral
Filter patter use the following regex:
.*spammer1.tld|.*spammer2.tld|.*spammer3.tld|.*spammer4.tld
Check the tld (com, net, co, etc) of the spammer* and change it accordantly inside the regex. *Find the list of spammers in Google Analytics in the Acquisition>All Traffic>Referrals report (You will need to monitor this section just in case new spammers arrive)
Your domain may be a reason - if it had been used on another site. Or someone used it early. Look at backlinks for your domain. It`s only my humble opinion.

How to Track Connections On a Server Accurately

I have a dedicated server hosting a website of mine and I have about 10% of my traffic unaccounted for.
The path of the clicks is as follows
Ads on Facebok ==> My Website
I have Google Analytics script on My Website (on the bottom) and it should fire off whenever a person lands on the page.
The problems is that if I have 4000 clicks on the Ad (tracked by Facebook), Google Analytics tells me I have about 3600 people landing on My Website.
I also invested in real-time tracking software like Clicky and it gives me similar results to Google Analytics. (just in case GA is not accurate)
So I have narrowed it down to 3 scenarios:
1) The Ad clicks aren't being tracked properly by Facebook (I have made sure this is not the problem)
2) The page is taking too long for some people to load and they are hitting the back button before Google Analytics can be triggered.
3) Some connection are dropping from the Ad to My Website.
Can anyone recommend a way I can make sure 2 and 3 aren't happening? and if they are
how would I fix them.
I'm going to make an assumption that you're using Apache. It should be possible to parse the Apache logs to extract connections from unique IP addresses. Hopefully the URL requested by the client will contain some sort of path indicating that it was directed from Facebook.
Link:
Get unique visitors from apache log file

How fast does Google take to crawl new page, and can we influence Google's crawler?

I want to submit my site to Google. How much time does it take to crawl a new post on the website?
Also, is there a way to feed this post to Google crawler as soon as a post is created?
Google has three modes of entering a website into its results - discover, crawl, index.
In order to 'discover' your site, it must be made aware of it's existence - normally through back-links. If you're site is brand new you can use the submit URL form - but this isn't really a trusted method. You're better off signing up for a Google Webmaster Tools account and submitting your site. An additional step is to submit an XML sitemap of your site. If you are publishing to your site in a blogging/posting way - you can always consider PubSubHubbub.
From there on, crawl frequency is normally based on site popularity (as measured by ye olde PageRank). Depth of crawl (crawl-budget) is also determined by PR.
There are a couple ways to help "feed" the Google Crawler a URL.
The first way is to go here and submit a URL ---> www.google.com/webmasters/tools/submit-url/
The second way is to go to your Google Webmasters Tools and clicking "Fetch as GoogleBot"
And then inputting the URL you want to add:
http://i.stack.imgur.com/Q3Iva.png
The URL will then appear similar to this:
http:\\example.site Web Success URL submitted to index 1/22/12 2:51 AM
As for how long it takes for a question on here to appear on google, there are many factors that are put in to this.
If the owners of the site use Google Webmasters Tools, the following setting is available:
http://i.stack.imgur.com/RqvOi.png
For fast crawl you should submit your xml sitemap in google web master and manually crawled and index your web pages url through google webmaster fetch.
I also used google crawled and index method and after that this practices give me best result.
This is a great resource that really breaks down all the factors that affect a crawl budget and how to optimize your website to increase it. Cleaning up your broken links and removing outdated content, for example, can work wonders. https://prerender.io/crawl-budget-seo/ 
I acknowledged error in my response by adding a comment to original question a long time ago. Now, I am updating this post in interest of keeping future readers from being misguided as I was. Please see notes from other users below - they are correct. Google does not make use of the revisit-after meta tag. I am still keeping the original response text here to make sure that anyone else looking for similar answer will find it here along with this note confirming that this meta tag IS NOT VALID! Hope this helps someone.
You may use HTML meta tag as follows:
<meta name="revisit-after" content="1 day">
Adjust time period as necessary. There is no guarantee that robots will return in given time frame but this is how you are telling robots about how often a given page is likely to change.
The Revisit Meta Tag is used to tell search engines when to come back next.

How I do to block Web scraping without blocking Well behaved bots?

I'm building an e-commerce website with a large database of products. Of course, is nice when Goggle indexes all products of the website. But what if some competitor wants Web Scrape the website and get all images and product descriptions?
I was observing some websites with similar lists of products, and they place a CAPTCHA, so "only humans" can read the list of products. The drawback is... it is invisible for Google, Yahoo or another "Well behaved" bots.
You can discover the IP addresses the Google and others are using by checking visitor IPs with whois (in the command line or on a web site). Then, once you've accumulated a stash of legit search engines, allow them into your product list without the CAPTCHA.
If you're worried about competitors using your text or images, how about a watermark or customized text?
Let them take your images and you'd have your logo on their site!
Since a potential screen-scaping application can spoof the user agent and HTTP referrer (for images) in the header and use a time schedule that is similar to a human browser, it is not possible to completely stop professional scrapers. But you can check for these things nevertheless and prevent casual scraping.
I personally find Captchas annoying for anything other than signing up on a site.
One technique you could try is the "honey pot" method: it can be done either by mining log files are via some simple scripting.
The basic process is you build your own "blacklist" of scraper IPs based by looking for IP addresses which look at 2+ unrelated products in a very short period of time. Chances are these IPs belong to Machines. You can then do a reverse lookup on them to determine if they are nice (like GoogleBot or Slurp) or bad.
Block webscrapers is not easy, and it's even harder trying to avoid false positives.
Anyway you can add some netrange to a whitelist, and don't serve any captcha to them.
All those well known crawlers: Bing, Googlebot, Yahoo etc.. use always specific netranges when crawling, and all those IP addresses resolve to specific reverse lookups.
Few examples:
Google IP 66.249.65.32 resolves to crawl-66-249-65-32.googlebot.com
Bing IP 157.55.39.139 resolves to msnbot-157-55-39-139.search.msn.com
Yahoo IP 74.6.254.109 resolves to h049.crawl.yahoo.net
So let's say that '*.googlebot.com ', '*.search.msn.com ' and '*.crawl.yahoo.net ' addresses should be whitelisted.
There are plenty of white lists you can implement out on internet.
Said that, I don't believe Captcha is a solution against advanced scrapers, since services such as deathbycaptcha.com or 2captcha.com promise to solve any kind of captcha within seconds.
Please have a look into our wiki http://www.scrapesentry.com/scraping-wiki/ we wrote many articles on how to prevent, detect and block web-scrapers.
Perhaps I over-simplify, but if your concern is about server performance then providing an API would lessen the need for scrapers, and save you band/width processor time.
Other thoughts listed here:
http://blog.screen-scraper.com/2009/08/17/further-thoughts-on-hindering-screen-scraping/

Resources