How tracking of the web traffic source works? - web

May be a stupid question, but I can't find any answer to this question on the web.
In Google analytics it is possible to check the origin a connection to our website. My question, how Google can track the origin of those connections?

If there is info in document.referer (for the javascript tracker, with the measurement protocol you'd have to pass a referer as parameter) Google identifies the source as referrer, unless it is configured (in the defaults or per custom settings) as a search engine (which is really just a referrer with a known search parameter). Also via the settings you can exclude urls from the referrer reports so they will appear as direct traffic.
If there are campaign parameters Google uses those (or else a Google click id (gclid) from autotagging in adwords, which serves a similar purpose). If campaign parameters or gclid are stripped out (e.g. by redirects) adwords ad clicks will be reported as organic search.
If there is no referrer and no campaign parameters/gclid (i.e. a direct type in or a bookmark) Google will identify the source as a direct hit, unless you have clicked an adwords ad before. In that case the aquisition report will report the source as CPC (click per cost) in the acquisition report (as Google puts it, they will use the last known marketing channel as source. Direct is not a marketing channel according to Google). However the multichannel reports will identify those more correctly as direct visits (which is why multichannel and acquisition reports usually do not quite match).

Related

Prevent Microsoft Safe Links Scanning

So today a client of mine sent out a marketing newsletter to around 140k clients that included a link to our web app.
What happened next was my web app experienced a flood of traffic (over 9000 requests in 15 minutes) from Microsoft-owned IP addresses in the range 40.94././ requesting that specific page on my site. This took the app down for all my clients until I managed to restart it.
It seems like the scan took place regardless of whether a user clicked on the link or not, as there are no other IP addresses in the request logs for the same url during this period.
So my question is, was this Microsoft pre-emptively scanning that link as it was delivered to newsletter subscribers? Or does the scan only happen when the link is clicked - I've found conflicting information on this, and as mentioned I see no other IP address requests during this period.
And secondly, how can I stop this from happening in the future - is my only option to blacklist this IP range, or are there other strategies?
So for anyone struggling with something similar I can confirm that Microsoft pre-emptively scans the links inside a received email before it lands in the recipients inbox.
The effect of this is that if a huge newsletter is sent to hundreds of thousands of recipients, Microsoft effectively triggers a wave of traffic to your server.
It would appear the only solution is to black-list their range of IPs, or ensure you have some throttling mechanism in place.
One of the solution as mentioned in the other answer is to block the range of ip addresses that belong to Microsoft Safelink in order to prevent the scans from accessing the website.
Other solution might be to use JS Challenge such as this available in Cloudflare. With such a solution each user has to go through a website that first verifies if he/she is using a real browser and only if that is the case he/she is redirected to the target website.
Such a JS Challenge can be enabled only for those accessing website from links in the email so that anyone using browser to directly access a website won't be affected

custom title and description of physical web notification

Reply from : https://github.com/google/physical-web/issues/595
For example, I am transmitting www.starbucks.com
http://www.starbucks.com as the URL.
My phone looks for physical web pages and say it detects www.starbucks.com
and shows it to me in my physical web present in my chrome.
As a user, this is how it will appear to me presently
» Now this does not convey much information to me.
» The text "Order while you wait" has been taken from the metadata
description of the page( as far as I know) and the title "Starbucks" *has
been taken from the *title tag.
Now, say if I can custom define these parameters, for example like this
Here, I custom defined the text of the same starbucks URL that my phone's
physical web scanned for.
This adds for relevancy to the URL. A user gets a clear message. Also, it
allows the stores to convey an effective contextual message.
This is possible when you use ReactJS and JSX?, because only you have one HTML file and always show the title default that is in this html, even if you change it with document.title = "other title" in the notification show the first and not the new title
The text shown in the Physical Web notification is strictly given by the target website and you can influence it only there.
The Chrome is actually not analyzing the target website. Its a Google server (Physical Web Service) that analysis it and this one provides information to Chrome. You seem to need changing the title instantly and often. So be careful about caching of already resolved webs on the server.
The website analysis does not execute any Javascript. It takes only what is written in HTML directly. So the trick with document.title wont work.
But there is a different way how to get the notifications. Look at the Google Nearby Notifications. In summary this works based on Eddystone-UID. You register your UID with the service and configure to redirect to target website. But in the configuration you can specify the title and description. Look at the mentioned page for the details.

How does Mixpanel's Search Keyword work?

I'm curious on how Mixpanel tracks which Search Keywords an event is affiliated with. Is this from the organic search (vs. paid search ads)?
If yes, how did they do it? From a glance, I guess organic search works this way:
That link goes to a proxy link with some query parameters which contain info about the (encrypted) search term & the real destination link.
Redirect to the real destination link.
Google Analytics know the organic search keyword used on a session because they intercept it in the middle point. I'm not sure if there's any way for someone outside of Google to intercept that info (including Mixpanel). Right? (correct me if I'm wrong)
If there is a way for the destination website to know the organic search keyword, can I be enlightened on the method?
I don't think this is coming from organic search or paid ads due to a couple reasons:
Most of the organic traffic is now in HTTPS which makes it hard to get the search parameters. Google Analytics shows this data through the Webmaster Tools console which is able to grab keyword data in a different way (I assume through the Google backend and not the URL itself). Otherwise, you are stuck with the "Not Provided" issue in Google Analytics.
Mixpanel only captures the default UTM parameters: utm_campaign, utm_source, utm_keyword, utm_medium and utm_content. Mixpanel also calls this properties as expected: UTM Medium, UTM Source, etc.
I can't tell from your screenshot but it seems this might be a custom property that your Mixpanel setup is setting it, perhaps from an internal search engine? Or perhaps you're grabbing a custom URL query?
Can you provide more information as to how this event is being captured?

How to Track Connections On a Server Accurately

I have a dedicated server hosting a website of mine and I have about 10% of my traffic unaccounted for.
The path of the clicks is as follows
Ads on Facebok ==> My Website
I have Google Analytics script on My Website (on the bottom) and it should fire off whenever a person lands on the page.
The problems is that if I have 4000 clicks on the Ad (tracked by Facebook), Google Analytics tells me I have about 3600 people landing on My Website.
I also invested in real-time tracking software like Clicky and it gives me similar results to Google Analytics. (just in case GA is not accurate)
So I have narrowed it down to 3 scenarios:
1) The Ad clicks aren't being tracked properly by Facebook (I have made sure this is not the problem)
2) The page is taking too long for some people to load and they are hitting the back button before Google Analytics can be triggered.
3) Some connection are dropping from the Ad to My Website.
Can anyone recommend a way I can make sure 2 and 3 aren't happening? and if they are
how would I fix them.
I'm going to make an assumption that you're using Apache. It should be possible to parse the Apache logs to extract connections from unique IP addresses. Hopefully the URL requested by the client will contain some sort of path indicating that it was directed from Facebook.
Link:
Get unique visitors from apache log file

Why does Google Analytics show less visits than One&One stats?

Comparing google analytics results to one&one hosting monthly statics shows a huge discrepancy.
For last month:
Google shows 1046 visits.
One&one stats show 15304 unique visits.
The google code is in the footer which appears on every page.
I'm aware ga only works with js enabled but to assume that many non js users???
Google Analytics is a good indicator of how many humans are visiting your website.
Here are some things to check:
how many bots are in your monthly stats? You can usually find something that says User-Agent in your stats page. GoogleBot, Slurp, msnbot & others will be visiting every page on your site.
that you've read Google Analytics' definition of a visit.
that you have read what your statistics provider means by unique visit. Does that mean unique visitor, page view or something else?
Raw hits on servers can be misleading for a number of reasons..
If you have external style sheets & JavaScript etc, they could be counted as a hit in the webserver log
RSS feed readers will periodically update without being asked to by a human
Check the page views in Google Analytics - it's possible that 1&1 is tracking unique page views instead of the actual visits.
Google Analytics works for almost all users (I believe less than 5% have JS disabled). I have had the same discrepancy, in my case the difference was zeroed out when I took into account the bots (which server-side statistics often take into account, as they produce http-requests). You probably have the same "problem".
Neither stats are wrong, they just count different things. Google Analytics is the more "accurate", i.e. the numbers you want to take a look at. The hosting stats, which look only at http requests, often without filtering, are less interesting.
Blogger, and probably other sites, serve a different page template or skin to mobile visitors. In my case, that template didn't contain the google analytics snippet of code and so those hits were uncounted, until I noticed and fixed it.

Resources