How to Track Connections On a Server Accurately - linux

I have a dedicated server hosting a website of mine and I have about 10% of my traffic unaccounted for.
The path of the clicks is as follows
Ads on Facebok ==> My Website
I have Google Analytics script on My Website (on the bottom) and it should fire off whenever a person lands on the page.
The problems is that if I have 4000 clicks on the Ad (tracked by Facebook), Google Analytics tells me I have about 3600 people landing on My Website.
I also invested in real-time tracking software like Clicky and it gives me similar results to Google Analytics. (just in case GA is not accurate)
So I have narrowed it down to 3 scenarios:
1) The Ad clicks aren't being tracked properly by Facebook (I have made sure this is not the problem)
2) The page is taking too long for some people to load and they are hitting the back button before Google Analytics can be triggered.
3) Some connection are dropping from the Ad to My Website.
Can anyone recommend a way I can make sure 2 and 3 aren't happening? and if they are
how would I fix them.

I'm going to make an assumption that you're using Apache. It should be possible to parse the Apache logs to extract connections from unique IP addresses. Hopefully the URL requested by the client will contain some sort of path indicating that it was directed from Facebook.
Link:
Get unique visitors from apache log file

Related

Can you change the phone number on a website based on the source of the traffic?

We are going to start running various ad campaigns and want to track which calls come from which source but our landing page is the same main website. Is there a way to change the phone number displayed on the page based on the traffic source a visitor comes from?
You can e.g. do with GeoIP from https://www.maxmind.com

Prevent Microsoft Safe Links Scanning

So today a client of mine sent out a marketing newsletter to around 140k clients that included a link to our web app.
What happened next was my web app experienced a flood of traffic (over 9000 requests in 15 minutes) from Microsoft-owned IP addresses in the range 40.94././ requesting that specific page on my site. This took the app down for all my clients until I managed to restart it.
It seems like the scan took place regardless of whether a user clicked on the link or not, as there are no other IP addresses in the request logs for the same url during this period.
So my question is, was this Microsoft pre-emptively scanning that link as it was delivered to newsletter subscribers? Or does the scan only happen when the link is clicked - I've found conflicting information on this, and as mentioned I see no other IP address requests during this period.
And secondly, how can I stop this from happening in the future - is my only option to blacklist this IP range, or are there other strategies?
So for anyone struggling with something similar I can confirm that Microsoft pre-emptively scans the links inside a received email before it lands in the recipients inbox.
The effect of this is that if a huge newsletter is sent to hundreds of thousands of recipients, Microsoft effectively triggers a wave of traffic to your server.
It would appear the only solution is to black-list their range of IPs, or ensure you have some throttling mechanism in place.
One of the solution as mentioned in the other answer is to block the range of ip addresses that belong to Microsoft Safelink in order to prevent the scans from accessing the website.
Other solution might be to use JS Challenge such as this available in Cloudflare. With such a solution each user has to go through a website that first verifies if he/she is using a real browser and only if that is the case he/she is redirected to the target website.
Such a JS Challenge can be enabled only for those accessing website from links in the email so that anyone using browser to directly access a website won't be affected

Blacklisting on Google App Engine - users or devices (and not just IP addresses)

I have couple Android apps on PlayStore, which use In-App purchases. I use Google App Engine for my backend. I see some users calling the APIs abnormally/repeatedly (may be to reverse engineer or hack?). I can figure out the IP address, Gmail ID, etc. How to prevent these people from accessing my API?
One suggestion is to use dos.xml
But these morons seem to constantly change the IP addresses, so it is painful to keep updating this list.
Is there a way in App Engine to black list users? or computers/devices?
If we know the google(Gmail) Ids of these ba*t*r*s, how/where do we report those? This page seems to be the right place to start, but it is not clear where to send email.
This page seems be more appropriate for vulnerabilities, but this is not such a case.
"Viewing top users in the Administration Console" section in DoS page says I should see a table of IP addresses which are using the API frequently. But I dont see such table in Admin console. Do I need to be a paid (Google App Engine) user?
Any help is greatly appreciated.
Yes, GAE allows for a blacklist, via dos.xml (dos.yaml for Python or PHP). If you don't want to have to keep updating the IP addresses, you may just have to check the user id, and serve them some message. But, that requires actually servicing the request, to check the id, etc. So, if it is a true DOS attack, it will succeed, as you have to still service the request. Using dos.xml cuts that off at the backend, so would be the best way to go.
I suggest a script to log the IP addresses in real time for those you want to ban, to make updating dos.xml less painful.

How tracking of the web traffic source works?

May be a stupid question, but I can't find any answer to this question on the web.
In Google analytics it is possible to check the origin a connection to our website. My question, how Google can track the origin of those connections?
If there is info in document.referer (for the javascript tracker, with the measurement protocol you'd have to pass a referer as parameter) Google identifies the source as referrer, unless it is configured (in the defaults or per custom settings) as a search engine (which is really just a referrer with a known search parameter). Also via the settings you can exclude urls from the referrer reports so they will appear as direct traffic.
If there are campaign parameters Google uses those (or else a Google click id (gclid) from autotagging in adwords, which serves a similar purpose). If campaign parameters or gclid are stripped out (e.g. by redirects) adwords ad clicks will be reported as organic search.
If there is no referrer and no campaign parameters/gclid (i.e. a direct type in or a bookmark) Google will identify the source as a direct hit, unless you have clicked an adwords ad before. In that case the aquisition report will report the source as CPC (click per cost) in the acquisition report (as Google puts it, they will use the last known marketing channel as source. Direct is not a marketing channel according to Google). However the multichannel reports will identify those more correctly as direct visits (which is why multichannel and acquisition reports usually do not quite match).

Why does Google Analytics show less visits than One&One stats?

Comparing google analytics results to one&one hosting monthly statics shows a huge discrepancy.
For last month:
Google shows 1046 visits.
One&one stats show 15304 unique visits.
The google code is in the footer which appears on every page.
I'm aware ga only works with js enabled but to assume that many non js users???
Google Analytics is a good indicator of how many humans are visiting your website.
Here are some things to check:
how many bots are in your monthly stats? You can usually find something that says User-Agent in your stats page. GoogleBot, Slurp, msnbot & others will be visiting every page on your site.
that you've read Google Analytics' definition of a visit.
that you have read what your statistics provider means by unique visit. Does that mean unique visitor, page view or something else?
Raw hits on servers can be misleading for a number of reasons..
If you have external style sheets & JavaScript etc, they could be counted as a hit in the webserver log
RSS feed readers will periodically update without being asked to by a human
Check the page views in Google Analytics - it's possible that 1&1 is tracking unique page views instead of the actual visits.
Google Analytics works for almost all users (I believe less than 5% have JS disabled). I have had the same discrepancy, in my case the difference was zeroed out when I took into account the bots (which server-side statistics often take into account, as they produce http-requests). You probably have the same "problem".
Neither stats are wrong, they just count different things. Google Analytics is the more "accurate", i.e. the numbers you want to take a look at. The hosting stats, which look only at http requests, often without filtering, are less interesting.
Blogger, and probably other sites, serve a different page template or skin to mobile visitors. In my case, that template didn't contain the google analytics snippet of code and so those hits were uncounted, until I noticed and fixed it.

Resources