How do I track keywords entered in Google Search using Google Tag Manager? - search

I want to know what keywords brings users to our website. The result should be such that, every time a user clicks on a link of the company's website, the page URL, timestamp and keywords entered in search are recorded.
I'm not really much of a coder, but I do understand the basics of Google Tag Manager. So I'd appreciate some solutions that can allow me to implement this in GTM's interface itself.
Thanks!

You don't track them. Well, that is unless you can deploy your GTM on Google's search result pages. Which you're extremely unlikely to be able to do.
HTTPS prevents query parameters to get populated for referrers, which is what the core reason for it is.
You still can, technically, track Google search keywords for the extremely rare users who manage to use Google via http, but again, no need to do anything in GTM. GA will automatically track it with its legacy keyword tracking.
Finally, you can use Google Search Console where Google reports what keywords were used to get to your site. That information, however, is so heavily sampled that it's just not joinable to any of the GA data. It is possible, however, to join GSC with GA, but that will only lead to GA having a separate report from GSC and that's it. No real data joins.

Related

"Sandbox" Google Analytics for security

By including Google Analytics in a website (specifically the Javascript version) isn't it true that you are giving Google complete access to all your cookies and site information? (ie. it could be a security hole).
Can this be mitigated by putting Google in an iFrame that is sandboxed? Or maybe only passing Google the necessary information (ie. browser type, screen resolution, etc)?
How can someone get the most out of Google Analytics without leaving the entire site open?
Or perhaps passing the data through my own server and then uploading it to Google?
You can create a scriptless implementation via the measurement protocol (for Universal Analytics enabled properties). This not only avoids any security issues with the script (although I'd rather trust Google on that), it also means you have more control what data is submitted to the Google Server.
A script run on your site can read cookies on your site, yes. And that data can be sent back to google, yes. That is why you shouldn't store sensitive information in cookies. You shouldn't do this even if you don't use google analytics. Even if you don't use ANY other code except your own. Browsers and browser addons can also read that stuff and you definitely cannot control that. Again, never store sensitive information in cookies.
As far as access to "site information".. javascript can be used to read the content on your pages, know urls of pages, etc.. IOW anything you serve up on a web page. Anything that is not behind a wall (e.g. login barrier) is surely up for grabs. But crawlers will look at that stuff anyway. Stuff behind walls can still be grabbed automatically, depending on what they have to actually do to get past those walls (e.g. simple registration/login barriers are pretty easy to get past).
This is also why you should never display sensitive information even in content of your site. E.g. credit card numbers, passwords, etc.. that's why virtually every site you go to that has even remotely sensitive information always shows a mask (e.g. ** ) instead of actual values.
Google Analytics does not actively do these things, but you're right: there's nothing stopping them from doing it, and you've already given them the right to do it by using their script.
And you are right: the safest way to control what Google can actually see is to send server-side requests to them. And also put all your content behind barriers that cannot be easily crawled or scraped. The strongest barrier being one that involves having to pay for access. People are ingenious about making bots about making crawlers and bots to get past all sorts of forms and "human" checks etc.. and you're fighting a losing battle on that count, but nothing stops a bot faster than requiring someone to give you money to access your stuff. Of course, this also means you'd have to make everybody pay for access...
Anyways.. if you're that paranoid about this stuff, why use GA at all? Use something you host yourself (e.g. Piwik). This won't solve for crawlers/bots, obviously, but it will solve for worries about GA grabbing more than you want it to.

Search engine components

I'm a middle school student learning computer programming, and I just have some questions about search engines like Google and Yahoo.
As far as I know, these search engines consist of:
Search algorithm & code
(Example: search.py file that accepts search query from the web interface and returns the search results)
Web interface for querying and showing result
Web crawler
What I am confused about is the Web crawler part.
Do Google's and Yahoo's Web crawlers immediately search through every single webpage existing on WWW? Or do they:
First download all the existing webpages on WWW, save them on their huge server, and then search through these saved pages??
If the latter is the case, then wouldn't the search results appearing on the google search results be outdated, Since I suppose searching through all the webpages on WWW will take tremendous amount of time??
PS. One more question: Actually.. How exactly does a web crawler retrieve all the web pages existing on WWW? For example, does it search through all the possible web addresses, like www.a.com, www.b.com, www.c.com, and so on...? (although I know this can't be true)
Or is there some way to get access to all the existing webpages on world wide web?? (sorry for asking such a silly question..)
Thanks!!
The crawlers search through pages, download them and save (parts of them) for later processing. So yes, you are right that the results that search engines return can easily be outdated. And a couple of years ago they really were quite outdated. Only relatively recently Google and others started to do more realtime searching by collaborating with large content providers (such as Twitter) to get data from them directly and frequently but they took the realtime search again offline in July 2011. Otherwise they for example take notice how often a web page changes so they know which ones to crawl more often than others. And they have special systems for it, such as the Caffeine web indexing system. See also their blogpost Giving you fresher, more recent search results.
So what happens is:
Crawlers retrieve pages
Backend servers process them
Parse text, tokenize it, index it for full text search
Extract links
Extract metadata such as schema.org for rich snippets
Later they do additional computation based on the extracted data, such as
Page rank computation
In parallel they can be doing lots of other stuff such as
Entity extraction for Knowledge graph information
Discovering what pages to crawl happens simply by starting with a page and then its following links to other pages and following their links, etc. In addition to that, they have other ways of learning about new web sites - for example if people use their public DNS server, they will learn about pages that they visit. Sharing links on G+, Twitter, etc.
There is no way of knowing what all the existing web pages are. There may be some that are not linked from anywhere and noone publicly shares a link to them (and doesn't use their DNS, etc.) so they have no way of knowing what these pages are. Then there's the problem of the Deep Web. Hope this helps.
Crawling is not an easy task (for example Yahoo is now outsourcing crawling via Microsoft's Bing). You can read more about it in Page's and Brin's own paper: The Anatomy of a Large-Scale Hypertextual Web Search Engine
More details about storage, architecture, etc. you can find for example on the High Scalability website: http://highscalability.com/google-architecture

What are the pros & cons of using Google CSE vs implementing dedicated search engine for a site like stackoverflow?

I understand that same work should not be repeated when Google CSE is already there, so what may be the reasons to should consider implementing a dedicated search engine for a public facing website similar to SO(& why probably StackOverflow did that ?). Paid version of CSE(Google site Search), already eliminates several drawbacks that forced dedicated implementation. Cost may be one reason to not choose Google CSE, but what are other reasons ?
Another thing I want to ask is my site is similar kind as StackOverflow, so when Google indexes its content every now & then, won't that overload my database servers with lots of queries may be when there is peak traffic time?
I look forward to use Google Custom search API but I need to clarify whether the 1000 paid queries that I get for 5$ are valid only for 1 day or they get adjusted to extra queries(beyond free ones) on the next day & so on. Can anyone clarify on this too?
This depends on the content of your site, the frequency of the updates, and the kind of search you want to provide.
For example, with StackOverflow, there'd probably be no way to search for questions of an individual user through Google, but it can be done with an internal search engine easily.
Similarly, Google can outdate their API at any time; in fact, if past experience is any indication, Google has already done so with their Google Web Search API, where a lot of non-profits that had projects based on such API were left on the street with no Google options for continuation of their services (paying 100 USD/year for only 20'000 search queries per year, may be fine for a posh blog indeed, but greatly limits what you can actually use the search API for).
On the other hand, you probably already want to have Google index all of your pages, to get the organic search traffic, so Google CSE would probably use rather minimal resources of your server, compared to having a complete in-house search engine.
Now that Google Site Search is gone, the best search tool alternative for all the loyal Google fans is Google Custom Search (CSE)
Some of the features of Google Custom Search that I loved the most, were :-
Its free (with ads)
Ability to monetise those ads with your AdSense Account
Tons of Customization options, including removing the Google branding,
Ability to link it with Google Analytics account, for highly comprehensive analytical report,
Powerful auto correct feature to understand the real intention behind the typos,
Cons : Lacks customer Support…
Read More: https://www.techrbun.com/2019/05/google-custom-search-features.html

Use Google Analytics for data to display on our webpage?

On some of our pages, we display some statistics like number of times that page has been viewed today, number of times it's been viewed the past week, etc. Additionally, we have an overall statistics page where we list the pages, in order, that have been viewed the most.
Today, we just insert these pageviews and event counts into our database as they happen. We also send them to Google Analytics via normal page tracking and their API. Ideally, instead of querying our database for these stats to display on our webpages, we just query Google Analytics' API. Google Analytics does a FAR better job figuring out who the real uniques are and avoids counting people who artificially inflate their pageview counts (we allow people to create pages on our site).
So the question is if it's possible to use Google Analytics' API for updating the statistics on our webpages? If I cache the results is it more feasible? Or just occasionally update our stats? I absolutely love Google Analytics for our site metrics, but maybe there's a better solution for this particular need?
So the question is if it's possible to use Google Analytics' API for updating the statistics on our webpages?
Yes, it is. But, the authentication process and xml return may slow things up. You can speed it up by limiting the rows/columns returned. Also, authentication for the way you want to display the data (if I understood you correctly) would require you to use the client authentication method. You send the username and password. Security is an issue.
I have done exactly what you described but had to put a loading graphic on the page for the stats.
If I cache the results is it more feasible? Or just occasionally update our stats?
Either one but caching seems like it would work especially since GA data is not real-time data anyway. You could make the api call and store (or process then store) the returned xml for display later.
I haven't done this but I think I might give it a go. Could even run as a scheduled job.
I absolutely love Google Analytics for our site metrics, but maybe there's a better solution for this particular need?
There are some third-party solutions (googling should root them out) but money and feasibility should be considered.

Statistics on documents downloaded from emails

I usually send out emails with links to document downloads. Is there any chance of registering statistics on those using Google Analytics?
Javascript and Google Analytics doesn't work in HTML emails, so you can't measure the clicks on those directly.
If you use an email service provider, like Mailchimp or one of the many others, they will provide click-tracking at not additional cost, so you could consider that. Most ESPs will have a free or freemium plan you can try.
Otherwise, you can do the following:
Route all clicks through a script that counts the clicks. Pros: accurate count. Cons: custom development, re-inventing the wheel.
Direct all the clicks to a landing page that has Google Analytics implemented. Pros: easy implementation, no custom development. Cons: extra click will lead to some dropoff.
I would recommend an email service provider. Besides the click-tracking, it will give you much better deliverability, testing tools, and reporting options.
You could try using an intermediary page that sends off the GA Tracking Beacon before redirecting to the download. This could either be done using JavaScript that works in a similar way to outbound link tracking or using a meta-refresh. Either way you would want to put a message on the page stating that the download will start shortly and a redirect link in case something goes wrong.

Resources