how does google analytics calculate metrics like "average time spent"? - search

how services like google analytics calculate parameters like
"average time spent"
"number of users that came to the website via search Vs user that hit the url directly
etc.
I would imagine that google can easily record a HIT when someone clicks on a link in serach result. But after that how long and deep the user is brwosing that perticular website is out of tap...hmmmm ?

This question has some information. As mentioned in that question time should be calculated using an onUnload() event. When the js is loaded firstime the time may be recorded (in cookies)and then onUnload() the time spent is calculated and sent to Google for record.
The above question explains most of your question.

This thread states quite clearly that there is no unLoad() event: http://groups.google.com/group/analytics-help-troubleshoot/browse_thread/thread/d142572ddf1fa9dd/38dd640f949e9890?pli=1
Also, try going to GA and look for sessions with only 1 pageview - you will see the average page time is 0s, which proves the point.

Related

Using PhantomJS to print user activity

In the company where I work there are many agents making calls and entering information into our system. I would like to know if it is possible for a boss to take a screenshot of the user's screen at any time of the day to know what activity the user is doing.
I know this can be done with PhantomJS, but I think it's used to make a screenshot of an external website.
Thank you in advance for any response.
Actually what you are looking for is an application called TimeSnapper. It records your screen every few seconds, and it also categorizes active applications/urls so it can calculate a productivity score.

calculate time to response in lotus notes messages

We are working with Lotus Notes in a Help Desk team and would be useful to know how much time we take to respond a message. Is there some way to achieve this?
The problem with the suggested approaches, is that they only measure the amount of time until someone begins to compose a response -- not the time until the response is sent. This, you could determine by forcing the replies to automatically save so that an agent could go through later to compare the received of original messages with the sent of the replies.
HOWEVER! As an occasional user of support services, I really don't want support staff measured on how quickly they respond. That's not a true measure of customer perceived value. I would instead want my support staff measured on how long it is before the customer feels their problem has been solved. Quality of service comes about from taking the trouble to understand the customer's question and carefully address their needs -- not from churning out reply emails at the fastest possible pace, which is what the proposed measurement would encourage.
I don't want a tech scored higher for replying immediately with a stupid non-answer, then replying immediately to my irritated request that they read the question, with another non-answer, than a tech who takes twice as long to send the original reply, but that reply answers my question. See?
Try creating a field that will record timestamp when the mail was replied to, ie. when user clicks on reply button, set the field's value to current time, then do the calculation of time passed between #created and that timestamp.
You can use the software called http://timetoreply.com which measures how quickly companies respond to online enquiries. You can measure email reply time using this software and its free.
Yes of course there is. If you have a current mail template, then responding to a mail sets a flag in the document. That changes the #Modified timestamp of the document.
A view with a column that calculates #Modified - #Created could already be a good start. Of course you can add any level of complexity to this by using LotusScript or Formula agents to analyze your data or to add own flags, etc...

The New Google Takeout

Is there a way to include one's search history within Google Takeout?
https://www.google.com/takeout/
Takeout purports to let you download everything stored within your Google account.
As far as I can tell, no. It's an obvious and mysterious gap in service.
You can download your recent google searches via an rss feed.
https://www.google.com/history/?output=rss
You can add commands to the url. The max query is 1000. num is number of searches and start is how many back to draw from. Like so:
https://www.google.com/history/?output=rss&num=1000&start=4000
Unfortunately, it starts to become somewhat reduced (as in not actually all of your searches) after a few thousand. I have over 40,000 searches on google, but I can only go back 7000 on this rss feed. Bummer. This means we still donĀ“t have access to all our data that they have.
Please prove me wrong!
Today is the last day to delete your search history, as suggested by the EFF(https://www.eff.org/deeplinks/2012/02/how-remove-your-google-search-history-googles-new-privacy-policy-takes-effect), before the new google terms come into force, linking that history with all the other google products. So if you can't grab it today, delete it so as to partially anonymise it eventually, or be tentacularised.

Counts of web search hits [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have a set of search queries in the size of approx. 10 millions. The goal is to collect the number of hits returned by a search engine for all of them. For example, Google returns about 47,500,000 for the query "stackoverflow".
The problem is that:
1- Google API is limited to 100 query per day. This is far from being useful to my task since I would have to get lots of counts.
2- I used Bing API but it does not return an accurate number. Accureate in the sense of matching the number of hits shown in Bing UI. Has anyone came across this issue before?
3- Issuing search queries to a search engine and parsing the html is one solution but it results in CAPTCHA and does not scale to this number of queries.
All I care about is that the number of hits and I am open for any suggestion.
Well, I was really hoping that someone would answer this since this is something that I also was interested in finding out but since it doesn't look like anyone will I will throw in these suggestions.
You could set up a series of proxies that change their IP every 100 requests so that you can query google as seemingly different people (seems like a lot of work). Or you can download wikipedia and write something to parse the data there so that when you search a term you can see how many pages it falls in. Of course that is a much smaller dataset than the whole web but it should get you started. Another possible data source is the google n-grams data which you can download and parse to see how many books and pages the search terms fall in. Maybe a combination of these methods could boost the accuracy on any given search term.
Certainly none of these methods are as good as if you could just get the google page counts directly but understandably that is data they don't want to give out for free.
I see this is a very old question but I was trying to do the same thing which brought me here. I'll add some info and my progress to date:
Firstly, the reason you get an estimate that can change wildly is because search engines use probabilistic algorithms to calculate relevance. This means that during a query they do not need to examine all possible matches in order to calculate the top N hits by relevance with a fair degree of confidence. That means that when the search concludes, for a large result set, the search engine actually doesn't know the total number of hits. It has seen a representative sample though, and it can use some statistics about the terms used in your query to set an upper limit on the possible number of hits. That's why you only get an estimate for large result sets. Running the query in such a way that you got an exact count would be much more computationally intensive.
The best I've been able to achieve is to refine the estimate by tricking the search engine into looking at more results. To do this you need to go to page 2 of the results and then modify the 'first' parameter in the URL to go way higher. Doing this may allow you to find the end of the result set (this worked for me last year I'm sure although today it only worked up to the first few thousand). Even if it doesn't allow you to get to the end of the result set you will see that the estimate gets better as the query engine considers more hits.
I found Bing slightly easier to use in the above way - but I was still unable to get an exact count for the site I was considering. Google seems to be actively preventing this use of their engine which isn't that surprising. Bing also seems to hit limits although they looked more like defects.
For my use case I was able to get both search engines to fairly similar estimates (148k for Bing, 149k for Google) using the above technique. The highest hit count I was able to get from Google was 323 whereas Bing went up to 700 - both wildly inaccurate but not surprising since this is not their intended use of the product.
If you want to do it for your own site you can use the search engine's webmaster tools to view indexed page count. For other sites I think you'd need to use the search engine API (at some cost).

How does the "mark as read" system on webforums work?

I've wondered about this for some time now. I'm wondering webforums implement the option to highlight something you haven't read. How the forum knows.
Since most webforums have a function to show you all posts since your last visit, they must save the last time you visited one of their pages in your userdata in the database.
But that doesn't explain how individual topics are still highlighted after you've read just one.
A many to many table connecting a user to a topic/post with flags for read/favorite etc.
Many web forums store a huge list of the last time you looked at each topic you've looked at.
This gets out of hand quickly, but there are mitigations. See Determining unread items in a forum
Keeping track of what posts a visitor has read is of course not that much of a big deal. Since it's highly likely that the number of posts a visitor read will be much less than the posts not read. So, if you know what posts a visitor has read, you also know what posts this visitor didn't read. To make this less computational intensive you'd normally do this only over a certain period of time, say the last two weeks. Everything before that time will be considered read.
Usually, this list of "unread" items only shows changes that have been made since the last time you logged out.
Use the user's last activity date/time to mark items as "unread" (any activity in a topic after that time is marked "unread"). Then store in a Session variable, a list of topic IDs that the user viewed since last login. Combining these two would give you a relatively accurate list of unread topics.
Of course this data would then be lost on log-out or session expire and the cycle would start again without sacrificing an unnecessary amount of SQL queries.
On the custom forum I used to work with, we used a combination of your last visit time (updated every time you viewed another page - usually cookied), and a "mark read" button on each topic that added a date/time value to a SQL table containing your UserID, the TopicID and the Date/Time.
Thus to view new topics we would look at your last visit date and anything created after that point in time was a new topic.
Once you entered a topic any topic you had clicked "mark read" on would only show the initial topic and then any replies with a date/time added after you clicked the mark read button. If you have fewer viewers and performance to spare you could basically set it up to add an entry to the table for every topic the user clicks on, when they click on it.
Another option you have, and I have actually seen this done before in a vBulletin installation, is to store a comma separated list of viewed topic ids client-side in a cookie.
Server-side, the only thing stored was the time of the user's previous visit. The forum system used this in conjunction with the information in the user's cookie to show 'as read' for any topic where either
Last modified date (ie last post) older than the user's previous visit
Topic ID found in the user's cookie as a topic the user has visited this session.
I'm not saying it's a good idea, but I thought I'd mention it as an alternative - the obvious way to do it has already been stated in other answers, ie store it server-side as a relation table (many to many table).
I guess it does have the advantage of putting less burden on the server of keeping that information.
The downsides are that it ties it to the session, so once a new session is started everything that occurred before the last session is considered 'already read'. Another downside is that a cookie can only hold so much information, and a user may view hundreds of topics in a session, so it approaches the storage limit of the cookie.
One more approach:
Make sure your stylesheet shows a clear difference between visited and non-visited links, taking advantage of the fact that browsers remember visited pages persistently.
For this to work, however, you'd need to have consistent URLs for topics, and most forum systems don't tend to do this. Another downside to this is that users may clear their history, or use more than one browser. This therefore puts this measure into the 'not highly reliable category'; you would probably just do this to augment whatever other measure you are using to track viewed topics.

Resources