How to perform website benchmarking?

How to perform website benchmarking? - nlp

I am trying to do competitive analysis of online trends prevailing in real estate domain at state level in a country. I have to create a report which is not biased towards any particular company but it compares or just shows how the companies are performing for a list of trends. I will use parameters of Clickstream analysis to show the statistics of how the websites of the company perform. The trend specific performance can be depicted by Sentiment Analysis in my opinion. If there is some other way to do it in an effective manner I am looking forward to any such approach.
Now, I am not able to find any trends that come in common.
How can I find general trends which will be common for all real estate comapnies ?
I tried using Google Trends. They provide graphical and demographic information regarding a particular search term and lists related terms to the search which I am clueless how to use. And as I drill down from country to state, the amount data is very less.
Once I have the trends then I've to find how people are reacting to those trends. Sentiment Analysis is the thing which will provide me this info.
But even if I get the trends how will I get trend specific data from which I can calculate its polarity ?
Twitter and other social media sites can provide some data on which sentiment analysis can be performed. I used this site which gives the positive, negative and neutral behaviour related to some term on twitter. I need something analogous to this but the dataset on which this analysis can be performed should not be limited to social media only.
Are there any other entities I can add in this competitive analysis
report ?
The report will be generated on monthly basis. And I want maximum amount of automation in above tasks. I am thinking of using web-scraping also to scrape data of similar format. I would also like to know what data I should scrape and what data I should manually extract.

Related

How to find popular Google search terms for a particular demographic/location/interest group?

I'm starting an online business targeted at a particular demographic and interests so I would like to produce content targeted at what this particular target market are actually searching for.
Google Ads allowed me to refine my target audience to the exact categories (demographics and interests) I needed but I couldn't tell me what that category of people tend to search for except for the tiny subset that happens to click on one of my ads which is very rare given I am just starting with a small budget. I would like to know the most popular search terms for everyone in the categories I specified not just those who happened to click on my ads.
I tried Google Trends, that told me the popularity of a particular search term for a given country but that's too broad - I need to narrow it down to a particular city, age group, parental status and interests. Google Trends also helped me find popular related search terms given a particular search term so I could try using that to see if there are any common popular related search terms related to my guesses but I could miss terms related to terms I never thought of.
I could try producing content across a rage of topics which I think my target audience might be interested in and then analyse the results using Google Ads but that could be a very expensive trial and error process and I might miss more popular topics which I never thought of.
Of course I could try to ask my target market in person directly (by interrupting people in the street!) but that would be very expensive for me because I would have to travel to and stay at the location where my online business is targeted, hoping to meet people with the exact same demographic and interests that I am looking.
I'm sure there must be a way to figure this out using the the Google search analytics. Essentially, all I need is a list of most popular recent Google search terms for a particular location, demographic and interests group in Google Analytics. Could anyone help me understand how to get this list?

Here are a few considerations, even if you found an answer.
Take a look at the AdRoll platform. Here's a potentially helpful article from them about target audience and demographics.
A recent article about AdWords demographic targeting. An older looking article, connecting demographics to search queries, but page's source code suggests an update this year.
Last but not least, you're probably eligible to talk with a Google Small Business Advisor.

Measurements to evaluate a web search engine

I'm currently developing a small web search engine but I'm not sure how am I going evaluate it. I understand that a search engine can be evaluated by its precision and recall. In a more "localized" information retrieval system, e.g., an e-library, I can calculate both of them because I can know which stuffs are relevant to my query. But in a web-based information retrieval system, e.g., Google, it would be impossible to calculate the recall because I do not know how many web pages are relevant. This should means that F-measure and other measurements that require the number of relevant pages cannot be done.
Is everything I wrote correct? Is web search engine evaluation limited to precision only? Are there any other measurements I could use to evaluate a web search engine (other than P#k)?

You're correct that precision and recall, along with the F score / F measure are commonly-used metrics for evaluating (unranked) retrieval sets in search engine performance.
And you're also correct about the difficult or impossible nature of determining recall and precision scores for huge corpus of data such as all the web pages on the entire internet. For all search engines, small or large, I would argue that it's important to consider the role of human interaction in information retrieval: are the users using the search engine more interested in having a (ranked) list of relevant results that answers their information need or would one "top" relevant result be enough to satisfy the user's information needs? Check out the concept of "satisficing" as it pertains to information seeking for more information about how users assess when their information needs are met.
Whether you use precision, recall, mean-average precision, mean reciprocal rank, or any other of the numerous relevance and retrieval metrics it really depends on what you're trying to assess with regard to the quality of your search engine's results. I'd first try to figure out what sort of 'information needs' the users of my small search engine might have: will they be looking for a selection of relevant documents or would it be more helpful for their query needs if they had one 'best' document to satisfy their information needs? If you can better understand how your users will be using your small search engine you can then use that information to help inform which relevance model(s) will give your users results that they deem to be most useful for their information-seeking needs.
You might be interested in the free, online version of the Manning and Schütze "Introduction to Information Retrieval" text available from Stanford's NLP department which covers relevance and retrieval models, scoring and much more.
Google's Search Quality Evaluator training guide, which lists a few hundred dimensions on how Google's search results are ranked/scored, might be of interest to you too as you try to suss out your user's information-seeking goals. It's pretty neat to see all of the various factors that go into determining a web page's PageRank (Google's page ranking algorithm) score!

how schema.org can help in nlp

I am basically working on nlp, collecting interest based data from web pages.
I came across this source http://schema.org/ as being helpful in nlp stuff.
I go through the documentation, from which I can see it adds additional tag properties to identify html tag content.
It may help search engine to get specific data as per user query.
it says : Schema.org provides a collection of shared vocabularies webmasters can use to mark up their pages in ways that can be understood by the major search engines: Google, Microsoft, Yandex and Yahoo!
But I don't understand how it can help me being nlp guy? Generally I parse web page content to process and extract data from it. schema.org may help there, but don't know how to utilize it.
Any example or guidance would be appreciable.

Schema.org uses microdata format for representation. People use microdata for text analytics and extracting curated contents. There can be numerous application.
Suppose you want to create news summarization system. So you can use hNews microformats to extract most relevant content and perform summrization onit
Suppose if you have review based search engine, where you want to list products with most positive review. You can use hReview microfomrat to extract the reviews, now perform sentiment analysis on it to identify product has -ve or +ve review
If you want to create skill based resume classifier then extract content with hResume microformat. Which can give you various details like contact (uses the hCard microformat), experience, achievements , related to this work, education , skills/qualifications, affiliations
, publications , performance/skills for performance etc. You can perform classifier on it to classify CVs with particular skillsets
Thought schema.org does not helps directly to nlp guys, it provides platform to perform text processing in better way.
Check out this http://en.wikipedia.org/wiki/Microformat#Specific_microformats to see various mircorformat, same page will give you more details.

Schema.org is something like a vocabulary or ontology to annotate data and here specifically Web pages.
It's a good idea to extract microdata from Web pages but is it really used by Web developper ? I don't think so and I think that the majority of microdata are used by company such as Google or Yahoo.
Finally, you can find data but not a lot and mainly used by a specific type of website.
What do you want to extract and for what type of application ? Because you can probably use another type of data such as DBpedia or Freebase for example.

GoodRelations also supports schema.org. You can annotate your content on the fly from the front-end based on the various domain contexts defined. So, schema.org is very useful for NLP extraction. One can even use it for HATEOS services for hypermedia link relations. Metadata (data about data) for any context is good for content and data in general. Alternatives, include microformats, RDFa, RDFa Lite, etc. The more context you have the better as it will turn your data into smart content and help crawler bots to understand the data. It also leads further into web of data and in helping global queries over resource domains. In long run such approaches will help towards domain adaptation of agents for transfer learning on the web. Pretty much making the web of pages an externalized unit of a massive commonsense knowledge base. They also help advertising agencies understand publisher sites and to better contextualize ad retargeting.

What is the best way to measure search quality on site?

I have site which is rely on search and I want to select single metric to use for search ranking experiments. What is the state of art metric for real live site.
"Real life" means I can't create golden standard and ask judges to rank results per query.
I'm thinking about "clicks per search" as a main measure.

Converting data into information:Where to start?

We (my company) runs a website which have lots of data recorded like user registration, visits, clicks, what the stuff they post etc etc but so far we don't have a tool to find out how to monitor entire thing or how to find patterns in it so that we can understand what kind of information we can get from it? So that Mgmt can take decisions based on it. In short, the people do at Amazon or Google based on data they retrieve, we want a similar thing.
Now, after the intro, I would like to know what technology could it be called;is it Data Mining,Machine Learning or what? Where should we start to convert meaningless data into useful Information?

I think what you need enters in the "realm" of: parsing data, creating graphs, showing statistics about some elements, etc.
There is no "easy" answer, I can only answer parts of your question.
There are no premade magical analytical tools, big companies have their own backend tools tunned to parse the large amounts of data and spit out data summaries that are then used to build graphs or for statistical analysis.
I think the domain you are searching for is statistical data analysis. But there are many parts that go together here.
Best advice I can give you is to set up specific goals for you analysis and then try to see what is the best solution, you question is too open.
ie. if you are interested in visits/clicks/website related statistics Google Analytics is a great tool, and very easy to use.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string