I have been tasked with finding a way to perform different actions, or display different content based on the search terms that someone used to find the site. I have seen other sites do this before (display my search terms to try to make the page look more relevant), but I don't know how this is done.
How can I fetch the search terms used to find the page?
I believe this is usually done by inspecting the value of the HTTP_REFERRER. You don't mention which language you're using, but you can obtain this in PHP with $_SERVER['HTTP_REFERRER']
Related
I have this requirement:
We have a journalarticle and we wish to have sections which have content for internal and external users for the application.
We are able to hide the content from rendering by implementing custom template on web content display and using a simple custom-field for a user which helps us to classify it.
Having said that when we search something as an external user, the search portlet is able to fetch an article where the search text is a part of internal user content, and due to the above mentioned template the content is not visible.
In short, from the user's perspective the resultant article does not match the searched term.
I wish to seek some pointer to check whether there is a mechanism to ensure that when an external user searches something then we only search the dynamic-element of the doc which matches the user type?
We have thousands of such articles and create multiple copy of the same article does not seems viable solution.. so any pointers would be a great help.
Liferay version : 6.2 GA4 CE
Thanks!
AJ
First of all: Not finding a search term in a document can be a sign of good working synonym resolution in the search engine. It's questionable if this behaviour is always wrong or only in this particular case. Remember google bombs?
That being said, I believe that this architecture of half-visible documents is flawed from the beginning. Ideally I'd suggest to change it, for example by splitting the information to two articles, so that you can use the standard permissions to resolve. If you link both, you can determine how/which article or template to use. It's not an ideal solution, but might be a workaround.
Another workaround might be to change Liferay's indexer component and index two different versions of the article, with two different permissions. Of course, you'll have to change the search side as well, so that you'll find each article at most once, even if it's now twice in the search engine.
Again - not ideal, but might be the quickest fix that you can get right now without changing the underlying architecture. However, to change the underlying architecture is my actual recommendation.
I have quite a few constant parts of pages I'd like to exclude from displaying in search results to prevent obscuring of the unique content on each respective page.
I read that class="nocontent" will perform this action for Google. But what about the other main search engines like Yahoo and Bing? Is there a globally accepted solution for this, or is there an additional step to get them to do the same?
Thank you for any assistance.
Google doesn't offer such a feature for the general search. The class nocontent is only for Google Custom Search. The comments googleon/googleoff are only for Google Search Appliance.
Yahoo! introduced the class robots-nocontent in 2007. Google doesn't support it.
There is a microformats draft, but it has probably no support.
Despite that, there are some "hacks" that could accomplish what you need, but I wouldn't count on or use them. For example: inserting content with JS, or embedding content in iframe (and blocking the source URL in robots.txt).
alt text http://blog.chomperstomp.com/img/wikipediaGoogle.png
How can I get a search box like this listed for my website?
You can't choose what todo with these links it's under Google controls, more info here
http://googleblog.blogspot.com/2008/03/search-within-site-tale-of.html
This feature will now occur when we detect a high probability that a user wants more refined search results within a specific site.
Essentially, you can't. This is only likely to occur on high-traffic sites with usage cases that lend themselves to this sort of search-within-search.
Looking through my search logs from time to time, I notice that by far the biggest user of my search engine is the google-bot. What gives? Is it looking for content that might not be directly accessible through navigation? If so, how does it know which words and phrases to look for (they're surprisingly relevant). Does it check the most popular keywords on the site? I know I seem to be answering my own question here, but this is really only working it out from first principles. I'd like to hear from someone who knows what they're talking about (i.e. not me).
If your search form's method is get instead of post, each search has its own url, and people might be posting those urls elsewhere. Or if you have a (possibly inadvertently) publicly accessible webstats page that listed those urls, that's another common way for search engines to stumble upon your internal search urls. A third way I've seen is sites that list recent searches on their pages, but this is more intentional. "MySQL Performance Blog" does this to an annoying extent, so any search of their site from google yields hundreds of pages of similar searches, even if none of them found what they were looking for.
Edit: Looks like it does on occasion, but only GET forms:
http://googlewebmastercentral.blogspot.com/2008/04/crawling-through-html-forms.html
Google will use words that occur on your site in search boxes to try to find pages that it can't otherwise.
Google says that for the past few months, it has been filling in forms
on a "small number" of "high-quality" web sites to get back
information. What words has it been entering into those forms? Words
automatically selected that occur on the site, with check boxes and
drop-down menus also being selected.
http://searchengineland.com/google-now-fills-out-forms-crawls-results-13760
Let's say I'm performing a google search for search term.
Sometimes, one of the suggestions will be to a URL like this: www.someothersearch.com/search+term/
How does "someothersearch.com" do this?
In general, a page will only be in Google if some other page links to it. Google is not going to go to someothersearch.com and submit "search term" into the form, it is likely a hidden or nonhidden link on someothesearch.com.
Why not? someothersearch.com presumably has its own index pages for terms searched previously; the Google spider is just indexing those index pages as well.
Just a guess. Maybe these sites support OpenSearch?
I misunderstood your question at first; What these sites are doing is rewriting their requests. How they know which terms people will search for is a bit of a mystery to me, but it probably relies on things like watching google.com/trends, scraping their own and other log files for referral from google that include the search term, buying lists of well ranking terms people might use AdSense for and instead trying to generate natural traffic for them... etc. Probably when they add new pages with these terms they're also adding them to their xml sitemap that Google will crawl.
Redacted:
I have added the Open-Search tag to your question; please follow it. You'll find this post on https://stackoverflow.com/questions/20830/firefox-and-ie7-users-here-is-your-stackoverflow-search-pluginlink textthe most informative; however I recommend you use image/png for your icon format.