How can I prevent certain element to get displayed in Google search excerpt? - search

Currently Google displays elements in the result excerpts that belongs to the functional part of the site. Is there a way to exclude these elements to get crawled/displayed in google?
Like eEdit, eDelete, etc in the example above.

To exclude the pages from Google's index, block them using the Robots.txt file or if it is just the content then use the "rel="nofollow" tag.
Hope this helps.

Update on my particular situation here: I just found out that the frontend code has been generated in a way where the title and the description meta was identical.
Google is smart enough to expect that if a copy is already displayed in the title of the search result there's no reason to add in to the excerpt as well, instead looks for content - believed to be valuable - from the actual page.
Lessons learned:
there's no way to hide elements from google but keep it visible for your users
if you'd like to have control over the content displayed in google searches, avoid using the same copy in your title and description

Related

Search result: How to show only pages, not different content items?

We are using Liferay as a classic CMS meaning that we compose pages using web content articles. There is an issue with Liferay's internal search I could not yet find a proper answer for:
Because web content articles are pretty much only building blocks for pages we don't want the search to show them as distinct items. The user should only get a list of pages that contain their search keywords, including all the articles put onto this page.
At the moment we can see two different approaches and both come with certain problems we could not solve yet:
Idea 1
We modify the journal indexer and try to obtain all URLs of the pages (how?) where the article has been placed on. Then we add them to the document to be indexed. In the search result we then can access the URLs and collect them. In the end we make sure every URL is only shown once.
Idea 2
At some point Liferay renders the entire page before sending it to the browser. If we somehow could put an indexer there, we could index the entire page. We then could limit the search to the special "page documents". Getting the fully rendered page would be the main issue here, because either we would have to run a crawler to frequently trigger this indexing or we would need to find a way to trigger page rendering from within an indexer or something like that.
I have been carrying this problem around for quite a while now and still could not find an idea good enough to spend time trying it out. If anyone of you has some input on those two ideas or maybe an entirely different approach, I would be extremely grateful.
I'll just answer myself, because by now we found a suitable solution to solve our problem:
In addition to the default search portlet there is also a "Web Content Search Portlet" shipped with Liferay. It seems to have been part of Liferay for quite a while now, but it's somewhat hard to find, because there is hardly any documentation for it (I only found the Liferay wiki page, which isn't really anything at all). It searches only within web content articles and shows links to the pages rather than just a link an isolated view of the article. It has much less configuration options than the default search portlet, however. Pretty much all it allows to change is whether articles actually have to be placed on at least one page to show up in the results.
So there is no need for any kind of custom indexer or any other "hack"...all we need to do is use the correct portlet. We will only need to write a hook that changes the appearance of the result page.
What you ask is interesting but your ideas are on the wrong direction.
Specially idea 2 it's particulary wrong because you cannot do indexing work meanwhile a page is rendered. Think about performace only.
In Liferay pages and assets are not directly linked: pages have portlets and portlets display assets (web content and more).
Liferay indexing refers and scans assets content, not refers the display result of the assets. Think about permission: the same page can display different contents depends on the user who looks.
bye

Drupal 7 -- Publishing and Printing Content Based on Search Results

I am currently working on a project which requires content to be published onto a view or page depending on a search result criteria. For example: I search through my content for the word dog and this word appears on 4 of 20 pieces of content. I wish to view all of those items on a page that is not the Search Results page, but rather one that displays all the content found, so I can print each piece of content.
I apologize if this post is awkwardly worded. At this moment it is just an idea and I am trying to get a better picture of how to change publishing based on search results to a certain area.
Thank you for your time -- and if anyone wishes to ask follow up questions, I'd be more than willing to help clarify.
You can use a view with exposed filter. Create a view, create a filter criteria there, then in settings - check "Expose this filter to visitors, to allow them to change it". A user will see a form in a view, wich you can also make separate from a view, by setting "exposed form" to "yes" and putting it in a sepparate block.

How to make searchable "text/contents" on wiki page?

I have created a page on Wiki and I want to make the contents of this page searchable via wiki search option.
Wiki mean Wikipedia
i.e. title/heading of page is "ABCDEFG". If someone search "ABCD" in wiki search then this page should appear in search list.
May be its possible through adding tags into wiki page, but I don't know how to add meta tags in wiki. Or someone know some other way?
Thanks in advance.
Everything in the page (both title and content) will be searched, so when your page contains the word it will be found.
You could force the find by creating a redirect from ABCD to ABCDEFG, altough it that is useless when the redirect title is the first part of the actual title - people will find that with the search autocompletion/suggestion.
Note that the indexing of newly created pages can take its time, especially on large wikis like Wikipedia. Your page might not be found instantly after you saved it.
In order to be found this way, the page has to contain ABCD in its title or content. Of course users will find it if they search for ABCD*, but in practice nobody does this.
The following page helps me a lot to solve my issue.
http://www.imagwiki.nibib.nih.gov/mediawiki/index.php?title=Creating_a_New_Wiki_Page

Handle default web page with little information for search?

Would like to garner opinions. We've created a website for a gay members club and they wanted the default landing page to mysterious with little information on it.
As such the Default.aspx only contains a form asking for some personal details. Users can click a button to skip this content and go to an AboutUs page.
The problem is, because we cannot control what information Google uses for the site description in search results, it is picking up the forms fields - which obviously do not makes sense as a description.
I think there are two options to counter this:
Use Robots.txt to block access to Default.aspx and only allow access to AboutUs.aspx
Write a description and title in a H1 tag but make the text colour the same as the background colour
Could I get opinions which method people will think is best for search results?
Thanks.
I would not block or try and deceive Google.
Make sure the title tag for the page is good and descriptive. Around 70 characters to explain what the website is about.
Same goes for your meta description. About two sentences to continue on from the title information.

How to get a description of a URL

I have a list of URLs and am trying to collect their "descriptions." By description I mean what comes up, for example, if you Googled the link. For example, http://stackoverflow.com">Google: http://stackoverflow.com shows the description as
A language-independent collaboratively
edited question and answer site for
programmers. Questions and answers
displayed by user votes and tags.
This the data I'm trying to accumulate for the URLs I have.
I tried parsing the URL's meta-descriptions, however most of them are lacking a meta-description (yet Google and other search engines manage to get a description somehow).
Any ideas? Should I just "google" each link and scrape the data? I have a feeling Google wouldn't like this...
Thanks guys.
Different search engines have different algorithms to get the description out of the page if/when they are lacking the description meta tag. Some ignore the tag even it it's there.
If you want the description Google has, the most accurate way to get it would be to scrape it. Otherwise, you could write your own or look around on the web for code that does it.
These are called snippets.
Google use proprietary (and possibly patented) methods to garner this information, so there is no simple answer.
As you suggest, they will use meta-description information if it is there. (How to set the meta-information to help Google.)
They will also honour requests from the page authors to NOT include snippets. (How to prevent Google from displaying snippets) You should probably respect this too (as well as robots.txt, of course.)
You may have some luck with existing auto-summary packages, such as OTS.
You may want to check AboutUs.org (i.e. http://www.aboutus.org/StackOverflow.com).
But, there's little chance that the site will have an aboutus page and not have a meta description.
Some info that might explain how google does this:
Webmasters/Site owners Help
Adding a URL to google
I am not familiar with Google APIs, but perhaps there is an official way to get such information.
Interesting. some sources are better than others.
For "audiotuts.com" google has a worse description than AboutUs.com.
Google
Nov 18th in General by Joel Falconer ·
1. Recently, an AUDIOTUTS reader asked me about creative process. While this
is a topic that can’t be made into a
...
AboutUs.com:
AUDIOTUTS is a blog/tutorial site for
musicians, producers and audio
junkies! It is the sister site of the
popular PSDTUTS, VECTORTUTS and
NETTUTS.
I hate problems like these... they should be trivial but they aren't!
If you can assume English content, you can first look for Meta Description, and if that doesn't work, you can look for the first two or three sentence-like word sequences.
A product I worked on looked for the first P or DIV that contained more than one sequence of > n "words" delimited by periods. It would use the two or three sentence-like sequences, up to x total words, as a summary paragraph. It wasn't 100% accurate, but good enough for the average case. The number of words was adjusted a few times to eliminate things like navigation elements.

Resources