Increase Snippet Size on Google Search Appliance - search

I am currently working on customizing a corporate GSA box for the client's website with Goggle's XSL stylesheet.
Unfortunately I do not have direct access to the box, this involves a meeting with a fellow in another timezone, so experiments and learning is very short on my part.
One of the biggest issues we are having is weather or not it is possible to get more characters/words into a resulting search-snippet in the XML output. More specifically, the returned field.
I've gone through a lot of the documentation for this and so far I've only found the tlen value for Title length, but not snippet length.
I know there are some parameters (hidden fields) to customize some options in the search form, but not finding anything relating to this. Since I can not access the Administrator Control panel itself, I've no idea what options are there. Can anyone point me to something that will help with this? It's greatly appreciated, I'm striking out on this.
BTW; we are at the current Version 6.14 I believe.

The GSA administrator can increase the length of the snippet. This is a system wide setting and not able to be customized at query time.
See this

http://www.google.com/support/enterprise/static/gsa/docs/admin/70/admin_console_help/serve_query_expansion.html#snippets
Changing the Snippet Length in Search Results
By default the Search Appliance, for most languages, will return Snippets with the length of 160 characters. Some exceptions to this rule are CJK languages for which Snippets are by default 240 characters. To Change the Snippet Length:
Under Snippet Generation, type a number in the Snippet Length box. The number must be bigger or equal to 0 and not bigger than 1024.
Click Apply Snippets Generation Settings.

Related

how to access google define feature in a batch

Suppose I have a huge set of noisy phrases. For each one of them, I want to check if it is defined by some resources by using the google define feature. Once I type "define my_phrase" to the google search box, if the retrieved results contain the definition panel (e.g. https://www.google.com/#q=define+home+cooking), I put it into my phrase pool.
I'm wondering is this possible to do this task in a batch so that I don't have to type each of the phrase manually one by one? It would be great if this could be achieved from a unix terminal but windows is also welcome!
I heard of google-app-engine but I only have a rough idea and not sure if it could help.
Thanks!
as starting point, you may try and play with the Google Custom search following API reference - Xml results
https://developers.google.com/custom-search/docs/xml_results?hl=en&csw=1#XML_Results
Be aware of:
google TOS for this service
quantity courtesy limit

Sharepoint 2010 - Pass "u" parameter to advanced search

The scenario - I am building a site to house a number of reports - thirty or so subsites under a main web for different report categories, and several libraries in each site, one for each separate report. In total, about 600 reports (libraries) across the thirty report categories (sites). This design has been decided on, and cannot change.
I plan/want to have a single advanced search page to search all the reports, using various custom metadata columns. That bit's easy, I can do that out of the box.
One of the most important search criteria is which report on which to search, of which, as I mentioned, there are many. The dictate is to make the report type added "invisibly" - they will select the report category, then the report type, and THEN get presented with the search page. The search should "know" which report is being searched on.
Scope selection is not a viable option, as there's too many libraries, and more will be added as new reports are created.
Now, I can get the results I want in the results if I add the "u" parameter to the URL as in;
results.aspx?k="RunDate=1/23/13"&U=http://site/report_type/library"
(address left unescaped for clarity)
My challenge is finding a way to feed that parameter TO the advanced search, and get it to tack it on to the end of its generated query.
I'm confident it can be done with only a little fidgeting to the webpart, but I need a bit of a shove in the right direction.
Or, as always, if y'all have a more brillianter idea, I could do that.
Now, I have a second issue where the different reports have their own varying set of metadata columns, and they only want the RIGHT ones to show up for each report, but one crisis at a time.
EDIT - upon further research, it seems I can't extend the advanced search webpart, as it's a sealed type. Has anyone either a way around that, or have a third-party advanced search page that I CAN crack into?
I was able to find a solution to this issue by overriding the JavaScript function NavigateTo(url) which is responsible for the redirect. My solution can be found here
What you are actually asking about is a contextual search box, as the u parameter resembles the contextual search scope.
I'm not sure that the standard search box can be configured the way you want it to, so it always adds the query string u=<current url>. I think you will have to resort to some (even if simple) code.
An example you can find here: Create a SharePoint Contextual Search Box in a Content Editor Web Part.
Of course you could do the same thing with server side code, but as you only want to add a querytring parameter, JavaScript should be enough.

can you have "variables" in text in google sites?

Sorry, this is a bad question. I don't even know what the title should be. I'm a total noob at making websites so this might be easy to find but I just don't know the terminology to search for. I cannot find anything about how to do this...
What I want to do is have something like references/variables that I can use in a block of text and it will automatically get replaced with whatever value should be there. Best way I can think of to describe it would be if I was using the site as a design doc for a game or something, I would be able to type in [Title] or something similar on any page and when it loads that text would be replaced with whatever my Title is. That way If I ever change titles, names, classes, races, places, items, etc... they would only have to be changed in 1 place and the change would be reflected everywhere.
I notice if I add a link to a page it will automatically use the Title of that page as the text of the link. That is almost exactly what I want. Except when I change the Title of the other page the text of the link remains as the original text. It doesn't get updated to the new Title and that is not at all what I want.
Also, I want to do this in Google Sites and as simply as possible. I don't really want to use a database. I was hoping Google Sites would have some kind of funcionality for this.
I don't believe this is possible (on Google Sites) and likely you need to consider a hosted solution.
Quoting the answer from this relevant post:
You should consider hosting your solution using Google's App Engine
instead of Google Sites. You can set it up so it uses PHP (see link
below), you can configure it to use your domain name and you get
enough CPU, disk and bandwidth allowance to serve around five million
page views for free each month, if you are serving more than that,
their prices are extremely competitive.
Google App Engine:
http://code.google.com/appengine/docs/whatisgoogleappengine.html How
to setup PHP using Google App Engine: http://blog.caucho.com/?p=187
Also I'm not sure how your PHP skills are but if you're unfamiliar with it then this should help to get you started.

CAPTCHAS and img name /displayed value association

Unfortunately I am having trouble summing up this question in one phrase/line, thus I am forced to initially hint as to what my understanding is before asking the question in fear of me asking a question based on wrong facts or assumptions. As I understand “and please correct me if am wrong”, as I understand captchas work like so:
Have numerous images, and associate each image name/source to its displayed characters value.
Display image, then have user input what they see.
Match user's input against the character value which is associated with that image's name/source.
Assuming my understanding is correct: Given an unlimited amount of time,
Can't one associate image
names/sources with the displayed
characters increasing the chance of
cracking the captcha as they gather
more associations?
In that case, wouldn't the security strength of captchas be parallel the size of the image database?
NOTICE: As i suspected my question was based on wrong understanding.
Short answer! These are dynamic images and they are not stored anywhere. You wont even find them in the source code..
Wikipedia has good explanation about this. Alternatively check out the related questions in SO.
Edit: Goto this page where you can see an example of a captcha. Use firebug to see the HTML code for this image and you will see something like this.
<img height="57" width="300" src="http://www.google.com/recaptcha/api/image?c=03AHJ_VutaG4ahxWuQv0e6edYypp_FM8QuFIZkG75AnAm8iu3WRmwQ41jqcvojmKmbSKXxkf_s9fk61-axEp77_omKZZEYliE35BND_hXNh3Jac6ZUAeD08wOMZPj4W2s-A39vAI84eim5q-z9kFnmoSmon1jG2LmmFw" style="display: block;">
Did you notice the source? It does not point to an image file.
You can copy this url and generate the image (just open it in a browser). So you can develop an application which can download this image and then scan for color change in pixels and try to match for alphabets and numbers but if you notice almost all the alphabets and numbers are connected and closer so it is difficult to seperate different alphabets.
Even if you manage to seperate most of the alphabtes are not perfect. example :
(source: watblog.com)

How to get a description of a URL

I have a list of URLs and am trying to collect their "descriptions." By description I mean what comes up, for example, if you Googled the link. For example, http://stackoverflow.com">Google: http://stackoverflow.com shows the description as
A language-independent collaboratively
edited question and answer site for
programmers. Questions and answers
displayed by user votes and tags.
This the data I'm trying to accumulate for the URLs I have.
I tried parsing the URL's meta-descriptions, however most of them are lacking a meta-description (yet Google and other search engines manage to get a description somehow).
Any ideas? Should I just "google" each link and scrape the data? I have a feeling Google wouldn't like this...
Thanks guys.
Different search engines have different algorithms to get the description out of the page if/when they are lacking the description meta tag. Some ignore the tag even it it's there.
If you want the description Google has, the most accurate way to get it would be to scrape it. Otherwise, you could write your own or look around on the web for code that does it.
These are called snippets.
Google use proprietary (and possibly patented) methods to garner this information, so there is no simple answer.
As you suggest, they will use meta-description information if it is there. (How to set the meta-information to help Google.)
They will also honour requests from the page authors to NOT include snippets. (How to prevent Google from displaying snippets) You should probably respect this too (as well as robots.txt, of course.)
You may have some luck with existing auto-summary packages, such as OTS.
You may want to check AboutUs.org (i.e. http://www.aboutus.org/StackOverflow.com).
But, there's little chance that the site will have an aboutus page and not have a meta description.
Some info that might explain how google does this:
Webmasters/Site owners Help
Adding a URL to google
I am not familiar with Google APIs, but perhaps there is an official way to get such information.
Interesting. some sources are better than others.
For "audiotuts.com" google has a worse description than AboutUs.com.
Google
Nov 18th in General by Joel Falconer ·
1. Recently, an AUDIOTUTS reader asked me about creative process. While this
is a topic that can’t be made into a
...
AboutUs.com:
AUDIOTUTS is a blog/tutorial site for
musicians, producers and audio
junkies! It is the sister site of the
popular PSDTUTS, VECTORTUTS and
NETTUTS.
I hate problems like these... they should be trivial but they aren't!
If you can assume English content, you can first look for Meta Description, and if that doesn't work, you can look for the first two or three sentence-like word sequences.
A product I worked on looked for the first P or DIV that contained more than one sequence of > n "words" delimited by periods. It would use the two or three sentence-like sequences, up to x total words, as a summary paragraph. It wasn't 100% accurate, but good enough for the average case. The number of words was adjusted a few times to eliminate things like navigation elements.

Resources