Solr highlighting - search

I saw this post here, he explains well howto show a highlighted result, but for me this is not going to work...
I am getting the lst with highlighting and all, but the text in that is very less compared to the original response without highlighting...
How do I merge highlighting content with original result set in php ?

Try hl.fragsize to increase the size of the highlighted snippet returned by Solr.

I would recommend to use a non-default highlighter to get much better results.
Search in you solrconfig.xml for your RequestHandler:
<requestHandler name="/select" class="solr.SearchHandler">
and add the following:
<str name="hl.usePhraseHighlighter">false</str>
<str name="hl.useFastVectorHighlighter">true</str>
<str name="hl.boundaryScanner">breakIterator</str>
Now goto the Highlighter section
<searchComponent class="solr.HighlightComponent" name="highlight">
and search for this line:
<boundaryScanner name="default" default="false" class="solr.highlight.SimpleBoundaryScanner">
Make sure you set it to "default=false".
Afterwards configure the BoundaryScanner defined above as default:
<boundaryScanner name="breakIterator" default="true" class="solr.highlight.BreakIteratorBoundaryScanner">
For this scanner change the type to "SENTENCE":
<str name="hl.bs.type">SENTENCE</str>
And set your language and country setting.
This change gave me a lot better hightlighting results!
Awww - almost forgot to mention the changes in schema.xml. Get the fiel you want to highlight and add following options:
termVectors="true" termPositions="true" termOffsets="true"

Related

XML conditional duplicate filtering

I want to filter an XML document.
Lets use a simple example:
Based on the code below, IF appears MORE than once in the document, WITHIN the tags, then it must point each occurrence out.
<car>
<engine>
</engine>
//THIS SHOULD THEN BE POINTED OUT//
<engine>
</engine>
//THIS SHOULD THEN BE POINTED OUT//
</car>
So it must show these with the click of a button(only if the conditions are true).
Maybe some sort of a program or xml schema?
Please any help would be much appreciated. Thanks in advance.
I have tried XML schema and Altova as well as excell.

Semantically correct way to add a copyright notice into a svg file?

I want to add a copyright notice in my svg files and it should be only "hidden" text and no watermark.
This is no real protection, because if you open a svg file with a text editor you can edit everything and delete the copyright. But I think this would be a simple and great way to show, who has made the file and a possible chance to find unlicensed graphics if there is some hidden information and if you are looking for it you can easily find it.
My main question is: how should the copyright text be put into the file?
<title> element is for accessibility purposes, some user agents display the title element as a tooltip.
<desc> element generally improves accessibility and you should describe what a user would see.
ugly way: a text element with inline CSS to hide it. Don't even think about this! :)
<!--Copyright info here--> could be also a simple solution.
<metadata>: this would the best way but I did not find a detailed definition and which child elements could live inside. Also https://developer.mozilla.org/en-US/DOM/SVGMetadataElement gives a 404.
Under https://www.w3.org/TR/SVG/struct.html#MetadataElement we can find more details. But is RDF really necessary?
I think a <metadata> element is the right place, but which child elements should be used and is just RDF the way to go?
I think the metadata element is the correct choice here. It has to contain XML, but it doesn’t have to be a RDF serialization (e.g., RDF/XML).
But I think it makes sense to use RDF here, because that’s exactly RDF’s job (providing metadata about resources, like SVG documents), and there is probably no other XML-based metadata language that has greater reach / better support.
A simple RDF statement (in RDF/XML) could look like this:
<metadata>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:schema="http://schema.org/">
<rdf:Description rdf:about="http://example.com/my-svg-file.svg">
<schema:license rdf:resource="https://creativecommons.org/licenses/by-sa/4.0/"/>
</rdf:Description>
</rdf:RDF>
</metadata>
The about attribute takes an IRI as value; for a stand-alone SVG document, you could provide an empty value (= the base IRI of the document).
In this example I use the license property from Schema.org:
A license document that applies to this content, typically indicated by URL.
(The vocabulary Schema.org is supported by several big search engines.)

How to blacklist/demote search results with certain set of keywords in ElasticSearch and Apache Solr?

I was comparing both ElasticSearch and Apache Solr for a search solution. Data that will go into the system is not moderated and I don't want anyone to search for something and some sexually explicit content to flash on the very top of the search result. But I don't want to remove them for search results either. I want to demote them, so that they come later in the search results. Can I do this in Solr or ElasticSearch ? Some pointers towards how to achieve this will be helpful.
In Solr you can't give "negative boosts" per se but you can boost everything that doesn't have the term. This can be done with the boost query:
...&bq=(*:* -erotic)^999
or in solrconfig.xml:
<str name="bq">(*:* -erotic)^999</str>
Where "erotic" is the term to which you wish to give a "negative boost". To add another term, add another bq=....

Wildcard searches using dismax handler?

I have successfully indexed files, and want to be able to search using wildcards. I am currently using the dismaxRequestHandler (QueryType = dismax) for the searches so that I can search all the fields for the query.
A general search like 'computer' returns results but 'com*er' doesn't return any results.
Similary, a search like 'co?mput?r' returns no results.
Could someone please tell me a way to continue using dismax and be able to do wildcard searches in the 'q' field?
Does edismax handler have this? If so, How do I use it. I have Solr 1.4.1.
Please help me out.
Thanks.
Imran.
Grab latest (trunk) build from Hudson. Use <str name="defType">edismax</str> in the RequestHandler to activate edismax.

Have boost effect on lucene/compass field search

In our compass mapping, we're boosting "better" documents to push them up in the list of search results. Something like this:
<boost name="boostFactor" default="1.0"/>
<property name="name"><meta-data>name</meta-data></property>
While this works fine for fulltext search, it does not when doing a field search, e.g. the boost is ignored when searching something like
name:Peter
Is there any way to enable boosting for field searches?
Thanks for your help and sorry if this is a dumb question - I am new to Lucene/Compass.
Best regards,
Peter
I am sorry, please ignore this question. The reason was completely different, the search result got blurred by a bad query :(

Resources