Solr plugin capabilities - search

Currently I'm facing a problem in multi-word synonym in Solr. So I thought up a solution step:
Step:
Solr plugin intercept the search keyword.
Plugin will get the list of acronym, synonym etc from database table
Plugin will compare the search keyword one by one from the synonym list that extract just now at 2
If exist, the search keyword will convert into the synonym word.
Depends on the result, plugin will decide which fieldtype/filter/tokenizer to put into 6 parameter.
Plugin will return (keyword, which field to search into, which analyzer to use) for Solr to search.
The questions:
can plugin intercept the search keyword so that it can be processed in plugin?
can I access and get records directly from DB in Solr plugin?
can plugin tell Solr what to search, search on which field and use what filter/tokenizer to search? Or can plugin straight away do searching within plugin and pop out the result?
Thank you.

You might want to take a look al Nolan Lawson's multi word synonym solution.
https://github.com/healthonnet/hon-lucene-synonyms

Yes. here is an example how to do that by writing your own SearchComponent.
in solrconfig.xml
<requestHandler name="/myHandler" class="solr.SearchHandler">
<arr name="first-components">
<str>myComponent</str>
</arr>
</requestHandler>
<searchComponent name="myComponent" class="com.xyz.MyComponent" />
.
public class MyConponent extends SearchComponent {
....
#Override
public void prepare(ResponseBuilder rb) throws IOException {
String originalQuery = rb.getQueryString(); //get the original query string
// access to DB and get records here
// then construct a new query string and set to rb.
rb.setQueryString(newQueryString);
}
}
Use "/myHandler" instead of "/select" for getting the results.

Related

Performing a distributed search through spark-solr

I'm using spark-solr in order to perform Solr queries. However, my searches don't work as they're supposed to because for some reason the requests being generated by spark prevent the searches from being distributed. I have discovered it by looking at the Solr logs where I saw that a distrib=false parameter is added to the sent requests. When executing the queries manually (not using spark) with distrib=true the results were fine.
I was trying to set the parameters sent by spark by changing the "solr.params" value in the options dictionary (I'm using pyspark):
options = {
"collection": "collection_name",
"zkhost": "server:port",
"solr.params": "distrib=true"
}
spark.read.format("solr").options(**options).load().show()
This change did not have any effect: I still see in the logs that a distrib=false parameter is being sent. Other parameters passed through the "solr.params" key (such as fq=something) do have an effect on the results. But it looks like spark insists on sending distrib=false no matter what I do.
How do I force a distributed search through spark-solr?
The easy solution is to configure the request handler to run distributed queries using an invariant. The invariant forces the distrib parameter to have a true value even if spark-solr is trying to change it in query time. Introducing the invariant can be done by adding the following lines under the definition of your request handler entry in solrconfig.xml:
<lst name="invariants">
<str name="distrib">true</str>
</lst>
While the introduction of the invariant is going to fix the problem, I think it's kind of a radical solution. This is because the solution involves hiding a behavior in which you overload the value of a parameter. By introducing the invariant you cannot decide to set distrib to false: even if your request explicitly does so, the value of distrib would still be true. This is too risky in my opinion and that's why I'm suggesting another solution which might be harder to implement but wouldn't suffer from that flaw.
The solution is to implement a query component which is going to force distrib=true only when receiving a forceDistrib=true flag as a parameter.
public class ForceDistribComponent extends SearchComponent {
private static String FORCE_DISTRIB_PARAM = "forceDistrib";
#Override
public void prepare(ResponseBuilder rb) throws IOException {
ModifiableSolrParams params = new ModifiableSolrParams(rb.req.getParams());
if (!params.getBool(FORCE_DISTRIB_PARAM, false)) return;
params.set(CommonParams.DISTRIB, true);
params.set(FORCE_DISTRIB_PARAM, false);
rb.req.setParams(params);
}
}
After building the component you can configure solr to use it by adding the component to solrconfig.xml and set your request handler to use it.
Adding the component to solrconfig.xml is done by adding the following entry to the solrconfig.xml file:
<searchComponent name="forceDistrib" class="ForceDistribComponent"/>
Configuring the request handler to use the forceDistrib component is done by adding it to the list of components under the request handler entry. It must be the first component in the list:
<arr name="components">
<str>forceDistrib</str>
<str>query</str>
...
</arr>
This solution, while more involved than simply introducing an invariant, is much safer.

How to perform a search on several entities with Symfony 2

I need to perform a search on several entities with the same string then order the results.
I've heard/read a little about FOSElasticaBundle, would this bundle be able to do it? It seems (to me) to have almost to much features for this purpose and I'm not sure it could run on a shared server (hostgator).
The other solution I can think of at the moment is doing the search "manually" (by using join and union) but I'm wondering where should I put such a function: in an existing controller, a new one, a new bundle or somewhere else?
I'm worried as well that this manual solution could come to a cost, especially on some non-indexable fields.
You would do custom entity repositories. Check out the docs. Basically this extends the default FindAll, FindOneBy, etc.
You would have a function like so:
class MyEntityRepository extends Doctrine\ORM\EntityRepository {
public function findByCustomRule(){
//this is mapped to your entity (automatically adds the select)
$queryBuilder = $this->createQueryBuilder('someAlias');
$queryBuilder->orderBy('...');
//this is mapped to any entity
$queryBuilder = $this->getEntityManager()->createQueryBuilder();
$queryBuilder->select('...');
//result
$result = $queryBuilder->getQuery()->getResult();
}
}
This class is defined in the doctrine mapping and lives inside the Entity folder.. Check the docs out and you should get a basic idea.

Searching with Lucene with stemming enabled

Suppose I store a set of strings (each document in Lucene would be a single word), and then given an input word W, I would like to retrieve all the document that not only match word W but also those documents whose stemmed version also matches W.
Also, suppose a input a word W, I would want to take care of the case where there is a document that matches the stemmed version of the word W as well.
Would writing my own custom analyzer and returning a PorterStemFilter suffice? Do I need to just write this class and reference it as the analyzer in the code?
Writing a custom Analyzer that has a stemmer in the analyzer chain should suffice.
Here is the sample code that uses PorterStemFilter in Lucene 4.1
class MyAnalyzer extends Analyzer {
#Override
protected TokenStreamComponents createComponents(String fieldName, Reader reader) {
Tokenizer source = new LowerCaseTokenizer(version, reader);
return new TokenStreamComponents(source, new PorterStemFilter(source));
}
}
Please note that you MUST use the same custom Analyzer while querying which is used for indexing as well.
You may find the sample code for your version of Lucene in the corresponding PorterStemFilter documentation.

Return Only Certain Fields From Lucene Search

I'm using Lucene to search an index and it works fine. My only issue is that I only need one particular field of what is being returned. Can you specify to Lucene to only return a certain field in the results and not the entire document?
This is why FieldSelector class exists.
You can implement a class like this
class MyFieldSelector : FieldSelector
{
public FieldSelectorResult Accept(string fieldName)
{
if (fieldName == "field1") return FieldSelectorResult.LOAD_AND_BREAK;
return FieldSelectorResult.NO_LOAD;
}
}
and use it as indexReader.Document(docid,new MyFieldSelector());
If you are interested in loading a small field, this will prevent to load large fields which, in turn, means a speed-up in loading documents. I think you can find much more detailed info by some googling.
What do you mean "return certain fields"? The Document.get() function returns just the field you request.
Yes, you can definitely do what you are asking. All you have to do is include the field name (case-sensitive) in the document.get() method.
string fieldNameText = doc.Get("fieldName");
FYI, it's usually a good idea to include some code in your questions. It makes it easier to provide a good answer.

How to Inner Join an UDF Function with parameters using SubSonic

I need to query a table using FreeTextTable (because I need ranking), with SubSonic. AFAIK, Subsonic doesn't support FullText, so I ended up creating a simple UDF function (Table Function) which takes 2 params (keywords to search and max number of results).
Now, how can I inner join the main table with this FreeTextTable?
InlineQuery is not an option.
Example:
table ARTICLE with fields Id, ArticleName, Author, ArticleStatus.
The search can be done by one of more of the following fields: ArticleName (fulltext), Author (another FullText but with different search keywords), ArticleStatus (an int).
Actually the query is far more complex and has other joins (depending on user choice).
If SubSonic cannot handle this situation, probably the best solution is good old plain sql (so there would be no need to create an UDF, too).
Thanks for your help
ps: will SubSonic 3.0 handle this situation?
3.0 can do this for you but you'd need to make a template for it since we don't handle functions (yet) out of the box. I'll be working on this in the coming weeks - for now I don't think 2.2 will do this for you.
I realize your question is more complex than this, but you can get results from a table valued function via SubSonic 2.2 with a little massaging.
Copy the .cs file from one of your generated views into a safe folder, and then change all the properties to match the columns returned by your UDF.
Then, on your collection, add a constructor method with your parameters and have it execute an InlineQuery.
public partial class UDFSearchCollection
{
public UDFSearchCollection(){}
public UDFSearchCollection(string keyword, int maxResults)
{
UDFSearchCollection coll = new InlineQuery().ExecuteAsCollection<UDFSearchCollection>("select resultID, resultColumn from dbo.udfSearch(#keyword, #maxResults)",keyword,maxResults);
coll.CopyTo(this);
coll = null;
}
}
public partial class UDFSearch : ReadOnlyRecord<UDFSearch>, IReadOnlyRecord
{
//all the methods for read only record go here
...
}
An inner join would be a little more difficult because the table object doesn't have it's own parameters collection. But it could...

Resources