Solr not allow search free text (semantic search) - search

I indexed the data in solr from database and want to do a free text search in all the columns indexed.
Do not want to provide column names.

Add a catch-all copyField instruction (for example source="*" dest="_text_"). This will make sure that all content is copied into the _text_ field. Make your queries search against this field.
q=foo bar&qf=_text_
The _text_ field is usually already defined, but otherwise configure it as a text field.

If you are using eDisMax or DisMax parser you can use the qf parameter to indicate what fields will be searched.
The general syntax is (via query string parameters) is:
q="hello+world"&qf=field1+field2+field3&defType=edismax
You can set this value directly on your solr_config.xml so that you don't have to pass it on every request. If you do, then your query will just be:
q="hello+world"&defType=edismax

Related

MarkLogic Text Search: Return Results Based on Matches Inside Attributes

I am using Marklogic's search:search() functions to handle searching within my application, and I have a use case where users need to be able to perform a text search that returns matches from an attribute on my document.
For example, using this document:
<document attr="foo attribute value">Some child content</document>
I want users to be able to perform a text search (not using constraints) for "foo", and to return my document based on the match within the attribute #attr. Is there some way to configure the query options to allow this?
Typing in attr:"foo" is not a workable solution, so using attribute range constraints won't help, and users still need to be able to search for other child content not in the attribute node. I'm thinking perhaps there is a way to add a cts:query OR'd into the search via the options, that allows this attribute to be searched?
Open to any and all other solutions.
Thanks!
Edit:
Some additional information, to help clarify:
I need to be able to find matches within the attribute, and elsewhere within the content. Using the example above, searches for "foo", "child content", or "foo child content" should all return my document as a result. This means that any query options that are AND'd with the search (like <additional-query>, which is intended to help constrain your search and not expand it) won't work. What I'm looking for is (likely) an additional query option that will be OR'd with the original search, so as to allow searching by child node content, attribute content, or a mix of the two.
In other words, I'd like MarkLogic to treat any attribute node content exactly the same as element text nodes, as far as searching is concerned.
Thanks!!
You could accomplish this search with a serialized element-attribute-word cts query in the additional-query options for the search API. The element attribute word query will use the universal index to match individual tokens within attributes.
In MarkLogic 9 You may be able to use the following to perform your search:
import module namespace search = "http://marklogic.com/appservices/search"
at "/MarkLogic/appservices/search/search.xqy";
search:search("",
<options xmlns="http://marklogic.com/appservices/search">
<additional-query>
<cts:element-attribute-word-query xmlns:cts="http://marklogic.com/cts">
<cts:element>document</cts:element>
<cts:attribute>attr</cts:attribute>
<cts:text>foo</cts:text>
</cts:element-attribute-word-query>
</additional-query>
</options>
)
MarkLogic has ways to parse query text and map a value to an attribute word or value query.
First, you can use cts:parse():
http://docs.marklogic.com/guide/search-dev/cts_query#id_71878
http://docs.marklogic.com/cts.parse
Second, you can use search:search() with constraints defined in an XML payload:
http://docs.marklogic.com/guide/search-dev/query-options#id_39116
http://docs.marklogic.com/guide/search-dev/appendixa#id_36346
I'd look into using the <default> option of <term>. For details see http://docs.marklogic.com/guide/search-dev/appendixa#id_31590
Alternatively, consider doing query expansion. The idea behind that is that a end user send a search string. You parse it using search:parse of cts:parse (as suggested by Erik), and instead of submitting that query as-is to MarkLogic, you process the cts:query tree, to look for terms you want to adjust, or expand. Typically used to automatically blend in synonyms, related terms, or translations, but could be used to copy individual terms, and automatically add queries on attributes for those.
HTH!

Hybris: Use same field for search and facet

I have to use a field "manufacturerName" for both solr search and solr facet in Hybris. While the solr free text search requires the field type to be text, the facet only works properly in string type.
Is there any way to use this same field for both search and facet. I think there is one way by using "copyField" but I searched a lot, and still don't know how to use it?
Any help would be highly appreciated!
PS: On keeping the field type string, free text search doesn't fetch proper results. On keeping the field type text, facet shows truncated values.
Using a copyField instruction is the way to go, but that require you to define an alternative field - meaning you have one field with the type text and the associated tokenization, and one field of the type string which isn't processed in any way. There is no way in Solr to combine these in a single field that I know of.
You'll then use the name of the string field to generate the facets, while you use the other field when you're querying.
<copyField source="text_search_field" dest="string_facet_field" />
You'll then have to refer to the name string_facet_field when you're filtering or faceting on the field. You'll want to filter against the facet field after the user selects a facet, since you otherwise would end up with documents from other facets possibly leaking into your document result set (for example if the facet was "Foo Bar", you'd suddenly get documents that had "Baz Foo Bar Spam" as the facet, since both words are present in the search string.
I was not able to implement the "copyField" approach, but I found another easy way to do this. In solr.impex, I had already added my new field manufacturerNameFacet of type string, but there is a parameter "fieldValueProvider" and "valueProviderParameter". I provided these values as "springELValueProvider" and the field I wanted to use for search and facet "manufacturerName". After a solr full indexing, it worked like a charm. No other setting was required. The search and facet both were working as expected.

wildcard searches on specific elements only

I am looking for a way to do wildcard search only on specific elements when executing a search:search. Specifically, I might have documents that look like the following:
<pdbe:person-envelope xmlns:pdbe="http://schemas.abbvienet.com/people-db/envelope">
<person xmlns="http://schemas.abbvienet.com/people-db/model">
<costcenter>
<code>0000601775</code>
<name>DISC-PLAT INFORM</name>
</costcenter>
<displayName>Tj Tang</displayName>
<upi>10025613</upi>
<firstName>
<preferred>TJ</preferred>
<given>Tze-John</given>
</firstName>
<lastName>
<preferred>Tang</preferred>
<given>Tang</given>
</lastName>
<title>Principal Research Scientist</title>
</person>
<pdbe:raw/>
</pdbe:person-envelope>
When searches happen, I want the search text to be automatically wildcarded, but only for certain elements like displayName, firstName, lastName, but NOT for upi or code. As I understand it, I would have certain wildcard related indexes enabled in the database, but then I would need to have a custom query parser that rewrite the query into multiple cts:element-query and cts:element-value-query statements for each element that I want to wildcard search on, OR'd with the originally parsed search query. Or I can create field constraints, and rewrite the query to use field contraints.
Is there another way to conditionally search using wildcard on some elements but not others, when the user is entering as simple search query?, i.e. partial first and last name, "TJ Tan", but no partial hits when I search "100256".
You are on the right track. Lets take an element (or maybe field) query on "TS Tan"
With cts:tokenize, you can break this up (read about cs:tokenize - it is not just a normal tokenizer).
Then I have "TS" and "Tan"
You can the do things like apply business rules on which word should be wild-carded and which not and build the appropriate cts query (probably individual word queries in an and statement - or a near query - tuning depends on your need).
Now with search phrase tokenized, you can also consider that you may find building your results relies not on a wildcard index, but on a an element word lexicon - where you do your term-expansion with word-matches and those terms are then sent to the query.
We sometimes take that further and combine the query building with xdmp:estimate and make the query less restrictive if we don't get enough results early on.
Where to put this logic?
You mention search:search, so in this case, I would suggest you package this into a custom constraint.

querieng document which doesn't have a given field or is empty string in Solr

I am doing a query with solr where I need to find documents without a given field say 'name' and I am trying following part;
$q=+status:active -name:["" TO *]'
But it sends both all the documents with and without that field.
Can anyone help me figure this out?
the field name is a normal String type and is indexed.
I am using nodejs. Can anyone help me with this
According to docs:
-field:[* TO *] finds all documents without a value for field
Update
I tried it but it sends even the ones with the field non empty
Then my wild quess is that you are using search query q instead of using filter query fq. Since you are using multiple statements in query I assume that q does some extra magic to get the most relevant documents for you, which can lead to returning some non-wanted results.
If you want to get the strict set of results you should use filter query fq instead, see docs.

Lucene wildcard applied to indexed field

I have a set of indexed fields such as these:
submitted_form_2200FA17-AF7A-4E44-9749-79D3A391A1AF:true
submitted_form_2398389-2-32-43242423:true
submitted_form_54543-32SDf-3242340-32422:true
And I get that it's possible to wildcard queries such as
submitted_form_2398389-2-32-43242423:t*e
What I'm trying to do is get "any" submitted form via something like:
submitted_form_*:true
Is this possible? Or will I have to do a stream of "OR"s on the known forms (which seems quite heavy)
That's not the intended use of fields, I think. Field names aren't supposed to be the searchable values, field values are. Field names are supposed to be known a priori.
My suggestion is (if possible) to store the second part of the name as the field value, for instance: submitted_form:2398389-2-32-43242423. submitted_from would be the field known a priori, and the value could eventually be searched with a PrefixQuery.
Anyway, you could access the collection of fields' names using IndexReader.getFieldNames() in Lucene 3.x and this in Lucene 4.x. I wouldn't expect search performance there.

Resources