I read some articles about searching in indexes by numbers, but it does not work for me yet.
more::
i need to search in my documents by number but it does not work.
i create the docuemnt:
$doc1->addField(Zend_Search_Lucene_Field::UnIndexed('id', $id));
and i search the index:
$index->find("id:123");
but it does not work and the result is empty! i have to do that.
i tested this by changing index type to keyword,unstored,text, and unindexed
Here is my Bootstrap::
Zend_Search_Lucene_Analysis_Analyzer::setDefault
(new Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8());
Zend_Search_Lucene_Search_QueryParser::setDefaultEncoding('utf-8');
Zend_Search_Lucene_Analysis_Analyzer::setDefault
(new Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8_CaseInsensitive());
Zend_Search_Lucene_Analysis_Analyzer::setDefault
(new Zend_Search_Lucene_Analysis_Analyzer_Common_TextNum());
I am using this on searching and indexing. Also I commented other settings, but they also didn't work.
Type of id filed should be "Keyword", so:
$doc1->addField(Zend_Search_Lucene_Field::Keyword('id', $id));
UnIndexed fields are not searchable.
Read section: Understanding Field Types Zend Lucene
Related
I am using solr for indexing some documents and then searching. I want to return those documents that have the same start as the search keywords higher in the results. How can i achieve that?
E.g.
If i the search keyword is "php"
and there are two documents with content :
php developer
ajax php
then i want to return 'php developer' first instead of 'ajax php'.
Any suggestions on how to return results in this order?
I am looking for some sort of an analyzer that only indexes the first word from the content of a field and then giving that field a lot of weight while querying. Maybe that can help. I couldnt find such an analyzer for my purposes.
You can boost the first tokens using payload. Refer to the link mentioned in Payloads
I'm creating a Sitecore.Ecommerce.Search.Query using FieldQuery objects. I'm then converting the Sitecore query to a Lucene.Net.Search.Query using the LuceneQueryBuilder class. Everything with the query works fine except for fields where I am trying to match on a empty string.
So... this works:
new FieldQuery(FieldName, "1", MatchVariant.NotEquals)
but this does not:
new FieldQuery(FieldName, string.Empty, MatchVariant.NotEquals)
I have reflected through both the Sitecore.Ecommerce assembly and the Lucene.Net assembly as well but I have not found any obvious issues. But, when I look at the Term that is created and used in the Lucene query, it looks like this:
-FieldName:
which I believe is incorrect... but maybe it is correct and I just don't have the correct field indexes setup... I'm not sure to be honest.
Any help is greatly appreciated.
Thanks!
Lucene does not really support searching for null/empty values. There is nothing indexed for it to find, after all. Lucene uses an inverted index, which makes certain kinds of queries, including pure negation queries and searching for nulls, difficult or even impossible.
If you need to search for documents in which certain fields are null, you should store a default value in the field (for instance "NULL") which you can search for.
That said, you could create
new RangeQuery(FieldName, null, null, true, true);
Which constructs a range query with open upper and lower bounds, so it matches anything that has a value.
Not a good way to do it, but neither is querying with only a negation.
Searching Magento with fulltext search engine and like method , it will store results in catalogsearch_fulltext table in "data_index" field where it stores value in the format like
each searchable attribute is separated with |.
e.g
3003|Enabled|None||Product name|1.99|yellow|0
here it store sku,status,tax class, product name , price ,color etc etc
It stores all searchable attribute value.
Now the issue is for Configurable product , it will also store the associated products name ,price ,status in the same field like
3003|Enabled|Enabled|Enabled|Enabled|None|None|None|None|Product name|Product name|associted Product name1|associted Product name2|associted Product name3|1.99|2.00|2.99|3.99|yellow|black|yellow|green|0|0|0|0
So what happen is if i search for any word from associated product, it will also list the main configurable product as it has the word in its "data_index" field.
Need some suggestion how can i avoid associated products being included in data_index, So that i can have perfect search result.
thanks
We are looking into our search as well and it has been surprising to see the inefficiencies included in the fulltext table. We have some configurable products as well that have MANY variations and their population in the fulltext search is downright horrendous.
As for solutions, I can only offer my approach to fix the problem (not completed: but rather in the process).
I am extending Magento to include an event listener to the process of indexing the products (Because catalog search indexing is when the fulltext database is populated). Once that process occurs, I am writing my own module to remove duplicate entries from the associated products and also to add the functionality of adding additional search keyword terms as populated from a CSV file.
This should effectively increase search speed dramatically and also return more relevent search results. Because as of now, configurable products are getting "search bias" in the search results.
This isn't so much of an answer as a comment, but it was too lengthy to fit in the comments but I thought this might be beneficial to you. Once I get my module working, if you would like, I can possibly give you directions on how you could implement a similar module yourself.
Hope that helped (if only for moral support in magento's search struggle)
Hi
I have a very specific need in my company for the system's search engine, and I can't seem to find a solution.
We have a SOLR index of items, all of them have the same fields, with one of the fields being "Type", (And ofcourse, "Title", "Text", and so on).
What I need is: I get an Item Type and a Query String, and I need to return a list of search suggestion with each also saying how meny items of the correct type will that suggested string return.
Something like, if the original string is "goo" I'll get
Goo 10
Google 52
Goolag 2
and so on.
now, How do I do it?
I don't want to re-query SOLR for each different suggestion, but if there is no other way, I just might.
Thanks in advance
you can try edge n-gram tokenization
http://search.lucidimagination.com/search/document/CDRG_ch05_5.5.6
You can try facets. Take a look at my more detailed description ('Autocompletion').
This was implemented at http://jetwick.com with Solr ... now using ElasticSearch but the Solr sources are still available and the idea is also the identical https://github.com/karussell/Jetwick
The SpellCheckComponent of Solr (that gives the suggestions) have extended results that can give the frequency of every suggestion in the index - http://wiki.apache.org/solr/SpellCheckComponent#Extended_Results.
However, the .Net component SolrNet, does not currently seem to support the extendedResults option: "All of the SpellCheckComponent parameters are supported, except for the extendedResults option" - http://code.google.com/p/solrnet/wiki/SpellChecking.
This is implemented using a facet field query with a Prefix set. You can test this using the xml handler like this:
http://localhost:8983/solr/select/?rows=0&facet=true&facet.field=type&f.type.prefix=goo
CouchDB gives an opportunity to search values from startkey, for exact key-value pair etc
But is there any way to search for substring in specified field?
The problem is like this. Our news database consists of about 40,000 news documents. Say, they have title, content and url fields. We want to find news documents which have "restaurant" in their title. Is there any way to do it?
View Collation wiki page tells nothing :( And it seems strange to me that there's no tool to handle this problem and all I can to do is just parsing JSON results with Python, PHP or smth else. In MySQL it's simply LOCATE() function..
Use couchdb-lucene.
Be careful here. Lucene is not always the best answer.
If your only searching one limited field and only searching for a word like restaurant then lucene which is really meant to tokenize large texts/documents can be way overkill, you can get the same effect by splitting the title.
function(doc){
var stringarray = doc.title.split(" ");
for(var idx in stringarray)
emit(stringarray[idx],doc);
}
Also Lucene and Couchdb do not support substring search, where the string is not in the beginning of a word.