I have table "class". This class has a field: "numStudents". I indexed numStudents field.
Now I want to search all classes have numStudents = 10, for example. How can I do this?
Please give me easiest way, modify solrconfig.xml or schema.xml is good for me. Thanks!
You probably want to filter out all the classes which has 10 students without actually scoring the documents.
You can use the Filter Query fq=numStudents:10 to filter out the classes.
FilterQuery is able to take advantage of the FilterCache, which would be a huge performance boost in comparison to your queries.
Related
Lets say I have my list of ingredients:
{'potato','rice','carrot','corn'}
and I want to return lists from a database that are most similar to mine:
{'beans','potato','oranges','lettuce'},
{'carrot','rice','corn','apple'}
{'onion','garlic','radish','eggs'}
My query would return this first:
{'carrot','rice','corn','apple'}
I've used Solr, and have looked at CloudSearch, ElasticSearch, Algolia, Searchify and Swiftype. These engines only seem to let me put in one query string and then filter by other facets.
In a real scenario my search list will be about 200 items long and will be matching against about a million lists in my database.
What technology should I use to accomplish what I want to do?
Should I look away from search indexers and more towards database-esque things like mongo, map reduce, hadoop... All I know are the names of other technologies and I just need someone to point me in the right direction on what technology path I should be exploring for this.
With so much data I can't really loop through it, I need to query everything at once.
I wonder what keeps you from trying it with Solr, as Solr provides much of what you need. You can declare the field as type="string" multiValued="true and save each list item as a value. Then, when querying, you specify each of the items in the list to look for as a search term for that field, and Solr will – by default – return the closest match.
If you need exact control over what will be regarded as a match (e.g. at least 40% of the terms from the search list have to be in a matching list) you can use the mm EDisMax parameter, cf. Solr Wiki
Having said that, I must add that I’ve never searched for 200 query terms (do I unerstand correctly that the list whose contents should be searched will contain about 200 items?) and do not know how well that performs. But I guess that setting up a test core and filling it with random lists using a script should not take more than a few hours, so it should be possible to evaluate the performance of this approach without investing too much time.
We have 4 different data sets and want to perform faceted search on them.
We are currently using SolrCloud and flattened these data sets before indexing them to Solr. Even though we have relational data, our primary goal is faceted search and Solr seemed like the right option.
Rough structure of our data:
Dataset1(col1, col2, col3,col4)
Dataset2(col1,col6,col7,col8)
Dataset3(col6,col9,col10)
Flattened dataset: dataset(col1,col2,col3,col4,col6,col7,col8,col9,col10).
In the end, we flattened them to have one common structure and have nulls where values do not exist. So far Solr works great.
Problem: Now we have additional data sets coming in and each of them have about 50-60 columns. Technically, I can still flatten these too, but I don't think it is a good idea. I know that I can have different collections with different schemas for each data set. But, we perform group by's on these documents so we need one schema.
Is there any way to maintain documents with a subset of fields of the schema under one collection without flattening them? If not, is there a better solution for this problem?
For instance:
DocA(field1, field2) DocB(field3,field4).
Schema(field1, field2, field3, field4).
Can we have DocA and DocB under one collection with the above schema?
Our backend is on top of Cloudera Hadoop (CDH4.6 and 5.2) distribution and we can choose any tool that belongs to the Hadoop ecosystem for a possible solution.
Of course you can, they only need a different uniquekey for each document. If you have defined a fixed solr schema, maybe dynamicfields can help you.
I have been tasked to use lucene to search in our product table. I have created an index and am searching using a QueryParser with multiple fields, but the results are not what I require.
I have a product that is stored as LM10, but I want to be able to find it if the search term is LM 10, but it also must be able to match if the search term is Fred LM10 or Fred LM 10.
Any ideas how I can do this in Lucene.
Thanks in advance
use a Tokenizer that splits tokens on word/number changes, apply it both at index and query time. You might use solr.WordDelimiterFilterFactory and avoid having to write a custom one.
/select/?q=*:*&rows=100&facet=on&facet.field=category
I have around 100 000 documents indexed. But I return only 100 documents using rows=100. The facet counts returned for category, however return the counts for all documents indexed.
Can we somehow restrict the facets to the result set returned? i.e 100 rows only?
I don't think it is possible in any direct manner, as was pointed out by Pascal.
I can see two ways to achieve this:
Method I: do the counting yourself visiting the 100 results returned. This is very easy and fast if they are categorical fields, but harder if they are text fields that need to be tokenized, etc.
Method II: do two passes:
Do a normal query without facets (you only need to request doc ids at this point)
Collect all the IDs of the documents returned
Do a second query for all fields and facets, adding a filter to restrict result to those IDs collected in setp 2. Something like:
select/?q=:&facet=on&facet.field=category&fq=id:(312
OR 28 OR 1231 ...)
The first is way more efficient and I would recommend for non-textual filds. The second is computationally expensive but has the advantage of working for all types od fields.
Sorry, but i don't think it is possible. The facets are always based on all the documents matching the query.
Not a real answer but maybe better than nothing: the results grouping feature (check out from trunk!):
http://wiki.apache.org/solr/FieldCollapsing
where facet.field=category is then similar to group.field=category and you will get only as much groups ('facet hits') as you specified!
If you always execute the same query (q=*:*), maybe you can use facet.limit, for example:
select/?q=*:*&rows=100&facet=on&facet.field=category&facet.limit=100
Tell us if the order that solr uses is the same in the facet as in the query :.
Hi I am building a search application using lucene. Some of my queries are complex. For example, My documents contain the fields location and population where location is a not-analyzed field and population is a numeric field. Now I need to return all the documents that have location as "san-francisco" and population between 10000 and 20000. If I combine these two fields and build a query like this:
location:san-francisco AND population:[10000 TO 20000], i am not getting the correct result. Any suggestions on why this could be happening and what I can do.
Also while building complex queries some of the fields that I am including are analyzed while others are not analyzed. For instance the location field is not analyzed and contains terms like chicago, san-francisco and so on. While the summary field is analyzed and it generally contains a descriptive paragraph.
Consider this query:
location:san-francisco AND summary:"great restaurants"
Now if I use a StandardAnalyzer while searching I do not get the correct results when the location field contains a term like san-francisco or los-angeles (i.e it cannot handle the hyphen in between) but if I use a keyword analyzer for the query I do not get correct results either because it cannot search for the phrase "great restaurants" in the summary field.
First, I would recommend tackling this one problem at a time. From my reading of your post, it sounds like you have multiple issues:
You're unsure why a particular query
is not returning any results.
You're unsure why some fields are not being analyzed.
You're having problems with the built-in analyzers dealing with
hyphens.
That's how your post reads. If that's correct, I would suggest you post each question separately. You'll get better answers if the question is precise. It's overwhelming trying to answer your question in the current format.
Now, let me take a stab in the dark at some of your problems:
For your first problem, if you're getting into really complex queries in Lucene, ask yourself whether it makes sense to be doing these queries here, rather than in a proper database. For a more generic answer, I'd try isolating the problem by removing parts of the query until you get results back. Once you find out what part of the query is causing no results, we can debug that further.
For the second problem, check the document you're adding to Lucene. Lucene provides options to store data but not index it. Make sure you've got the right option specified when adding fields to the document.
For the third problem, if the built-in analyzers don't work out for you, breaking on hyphens, just build your own analyzer. I ran into a similar issue with the '#' symbol, and to solve the problem, I wrote a custom analyzer that dealt with it properly. You could do the same for hyphens.
You should use PerFieldAnalyzerWrapper. As the name suggests, you can use different analyzers for different field. In this case, you can use KeywordAnalyzer for city name and StandardAnalyzer for text.