Lucene.Net -
Is there a way to query for documents that contain a particular field.
Lets say some of my documents have a field 'foo' and some do not.
I want to find all documents that have the field 'foo' - regardless of what the value of foo is.
How do I do this? Is it some sort of TermQuery?
Try foo:[* TO *]
should work for all non-null values of field 'foo'
Related
I have inserted some json data into mongodb and I wanted to perform a simple search by matching only the values irrespective of the keys (Since keys are different for different documents) and wanted to return the id of the document. I don't know how to compare only by values in mongodb.
Example: Suppose if am searching for word "Knowledge" it should return all the ids of the document which contain the word "Knowledge" irrespective of its key value.
You need to use Wildcard Text Indexes.
db.collection.createIndex( { "$**": "text" } )
If there is a static superset of fieldnames, you may find text indexes and the $text query operator useful for word-based searches.
Create the text index on every potential field, and those contained in each document will be included.
So for example we have field 1 up to 10. I want to index all the field in Azure Search, so you can filter, search on those filters.
My Question is, is there a way to just exclude the fields that are NULL from a specific ID, so not store them in Azure search? See example underneath.
The data itself is initially stored in Azure Cosmos Database.
In Azure Cosmos DB it would like this:
Id 1
field 1: a
field 2: b
field 5: c
field 6: d
field 8: e
Id 2
field 3: a
field 2: b
field 5: c
field 9: d
field 10: e
However in Azure Search Index, it looks like this:
Id 1
field 1:a
field 2:b
field 3:NULL
field 4:NULL
field 5:c
field 6:d
field 7:NULL
field 8:e
field 9:NULL
field 10:NULL
Id 2
field 1:NULL
field 2:b
field 3:a
field 4:NULL
field 5:c
field 6:NULL
field 7:NULL
field 8:NULL
field 9:d
field 10:e
The shortest answer to your question is "no", but it's a little deeper than that.
When you add documents to an Azure Cognitive Search index, the values of each field are stored in a data structure called an inverted index. This stores a dictionary of terms found in the field, and each entry contains a list of document IDs containing that term. It is somewhat similar to a column-oriented database in that regard. The null value that you see in document JSON is never actually stored in the inverted index. This can make it expensive to test whether a field is null, since the query needs to look for all document IDs not contained in the inverted index, but it is perfectly efficient in terms of storage (because it doesn't consume any).
This article has a few simplified examples of how inverted indexes work, although it's about a different topic than your question.
Your broader concern about having many fields defined in your index is a valid one. There is a tradeoff between schema flexibility and resource utilization as you increase the number of fields in your index. However, this is due to the bookkeeping overhead required for each field, not the "number of nulls in the field" (which doesn't really mean anything since nulls aren't stored).
From your question, it sounds like you're trying to model different "entity types" in the same index, resulting in a sparse index where some subset of the documents have one subset of fields defined, while another subset of documents have different fields defined. This is a scenario that we want to better support in the service. One promising future direction could be supporting multi-index query, so each subset of your schema could have its own index with its own distinct (but perhaps overlapping) set of fields. This is not on our immediate roadmap, but it's something we want to investigate further. Please vote on this User Voice item to help us prioritize.
As far as not saving the null values, AFAIK it is not possible. An index in Cognitive Search has a pre-defined schema (much like a relational database table) and based on an attribute's data type an attribute's value will be initialized with a default value (null for most of the data types).
If your concern is storage, it's not a problem since it's an inverted index.
If you have an issue with the complexity of the JSON data returned, you could implement your own intermediate service that just hides all NULL values from the JSON. So, your application queries your own query service which in turn queries the actual Azure service. Just passing along all parameters as-is. The only difference is that your service removes both the key/value from the JSON to make the responses easier to manage.
The response from search would then appear to be identical to your Cosmos record.
I am trying to search for a document in elasticsearch.
Field name is sum and Field value is 'SUM-123'
I created a bool query to matchQuery('sum','SUM-123').
But it is not returning the exact sum field documents instead it is returning the documents with different field values.
Thanks.
If you want to do an exact string matching, you should not analyzed your index.
More info here
I'm inspecting a Lucene index with Luke.
All documents have a field 'Title' and I would like to do a search for the search expression Title:Power, by which I want to find all documents with a title containing the word Power.
In Luke, I go to the tab "Search" and enter +Title:Power
When searching, there are no results. However, when I search by another field, I do find the document: +ContentType:MyContentType
In the column Title, I can clearly see the value of the document being: Power Quality Guide.
What could be the reasons I'm not finding this document when searching on Title?
There can be a number of reasons. Most common ones:
Title field could just be stored in the index but not indexed for search (Field.Store.YES, Field.Index.NO), unlike for the field for which you can find results (ContentType);
document(s) could be indexed using one analyzer but query is using a different one;
document is indexed using NOT_ANALYZED option which would store a field as a single term
Suppose I search for a query in Field A, and I want to retrive the corresponding fields B and C from my index, how should I go about it? I am using Lucene 3.6.0.
The results of your query will be returned as a set of documents, not fields. Once you've got a document, you can load whichever field contents you're interested in.
One thing that's probably worth watching out for is to ensure that your fields have been "stored".
Good luck,