Azure Search indexer does not pick up update to null - azure

In the cosmos db I have array element "X" and it's mapped to in the index field "X" Collection(Edm.string). If I update X to null then changes does not reflect in the Azure search. Indexer picks up the timestamp change but actual value does not show up in the index. (but if you assign any value, it's show up in the index). Basically when you make your element null, index does not show "null" instead it shows old value.
Any idea how i can fix this?

Set the field to an empty array.
Setting a fields value to null removes it from the index (in contrast to setting it to an empty string or 0).
Collections are, well, collections of values. You can remove values from the collection, but not the collection itself.

Related

Is there a way to exclude NULL values from Azure Cognitive Search Indexes

So for example we have field 1 up to 10. I want to index all the field in Azure Search, so you can filter, search on those filters.
My Question is, is there a way to just exclude the fields that are NULL from a specific ID, so not store them in Azure search? See example underneath.
The data itself is initially stored in Azure Cosmos Database.
In Azure Cosmos DB it would like this:
Id 1
field 1: a
field 2: b
field 5: c
field 6: d
field 8: e
Id 2
field 3: a
field 2: b
field 5: c
field 9: d
field 10: e
However in Azure Search Index, it looks like this:
Id 1
field 1:a
field 2:b
field 3:NULL
field 4:NULL
field 5:c
field 6:d
field 7:NULL
field 8:e
field 9:NULL
field 10:NULL
Id 2
field 1:NULL
field 2:b
field 3:a
field 4:NULL
field 5:c
field 6:NULL
field 7:NULL
field 8:NULL
field 9:d
field 10:e
The shortest answer to your question is "no", but it's a little deeper than that.
When you add documents to an Azure Cognitive Search index, the values of each field are stored in a data structure called an inverted index. This stores a dictionary of terms found in the field, and each entry contains a list of document IDs containing that term. It is somewhat similar to a column-oriented database in that regard. The null value that you see in document JSON is never actually stored in the inverted index. This can make it expensive to test whether a field is null, since the query needs to look for all document IDs not contained in the inverted index, but it is perfectly efficient in terms of storage (because it doesn't consume any).
This article has a few simplified examples of how inverted indexes work, although it's about a different topic than your question.
Your broader concern about having many fields defined in your index is a valid one. There is a tradeoff between schema flexibility and resource utilization as you increase the number of fields in your index. However, this is due to the bookkeeping overhead required for each field, not the "number of nulls in the field" (which doesn't really mean anything since nulls aren't stored).
From your question, it sounds like you're trying to model different "entity types" in the same index, resulting in a sparse index where some subset of the documents have one subset of fields defined, while another subset of documents have different fields defined. This is a scenario that we want to better support in the service. One promising future direction could be supporting multi-index query, so each subset of your schema could have its own index with its own distinct (but perhaps overlapping) set of fields. This is not on our immediate roadmap, but it's something we want to investigate further. Please vote on this User Voice item to help us prioritize.
As far as not saving the null values, AFAIK it is not possible. An index in Cognitive Search has a pre-defined schema (much like a relational database table) and based on an attribute's data type an attribute's value will be initialized with a default value (null for most of the data types).
If your concern is storage, it's not a problem since it's an inverted index.
If you have an issue with the complexity of the JSON data returned, you could implement your own intermediate service that just hides all NULL values from the JSON. So, your application queries your own query service which in turn queries the actual Azure service. Just passing along all parameters as-is. The only difference is that your service removes both the key/value from the JSON to make the responses easier to manage.
The response from search would then appear to be identical to your Cosmos record.

Arango AQL use index to search for existing property

I want to get all object with available property "available"
For u in col
filter u.available!= null
return u
But the above query would not use index even if u.available is indexed. How can I iterate with the use of index?
With comment from #mpoeter , I can create an index that are specifically spare to force arango to use index.

MongoDB API pagination

Imagine situation when a client has feed of objects with limit 10.
When the next 10 are required it sends request with skip 10 and limit 10.
But what if there are some new objects were added (or deleted) to collection since the 1st request with offset == 0.
Then on 2nd request (with offset == 10) response may have wrong objects order.
Sorting on time of their creation does not work here, because I have some feeds which are formed on sorting via some numeric field.
You can add a time field like created_at or updated_at. It must updated when ever the document is created or modified and the field must be unique.
Then query the DB for the range of time using $gte and $lte along with a sort on this time field.
This ensures that any changes made outside the time window will not get reflected in the pagination, provided that the time field does not have duplicates. Most probably if you include microtime, duplicates wont happen.
It really depends on what you want the result to be.
If you want the original objects in their original order regardless of Delete and Add operations then you need to make a copy of the list (or at least of the order) and then page through that. Copy every Id to a new collection that doesn't change once the page has loaded and then paginate through that.
Alternatively, and perhaps more likely, what you want is to see the next 10 after the last one in the current set including any Delete or Add operations that have take place since. For this, you can use the sorted order in which you are viewing them and a filter, $gt whatever the last item was. BUT that doesn't work when there are duplicates in the field on which you are sorting. To get around that you will need to index on that field PLUS some other field which is unique per record, for example, the _id field. Now, you can take the last record in the first set and look for records that are $eq the indexed value and $gt the _id OR are simply $gt the indexed value.

Add variation to the Search Results based on a solr field

I have a solr field which has a set of values. Is it possible in solr to return results that are varied based on that field.
Eg: My field contains "ValueA","ValueB" and "ValueC". So if rows is set to 3 then instead of returning all results from "ValueA" it should give me one from each field value (Considering they have the same scores)
You might want to use the Result Grouping / Field Collapsing
or the CollapsingQParserPlugin.
The CollapsingQParserPlugin is newer (since Solr 4.6), faster and more appropriate for your problem, I guess, as it does not effect the structure of the results.
Just add this to your solrconfig.xml:
<queryParser name="collapse" class="org.apache.solr.search.CollapsingQParserPlugin"/>
You can then collapse your result by adding the following parameter to your query:
fq={!collapse field=my_field}
or in Solrj:
solrQuery.addFilterQuery("{!collapse field=my_field}");
Collapsing means: For each value in my_field it only retains the document with the highest score in the result set.

CRecordset::IsFieldNull method replacement in bulk row fetching

I have query related to bulk row fetching using CRecordSet (MFC ODBC).
On the MSDN page, it is written that
The member functions IsDeleted, IsFieldDirty, IsFieldNull, IsFieldNullable, SetFieldDirty, and SetFieldNull cannot be used on recordsets that implement bulk row fetching. However, you can call GetRowStatus in place of IsDeleted, and GetODBCFieldInfo in place of IsFieldNullable.
Now, I want to check whether a field contains "NULL"/"has no value" data. How can I check this as the IsFieldNull function does not work in bulk row fetching?
There is the difference between IsFieldNull and IsFieldNullable function.
So logically you will not be able to know whether a filed is null for a particular row since you are doing bulk row fetching. But you can only determine whether a particular field is nullable which simply means if that field is capable of accepting null values.
The CODBCFieldInfo structure contains information about the fields in an ODBC data source.
It has a member called m_nNullability which identifies Whether the field accepts a Null value. This can be one of two values: SQL_NULLABLE if the field accepts Null values, or SQL_NO_NULLS if the field does not accept Null values.
So pass the object of CODBCFieldInfo structure to CRecordset::GetODBCFieldInfo function which collects the object by reference. So no need to worry, you will get the updated value back and then check the member m_nNullability value of that object to only know whether the filed is nullable and not whether a field for a particular row is null.
http://msdn.microsoft.com/en-us/library/xexc6xef(v=vs.80).aspx
http://msdn.microsoft.com/en-us/library/k50dcc9s(v=vs.80).aspx
CRecordset::GetODBCFieldInfo function has two versions. One version of the function lets you look up a field by name. The other version lets you look up a field by index.

Resources