Discovery does not allow nullable date field to be indexed - discovery

I am trying to index JSON data in discovery. The issue comes with a date fields. It seems like that discovery is sensing data types. In my case these date fields might be empty in some cases. Is there a way to over ride this data type detection in discovery and let it allow only sense as String while indexing. Please clarify.
Soumitra

What you can do (well, assuming you have enough control over the JSON), is omit the date field for documents which have no date. For example these two documents will work together in a single Discovery collection.
{
"title": "Document With Date",
"text": "Discovery detects date types to support range queries, sorting and more.",
"updated": "2018-04-26T10:11:12Z"
}
{
"title": "Undated Document",
"text": "Discovery has no trouble with fields that appear in some documents and not others."
}

Related

CouchDB searching linked documents

I'm very new to couchdb and i'm hoping someone can help me with a solution to this problem.
Say I have an address document that contains various keys, but importantly a singleLineAddress and a persons array:
{
"_id": "002cb726bfe69a79ed9b897931000ec6",
"_rev": "2-6af6d8896703e9db6f5ba97abb1ca5d7",
"type": "address",
...
"singleLineAddress": "28 CLEVEDON ROAD, WESTON-SUPER-MARE, BS23 1DG",
...
"persons":["d506d09a1c46e32f6632e6d99a0062bd","002cb726bfe69a79ed9b897931001c80"]
}
Then i have a person document with a number of keys, crucially with firstName & lastName:
{
"_id": "d506d09a1c46e32f6632e6d99a0062bd",
"_rev": "4-98fae966a92d5c6c359cb8ddfaa487e1",
"type": "person",
...
"firstName": "Joe",
"lastName": "Bloggs"
...
}
I understand I can created a linked document view and emit all the person id's linked to address, then I can use include_docs=true to see all the person data. But, from what i'm reading it's not advised to use include_docs=true as it can be expensive.
Ultimately, i'd like to use couchdb-lucene to run a FTS against person # address using the name & address. Is that even possible using linked documents?
Using ?include_docs=true is more expensive than not using it - for every row of the index returned, the database has to fetch the related document body. But sometimes needs must :) You can avoid using ?include_docs=true by "projecting" more data into the index which is returned to you at query time. See https://blog.cloudant.com/2021/11/12/Projection.html
As for Lucene full-text searching, you can certainly search across document types in the same collection but your search results would consist of a mixture of address and people documents - full text searching can't do the "join" between an address and its occupants - you'd have to do that yourself later.
If you desperately need to return address and people objects together, then consider combining the two: your address document would contain an array of people objects that reside there? There is a trade off between combining objects such that data the belongs together is stored together, and keeping every micro object separate for ease of updating.

How to update fields by query in Azure search?

So i'm trying to update record in search index via api which works fine when i provide the index key, e.g.
{
"value": [
{
"#search.action": "merge",
"hotelid": "4618416",
"HotelName":"Gacc Capital"
}
]
}
However due to nature and structure of the index getting created from different databases hence the primary key of the index is not present in all databases.
See below example where field "ContactName" is stored in different database,
"value": [
{
"#search.score": 1,
"HotelId": "124",
"HotelName": "Gacc Capital",
"Description": "Chic hotel near the city. High-rise hotel in downtown, walking distance to theaters, restaurants and shops, complete with wellness programs."
"Category": "Paid",
"Amount": "£123456",
"ContactId": "456",
"ContactName":"Mr David Koh",
]
}
The issue i'm having to update particular field whenever there's a change, for instance if someone changes their name from "Mr David Koh" to "Mr David Warner Koh" i need a way to update all the record where contactid is 456
Is there a way to tackle this problem? or am i missing piece of puzzle before hand!
Not sure if this possibile in azure search sdk (c#) but happy to give it ago if this works better than API.
I assume you have two different types of records with relations. It’s not clear from your question.
To keep relational data updated you could do the data maintenance in an actual database that has a view that resembles what your index looks like. Then index that view.
Alternatively, you could implement the logic yourself. Just query for all record ids that contains a contact with a specific ID and then update each of those records like you did above.

Find differences between current document and previous revision

Is there a way to determine what changes were made in a document? Here's a document and a revision of it
{
"_id": "panel100000",
"_rev": "1-b4f55d0e03fbfaef0822a0607d5d6ad0",
"name": "Maya Jambalaya",
"maritalstatus": "Married",
"employed": "Full time",
"education": "College graduate"
}
{
"_id": "panel100000",
"_rev": "2-caab684a341da5185546a028cfb5b0d9",
"name": "Maya Papaya",
"maritalstatus": "Married",
"employed": "Full time",
"education": "College graduate"
}
In this example, name and maritalstatus have changed. Is there a way to find changes between a document and its previous revisions?
Is there anything built-in that does or could track such changes?
Is it possible to access a document's revision via a design document?
If the answer to #2 is "yes" then does anyone have a template of a design document with which to compare them?
No. If you want to track changes, you would probably need to use a data model adapted for that purpose. Otherwise, Couch keeps revisions of the documents and you can query them to manually calculate the diff. Therefore, there are no guarantees that revisions will not be compacted.
No. Design documents are built with the latest revision of each documents.
...
If you want to be sure to keep every document changes, you would need to create a document for each change. Those changes could be grouped by a uniqueId and you could use a map/reduce to get the latest value of a document. The diff would need to be made manually tho. The advantage would be that you can easily get the state of the document at a certain time.

How to build search with facetting over unknown/unspecified set of attributes/properties?

I'm working on a product search engine with a big set of undefined products which is constantly growing. Each product has different attributes and at this time they're saved in an array of string key-value pairs like this:
"attributes": [
{
"key": "Producttype",
"value": "Headphones - 3.5 mm plug"
},
{
"key": "Weight",
"value": "280 g"
},
{
"key": "Soundmode",
"value": "Stereo"
},
....
]
Each product has also a category. I'm using elasticsearch 2.4.x to persist data that i want to search on via spring-data-elasticsearch. It's possible to upgrade to the newest elasticsearch version if needed.
As you can see the attributes are really generic. It's also needed to use nested objects to be able to search on this attributes. I'm also thinking about preprocessing this attributes to a standardized format. For example the "Weight" key might be written in different forms like "Productweight" or "Weight of product". Because there are a lot of attributes and i wouldn't like to create a custom property/field for each one i thought about about mapping only the important ones (like weight) to a custom, own field and to map the other attributes like described above.
Now if someone searches for example "iphone" i would like to show some facettes on the left of the search result page. The facettes should differ if someone searches "Adidas shoes". Is this possible with the given format above using nested objects? Is it possible to build the facettes dynamically regarding to the resultset elasticsearch is returning? E.g. the most common properties which all result products contain should be used to create facettes. Or do i have to persist some predefined filters/facettes on each category? I think that would be too much work and also doesn't work on search results where products can have different categories. What's the best practice to build a search feature with facetting on entities with n different properties that can grow in future?

couchdb match multiple inconsistent keys

Considering the following two documents:
{
"_id": "a6b8d3d7e2d61c97f4285220c103abca",
"_rev": "7-ad8c3eaaab2d4abfa01abe36a74da171",
"File":"/store/document/scan_bgd123.jpg",
"Commend": "Describes a person",
"DateAdded": "2014-07-17T14:13:00Z",
"Name": "Joe",
"LastName": "Soap",
"Height": "192cm",
"Age": "25"
}
{
"_id": "a6b8d3d7e2d61c97f4285220c103c4a9",
"_rev": "1-f43410cb2fe51bfa13dfcedd560f9511",
"File":"/store/document/scan_adf123.jpg",
"Comment": "Describes a car",
"Make": "Ford",
"Year": "2011",
"Model": "Focus",
"Color": "Blue"
}
How would I find a document based on multiple criteria, say for example "Make"="Ford" and "Color"="Blue". I realize I need a view for this, but I don't know what the key is going to be, and as you can see from the two documents, the key/value pairs aren't consistent. The only consistent item will be the "File" key.
I'm attempting to create couchDB database that will store the location of files, but tagged with Key/Value pairs.
EDIT:
Perhaps I should reconsider my data structure. modify it slightly?
{
"_id": "a6b8d3d7e2d61c97f4285220c103c4a9",
"_rev": "1-f43410cb2fe51bfa13dfcedd560f9511",
"File": "/store/document/scan_adf123.jpg",
"Tags": {
"Comment": "Describes a car",
"Make": "Ford",
"Year": "2011",
"Model": "Focus",
"Color": "Blue"
}
}
So, I need to find by the Key>Value pair in the tag or any number of Key>Value pairs to filter which document I want. The problem here is, I want to tag objects with a key>value pair. These tags could be very different per view, so the next document will have a whole diff set of Key>Value pairs.
Couchdb supports flexible schema. There is no need for the documents to be consistent for them to be query-able. The view for your scenario is pretty straightforward. Here is the map function that should do the trick.
function(doc){
if(doc.Make&&doc.Color)
emit([doc.Make,doc.Color],null);
}
This gives you a view which you can then query like
/view-name/key=["Ford","Blue"]&include_docs=true
This should give you the desired result.
Edit based on comment
For that you will need two separate views. Every view in couchdb is designed to fulfil a specific query need. This means that you have to think about access strategy of your data. It is a lot more work on your part initially but for the trouble you are rewarded with data that is indexed and has very fast access times.
So to answer your question directly. Create two views. One for Make like we have already done and other for Name like
function(doc){
if(doc.Name&&doc.LastName)
emit([doc.Name,doc.Name],null);
}
Now the Name view will index only those documents that have name in it. Where as Make view will index those documents that have make in it.
What happens when a requirement comes in future for which you don't have a query?
You can try a few things.
This is probably the easiest solution. Use couchdb-lucene for your dynamic queries. In this case your architecture will be like couchdb views for queries that you know your application would need. Lucene index for queries that you don't know you might need. So for instance you have indexed name and last name in the in couchdb query. But a requirement arises and you might need to query by age then simply dump the age field in lucene and it will take care of the rest.
Another approach is using the PPP technique where you exploit the fact that creating views is a one time cost and you can create views on less active hours and deploy them in a production service once they are built.
Combine steps 1 and 2! lucene to handle adhoc request while you are building views using the ppp technique.

Resources