Azure Search with documentDB does not find data - azure

It's the first time that i'm using Azure Search. I follewed the example with the generated dataset. Now I want to implement Azure search on my database.
This is an example of an item of the collection I want to index.
{
"_id" : "Watch",
"name" : "Watch",
"cloudProvider" : "azure",
"channel" : "C6SELFQMD",
"services" : [
"azure-backup",
"azure-data-lake-analytics",
"backup",
"blobstorage",
"site-recovery",
"storage"
],
"__v" : 0
}
Azure search itself doesn't even detect the fields.
This are the steps i'm doing:
If I add the fields manually it returns useless data. Does somebody know why this is happening? I do only have 2 items in my collection at the moment but I don't think that this is the problem?
UPDATE:
So the problem is that I've an underscore before my id ("_id"). Now I'm trying to use fieldMappings to solve this issue. But the api's response is:
{
"error": {
"code": "",
"message": "Data source does not contain column '_id', which is required because it maps to the document key field 'id' in the index 'index'. Ensure that the '_id' column is present in the data source, or add a field mapping that maps one of the existing column names to 'id'."
}
}

Azure Search currently does not support Cosmos DB Table API accounts, as seems to be the case here.
If you want to see Table API supported by Azure Search, please vote for Azure Search should be able to index Cosmos DB Table API collections to help us prioritize that work.

Related

How to map blob file "content" to an existing "content" field in an index based on blob metadata_storage_path property?

I am trying to create an index using Azure SQL and Azure Blob Data source. Blob container contains files in following formats word, pdf, pptx, txt, and etc.
Click here for Index Structure
"ItemId" is the Key field in the index, data is pulled from item table in the db.
"DocumentList" is a collection which holds files metadata including file storage path.
"DocumentList" is derived from an SQL Json array column. Json column holds the files metadata against item.
Files are stored in blob and blob path is stored in the above json column under "DocumentLocation" property.
Note: Each row in db. can hold multiple files in the blob.
Questions:
How to map the blob "content" to the "Content" field under "DocumentList" field in the index using "DocumentLocation" as the basis for joining?
Can we define field mapping or output field mapping for above scenario. if it is possible, how to do that?
Any other approach to above scenario ?
Any suggestions are much appreciated.
Using the current code snippet, make the name changes according to the requirement.
"fieldMappings": [
{
"sourceFieldName": "metadata_storage_path",
"targetFieldName": "metadata_storage_path"
},
{
"sourceFieldName": "metadata_storage_path",
"targetFieldName": "index_key",
"mappingFunction": {
"name": "base64Encode"
}
}
]
For better understanding, follow the procedure mentioned in the below link.
https://learn.microsoft.com/en-us/azure/search/search-indexer-field-mappings

using pdf files and their metadata for Azure cognitive search

I'm uploading hundreds of PDF files into blob storage to be used in Azure cognitive search.
I would like the user to be able to get the title and author of these PDF files on top of their search results.
I'm not sure how the metadata for these PDF files (e.g., 'author', 'date', 'title') can be added (e.g., as a json file) to the blob storage.
Any advice would be appreciated.
Thanks
I'm from the Microsoft for Founders Hub team. Azure blob storage has blob properties and metadata built into it! You can view and add metadata through various tools including the Azure Portal, CLI, PowerShell, or the REST API. To learn more, here's a great area to get started:
View Blob Properties and Meta Data using Azure Tools
Add Blob Metadata using Azure tools and code
If you would like the title, author and date to be returned in your search results, you can add them to the index. Thus, you can create fields called author, title and date in your index. Then, in the indexer, you can return the specific metadata for PDF's, as mentioned here, like this:
indexer= {
"name":...,
"dataSourceName":...,
"targetIndexName":...,
"skillsetName":...,
"fieldMappings": [
{
....
},
{
"sourceFieldName": "metadata_title",
"targetFieldName": "title"
},
{
"sourceFieldName": "metadata_creation_date",
"targetFieldName": "date"
},
{
"sourceFieldName": "metadata_author",
"targetFieldName": "author"
}
],
"outputFieldMappings": [
...
]
...
}
Where the "..." means that you add your own code.
Of course, the PDF's should have the metadata, otherwise it will return an empty value [].
You can then access the fields like you'd normally do for content for example.
NOTE: if you happen to put a null mappingFunction for the title, date and author, you might also get a []. If you don't use it, best remove it.

How to update fields by query in Azure search?

So i'm trying to update record in search index via api which works fine when i provide the index key, e.g.
{
"value": [
{
"#search.action": "merge",
"hotelid": "4618416",
"HotelName":"Gacc Capital"
}
]
}
However due to nature and structure of the index getting created from different databases hence the primary key of the index is not present in all databases.
See below example where field "ContactName" is stored in different database,
"value": [
{
"#search.score": 1,
"HotelId": "124",
"HotelName": "Gacc Capital",
"Description": "Chic hotel near the city. High-rise hotel in downtown, walking distance to theaters, restaurants and shops, complete with wellness programs."
"Category": "Paid",
"Amount": "£123456",
"ContactId": "456",
"ContactName":"Mr David Koh",
]
}
The issue i'm having to update particular field whenever there's a change, for instance if someone changes their name from "Mr David Koh" to "Mr David Warner Koh" i need a way to update all the record where contactid is 456
Is there a way to tackle this problem? or am i missing piece of puzzle before hand!
Not sure if this possibile in azure search sdk (c#) but happy to give it ago if this works better than API.
I assume you have two different types of records with relations. It’s not clear from your question.
To keep relational data updated you could do the data maintenance in an actual database that has a view that resembles what your index looks like. Then index that view.
Alternatively, you could implement the logic yourself. Just query for all record ids that contains a contact with a specific ID and then update each of those records like you did above.

How To Retrieve Custom Columns For DriveItems in MS Graph

I'm trying to use the Graph API to retrieve a hierarchy of files in a Sharepoint document library. Since document libraries are stored in "drives" (is it technically correct to call it OneDrive?), I'm using the /drives endpoint to fetch a list of files, like this:
https://graph.microsoft.com/beta/drives/{driveid}/root/children
I would like to get information from some of the custom columns that exist when viewing these items through Sharepoint. Using ?expand=fields doesn't work because fields only exists in listItem object of the /sites endpoint, not in the driveItem object of /drives endpoint. If I try obtaining the listItem from a single driveItem (traversing the Graph from OneDrive to Sharepoint), and then expanding the fields, like
https://graph.microsoft.com/beta/drives/{driveid}/items/{driveItemId}/listItem?expand=fields
this retrieves built-in columns (Author, DocIcon, and some others) but doesn't seem to retrieve the custom columns.
I've also tried getting the list of files from the /sites endpoint, and using ?expand=fields will get the custom columns, but it gets every file from every subfolder, rather than the current folder path. But I feel that deserves its own SO question.
Is it possible to retrieve custom column information from driveItems?
I spent a lot of time digging around with the different syntax possibilities and was finally able to get custom library properties using this query format. This is the only one that has produced my custom/user-defined fields for a document library.
https://graph.microsoft.com/v1.0/drives/insert_drive_id_here/root/children?expand=listItem
Shortened result:
{
"#odata.context": "...",
"value": [
{
"#microsoft.graph.downloadUrl": "...",
"listItem#odata.context": "...",
"listItem": {
"#odata.etag": "...",
"fields#odata.context": "...",
"fields": {
"#odata.etag": "...",
"Title": "...",
"Other_Custom_Property": "..."
}
}
}
]
}
I did some testing. What SHOULD work is:
https://graph.microsoft.com/beta/drives/{driveid}/root/children?$select=id,MyCustomColumnName
However, when I did that, it just returned that id field. In my opinion, that is a bug in the graph because this same type of query does work in the SharePoint REST api.
If this helps, you can accomplish this by using the SharePoint REST api. Your endpoint query would be something like:
https://{yoursite}.sharepoint.com/sites/{sitename}/_api/web/lists/(' {DocumentLibraryID}')/items?$select=id,MyCustomColumnName
There are other ways to do the same query.
Try the list endpoint then expand driveItem and fields. You now have both custom column fields and drive item fields.
/beta/sites/[site-id]/lists/[list-id]/items?expand=driveitem,fields&filter=(fields/customColumn eq 'someValue')

Azure Search - Match value from comma-separated values string

How do you structure a Azure POST REST call to match a value on a comma-separated list string?
For Example:
I want to search for "GWLAS" or "SAMGV" within the Azure field "ProductCategory".
The "ProductCategory" field in the documents will have a comma-separated value string such as "GWLAS, EXDEB, SAMGV, AMLKYC".
Any ideas?
If you use the default analyzer for your ProductCategory field (assuming it is searchable), it should word-break on commas by default. This means all you should need to do is search for the terms you're interested in and limit it to the right field:
POST /indexes/yourindex/docs/search?api-version=2016-09-01
{
"search": "GWLAS SAMGV",
"searchFields": [ "ProductCategory" ]
}
There are other ways to do this, but this is the simplest. If you already scope parts of your search query to other fields, here is how you can scope just the desired terms to ProductCategory:
POST /indexes/yourindex/docs/search?api-version=2016-09-01
{
"search": "(Name:\"Anderson John\"~3 OR Text:\"Anderson John\"~3) AND ProductCategory:GWLAS SAMGV",
"queryType": "full"
}
Please consult the Azure Search REST API documentation for details on other options you can set in the Search request. Also, this article will help you understand how Azure Search executes queries. You can find the reference for the full Lucene query syntax here.

Resources