I am looking for a way to retrieve the search score in the search result (an index field value), similar to the other metadata fields like metadata_storage_name or metadata_storage_path. In the Indexer Definition, I tried retrieving the search score in the following way. Please correct me if I am missing anything or retrieving it the wrong way.
"fieldMappings": [
{
"sourceFieldName": "#search.score",
"targetFieldName": "search_score",
"mappingFunction": null
}
]
Search score is an attribute added to each search result in the search request response. Try issue a simple search request using your favourite REST client or Azure Poral. Below is an example of a response object. #search.score is what you're looking for.
"value": [
{
"#search.score": 7.3617697,
"HotelId": "21",
"HotelName": "Nova Hotel & Spa",
"Description": "1 Mile from the airport. Free WiFi, Outdoor Pool, Complimentary Airport Shuttle, 6 miles from the beach & 10 miles from downtown.",
"Category": "Resort and Spa",
"Tags": [
"pool",
"continental breakfast",
"free parking"
]
},
{
"#search.score": 2.5560288,
"HotelId": "25",
"HotelName": "Scottish Inn",
"Description": "Newly Redesigned Rooms & airport shuttle. Minutes from the airport, enjoy lakeside amenities, a resort-style pool & stylish new guestrooms with Internet TVs.",
"Category": "Luxury",
"Tags": [
"24-hour front desk service",
"continental breakfast",
"free wifi"
]
}]
Example is from here: https://learn.microsoft.com/en-us/azure/search/search-query-simple-examples#example-1-full-text-search
'#search.score' is not a field in an index, but a computation of each search result relevance scoring. If there is a match for the criteria of your search and a result returned, you can retrieve that value from the HTTP response with '#search.score'.
Field mappings on the other hand are used to map a field that is found in your data source and does not match the name you would like to use in the index, so you can map to the name you need.
For more information on the HTTP response of Search Documents REST API and search scoring, please visit:
https://learn.microsoft.com/rest/api/searchservice/search-documents and
https://learn.microsoft.com/azure/search/index-similarity-and-scoring
Related
From what I understand, you can use the documents parameter OR the file parameter to tell openai on what labels you want to perform a search. I'm getting expected results using the documents parameter. I am getting unsatisfactory results using the file parameter. I would expect them to be the same.
When performing a search using the documents parameter..
response = dict(openai.Engine('davinci').search(
query='sitcom',
#file=file_id,
max_rerank=5,
documents=["white house", "school", "seinfeld"],
return_metadata=False))
..I get expected results.. "sitcom" wins the search with a score of 771.
{'object': 'list', 'data': [<OpenAIObject search_result at 0xb5e8ef48> JSON: {
"document": 0,
"object": "search_result",
"score": 147.98
}, <OpenAIObject search_result at 0xb5ebd148> JSON: {
"document": 1,
"object": "search_result",
"score": 211.021
}, <OpenAIObject search_result at 0xb5ebd030> JSON: {
"document": 2,
"object": "search_result",
"score": 771.348
}], 'model': 'davinci:2020-05-03'}
Now trying with the file parameter I create a temp.jsonl file with contents..
{"text": "white house", "metadata": "metadata here"}
{"text": "school", "metadata": "metadata here"}
{"text": "seinfeld", "metadata": "metadata here"}
I then upload the file to openai server with..
res = openai.File.create(file=open('temp.jsonl'), purpose="search")
where..
file_id = res['id']
I wait until the file is processed by the server then..
response = dict(openai.Engine('davinci').search(
query='sitcom',
file=file_id,
max_rerank=5,
#documents=["white house", "school", "seinfeld"],
return_metadata=False))
But I get the following message when I perform search..
No similar documents were found in file with ID 'file-LzHkASUxbDjTAWBhHxHpIOf4'.Please upload more documents or adjust your query.
I only get results when my query exactly matches a label..
response = dict(openai.Engine('davinci').search(
query='seinfeld',
file=file_id,
max_rerank=5,
#documents=["white house", "school", "seinfeld"],
return_metadata=False))
{'object': 'list', 'data': [<OpenAIObject search_result at 0xb5e74f48> JSON: {
"document": 0,
"object": "search_result",
"score": 668.846,
"text": "seinfeld"
}], 'model': 'davinci:2020-05-03'}
What am I doing wrong? Shouldn't the results be the same using the documents parameter or the file parameter?
Rereading the docs, it seems, when using file parameter instead of documents parameter, the server first performs a basic "keyword" search with the provided query to narrow down the results before finally reranking those results with a semantic search using the same query.
This is disappointing.
Just to provide a working example..
{"text": "stairway to the basement", "metadata": "metadata here"}
{"text": "school", "metadata": "metadata here"}
{"text": "stairway to heaven", "metadata": "metadata here"}
Now using the query "led zeppelin's most famous song stairway" the server will narrow down the results to document 0 and document 2 finding matches for the "stairway" token. It will then perform a semantic search and score both of them. Document 2 ("stairway to heaven") will have the highest relevancy score.
Using the query "stairway to the underground floor" will give document 0 ("stairway to the basement") the highest relevancy score.
This is disappointing because the query has to be useful for both a keyword search AND the semantic search.
In my original post, the keyword search was not providing any results because the query was only designed for a semantic search. When using the documents parameter, only a semantic search is performed, that is why it worked in that case.
I am new to Azure Search. I am indexing few pdf documents using this method
But, I want to get search result page-wise. It is currently providing result from the whole document, but instead of that I want the result to be shown from each page and I also need that particular file name and page number that has the highest score.
As you have noticed, the document cracking by default shoves all text into one field (content). If you have an OCR skill involved (assuming you have images within the PDF that contain text), it does the same thing by default in merged_content. I do not believe there is a way to force these two tasks to break your data out into pages.
I say "believe" because it difficult to find documentation on the shape of the document object that is input into your skillsets. For example, look at the input to this merge skillset. It uses /document/content and other document related data and pushes it all into a field called merged_content. If you could find documentation on all the fields in document, it MIGHT have your pages broken down.
{
"#odata.type": "#Microsoft.Skills.Text.MergeSkill",
"name": "#BookMergeSkill",
"description": "Some description",
"context": "/document",
"insertPreTag": " ",
"insertPostTag": " ",
"inputs": [
{
"name": "text",
"source": "/document/content"
},
{
"name": "itemsToInsert",
"source": "/document/normalized_images/*/text"
},
{
"name": "offsets",
"source": "/document/normalized_images/*/contentOffset"
}
],
"outputs": [
{
"name": "mergedText",
"targetName": "merged_content"
}
]
},
The only way I know to approach this is to use a custom skill, which would reside in an Azure Function and be called as part of the document skillset pipeline. Inside that Azure Function, you would have to use a PDF reader, like iText7, and crack open the documents yourself and return data that you would place in the index document as an array of text or custom objects.
We were going to go down a custom cracking process with a client (not to do this but for other reasons), but the project was canned due to the cost of holding large amounts of data within an index.
In Acumatica REST API - StockItem
I am using the url https://sandbox.kimballinc.com/AcumaticaERP/entity/Default/18.200.001/StockItem?$filter=InventoryID eq '12345' & $expand=UOMConversions
In the response i am getting UOMConversions object as
"UOMConversions": [
{
"rowNumber": 1,
"note": null,
"ConversionFactor": {
"value": 1
},
"FromUOM": {
"value": "EACH"
},
"MultiplyDivide": {
"value": "Multiply"
},
"ToUOM": {
"value": "FOOT"
}
}
]
I want to know how ConversionFactor, FromUOM, MultiplyDivide, ToUOM is used and possible values for these fields.
can you please help me in understand these fields. Thanks
In order to find more information on that , I would recommend that you connect to the Acumatica site in the browser, Navigate to the Stock Item screen and go to the help page for that screen(Tools -> Help).
Once on the help screen, search for the "Unit Conversion Table" you will then find more information about these fields.
For the values that are available, I would recommend once again to go to the browser and the screen itself. Open the selector for the "From Unit" field and the drop-down for the "Multiply/Divid" field. The "Conversion" being just a decimal number and the "To Unit" being a read only field that take for value the base unit of the Stock Item.
I'm trying to use the JSON search endpoint for the MS Academic Knowledge Graph and finding it difficult to work out what the legal "Select" fields are.
For example, I am interested in the author's affiliation
POST /academic/v1.0/graph/search?mode=json
{
"path":"/v0/",
"v0" : {
"match": {
"Name": "stephen hawking",
},
"type": "Author",
"select": [ "DisplayAuthorName", "AffiliationName" ]
}
}
There is no obvious way of working out these field names. The documentation gives the shortform entity names like "DAuN" which do not seem to work in the graph api.
Where can I get a comprehensive list of legal knowledge graph "select" fields?
I'm using elastic search for classic queries LIKE "search all documents with G4 in name and LG in manfucaturer". This is ok. But what if I have a lot of documents and database with lot of search terms and I need to know which documents match some specific multicolumn terms. For example:
Documents:
[
{
"id": 5787,
"name": "Smartphone G4",
"manufacturer": "LG",
"description": "The revolutionary LG G4 design can only be described as forward thinking—with a classic touch."
},
{
"id": 68779,
"name": "Smartphone S6",
"manufacturer": "Samsung",
"description": "The Samsung Galaxy S6 is powerful to use and beautiful to behold."
}
]
...
Terms:
[
{
"id": "587",
"name": "G4",
"manufacturer": "LG",
"description": "classic touch"
},
{
"id": "364",
"manufacturer": "Samsung",
"description": "galaxy s6"
}
]
...
Result:
{
"587": [5787],
"364": [68779]
}
OR:
{
"5787": [587],
"68779": [364]
}
I need list of documents and list of terms which corresponds them (or oposite). In small amount of terms, it should be possible to apply all rules one by one and save matching documents. But I have milions of documents and thousands of terms. So, it is not possible to aply them one by one. Is it possible in another way?
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-percolate.html is exactly what I wanted. It can store your queries and execute them against documents.