I am creating a querying LUIS App, which needs a semantic understanding of time/ date ranges.
By semantic, I mean that I would like to be able to resolve the following examples:
Last week -> start: 2019-09-02T00:00:00+00:00; end: 2019-09-08T00:00:00+00:00
Yesterday -> start: 2019-09-14T00:00:00+00:00; end: 2019-09-14T23:59:59+00:00
1st July to the 18th August -> start: 2019-07-01T00:00:00+00:00; end: 2019-08-18T00:00:00+00:00
I have tried the built in datetimev2 entity, however, that doesn't appear to contain the range functionality and the custom entities don't seem to be able to manage resolving a single utterance - i.e. "yesterday" and resolve that to 2 different values (i.e. start and end). Whereas the 3rd example, 2 specific values are obviously fairly straight forward to manage.
The only current solution I can currently see is to have a "Range" entitiy, which yesterday, last month etc. would resolve too. As well as a start and end type. Then manually resolve the values in code outside the botframework. But this is a bit messy.
Are there any built it types or features which cover this sort of functionality, or is there an alternative approach to architect this?
NOTE:
From the Azure docs, it seems as though the preferred solution is to use the prebuilt datetimeV2 entity, with a "start" and "end" role. However, I can't get the app to identify the range as two entities i.e. it identifies "between 1st July and the 18th August" as one value.
You can use Regex entity for this scenario, create a Regex entity with start and end role :
Config your intent:
Test :
The regex expression here is :
(\d{4})-(\d{2})-(\d{2})
Which matches date format YYYY-MM-DD . Hope it helps .
Using the datetimeV2 prebuilt entity, 'Yesterday' will result in a single value as date. Something like
"entities": [
{
"entity": "yesterday",
"type": "builtin.datetimeV2.date",
"startIndex": 14,
"endIndex": 22,
"resolution": {
"values": [
{
"timex": "2019-09-16",
"type": "date",
"value": "2019-09-16"
}
]
}
}
]
For this scenario (wanting a range for a single date), my suggestion is to take the returned resolution and use your favorite/appropriate library to calculate the range. i.e. use the value 2019-09-16 to be a date-time range.
But using the utterance 'between 1st july and the 18th august', it should resolve to a daterange:
"entities": [
{
"entity": "between 1st july and the 18th august",
"type": "builtin.datetimeV2.daterange",
"startIndex": 14,
"endIndex": 49,
"resolution": {
"values": [
{
"timex": "(XXXX-07-01,XXXX-08-18,P48D)",
"type": "daterange",
"start": "2019-07-01",
"end": "2019-08-18"
},
{
"timex": "(XXXX-07-01,XXXX-08-18,P48D)",
"type": "daterange",
"start": "2020-07-01",
"end": "2020-08-18"
}
]
}
}
]
datetimeV2 uses the Recognizers-Text library (and Timex for datetime) and you can find more info here. You can find a Node sample on how to work with Timex here.
Related
I am not sure if you have access to GPT-3, particularly DaVinci (the complete-a-sentence tool). You can find the API and info here
I've been trying this tool for the past hour and every time I hit their API using the same prompt (indeed the same input), I received a different response.
Do you happen to encounter the same situation?
If this is expected, do you happen to know the reason behind it?
Here are some examples
Request header (I tried to use the same example they provide)
{
"prompt": "Once upon a time",
"max_tokens": 3,
"temperature": 1,
"top_p": 1,
"n": 1,
"stream": false,
"logprobs": null,
"stop": "\n"
}
Output 1
"choices": [
{
"text": ", this column",
"index": 0,
"logprobs": null,
"finish_reason": "length"
}
]
Output 2
"choices": [
{
"text": ", winter break",
"index": 0,
"logprobs": null,
"finish_reason": "length"
}
]
Output 3
"choices": [
{
"text": ", the traditional",
"index": 0,
"logprobs": null,
"finish_reason": "length"
}
]
I just talked to OpenAI and they said that their response is not deterministic. It's probabilistic so that it can be creative. In order to make it deterministic or reduce the risk of being probabilistic, they suggest adjusting the temperature parameter. By default, it is 1 (i.e. 100% taking risks). If we want to make it completely deterministic, set it to 0.
Another parameter is top_p (default=1) that can be used to set the state of being deterministic. But they don't recommend tweaking both temperature and top_p. Only one of them would do the job.
OpenAI documenation:
https://beta.openai.com/docs/api-reference/completions/create
temperature number Optional Defaults to 1
What sampling temperature to use. Higher values means the model will
take more risks. Try 0.9 for more creative applications, and 0 (argmax
sampling) for ones with a well-defined answer.
We generally recommend altering this or top_p but not both.
top_p number Optional Defaults to 1
An alternative to sampling with temperature, called nucleus sampling,
where the model considers the results of the tokens with top_p
probability mass. So 0.1 means only the tokens comprising the top 10%
probability mass are considered.
We generally recommend altering this or temperature but not both.
Background: I have a CSV file with a column that has a list of tags for a given row. The tag list is not in any specific order and varies for each cell in the tags column. I am looking for the value for a row which matches the string "Owner". When pulling in the CSV file, the entire cell is 1 string per cell. An example cell in this column looks like following:
"Organization": "Microsoft", "Owner": "Eric Holmes", "DateCreated": "07/09/2021"
Goal: I would like to find a way in Azure Data Flows or Azure Data Factory to make a new column with a value for a specific key in a list.
Example:
Current Column
Tags
"Department": "Business", "Owner": "Karen Singh", "DateCreated": "09/20/2019"
"Owner": "Henry Francis", "AppName": "physics-engine", "Department": "GeospatialServices"
"Department": "Fashion", "DateCreated": "01/10/2015", "Owner": "Xiuxiang Long"
Desired Column
Owner
"Karen Singh"
"Henry Francis"
"Xiuxiang Long"
Work So Far: I have taken each string in the tags column split it into an array by breaking it apart and the commas (,). Then I have split each string at each index by the colon (:). This makes the values look like:
Tags
[["Department", "Business"], ["Owner", "Karen Singh"], ["DateCreated", "09/20/2019"]]
[["Owner", "Henry Francis"], ["AppName", "physics-engine"], ["Department", "GeospatialServices"]]
[[Department", "Fashion"], ["DateCreated", "01/10/2015"], ["Owner", "Xiuxiang Long"]]
To split the strings, I've used this open expression
mapIndex(split(replace(Tags, '"', ''), ','), split(#item, ':'))
Problems
I am new to Open Expressions and Azure Data Factory and Data Flows. Does anyone know how I would:
Search for the desired tag like "Owner"
And return the value associated to it
Sorry I know this question sounds very simple but using only open expression functions makes this more convoluted than necessary. Additionally, if there is a better way to go about this problem I'd appreciate any input! I've been banging my head against the wall and any leads help. Thank you!
I have tried to repro it, could achieve it using Derived Column, where you could Split():
Use Derived Column transformation and use below expression:
split(split(tags,'"Owner":')[2],'"')[2]
Data Preview:
Given this JSON hash:
{
"id": 55555,
"name": "111111",
"custom_data_field": [
{
"id": 1,
"label": "Vehicle Type",
"value": "Coach"
},
{
"id": 2,
"label": "Vendor",
"value": 1
}
]
}
I need the value associated with each label.
I'm able to get the value using the array's index:
#item()?['custom_data_field'][0]['value'] # Coach
#item()?['custom_data_field'][1]['value'] # 1
But this is a bit fragile.
This syntax doesn't work:
#item()?['custom_data_field'][#label=='Vehicle Type']['value'] # Coach
#item()?['custom_data_field'][#label=='Vendor']['value'] # 1
Is there a way to do this reliably?
According to the description of your question, it seems the data {"id": 55555, "name": "111111"....} you provided is an item of a array list because your expression begins with item() (I guess you use this expression in a "For each" or something else loop action). And custom_data_field is array under the item, you want to do filter/select operation to get the value which its label equals Vehicle Type just by the expression. I don't think we can do it just in one expression (because the label and value are not key-value map, we can not do the filter/select in expression easily).
To meet the requirement, we need to use a more sophisticated approach such as "Filter array" action mentioned by Scott in comments. We need to set the array custom_data_field to the input box("From" box) of the "Filter array" action.
And then add the filter condition.
After running the logic app, it will filter the items by the filter condition.
As the filter action don't know how many items match the condition, so the output will always be a array but not an item or a record even if there is only one item matches the condition(label equals "Vehicle Type") in your custom_data_field list.
So if you want to get the value, you need to get it by writing an expression as below screenshot.
Hope it helps~
How can I search a documents named "Hola-Mundo_Army.jpg" searching by the Army* word (always using the asterisk key at the end please)? The thing is that if I search the documents using Army* the result is zero. I think that the problem is the underscore before Army word.
But if I search Mundo_Army* the result is one found, correctly.
docs?api-version=2016-09-01&search=Mundo_Army* <--- 1 result OK
docs?api-version=2016-09-01&search=Army* <--- 0 results and it should find 1 result like the previous search. I always need to use the asterisk at the end.
Thank you!
This is the blob information that I have to search and find:
{
"#search.score": 1,
"content": "{\"azure_cdn\":\"http:\\/\\/dev-dr-documents.azureedge.net\\/localhost-hugo-docs-not-indexed\\/Hola-Mundo_Army.jpg\"}\n",
"source": "dr",
"title": "Hola-Mundo_Army.jpg",
"file_name": "Hola-Mundo_Army.jpg",
"file_type": "Image",
"year_created": "2017",
"client": "LALALA",
"brand": "LELELE",
"description": "HUGO_DEV-TUCUMAN",
"categories": "Clothing and Accessories",
"media": "Online media",
"tags": null,
"channel": "Case Study",
"azuresearch_skipcontent": "1",
"id": "1683",
"metadata_storage_content_type": "application/octet-stream",
"metadata_storage_size": 109,
"metadata_storage_last_modified": "2017-04-26T18:30:35Z",
"metadata_storage_content_md5": "o2yZWelvS/EAukoOhCuuKg==",
"metadata_storage_name": "Hola-Mundo_Army.json",
"metadata_content_encoding": "ISO-8859-1",
"metadata_content_type": "text/plain; charset=ISO-8859-1",
"metadata_language": "en"
}
The best way to troubleshoot cases like this is by using the Analyze API. It will help you understand how your documents and query terms are processed by the search engine. In your case, assuming you are not setting the analyzer property on the field you are searching against, the text Hola-Mundo_Army.jpg is broken down by the default analyzer into the following two terms: hola, mundo_army.jpg. These are the terms that are in your index. That's why, when you are searching for the prefix mundo_army*, the term mundo_army.jpg is matched. Prefix army* doesn't match anything in your index.
You can learn more about the the default behavior of the search engine and how to customize it from this article: How full text search works in Azure Search
I've got an index of hundreds of book titles in elasticserch, with documents like:
{"_id": 123, "title": "The Diamond Age", ...}
And I've got a block of freeform text entered by a user. The block of text could contain a number of book titles throughout it, with varying capitalization.
I'd like to find all the book titles in the block of text, so I can link to the specific book pages.
Any idea how I can do this? I've been looking around for exact phrase matches in blocks of text, with no luck.
You need to index the field title as not_analyzed or using keyword analyzer.
This will tell elasticsearch to do no operations on the field whenever you send a query and this will make you be able to do an exact match search.
I would suggest that you keep an analyzed version as well as a not_analyzed version in order to be able to do exact searches as well as analyzed searches. Your mappings would go like this, in this case I assume that the type name is movies in your case.
"mappings":{
"movies":{
"properties":{
"title":{
"type": "string",
"fields":{
"row":{
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
This will give you two fields title which contains an analyzed title and title.row which contains the exact value indexed with absolutely no processing.
title.row would match if you entered an exact