GPT-3 davinci gives different results with the same prompt - text

I am not sure if you have access to GPT-3, particularly DaVinci (the complete-a-sentence tool). You can find the API and info here
I've been trying this tool for the past hour and every time I hit their API using the same prompt (indeed the same input), I received a different response.
Do you happen to encounter the same situation?
If this is expected, do you happen to know the reason behind it?
Here are some examples
Request header (I tried to use the same example they provide)
{
"prompt": "Once upon a time",
"max_tokens": 3,
"temperature": 1,
"top_p": 1,
"n": 1,
"stream": false,
"logprobs": null,
"stop": "\n"
}
Output 1
"choices": [
{
"text": ", this column",
"index": 0,
"logprobs": null,
"finish_reason": "length"
}
]
Output 2
"choices": [
{
"text": ", winter break",
"index": 0,
"logprobs": null,
"finish_reason": "length"
}
]
Output 3
"choices": [
{
"text": ", the traditional",
"index": 0,
"logprobs": null,
"finish_reason": "length"
}
]

I just talked to OpenAI and they said that their response is not deterministic. It's probabilistic so that it can be creative. In order to make it deterministic or reduce the risk of being probabilistic, they suggest adjusting the temperature parameter. By default, it is 1 (i.e. 100% taking risks). If we want to make it completely deterministic, set it to 0.
Another parameter is top_p (default=1) that can be used to set the state of being deterministic. But they don't recommend tweaking both temperature and top_p. Only one of them would do the job.

OpenAI documenation:
https://beta.openai.com/docs/api-reference/completions/create
temperature number Optional Defaults to 1
What sampling temperature to use. Higher values means the model will
take more risks. Try 0.9 for more creative applications, and 0 (argmax
sampling) for ones with a well-defined answer.
We generally recommend altering this or top_p but not both.
top_p number Optional Defaults to 1
An alternative to sampling with temperature, called nucleus sampling,
where the model considers the results of the tokens with top_p
probability mass. So 0.1 means only the tokens comprising the top 10%
probability mass are considered.
We generally recommend altering this or temperature but not both.

Related

Get a value associated with a key in an array of JSON objects in a Logic App expression

Given this JSON hash:
{
"id": 55555,
"name": "111111",
"custom_data_field": [
{
"id": 1,
"label": "Vehicle Type",
"value": "Coach"
},
{
"id": 2,
"label": "Vendor",
"value": 1
}
]
}
I need the value associated with each label.
I'm able to get the value using the array's index:
#item()?['custom_data_field'][0]['value'] # Coach
#item()?['custom_data_field'][1]['value'] # 1
But this is a bit fragile.
This syntax doesn't work:
#item()?['custom_data_field'][#label=='Vehicle Type']['value'] # Coach
#item()?['custom_data_field'][#label=='Vendor']['value'] # 1
Is there a way to do this reliably?
According to the description of your question, it seems the data {"id": 55555, "name": "111111"....} you provided is an item of a array list because your expression begins with item() (I guess you use this expression in a "For each" or something else loop action). And custom_data_field is array under the item, you want to do filter/select operation to get the value which its label equals Vehicle Type just by the expression. I don't think we can do it just in one expression (because the label and value are not key-value map, we can not do the filter/select in expression easily).
To meet the requirement, we need to use a more sophisticated approach such as "Filter array" action mentioned by Scott in comments. We need to set the array custom_data_field to the input box("From" box) of the "Filter array" action.
And then add the filter condition.
After running the logic app, it will filter the items by the filter condition.
As the filter action don't know how many items match the condition, so the output will always be a array but not an item or a record even if there is only one item matches the condition(label equals "Vehicle Type") in your custom_data_field list.
So if you want to get the value, you need to get it by writing an expression as below screenshot.
Hope it helps~

Generating synonyms or similar words using BERT word embeddings

I want to generate synonyms or similar words using BERT words embeddings.
I started to do this using BERT.
For later software integration, it has to be done in JAVA, so I went for easy-bert
(https://github.com/robrua/easy-bert).
It appears I can get word embeddings this way:
try(Bert bert = Bert.load(new File("com/robrua/nlp/easy-bert/bert-uncased-L-12-H-768-A-12"))) {
float[][] embedding = bert.embedTokens("A sequence");
float[][][] embeddings = bert.embedTokens("Multiple", "Sequences");
}
Do you know how I could get similars words from these word embeddings ?
Thanks for your help !
I developed a way to do this using Luminoso. I work for them so this is a bit of an ad, but it does exactly what you want it to do.
https://www.luminoso.com/search
Luminoso is really good at understanding conversational text like product reviews, product descriptions, survey results and trouble tickets. It doesn't require ANY kind of training or ontology building and will build a language model around your language. You feed the text for your pages into Luminoso and it will generate a set synonyms for the concepts used in your text.
As an example project I built a search using Amazon.com beauty products. I'll just copy a couple of the automatically generated synonyms around three concepts. There were 17851 synonyms generated from this dataset.
scent, rose-like, not sickeningly, not nauseating, not overwhelming, herb-y, no sweetness, cucumber-y, not too citrus-y, no gardenia, not lemony, pachouli, vanilla-like, fragarance, not spicy, flowerly, musk, perfume-like, floraly, not cloyingly => scent
recommend, recommende, advice, suggestion, highly recommend, suggest, recommeded, recommendation, recommend this product, reccommended, advise, suggest, indicated, suggestion, advice, agree, recommend, say, considering, mentioned => recommend
bottle, no sprayer, 8-oz, beaker, decanter, push-down, dispenser, pipet, pint, not the bottle, no dropper, keg, gallon, jug, pump-top, liter, half-full, decant, tumbler, vial => bottle
eczema, non-steroidal, ulcerative, dematitis, ecsema, Elidel, dermititis, inflammation, pityriasis, hydrocortizone, dyshidrotic, chickenpox, Stelatopia, perioral, rosacea, dry skin, nummular, ecxema, mild-moderate, ezcema => eczema
There were 800k products in this search index so the results were large as well, but this will work on small datasets as well.
Besides the synonym format there you can also place this directly into elasticsearch and associated the synonyms for a specific page with that page.
This is a sample of an Elasticsearch index enhanced with the same technology. It's dialed up super high so there are too many concepts added, but just to show you how well it finds relationships between concepts.
{"index": {"_index": "amzbeauty", "_type": "_doc", "_id": "130414089X"}}
{"title": "New Benefit Waterproof Automatic Eyeliner Pen - Black - BAD Gal Liner", "text": "Length : 13.5 cm\nColor: Black\n100% Brand new and unused.\nSmudge free.\nFine-tip. Easy to blend and smooth to apply\nCan make fine and bold eyeline with new texture and furnishing.\nProvide rich and consistant colour\nLongwearing and waterproof\nFregrance Free", "primary_concepts": ["not overpoweringly", "concoction", "equipped", "fine-tip", "water-resistant", "luxuriant", "make", "fixture", "☆", "not lengthen", "washable", "not too heady", "blendable", "doesn't collect", "shade", "niche", "supple", "smudge-proof", "sumptuous", "movable", "black", "over-apply", "quick", "silky", "colored", "sweatproof", "opacity", "accomodate", "fuchsia", "furnishes", "meld", "sturdily", "smear", "inch", "mid-back", "chin-length", "smudge", "alredy", "not cheaply", "long-wearing", "eyeline", "texture", "steady", "no-name", "audacious", "easy", "edgy", "is:A", "marketers", "greys", "decadent", "applicable", "Crease-free", "magenta", "free", "itIn", "stay-true", "racy", "application", "glides", "smooth", "sleek", "taupe", "grainy", "dark", "wealthy", "JP7506CF", "gray", "grayish", "width", "newness", "purfumes", "Lancme", "blackish", "easily", "doesn't smudge", "maroon", "blend", "convenient", "smoother", "Moschino", "long-wear", "mauve", "medium-length", "no raccoon", "revamp", "demure", "richly", "white", "brand", "offers", "lenght", "soft", "doesn't smear", "provide", "provides", "unusable", "eye-liner", "unopened", "straightforward", "silky-smooth", "uniting", "compactness", "bold", "fearless", "mix", "indulgent", "brash", "serviceable", "unmarked", "not musky", "constructed", "racoon", "smoothly", "sealant", "merged", "boldness", "reuse", "unused", "long", "Kors", "effortless", "luscious", "stain", "rich", "discard", "richness", "opulent", "short", "consistency", "fine", "sents", "newfound", "fade-resistant", "mixture", "hue", "sassy", "apply", "fragnance", "heathy", "adventurous", "not enthusiastic", "longwearing", "fregrance", "non-waterproof", "empty", "lashline", "simple", "newly", "you'r", "combined", "no musk", "mingle", "waterproof", "painless", "pinkish", "thickness", "clump-free", "gos", "consistant", "color", "smoothness", "name-brand", "new", "smudgeproof", "yaaay", "water-proof", "eyemakeup", "not instant", "spidery", "furnish", "tint", "product", "reapply", "not black", "no globs", "imitators", "blot", "cinch", "uncomplicated", "untouched", "length"], "related_concepts": ["eyeliner", "no goofs", "doesn't smear", "pen", "hundreds"]}
{"index": {"_index": "amzbeauty", "_type": "_doc", "_id": "130414643X"}}
{"title": "Goodskin Labs Eyliplex-2 Eye Life and Circle Reducer - 10ml", "text": "Eyliplex-2 is a dual solution that focuses on the problematic eye area. This breakthrough, 24-hour system from the scientists at good skin pharmacy visibly tightens eye areas while reducing dark circles. 0.34 oz. each. 64% of subjects reported younger looking eyes immediately and a 20% reduction in the appearance of dark circles in clinical studies.", "primary_concepts": ["coloration", "Laboratories", "oncology", "cornea", "undereye", "eye", "immediately", "☆", "teen", "dry-skin", "good", "eyelids", "puffiness", "behold", "research", "temperamental", "dermatological", "breakthrough", "study", "store", "nice", "lasik", "instantaneously", "teenaged", "multi", "rheostat", "dermatology", "chemist", "invisibly", "PhD", "pharmacy", "alredy", "not cheaply", "optional", "pharmacist", "Obagi-C", "topic", "supermarket", "reversible", "studies", "Younger", "medically", "report", "thermo", "tightness", "dual", "eliminate", "researcher", "Minimization", "cutaneous", "hydration", "O2", "taupe", "increase", "moisturization", "dark", "preliminary", "excellent", "Quad", "well", "appearance", "dusky", "quickly", "instantly", "CVS", "Dermal", "great", "revolutionary", "biologist", "epidermis", "blackish", "disclosed", "problem", "youngsters", "murky", "scientific", "teenager", "oz", "dark circles", "clinically", "emphasis", "absorption", "skin", "loosen", "intractable", "technological", "reduction", "clinician", "nutritional", "forthwith", "grocer", "scientifically", "swiftly", "examination", "state-of-the-art", "not acne prone", "zone", "decrease", "younger-looking", "excellently", "troublesome", "system", "radius", "tighten", "FDA", "decent", "noticeably", "WD-40", "clearer", "scientist", "saggy", "significantly", "improvement", "Teamine", "interchangeable", "visible", "visable", "no fine line", "shortly", "minimize", "survey", "problematic", "young", "glance", "racoon", "vicinity", "youthful", "exacerbated", "focal", "region", "groundbreaking", "reddish", "focus", "reduce", "increments", "nad", "fasten", "area", "soon", "complexion", "squinting", "look", "grocery", "eyliplex-2", "Eyliplex-2", "subsequently", "even-toned", "bothersome", "eyes", "mitigate", "markedly", "philosophy:you", "difficult", "darkish", "bluish", "satisfactory", "darken", "epidermal", "lessen", "appearence", "ocular", "ergonomically", "diminished", "progression", "purplish", "sun-damaged", "Cellex-C", "visibly", "diagnosis", "drugstore", "under-eye", "apothecary", ":-D", "terrific", "clinical", "oz.", "Endocrinology", "time-released", "Nouriva", "tight", "adolescent", "subject", "eyeballs", "sking", "Pro-Retinol", "aggravate", "younger", "shortcomings", "solution", "assess", "promptly", "teenage", "Kinetin", "24-hour", "Mart", "youth", "visibility", "scientists", "taut", "better", "eyesight", "no dark circles", "not reduce", "photoaging", "Pending"], "related_concepts": ["A22", "A82", "Amazon", "daytime", "HK", "nighttime", "smell", "dark circles", "purchased"]}
{"index": {"_index": "amzbeauty", "_type": "_doc", "_id": "1304146537"}}
Luminoso uses word embeddings from ConceptNet which it also develops and the technology is above and beyond what ConceptNet gives you. I'm biased, but every time I've run data through it I'm amazed. Not free, but it really works with absolutely zero pre-training of the data and nothing is actually free.
Similar task for this subject (lexical substitution) would belong to LS07 and LS14.
One researcher achieved the SOTA in those benchmarks using the BERT.
You'd be interested in reading this paper.
https://www.aclweb.org/anthology/P19-1328.pdf
The author says as below.
applies dropout to the target word’s embedding for partially masking
the word, allowing BERT to take balanced consideration of the target
word’s semantics and contexts for proposing substitute candidates, and
then validates the candidates based on their substitution’s influence
on the global contextualized representation of the sentence."
I don't know how to reproduce the same result because the implementation is not open to public. But here's the hint - the embedding dropout could be applied to generate substitute candidates.

Managing Intervals and Ranges in LUIS App

I am creating a querying LUIS App, which needs a semantic understanding of time/ date ranges.
By semantic, I mean that I would like to be able to resolve the following examples:
Last week -> start: 2019-09-02T00:00:00+00:00; end: 2019-09-08T00:00:00+00:00
Yesterday -> start: 2019-09-14T00:00:00+00:00; end: 2019-09-14T23:59:59+00:00
1st July to the 18th August -> start: 2019-07-01T00:00:00+00:00; end: 2019-08-18T00:00:00+00:00
I have tried the built in datetimev2 entity, however, that doesn't appear to contain the range functionality and the custom entities don't seem to be able to manage resolving a single utterance - i.e. "yesterday" and resolve that to 2 different values (i.e. start and end). Whereas the 3rd example, 2 specific values are obviously fairly straight forward to manage.
The only current solution I can currently see is to have a "Range" entitiy, which yesterday, last month etc. would resolve too. As well as a start and end type. Then manually resolve the values in code outside the botframework. But this is a bit messy.
Are there any built it types or features which cover this sort of functionality, or is there an alternative approach to architect this?
NOTE:
From the Azure docs, it seems as though the preferred solution is to use the prebuilt datetimeV2 entity, with a "start" and "end" role. However, I can't get the app to identify the range as two entities i.e. it identifies "between 1st July and the 18th August" as one value.
You can use Regex entity for this scenario, create a Regex entity with start and end role :
Config your intent:
Test :
The regex expression here is :
(\d{4})-(\d{2})-(\d{2})
Which matches date format YYYY-MM-DD . Hope it helps .
Using the datetimeV2 prebuilt entity, 'Yesterday' will result in a single value as date. Something like
"entities": [
{
"entity": "yesterday",
"type": "builtin.datetimeV2.date",
"startIndex": 14,
"endIndex": 22,
"resolution": {
"values": [
{
"timex": "2019-09-16",
"type": "date",
"value": "2019-09-16"
}
]
}
}
]
For this scenario (wanting a range for a single date), my suggestion is to take the returned resolution and use your favorite/appropriate library to calculate the range. i.e. use the value 2019-09-16 to be a date-time range.
But using the utterance 'between 1st july and the 18th august', it should resolve to a daterange:
"entities": [
{
"entity": "between 1st july and the 18th august",
"type": "builtin.datetimeV2.daterange",
"startIndex": 14,
"endIndex": 49,
"resolution": {
"values": [
{
"timex": "(XXXX-07-01,XXXX-08-18,P48D)",
"type": "daterange",
"start": "2019-07-01",
"end": "2019-08-18"
},
{
"timex": "(XXXX-07-01,XXXX-08-18,P48D)",
"type": "daterange",
"start": "2020-07-01",
"end": "2020-08-18"
}
]
}
}
]
datetimeV2 uses the Recognizers-Text library (and Timex for datetime) and you can find more info here. You can find a Node sample on how to work with Timex here.

How do I write a service to correct spellings in my entities using API.ai Dialogflow?

I searched [api.ai] and [dialogflow] tags thoroughly before asking this question.
I query an API to get me a json array every 20 seconds, Below snippet shows a few objects from the array
{
"id": "pivx",
"name": "PIVX",
"symbol": "PIVX",
"rank": "46",
"price_usd": "8.65711",
"price_btc": "0.00052161",
"24h_volume_usd": "7948150.0",
"market_cap_usd": "477700707.0",
"available_supply": "55180159.0",
"total_supply": "55180159.0",
"max_supply": null,
"percent_change_1h": "0.07",
"percent_change_24h": "21.92",
"percent_change_7d": "69.6",
"last_updated": "1513821853",
"price_eur": "7.2916846395",
"24h_volume_eur": "6694543.93755",
"market_cap_eur": "402356318.0"
}
I have a bot where the person often types something like "PIVY to USD" how do I correct "PIVY" as "PIVX" I had a few approaches in mind
I tokenize "PIVY to USD" giving me "PIVY", "to" and "USD" , I eliminate stop words and am left with "PIVY" and "USD" I take each word and compare it with all the symbols in the array to get the set of candidates with the lowest levenshein score. Does this approach make sense?
If I run "PIVY to USD" on API.ai, I only get USD since PIVY is a misspelling of the entity PIVX
I also have other intents so if the person types "How are you" I dont want to tokenize and search each word in here with all the symbols in my array
How do I correct spelling mistakes for a particular intent? One approach is to have 2 intents 1) that detects the existence of currencies before correcting spelling mistakes and 2) that actually converts them? I am using the Bot framework
Kindly give your suggestions on this. Thank you for your time to read this long question
There are several approaches you can follow in your query:
1) Intent based solution if your use case is only to convert PIVX to USD.
2) If you are converting more then one conversions then you need to create a dictionary and check the conversation in key value pair to avoid error. (Not every time only in case of getting the intent of conversion and PIVY in your phrase)
i hope this will help you.
do let me know in case you require more help.

Searching for terms with underscore doesn't return expected results

How can I search a documents named "Hola-Mundo_Army.jpg" searching by the Army* word (always using the asterisk key at the end please)? The thing is that if I search the documents using Army* the result is zero. I think that the problem is the underscore before Army word.
But if I search Mundo_Army* the result is one found, correctly.
docs?api-version=2016-09-01&search=Mundo_Army* <--- 1 result OK
docs?api-version=2016-09-01&search=Army* <--- 0 results and it should find 1 result like the previous search. I always need to use the asterisk at the end.
Thank you!
This is the blob information that I have to search and find:
{
"#search.score": 1,
"content": "{\"azure_cdn\":\"http:\\/\\/dev-dr-documents.azureedge.net\\/localhost-hugo-docs-not-indexed\\/Hola-Mundo_Army.jpg\"}\n",
"source": "dr",
"title": "Hola-Mundo_Army.jpg",
"file_name": "Hola-Mundo_Army.jpg",
"file_type": "Image",
"year_created": "2017",
"client": "LALALA",
"brand": "LELELE",
"description": "HUGO_DEV-TUCUMAN",
"categories": "Clothing and Accessories",
"media": "Online media",
"tags": null,
"channel": "Case Study",
"azuresearch_skipcontent": "1",
"id": "1683",
"metadata_storage_content_type": "application/octet-stream",
"metadata_storage_size": 109,
"metadata_storage_last_modified": "2017-04-26T18:30:35Z",
"metadata_storage_content_md5": "o2yZWelvS/EAukoOhCuuKg==",
"metadata_storage_name": "Hola-Mundo_Army.json",
"metadata_content_encoding": "ISO-8859-1",
"metadata_content_type": "text/plain; charset=ISO-8859-1",
"metadata_language": "en"
}
The best way to troubleshoot cases like this is by using the Analyze API. It will help you understand how your documents and query terms are processed by the search engine. In your case, assuming you are not setting the analyzer property on the field you are searching against, the text Hola-Mundo_Army.jpg is broken down by the default analyzer into the following two terms: hola, mundo_army.jpg. These are the terms that are in your index. That's why, when you are searching for the prefix mundo_army*, the term mundo_army.jpg is matched. Prefix army* doesn't match anything in your index.
You can learn more about the the default behavior of the search engine and how to customize it from this article: How full text search works in Azure Search

Resources