How to generate automatic generation of testcases using NLP - nlp

Requirement to testcase generation.
Input is a sentence and output generates testcases.
Testcase should look like below
Testcase (what it will do)+ Test sequence steps + expected result.
I have read input sentence
I have used ruled based approach to split sentence but it is not providing expected output.
what other ways i can try?
other way i tried to simply sentence from compound or complex sentence

Related

Does pattern in sentence edits affect the performance of sentence correction seq2seq model

I am trying to train a seq2seq model using T5 transformer for sentence correction task. I am using StackOverflow dataset for the training and evaluation process. The dataset contains original and edited sentences extracted from StackOverflow posts.
Below are some samples:
Original
Edited
is it possible to print all reudctions in Haskell - using WinHugs
Is it possible to print all reductions in Haskell - using WinHugs
How do I pass a String into a fucntion in an NVelocty Template
How do I pass a String into a function in an NVelocity Template
Caconical term for something that can only occur once
Canonical term for something that can only occur once
When trained on samples that have a high similarity (using the longest common sequence to determine this) and are edited due to spelling correction, verb changes, and preposition changes the model is predicting good recommendation. But when I use samples that do not have high similarity the model is not predicting very accurate results. Below are some samples:
Original
Edited
For what do API providers use API keys, such as the UPS API Key
Why do some API providers require an API key
NET - Programmatic Cell Edit
NET - working with GridView Programmatically
How to use http api (pseudo REST) in C#
How to fire a GET request over a pseudo REST service in C#
I am using simpletranfromers for training T5 model based on t5-base.
Can anyone confirm that is it a limitation of seq2seq models that they can not learn much when the input and target sequences are out of pattern?

NLP for entity comparison in a sentence

I am looking for something that can identify which entity of a sentence is better in comparison or the elements of comparison.
like if I have a sentence "A is better than better than B" I should be idintifying which element is better using NLP.
the dataset I have consists of
sentence, entity1, entity2, the element of comparison
you can try two approaches:
1- rule based(with regex you can specify exactly what pattern is desired and extract what you want)
2-machine learning(you give the training data and their lables to computer and computer extracts rules)
for more information you can use this refrences:
https://www.coursera.org/learn/machine-learning
https://www.w3schools.com/python/python_regex.asp

Text classification using BERT - how to handle misspelled words

I am not sure if this is the best place to submit that kind of question, perhaps CrossValdation would be a better place.
I am working on a text multiclass classification problem.
I built a model based on BERT concept implemented in PyTorch (huggingface transformer library). The model performs pretty well, except when the input sentence has an OCR error or equivalently it is misspelled.
For instance, if the input is "NALIBU DRINK" the Bert tokenizer generates ['na', '##lib', '##u', 'drink'] and model's prediction is completely wrong. On the other hand, if I correct the first character, so my input is "MALIBU DRINK", the Bert tokenizer generates two tokens ['malibu', 'drink'] and the model makes a correct prediction with very high confidence.
Is there any way to enhance Bert tokenizer to be able to work with misspelled words?
You can leverage BERT's power to rectify the misspelled word.
The article linked below beautifully explains the process with code snippets
https://web.archive.org/web/20220507023114/https://www.statestitle.com/resource/using-nlp-bert-to-improve-ocr-accuracy/
To summarize, you can identify misspelled words via a SpellChecker function and get replacement suggestions. Then, find the most appropriate replacement using BERT.

Training and evaluating spaCy model by sentences or paragraphs

Observation:
Paragraph: I love apple. I eat one banana a day
Sentence: I love apple., I eat one banana a day
There are two sentences in this paragraph, I love apple and I eat one banana a day. If I put the whole paragraph into spaCy, it'll recognize only one entity, for example, apple, but if I put the sentences in paragraph one by one, spaCy can recognize two entities, apple and banana.(This is just an example to show my point, the actual recognition result could be different)
Situation:
After having trained a model by myself, I want to evaluate the recognizing accuracy of my model, there are two ways to pass the text into the spaCy model:
1. split the paragraph into sentences and pass the sentence one by one
for sentence in paragraph:
doc = nlp(sentence)
# retrieve the parsing result
2. pass the paragraph at once
doc = nlp(paragraph)
# retrieve the parsing result
Question:
I'm wondering which way would be better to test the performance of the model? Since I'm sure passing by sentence can always recognize more entities than passing by paragraph.
If the second one is better, do I also need to change the way that I trained the model? Currently, I train the spacy model sentence by sentence rather than a paragraph.
The goal of my project:
After getting a document, recognize all the entities that I'm interested in the document.
Thanks!

Iterate a spark pipeline

Currently I'm working on a Sentiment analysis project using Spark. I'm trying to implement a pipe line like this:
Raw text---(Tokenized)-->Tokenized Words---(join with Sentiment Dictionary)--->Words with Sentiment value---(distribute words to sentence again)--->Sentence with Sentiment value---(average sentiment value of words from sentence it appeared in)--->new Sentiment Dictionary
Now i want to repeat this process until the different between 2 new Sentiment Dictionary in 2 consecutive iterations are bellow a defined value. However, I'm not sure how do I do this, I wrote a custom transformer for this pipeline (since most of my transformer are not available in ml library). On the step of iteration, I'm not sure what's the best way to do this. Should I just put a while loop there and repeat everything, or there is a better mechanism?
Thank you for your time.

Resources