Training process to fine-tune a LLM to answer questions regarding movies in dialogue - nlp

I'm trying to fine-tune a GPT2 pre-trained model so that it can talk about movies in a dialogue manner (inspired by ChatGPT). And Im not quite sure on how to go about doing it.
This was my initial idea but I'm not entirely confident that it is the way to go:
Freeze most of the initial attention blocks and fine-tune the rest on lots of movie plots and information (release year, cast, etc). In order to acquire knowledge;
Freeze most of the initial attention blocks and fine-tune on a dataset of questions-answers regarding movies (using only the questions and answers, discarding the context since the knowledge should have been acquired in step 1). Using a prompt format that will simulate the conversations, for example: <|startofuserinput|> who dies at the end of the spider-man 3? <|endofuserinput|> \n####\n Harry, Peter's friend dies at the end of spider-man 3 to help Peter. <|endofresponse|>
Use the resulting model to keep generating responses to the user, taking into account the most recent prompt as well as the conversation history.
Is there anything inherently wrong with this method, any obvious flaw? If so, how should I proceed to accomplish such a task?

Related

I'm trying to finetune a bank-bot

Of course, it's not really a bankbot yet.
The source data that has been given to me is a hundred rows or so of telephone scripts (probably artificial in nature).
I'm trying to use them to fine-tune the davinci model on open-ai. Just feeding them in in the
{prompt:question, completion:answer}
format has not yielded great results. I've also tried sending the conversation back to it in the prompts in hopes of improving the outcome, but that's been a mixed bag too as it's prone to repetition.
On the openAI website, there I have found this:
{"prompt":"Summary: <summary of the interaction so far>\n\nSpecific information:<for example order details in natural language>\n\n###\n\nCustomer: <message1>\nAgent: <response1>\nCustomer: <message2>\nAgent:", "completion":" <response2>\n"}
{"prompt":"Summary: <summary of the interaction so far>\n\nSpecific information:<for example order details in natural language>\n\n###\n\nCustomer: <message1>\nAgent: <response1>\nCustomer: <message2>\nAgent: <response2>\nCustomer: <message3>\nAgent:", "completion":" <response3>\n"}
Can someone point me to some examples of this in action with some actual data plugged into it in a q&a bot context? Thanks.

How do I force learning in the new UI when utterances are shown as Not Learned?

I want there to be a way to force these two items to become learned or at least to reveal to me why they are not Not Learned. But I can't figure out what if anything I am supposed to do.
I have run into this a couple of times where I was "sure" my training was correct, yet I got the "not learned" status
I suggest you check to see if runs properly using the "Aligned NL" feature in the V2 training when you choose Run in the Simulator
If that works, the fix I'm using might work for you.
I found that if I add a vocab to work with my enum, then my training gets learned
When training utterances aren't being learned, odds are that there are conflicting examples that Bixby cannot reconcile.
In this case, I would recommend first attempting to teach Bixby a few "follow" utterances until you can confirm Bixby is able to learn that. Then, attempt a few "unfollow" utterances. If the previously learned "follow" utterances become unlearned, the "follow" and "unfollow" utterances are conflicting.
I would also recommend that you look through the training documentation as it explains the nuance of training Bixby.

How recommendation systems work in production?

I am newbie to this and have started learning Spark. I have general question regarding how recommendation systems work in production environments or rather how is it deployed to production.
Below is a small example of a system for e-commerce website.
I understand that once the system is built, at the start we can feed the data to the engine(we can run the jobs or run the program/process for the engine) and it will give the results, which will be stored back to the database against each user. Next time when the user logins the website can fetch the data, previously computed data by the engine from the database and show as recommended items.
The confusion I have is how ‘these systems’, on the fly generate outputs based on the user activity. e.g. If I view a video on Youtube and I refresh the page, Youtube starts showing me similar videos.
So, do we have these recommendation engine running always in background and they keep updating the results based on the user’s activity? How it is done so fast and quick?
Short answer:
Inference is fast; training is the slow part.
Retraining is a tweak to the existing model, not from scratch.
Retraining is only periodic, rarely "on demand".
Long answer:
They generate this output based on a long baseline of user activity. This is not something the engine derives from your recent view: anyone viewing that video on the same day will get the same recommendations.
Some recommender systems will take your personal history into account, but this is usually just a matter of sorting the recommended list with respect to the predicted ratings for you personally. This is done with a generic model from your own ratings by genre and other characteristics of each video.
In general, updates are done once a day, and the changes are small. They don't retrain a model from scratch every time you make a request; the parameters and weights of your personal model are stored under your account. When you add a rating (not just viewing a video), your account gets flagged, and the model will be updated at the next convenient opportunity.
This update will begin with your current model, and will need to run for only a couple of epochs -- unless you've rated a significant number of videos with ratings that depart significantly from the previous predictions.
To add to Prune's answer, depending on the design of the system it may also be possible to take into account the user's most recent interactions.
There are two main ways of doing so:
Fold-in: your most recent interactions are used to recompute the parameters of your personal model while leaving the remainder of the model unchanged. This is usually quite fast and could be done in response to every request.
A model that takes interactions as input directly: some models can compute recommendations directly by taking a user's interactions as input, without storing user representations that can only be updated via retraining. For example, you can represent the user by simply averaging the representations of the items she most recently interacted with. Recomputing this representation can be done in response to every request.
In fact, this last approach seems to form part of the YouTube system. You can find details in the Covington et al. Deep Neural Networks For YouTube Recommendations paper.

suggest list of how-to articles based on text content

I have 20,000 messages (combination of email and live chat) between my customer and my support staff. I also have a knowledge base for my product.
Often times, the questions customers ask are quite simple and my support staff simply point them to the right knowledge base article.
What I would like to do, in order to save my support staff time, is to show my staff a list of articles that may likely be relevant based on the initial user's support request. This way they can just copy and paste the link to the help article instead of loading up the knowledge base and searching for the article manually.
I'm wondering what solutions I should investigate.
My current line of thinking is to run analysis on existing data and use a text classification approach:
For each message, see if there is a response with a link to a how-to article
If Yes, extract key phrases (microsoft cognitive services)
TF-IDF?
Treat each how-to as a 'classification' that belongs to sets of key phrases
Use some supervised machine learning, support vector machines maybe to predict which 'classification, aka how-to article' belongs to key phrase determined from a new support ticket.
Feed new responses back into the set to make the system smarter.
Not sure if I'm over complicating things. Any advice on how this is done would be appreciated.
PS: naive approach of just dumping 'key phrases' into search query of our knowledge base yielded poor results since the content of the help article is often different than how a person phrases their question in an email or live chat.
A simple classifier along the lines of a "spam" classifier might work, except that each FAQ would be a feature as opposed to a single feature classifier of spam, not-spam.
Most spam-classifiers start-off with a dictionary of words/phrases. You already have a start on this with your naive approach. However, unlike your approach a spam classifier does much more than a text search. Essentially, in a spam classifier, each word in the customer's email is given a weight and the sum of weights indicates if the message is spam or not-spam. Now, extend this to as many features as FAQs. That is, features like: FAQ1 or not-FAQ1, FAQ2 or not-FAQ2, etc.
Since your support people can easily identify which of the FAQs an e-mail requires then using a supervised learning algorithm would be appropriate. To reduce the impact of any miss-classification errors, then consider the application presenting a support person with the customer's email followed by the computer generated response and all the support person would have to-do is approve the response or modify it. Modifying a response should result in a new entry in the training set.
Support Vector Machines are one method to implement machine learning. However, you are probably suggesting this solution way too early in the process of first identifying the problem and then getting a simple method to work, as well as possible, before using more sophisticated methods. After all, if a multi-feature spam classifier works why invest more time and money in something else that also works?
Finally, depending on your system this is something I would like to work-on.

If I interrupt sklearn grid_search.fit() before completion can I access the current .best_score_, .best_params_?

If I interrupt grid_search.fit() before completion will I loose everything it's done so far?
I got a little carried away with my grid search and provided an obscenely large search space. I can see scores that I'm happy with already but my stdout doesn't display which params led to those scores..
I've searched the docs: http://scikit-learn.org/stable/modules/generated/sklearn.grid_search.GridSearchCV.html
And there is a discussion from a couple years ago about adding a feature for parrallel search here: https://sourceforge.net/p/scikit-learn/mailman/message/31036457/
But nothing definitive. My search has been working for ~48hrs, so I don't want to loose what's been discovered, but I also don't want to continue.
Thanks!
welcome to SO!
To my understanding there isn't any intermediate variables that get returned off the grid_search function, only the resulting grid and their scores (see here for more information grid search.py).
So if you cancel it you might lose the work that's been done so far.
But a bit of advice, 48 hours is a long time (obviously this depends on the rows, columns and number of hyper parameters being tuned). You might want to start with a more broad grid search first and then refine your parameter search off that.
That will benefit you two ways:
Run time might end up being much shorter (see caveats above) meaning you don't have to wait so long and risk losing results
You might find that your model prediction score is only impacted by one or two hyper parameters, letting you keep the other searches more broad and focussing your efforts on the parameters that influence your prediction accuracy most.
Hopefully by the time I've written this response your grid search has completed!!

Resources