What's the difference between RM model and value function? In the papar, the author ofent mentions them together. I'm very confused.
In page 42, C.4, under the title of training details of RLHF, he also mentions the learning rate of value function. However, we have know the learning rate of RM model in C.2.
Here is the content of c.4:
As previously mentioned, for all PPO models we use a 6B RM and a 6B value function, and the latter is initialized from the former. By using the same 6B reward model and value function on policies of all model sizes, it’s easier to compare the effect of policy model size on policy performance. A fixed learning rate of 9e-6 for the value function is used for 1.3B and the 6B policies and 5e-6 for the 175B policy.
Paper link:
https://cdn.openai.com/papers/Training_language_models_to_follow_instructions_with_human_feedback.pdf
I have read the whole paper and I'm still confused.
Related
I am trying to implement BERT Model for Question Answering tasks, but Its a little different from the existing Q&A models,
The Model will be given some text(3-4 pages) and will be asked questions based on the text, and the expected answer may be asked in short or descriptive subjective type
I tried to implement BERT, for this task.
The Problems I am facing:
The input token limit for BERT is 512.
How to get the answer in long form, which can describe any instance, process, event, etc.
Try longformer which can have input length 0f 4096 tokens, or even 16384 tokens with gradient checkpointing. See details in https://github.com/allenai/longformer. Or on huggingface model hub https://huggingface.co/docs/transformers/model_doc/longformer.
I use Azure object anchor service to convert a geometry model with obj format. This model is a human head model. Before conversion, the nose, ear and eyes are clearly present in the model. But after conversion, all these details are missing. What I obtained is a very coarse head model. Due to the missing of the details, the detection result is not accurate. I would like to know how to keep the details as I do model conversion?
Thanks.
YL
For this one, it is likely the model size is just too small. A human head would be much smaller than the 1-meter threshold. The service requires larger objects per the documentation here:
https://learn.microsoft.com/en-us/azure/object-anchors/overview#asset-requirements
"Each dimension of an asset should be between 1 meter to 10 meters, and the file size should be less than 150 MB."
I understand the task of retrieval - I have gone through the code; also looked into alternative approaches like SCNN which is an ultra-fast nearest neighbor.
However, I still have hard time understanding the mechanism of the following code
# Create a model that takes in raw query features, and
index = tfrs.layers.factorized_top_k.BruteForce(model.user_model)
# recommends movies out of the entire movies dataset.
index.index_from_dataset(
tf.data.Dataset.zip((movies.batch(100), movies.batch(100).map(model.movie_model)))
)
# Get recommendations.
_, titles = index(tf.constant(["42"]))
print(f"Recommendations for user 42: {titles[0, :3]}")
model.user_model is trained and by now should return embeddings of user_id. The input for BruteForce layer is model.user_model; and then it should be indexed ?
I guess the output is given user_id 42, return 3 titles, out of that movies.batch(100). but I can't understand the function of BruteForce and indexing !
The BruteForce layer tests all the combinations between embeddings that are extracted from the last layer of the model.
According to the tensorflow documentation for the layer the layer retuns the index of the top k results (by default 10) indexes that are the closest to each index.
The Azure F0 (free) plan allow users to train their own model but not allowing them to deploy and use the model trained. So what is the purpose of allowing users to train if it could not be used?
You could see the result of your training, as long as your training data is less than 2 million characters. The result includes a BLEU score, and the translation of your test set.
From the sklearn-style API of XGBClassifier, we can provide eval examples for early-stopping.
eval_set (list, optional) – A list of (X, y) pairs to use as a
validation set for early-stopping
However, the format only mentions a pair of features and labels. So if the doc is accurate, there is no place to provide weights for these eval examples.
Am I missing anything?
If it's not achievable in the sklearn-style, is it supported in the original (i.e. non-sklearn) XGBClassifier API? A short example will be nice, since I never used that version of the API.
As of a few weeks ago, there is a new parameter for the fit method, sample_weight_eval_set, that allows you to do exactly this. It takes a list of weight variables, i.e. one per evaluation set. I don't think this feature has made it into a stable release yet, but it is available right now if you compile xgboost from source.
https://github.com/dmlc/xgboost/blob/b018ef104f0c24efaedfbc896986ad3ed1b66774/python-package/xgboost/sklearn.py#L235
EDIT - UPDATED per conversation in comments
Given that you have a target-variable representing real-valued gain/loss values which you would like to classify as "gain" or "loss", and you would like to make sure the validation-set of the classifier weighs the large-absolute-value gains/losses heaviest, here are two possible approaches:
Create a custom classifier which is just XGBoostRegressor fed to a treshold where the real-valued regression predictions are converted to 1/0 or "gain"/"loss" classifications. The .fit() method of this classifier would just call .fit() of xgbregressor, while .predict() method of this classifier would call .predict() of the regressor and then return the thresholded category predictions.
you mentioned you would like to try weighting the treatment of the records in your validation set, but there is no option for this in xgboost. The way to implement this would be to implement a custom eval-metric. However, you pointed out that eval_metric must be able to return a score for a single label/pred record at a time, so it couldn't accept all your row-values and perform the weighting in the eval metric. The solution to this you mentioned in your comment was "create a callable which has a ref to all validation examples, pass the indices (instead of labels and scores) into eval_set, use the indices to fetch labels and scores from within the callable and return metric for each validation examples." This should also work.
I would tend to prefer option 1 as more straightforward, but trying two different approaches and comparing results is generally a good idea if you have the time, so interested how these turn out for you.