nan reward after hyperparameters optimization (ray, gym) - openai-gym

I launched a hyperopt algorithm on a custom gym environment.
this is my code :
config = {
"env": "affecta",
"sgd_minibatch_size": 1000,
"num_sgd_iter": 100,
"lr": tune.uniform(5e-6, 5e-2),
"lambda": tune.uniform(0.6, 0.99),
"vf_loss_coeff": tune.uniform(0.6, 0.99),
"kl_target": tune.uniform(0.001, 0.01),
"kl_coeff": tune.uniform(0.5, 0.99),
"entropy_coeff": tune.uniform(0.001, 0.01),
"clip_param": tune.uniform(0.4, 0.99),
"train_batch_size": 200, # taille de l'épisode
# "monitor": True,
# "model": {"free_log_std": True},
"num_workers": 6,
"num_gpus": 0,
# "rollout_fragment_length":3
# "batch_mode": "complete_episodes"
}
current_best_params = [{
'lr': 5e-4,
}]
config = explore(config)
optimizer = HyperOptSearch(metric="episode_reward_mean", mode="max", n_initial_points=20, random_state_seed=7, space=config)
# optimizer = ConcurrencyLimiter(optimizer, max_concurrent=4)
tuner = tune.Tuner(
"PPO",
tune_config=tune.TuneConfig(
# metric="episode_reward_mean", # the metric we want to study
# mode="max", # maximize the metric
search_alg=optimizer,
# num_samples will repeat the entire config 'num_samples' times == Number of trials dans l'output 'Status'
num_samples=10,
),
run_config=air.RunConfig(stop={"training_iteration": 3}, local_dir="test_avec_inoffensifs"),
# limite le nombre d'épisode pour chaque croisement d'hyperparamètres
)
results = tuner.fit()
The problem is that the dataframes returned at each iteration of the hyperopt algorithm contain nan values for rewards...
I tried using several environments, and it is still the same.
Thank you by advance :)

The returned rewards are independent HP optimization algorithm.
If the train_batch_size is 200 but you have tiny rollout fragment lengths, you probably run into an issue related to num_workers*rollout_fragment_length only being 18. So you collect very few samples (18!) on every iteration, train on them, but there is never a full episode to calculate the mean reward from, even after three iterations.
Collecting complete episodes, a larger rollout_fragment_length and/or a lower train_batch_size should do the trick.

Related

Identify best GridsearchCV scoring metric for food prediction in XGBoost

I am using GridSearchCV to find the best parameter that help me tune XGBoost for a food prediction algorithm.
I am struggling to identify the best scoring metric that would result in the best profit (sales margin minus wastage costs) as this is ultimately what I am looking for. In running the script below and plugging it into the data (I reserved some data for testing only), I noticed that a better R2 seems to be better than a better RMSE in obtaining a higher profit. But I am struggling to find an explanation which will help me guide to the best scoring method.
Here some infos on the situation:
It costs me 6 USD to produce the product and 9 USD to sell, so my margin is 3 USD. Therefore my wastage is 6 USD multiplied by (production minus sales quantities), whereas my earnings are sales quantities multiplied by 3.
Example: I produce 100, sell 70, waste 30 my earnings are 70*3 - 30*6 = 30
So I have an imbalance between sales and wastage.
Main Question: Which scoring metric puts a higher penalty weight on the over-prediction?
My current code:
X = consumption[feature_names]
y = consumption['Meal1']
data_dmatrix = xgb.DMatrix(data=X,label=y)
# Create the parameter grid: gbm_param_grid
gbm_param_grid = {
'min_child_weight':[1, 2],
'gamma': [0.05,0.06],
'reg_alpha':range(1, 2),
'colsample_bytree': [0.22, 0.23],
'n_estimators': range(28, 29),
'max_depth': range(3, 8),
'reg_alpha':range(1, 2),
'reg_lambda':range(1, 2),
'subsample': [0.7,0.8,0.9],
'learning_rate': [0.1,0.2],
}
fixed_params = {'objective':'reg:squarederror','booster':'gbtree' }
# Instantiate the regressor: gbm
gbm = xgb.XGBRegressor(**fixed_params)
# Perform grid search: grid_mse
grid_mse = GridSearchCV(estimator=gbm, param_grid=gbm_param_grid, scoring="r2", cv=5, verbose=1)
# Fit grid_mse to the data
grid_mse.fit(X,y)
# Print the best parameters and lowest RMSE
print("Best parameters found: ", grid_mse.best_params_)
print("Lowest Score found: ", np.sqrt(np.abs(grid_mse.best_score_)))

Tensorflow model's weight updated on sess.run

I am struggling with the fact that my weights in the model get updated when I run sess.run(without reference to train step).
I try to feed my model with variables to get the estimated outputs, but when I run the sess.run the weights get updated.
### in the training phase ####
X_eval, Y_eval, O_eval, W_eval, cost_eval, train_step_eval = sess.run([X, Y, O_out, W, cost, train_step], feed_dict={X:x_batch , Y:y_batch})
### when the training is finished (closed for loop) ###
print(W_eval)
Y_out, W_eval2 = sess.run([O_out, W], feed_dict = {X:labeled_features[:,: - n_labels], Y:labeled_features[:,- n_labels :]})
print(W_eval2)
When I compare W_eval and W_eval2 they are not the same, which I do not understand why.
Could you please point me to the right direction, why the weights are not the same?
'w3': array([[-2.9685912],
[-3.215485 ],
[ 3.8806837],
[-3.331745 ],
[-3.3904853]], dtype=float32
'w3': array([[-2.9700036],
[-3.2168453],
[ 3.8804765],
[-3.3330843],
[-3.3922129]], dtype=float32
Thank you in advance.
EDIT Added W_eval assignment.
Your code
### in the training phase ####
X_eval, Y_eval, O_eval, W_eval, cost_eval, train_step_eval = sess.run([X, Y, O_out, W, cost, train_step], feed_dict={X:x_batch , Y:y_batch})
### when the training is finished (closed for loop) ###
print(W_eval)
Y_out, W_eval2 = sess.run([O_out, W], feed_dict = {X:labeled_features[:,: - n_labels], Y:labeled_features[:,- n_labels :]})
print(W_eval2)
still executes train_step. A simpler version to understand what is going on is:
import tensorflow as tf
a = tf.get_variable('a', initializer=42.)
train_step = a.assign(a + 1)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
value, _ = sess.run([a, train_step]) # will update a
print(value)
value = sess.run([a]) # will not update a
print(value)
value = sess.run([a]) # will not update a
print(value)
gives the output
42.0
[43.0]
[43.0]
Another thing to check is if x_batch == labeled_features[:,: - n_labels] holds.

sklearn Cross validation score gives the same results for every number of folds

I can't figure out why cross validation gives me always the same accuracy (0.92), no matter how much folds i use.
Even when i delete parameter cv=10 it gives me the same result.
#read preprocessed data
traindata = ast.literal_eval(open('pretprocesirano.txt').read())
testdata = ast.literal_eval(open('pretprocesiranoTEST.txt').read())
#create word vector
vectorizer= CountVectorizer(tokenizer=lambda x:x.split(), min_df=3, max_features=300)
traindataCV=vectorizer.fit_transform(traindata)
#save wordlist
wordlist=vectorizer.vocabulary_
#save vectorizer
SavedVectorizer = CountVectorizer(vocabulary=wordlist)
#transform test data
testdataCV=SavedVectorizer.transform(testdata)
#modeling-NaiveBayes
clf = MultinomialNB()
clf.fit(traindataCV, label_train)
#cross validation score
CrossValScore = cross_val_score(clf, traindataCV, label_train, cv=10)
print("Accuracy CrossValScore: %0.3f" %CrossValScore.mean())
I tried this way too, and i also got the same results (0.92). This happens even when i change the number of folds, or remove it.
from sklearn.model_selection import KFold
CrossValScore = cross_val_score(clf, traindataCV, label_train, cv=KFold(10, shuffle=False, random_state=0))
print("Accuracy CrossValScore: %0.3f" %CrossValScore.mean())
Here are some samples:
traindata= ['ucg investment bank studying unicredit intesa paschi merger sole', 'mtoken sredstva autentifikacije intesa line umesto mini cda cega line vise moze koristi aktivacija', 'pll intesa', 'intesa and unicredit banka asset management the leading italia lenders are both after more fee income but url', 'about write intesa scene colbie cailat fosterthepeople that involves sexy taj between these url']
testdata= ['naumovic samo privilegovani nije delatnosti moci imati hit nama traziti depozit rimuje mentionpositive', 'breaking unicredit board okays launch bad loans vehicle with intesa kkr read more url', 'postoji promocija kupovina telefon rate telefon banka popust pretplata url', 'direktor politike haha struja obecao stan svi zaposliti kredit komercijalna banka', 'forex update unicredit and intesa pool bln euros bad loans kkr vehicle url']
label_train=[0 1 0 0 0]
label_test=[1 0 1 1 0]

Problems using poly kernel in GridSearchCV and SVM classifier

I am trying to do a grid search using a SVM classifier.
Consider my data and target that have been parsed from file and input to numpy arrays.
I then preprocess them.
# Transform the data to have zero mean and unit variance.
zeroMeanUnitVarianceScaler = preprocessing.StandardScaler().fit(data)
zeroMeanUnitVarianceScaler.transform(data)
scaledData = data
# Transform the target to have range [-1, 1].
scaledTarget = np.empty([161L,], dtype=int)
for i in range(len(target)):
if(target[i] == 'Malignant'):
scaledTarget[i] = 1
if(target[i] == 'Benign'):
scaledTarget[i] = -1
I now try to set up my grid and fit the scaled data to targets.
# Generate parameters for parameter grid.
CValues = np.logspace(-3, 3, 7)
GammaValues = np.logspace(-3, 3, 7)
kernelValues = ('poly', 'sigmoid')
# kernelValues = ('linear', 'rbf', 'sigmoid')
degreeValues = np.array([0, 1, 2, 3, 4])
coef0Values = np.logspace(-3, 3, 7)
# Generate the parameter grid.
paramGrid = dict(C=CValues, gamma=GammaValues, kernel=kernelValues,
coef0=coef0Values)
# Create and train a SVM classifier using the parameter grid and with
stratified shuffle split.
stratifiedShuffleSplit = StratifiedShuffleSplit(n_splits = 10, test_size =
0.25, train_size = None, random_state = 0)
clf = GridSearchCV(estimator=svm.SVC(), param_grid=paramGrid,
cv=stratifiedShuffleSplit, n_jobs=1)
clf.fit(scaledData, scaledTarget)
If I uncomment the line kernelValues = ('linear', 'rbf', 'sigmoid'), then the code runs in approximately 50 seconds on my 16 GB i7-4950 3.6 GHz machine running windows 10.
However, if I try to run the code as is with 'poly' as a possible kernel value, then the code hangs forever. For example, I ran it yesterday overnight and it did not return anything when I got back in the office today.
Interestingly enough, if I try to create a SVM classifier with a poly kernel, it returns a result immediately
clf = svm.SVC(kernel='poly',degree=2)
clf.fit(data, target)
It hangs up when I do the above code. I have not tried other cv methods to see if that changes anything.
Is this a bug in sci-kit learn? Am I doing things properly? On a side note, is my method of doing gridsearch/cross validation using GridSearchCV and StratifiedShuffleSplit sensible? It seems to me the most brute force (i.e. time consuming) but robust method.
Thank you!

ALS model - predicted full_u * v^t * v ratings are very high

I'm predicting ratings in between processes that batch train the model. I'm using the approach outlined here: ALS model - how to generate full_u * v^t * v?
! rm -rf ml-1m.zip ml-1m
! wget --quiet http://files.grouplens.org/datasets/movielens/ml-1m.zip
! unzip ml-1m.zip
! mv ml-1m/ratings.dat .
from pyspark.mllib.recommendation import Rating
ratingsRDD = sc.textFile('ratings.dat') \
.map(lambda l: l.split("::")) \
.map(lambda p: Rating(
user = int(p[0]),
product = int(p[1]),
rating = float(p[2]),
)).cache()
from pyspark.mllib.recommendation import ALS
rank = 50
numIterations = 20
lambdaParam = 0.1
model = ALS.train(ratingsRDD, rank, numIterations, lambdaParam)
Then extract the product features ...
import json
import numpy as np
pf = model.productFeatures()
pf_vals = pf.sortByKey().values().collect()
pf_keys = pf.sortByKey().keys().collect()
Vt = np.matrix(np.asarray(pf_vals))
full_u = np.zeros(len(pf_keys))
def set_rating(pf_keys, full_u, key, val):
try:
idx = pf_keys.index(key)
full_u.itemset(idx, val)
except:
pass
set_rating(pf_keys, full_u, 260, 9), # Star Wars (1977)
set_rating(pf_keys, full_u, 1, 8), # Toy Story (1995)
set_rating(pf_keys, full_u, 16, 7), # Casino (1995)
set_rating(pf_keys, full_u, 25, 8), # Leaving Las Vegas (1995)
set_rating(pf_keys, full_u, 32, 9), # Twelve Monkeys (a.k.a. 12 Monkeys) (1995)
set_rating(pf_keys, full_u, 335, 4), # Flintstones, The (1994)
set_rating(pf_keys, full_u, 379, 3), # Timecop (1994)
set_rating(pf_keys, full_u, 296, 7), # Pulp Fiction (1994)
set_rating(pf_keys, full_u, 858, 10), # Godfather, The (1972)
set_rating(pf_keys, full_u, 50, 8) # Usual Suspects, The (1995)
recommendations = full_u*Vt*Vt.T
top_ten_ratings = list(np.sort(recommendations)[:,-10:].flat)
print("predicted rating value", top_ten_ratings)
top_ten_recommended_product_ids = np.where(recommendations >= np.sort(recommendations)[:,-10:].min())[1]
top_ten_recommended_product_ids = list(np.array(top_ten_recommended_product_ids))
print("predict rating prod_id", top_ten_recommended_product_ids)
However the predicted ratings seem way too high:
('predicted rating value', [313.67320347694897, 315.30874327316576, 317.1563289268388, 317.45475214423948, 318.19788673744563, 319.93044594688428, 323.92448427140653, 324.12553531632761, 325.41052886977582, 327.12199687047649])
('predict rating prod_id', [49, 287, 309, 558, 744, 802, 1839, 2117, 2698, 3111])
This appears to be incorrect. Any tips appreciated.
I think the approach mentioned would work if you only care about the ranking of the movies. If you want to get an actual rating there seem to be something of in terms dimension/scaling.
The idea here, is to guess the latent representation of your new user. Normally, for a user already in the factorization, user i, you have his latent representation u_i (the ith row in model.userFeatures()) and you get his rating for a given movie (movie j) using model.predict which basically multiply u_i by the latent representation of the product v_j. you can get all the predicted ratings at once if you multiply with the whole v: u_i*v.
For a new user you have to guess what is his latent representation u_new from full_u_new.
Basically you want 50 coefficients that represent your new user affinity towards each of the latent product factor.
For simplicity and since it was enough for my implicit feedback use case, I simply used the dot product, basically projecting the new user on the product latent factor: full_u_new*V^t gives you 50 coefficient, the coeff i being how much your new user looks like product latent factor i. and it works especially well with implicit feedback.
So, using the dot product will give you that but it won't be scaled and it explains the high scores you are seeing.
To get usable scores you need a more accurately scaled u_new, I think you could get that using the cosine similarity, like they did [here]https://github.com/apache/incubator-predictionio/blob/release/0.10.0/examples/scala-parallel-recommendation/custom-query/src/main/scala/ALSAlgorithm.scala
The approach mentioned by #ScottEdwards2000 in the comment is interesting too, but rather different. You could indeed look for the most similar user(s) in your training set. If there are more than one you could get the average. I don't think it would do too badly but it is a really different approach and you need the full rating matrix (to find the most similar user(s)). Getting one close user should definitely solve the scaling problem. If you manage to make both approach work you could compare the results!

Resources