I have been in the process of trying to understand how the reinforcement learning frameworks in AWS work. I have recently moved onto the COACH framework after having numerous problems with versioning whilst working with RAY. I still cannot understand how to configure the presets properly. The training loops sometimes go on for ever and do not stop when I expect it to. I am also unsure of how to fix the number of steps per episode so the model doesn't keep on training.
The reward in the image here keeps going up to 3.5 million which I do not want. And as you can see is very unstable
I have tried messing around with a couple of the preset configs particularly for the DQN algorithm. I changed the following parameters
schedule_params.improve_steps = TrainingSteps(100000) #between 100 and 1000000
schedule_params.steps_between_evaluation_periods = EnvironmentEpisodes(100) # between 10 and 100
schedule_params.evaluation_steps = EnvironmentEpisodes(10) #between 1 and 10
schedule_params.heatup_steps = EnvironmentSteps(10) #between 10 and 100
This is the preset for the DQN:
from rl_coach.agents.dqn_agent import DQNAgentParameters
from rl_coach.base_parameters import VisualizationParameters, PresetValidationParameters, DistributedCoachSynchronizationType, EmbedderScheme
from rl_coach.architectures.embedder_parameters import InputEmbedderParameters
from rl_coach.schedules import ConstantSchedule
from rl_coach.core_types import TrainingSteps, EnvironmentEpisodes, EnvironmentSteps
from rl_coach.environments.gym_environment import GymVectorEnvironment
from rl_coach.graph_managers.basic_rl_graph_manager import BasicRLGraphManager
from rl_coach.graph_managers.graph_manager import ScheduleParameters
from rl_coach.memories.memory import MemoryGranularity
from rl_coach.schedules import LinearSchedule
from rl_coach.filters.observation.observation_normalization_filter import ObservationNormalizationFilter
from rl_coach.filters.observation.observation_move_axis_filter import ObservationMoveAxisFilter
from rl_coach.architectures.layers import Dense
####################
# Graph Scheduling #
####################
schedule_params = ScheduleParameters()
schedule_params.improve_steps = TrainingSteps(100000)
schedule_params.steps_between_evaluation_periods = EnvironmentEpisodes(100)
schedule_params.evaluation_steps = EnvironmentEpisodes(10)
schedule_params.heatup_steps = EnvironmentSteps(10)
#########
# Agent #
#########
agent_params = DQNAgentParameters()
# DQN params
agent_params.algorithm.num_steps_between_copying_online_weights_to_target = EnvironmentSteps(100)
agent_params.algorithm.discount = 0.99
agent_params.algorithm.num_consecutive_playing_steps = EnvironmentSteps(1)
# NN configuration
agent_params.network_wrappers['main'].learning_rate = 0.00025
agent_params.network_wrappers['main'].replace_mse_with_huber_loss = False
# agent_params.network_wrappers['main'].input_embedders_parameters['observation'].scheme = [Dense(1)]
agent_params.network_wrappers['main'].batch_size = 64
# agent_params.pre_network_filter.add_observation_filter('observation', 'move_axis',
# ObservationMoveAxisFilter(0,0))
# agent_params.pre_network_filter.add_observation_filter('observation', 'normalize_observation',
# ObservationNormalizationFilter(name='normalize_observation'))
# ER size
agent_params.memory.max_size = (MemoryGranularity.Transitions, 40000)
# E-Greedy schedule
agent_params.exploration.epsilon_schedule = LinearSchedule(1.0, 0.01, 10000)
################
# Environment #
################
env_params = GymVectorEnvironment(level='env:ArrivalSim')
env_params.additional_simulator_parameters = {'price': 30.0 }
# env_params.observation_space_type = ObservationSpaceType
#################
# Visualization #
#################
vis_params = VisualizationParameters()
vis_params.dump_gifs = False
########
# Test #
########
preset_validation_params = PresetValidationParameters()
preset_validation_params.test = False
preset_validation_params.min_reward_threshold = 8000
preset_validation_params.max_episodes_to_achieve_reward = 250
graph_manager = BasicRLGraphManager(agent_params=agent_params, env_params=env_params,
schedule_params=schedule_params, vis_params=vis_params,
preset_validation_params=preset_validation_params)
the problem is mainly around graph scheduling.
I expect to be able to set up a training loop which has a fixed number of steps each episode and doesn't continue on to infinity. I also hope to control the number of episodes.
Please take a look at the RL examples in our public Github repo: https://github.com/awslabs/amazon-sagemaker-examples/tree/master/reinforcement_learning
There are a few coach-based examples which may help here.
Related
Short form:
I am trying to figure out how can I run the hyperparam within a training step (i.e. train_step = PythonScriptStep(...)) in the pipeline, I am not sure where shall I put the "config=hyperdrive"
Long form:
General:
# Register the environment
diabetes_env.register(workspace=ws)
registered_env = Environment.get(ws, 'diabetes-pipeline-env')
# Create a new runconfig object for the pipeline
run_config = RunConfiguration()
# Use the compute you created above.
run_config.target = ComputerTarget_Crea
# Assign the environment to the run configuration
run_config.environment = registered_env
Hyperparam:
script_config = ScriptRunConfig(source_directory=experiment_folder,
script='diabetes_training.py',
# Add non-hyperparameter arguments -in this case, the training dataset
arguments = ['--input-data', diabetes_ds.as_named_input('training_data')],
environment=sklearn_env,
compute_target = training_cluster)
# Sample a range of parameter values
params = GridParameterSampling(
{
# Hyperdrive will try 6 combinations, adding these as script arguments
'--learning_rate': choice(0.01, 0.1, 1.0),
'--n_estimators' : choice(10, 100)
}
)
# Configure hyperdrive settings
hyperdrive = HyperDriveConfig(run_config=script_config,
hyperparameter_sampling=params,
policy=None, # No early stopping policy
primary_metric_name='AUC', # Find the highest AUC metric
primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,
max_total_runs=6, # Restict the experiment to 6 iterations
max_concurrent_runs=2) # Run up to 2 iterations in parallel
# Run the experiment if I only want to run hyperparam alone without the pipeline
#experiment = Experiment(workspace=ws, name='mslearn-diabetes-hyperdrive')
#run = experiment.submit(**config=hyperdrive**)
PipeLine:
prep_step = PythonScriptStep(name = "Prepare Data",
source_directory = experiment_folder,
script_name = "prep_diabetes.py",
arguments = ['--input-data', diabetes_ds.as_named_input('raw_data'),
'--prepped-data', prepped_data_folder],
outputs=[prepped_data_folder],
compute_target = ComputerTarget_Crea,
runconfig = run_config,
allow_reuse = True)
# Step 2, run the training script
train_step = PythonScriptStep(name = "Train and Register Model",
source_directory = experiment_folder,
script_name = "train_diabetes.py",
arguments = ['--training-folder', prepped_data_folder],
inputs=[prepped_data_folder],
compute_target = ComputerTarget_Crea,
runconfig = run_config,
allow_reuse = True)
# Construct the pipeline
pipeline_steps = [prep_step, train_step]
pipeline = Pipeline(workspace=ws, steps=pipeline_steps)
print("Pipeline is built.")
# Create an experiment and run the pipeline
**#How do I need to change these below lines to use hyperdrive????**
experiment = Experiment(workspace=ws, name = 'mslearn-diabetes-pipeline')
pipeline_run = experiment.submit(pipeline, regenerate_outputs=True)
Not sure where I need to put config=hyperdrive in the Pipeline section?
here's how to combine hyperparameters with an AML pipeline: https://learn.microsoft.com/en-us/python/api/azureml-pipeline-steps/azureml.pipeline.steps.hyperdrivestep?view=azure-ml-py
Alternatively, here's a sample notebook: https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-parameter-tuning-with-hyperdrive.ipynb
I would like to use multiprocessing to compute the SIFT extraction and SIFT matching for object detection.
For now, I have a problem with the return value of the function that does not insert data in the dictionary.
I'm using Manager class and image that are open inside the function. But does not work.
Finally, my idea is:
Computer the keypoint for every reference image, use this keypoint as a parameter of a second function that compares and match with the keypoint and descriptors of the test image.
My code is:
# %% Import Section
import cv2
import numpy as np
from matplotlib import pyplot as plt
import os
from datetime import datetime
from multiprocessing import Process, cpu_count, Manager, Lock
import argparse
# %% path section
tests_path = 'TestImages/'
references_path = 'ReferenceImages2/'
result_path = 'ResultParametrizer/'
#%% Number of processor
cpus = cpu_count()
# %% parameter section
eps = 1e-7
useTwo = False # using the m and n keypoint better with False
# good point parameters
distanca_coefficient = 0.75
# gms parameter
gms_thresholdFactor = 3
gms_withRotation = True
gms_withScale = True
# flann parameter
flann_trees = 5
flann_checks = 50
#%% Locker
lock = Lock()
# %% function definition
def keypointToDictionaries(keypoint):
x, y = keypoint.pt
pt = float(x), float(y)
angle = float(keypoint.angle) if keypoint.angle is not None else None
size = float(keypoint.size) if keypoint.size is not None else None
response = float(keypoint.response) if keypoint.response is not None else None
class_id = int(keypoint.class_id) if keypoint.class_id is not None else None
octave = int(keypoint.octave) if keypoint.octave is not None else None
return {
'point': pt,
'angle': angle,
'size': size,
'response': response,
'class_id': class_id,
'octave': octave
}
def dictionariesToKeypoint(dictionary):
kp = cv2.KeyPoint()
kp.pt = dictionary['pt']
kp.angle = dictionary['angle']
kp.size = dictionary['size']
kp.response = dictionary['response']
kp.octave = dictionary['octave']
kp.class_id = dictionary['class_id']
return kp
def rootSIFT(dictionary, image_name, image_path,eps=eps):
# SIFT init
image = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
sift = cv2.xfeatures2d.SIFT_create()
keypoints, descriptors = sift.detectAndCompute(image, None)
descriptors /= (descriptors.sum(axis=1, keepdims=True) + eps)
descriptors = np.sqrt(descriptors)
print('Finito di calcolare, PID: ', os.getpid())
lock.acquire()
dictionary[image_name]['keypoints'] = keypoints
dictionary[image_name]['descriptors'] = descriptors
lock.release()
def featureMatching(reference_image, reference_descriptors, reference_keypoints, test_image, test_descriptors,
test_keypoints, flann_trees=flann_trees, flann_checks=flann_checks):
# FLANN parameter
FLANN_INDEX_KDTREE = 1
index_params = dict(algorithm=FLANN_INDEX_KDTREE, trees=flann_trees)
search_params = dict(checks=flann_checks) # or pass empty dictionary
flann = cv2.FlannBasedMatcher(index_params, search_params)
flann_matches = flann.knnMatch(reference_descriptors, test_descriptors, k=2)
matches_copy = []
for i, (m, n) in enumerate(flann_matches):
if m.distance < distanca_coefficient * n.distance:
matches_copy.append(m)
gsm_matches = cv2.xfeatures2d.matchGMS(reference_image.shape, test_image.shape, keypoints1=reference_keypoints,
keypoints2=test_keypoints, matches1to2=matches_copy,
withRotation=gms_withRotation, withScale=gms_withScale,
thresholdFactor=gms_thresholdFactor)
#%% Starting reference list file creation
reference_init = datetime.now()
print('Start reference file list creation')
reference_image_process_list = []
manager = Manager()
reference_image_dictionary = manager.dict()
reference_image_list = manager.list()
for root, directories, files in os.walk(references_path):
for file in files:
if file.endswith('.DS_Store'):
continue
reference_image_path = os.path.join(root, file)
reference_name = file.split('.')[0]
image = cv2.imread(reference_image_path, cv2.IMREAD_GRAYSCALE)
reference_image_dictionary[reference_name] = {
'image': image,
'keypoints': None,
'descriptors': None
}
proc = Process(target=rootSIFT, args=(reference_image_list, reference_name, reference_image_path))
reference_image_process_list.append(proc)
proc.start()
for proc in reference_image_process_list:
proc.join()
reference_end = datetime.now()
reference_time = reference_end - reference_init
print('End reference file list creation, time required: ', reference_time)
I faced pretty much the same error. It seems that the code hangs at detectAndCompute in my case, not when creating the dictionary. For some reason, sift feature extraction is not multi-processing safe (to my understanding, it is the case in Macs but I am not totally sure.)
I found this in a github thread. Many people say it works but I couldn't get it worked. (Edit: I tried this later which works fine)
Instead I used multithreading which is pretty much the same code and works perfectly. Of course you need to take multithreading vs multiprocessing into account
librosa.feature.mfcc returns difference dimensions for the different audio file. so how to handle this case for training or testing the model
#test.py
import os
import pickle
import numpy as np
from scipy.io.wavfile import read
import librosa as mfcc
from sklearn import preprocessing
import warnings
warnings.filterwarnings("ignore")
def get_MFCC(sr,audio):
features = mfcc.feature.mfcc(audio,sr,n_mfcc=20, dct_type=2)
feat = np.asarray(())
for i in range(features.shape[0]):
temp = features[i,:]
if np.isnan(np.min(temp)):
continue
else:
if feat.size == 0:
feat = temp
else:
feat = np.vstack((feat, temp))
features = feat;
features = preprocessing.scale(features)
return features
#path to test data
source = "C:\\Users\\PrashuGupta\\Downloads\\datasets\\pygender\\test_data\\AudioSet\\female_clips\\"
#path to save trained model
modelpath = "C:\\Users\\Prashu Gupta\\Downloads\\datasets\\pygender\\"
gmm_files = [os.path.join(modelpath,fname) for fname in
os.listdir(modelpath) if fname.endswith('.gmm')]
models = [pickle.load(open(fname,'rb')) for fname in gmm_files]
genders = [fname.split("\\")[-1].split(".gmm")[0] for fname
in gmm_files]
files = [os.path.join(source,f) for f in os.listdir(source)
if f.endswith(".wav")]
for f in files:
print (f.split("\\")[-1])
audio,sr = mfcc.load(f, sr = 16000,mono = True)
features = get_MFCC(sr,audio)
scores = None
log_likelihood = np.zeros(len(models))
for i in range(len(models)):
gmm = models[i] #checking with each model one by one
scores = np.array(gmm.score(features))
log_likelihood[i] = scores.sum()
winner = np.argmax(log_likelihood)
print ("\tdetected as - ", genders[winner],"\n\tscores:female",log_likelihood[0],",male ", log_likelihood[1],"\n")
The error
Expected the input data X have 1800 features, but got 313 features in
scores = np.array(gmm.score(features))
Either you must truncate/pad files such that they are all the same size (say 5 seconds), or summarize the features for the file into a fixed length vector that does not depend on clip length (average/min/max), or you make the classifier operate on a stream of fixed-lenght feature windows (say 1 second).
I was trying to replicate this paper (which is about to the Heston Model) using QuantLib tool (python 3.5).
Following the Python Quantlib Cookbook I was able to setup the parameters of page 12 from the paper. Quantlib´s result is 0.0497495 which is slightly different from paper´s result (0.049521147).
So, my question is what is the cause of this difference? Is it possible that day account have something to do here?
Code following Cookbook with papers´s parameters:
from QuantLib import *
import numpy as np
import math
#parameters
strike_price = 2
payoff = PlainVanillaPayoff(Option.Call, strike_price)
#option data
maturity_date = Date(16, 4, 2028)
spot_price = 1
strike_price = 2
volatility = 0.16 # the historical vols for a year
dividend_rate = 0.000
option_type = Option.Call
risk_free_rate = 0.000
day_count = Actual365Fixed()
calendar = UnitedStates()
calculation_date = Date(16, 4, 2018)
Settings.instance().evaluationDate = calculation_date
# construct the European Option
payoff = PlainVanillaPayoff(option_type, strike_price)
exercise = EuropeanExercise(maturity_date)
european_option = VanillaOption(payoff, exercise)
# construct the Heston process
v0 = 0.16 #volatility*volatility # spot variance
kappa = 1
theta = 0.16
sigma = 2
rho = -0.8
spot_handle = QuoteHandle(SimpleQuote(spot_price))
flat_ts = YieldTermStructureHandle(FlatForward(calculation_date,
risk_free_rate, day_count))
dividend_yield = YieldTermStructureHandle(FlatForward(calculation_date,
dividend_rate, day_count))
heston_process = HestonProcess(flat_ts, dividend_yield,spot_handle,
v0, kappa,theta, sigma, rho)
engine = AnalyticHestonEngine(HestonModel(heston_process),0.01, 1000)
european_option.setPricingEngine(engine)
h_price = european_option.NPV()
print("The Heston model price is",h_price)
PD: I used QuantLib engine to double check my code (I must say I have no experience using QuantLib). I get the paper´s result using my code.
The difference is partly, but not entirely due to the day counter.
If you use day_count = SimpleDayCounter(), leaving all else the same the QuantLib result becomes 0.04964543.
The rest of the difference is because you set the "relative tolerance" in the AnalyticHestonEngine to 0.01. If you set it to a smaller value, e.g. to 0.001, you get an answer of 0.04951948, which is consistent with the answer obtained in the paper of 0.0495.
My co-worker and I have been setting up, configuring, and testing Dask for a week or so now, and everything is working great (can't speak highly enough about how easy, straightforward, and powerful it is), but now we are trying to leverage it for more than just testing and are running into an issue. We believe it's a fairly simple one related to syntax and an understanding gap. Any help to get it running is greatly appreciated. Any support in evolving our understanding of more optimal paths is also greatly appreciated.
We got fairly close with these two posts:
Dask: How would I parallelize my code with dask delayed?
Unpacking result of delayed function
High level flow:
Open data in pandas & clean it (we plan on moving this to a pipeline)
From there, convert the cleaned data set for regression into a dask data frame
Set the x & y variables and create all unique x combination sets
Create all unique formulas (y ~ x1 + x2 +0)
Run each individual formula set with the data through a linear lasso lars model to get the AIC for each formula for ranking
Current Issue:
Run each individual formula set (~1700 formulas) with the data (1 single data set which doesn’t vary with each run) on the dask cluster and get the results back
Optimize the calculation & return the final data
Code:
# In[]
# Imports:
import logging as log
import datetime as dat
from itertools import combinations
import numpy as np
import pandas as pd
from patsy import dmatrices
import sklearn as sk
from sklearn.linear_model import LogisticRegression, SGDClassifier, LinearRegression
import dask as dask
import dask.dataframe as dk
from dask.distributed import Client
# In[]
# logging, set the dask client, open & clean the data, pass into a dask dataframe
log.basicConfig(level=log.INFO,
format='%(asctime)s %(message)s',
datefmt="%m-%d %H:%M:%S"
)
c = Client('ip:port')
ST = dat.datetime.now()
data_pd = pd.read_csv('some.txt', sep="\t")
#fill some na/clean up the data a bit
data_pd['V9'] = data_pd.V9.fillna("Declined")
data_pd['y'] = data_pd.y.fillna(0)
data_pd['x1'] = data_pd.x1.fillna(0)
#output the clean data and re-import into dask, we could alse use from_pandas to get to dask dataframes
data_pd.to_csv('clean_rr_cp.csv')
data = dk.read_csv(r'C:\path\*.csv', sep=",")
# set x & y variables - the below is truncated
y_var = "y"
x_var = ['x1',
'x2',
'x3',
'x4',......
#list of all variables
all_var = list(y_var) + x_var
#all unique combinations
x_var_combos = [combos for combos in combinations(x_var,2)]
#add single variables for testing as well
for i in x_var:
x_var_combos.append((i,""))
# create formulas from our y, x variables
def formula(y_var, combo):
combo_len = len(combo)
if combo_len == 2:
formula = y_var +"~"+combo[0] +"+"+ combo[1]+"+0"
else:
formula = y_var +"~"+combo[0]+"+0"
return formula
#dask.delayed
def model_aic(dt, formula):
k = 2
y_df, x_df = dmatrices(formula, dt, return_type = 'dataframe')
y_df = np.ravel(y_df)
log.info('dmatrices successful')
LL_model = sk.linear_model.LassoLarsIC(max_iter = 100)
AIC_Value = min(LL_model.fit(x_df, y_df).criterion_) + ( (2*(k**2)+2*(k)) / (len(x_df)-k-1) )
log.info('AIC_Value: %s', AIC_Value)
oup = [formula ,AIC_Value, len(dt)-AIC_Value]
return oup
# ----------------- here's where we're stuck ---------------------
# ----------------- we think this is correct ----------------------
# ----------------- create a list of all formula to execute -------
# In[]
out = []
for i in x_var_combos:
var = model_aic(data, formula(y_var, i))
out.append(var)
# ----------------- but we're stuck figuring out how to -----------
# ------------------make it compute & return the result -----------
ans = c.compute(*out)
ans2 = c.compute(out[1])
print (ans2)