Open AI Gym Runs in Realtime instead of as fast as possible - openai-gym

Ok so there must be some option in OpenAI gym that allows it to run as fast as possible? I have a linux environment that does exactly this(run as fast as possible), but when I run the exact setup on Windows, it instead runs it only in real-time.
The specific environment I'm working on is in Montezuma's Revenge Atari game. I run the exact same code but on my linux setup, it was able to run the game much faster. Just so you know my linux computer has worse specs than my windows.
here is some code for those who want it:
for i in range(episode_count):
ob = env.reset()
ob = np.expand_dims(ob, axis=0)
time = 0
while True:
time += 1
action = agent.act(ob, reward, done)
new_ob, reward, done, _ = env.step(action)
new_ob = np.expand_dims(new_ob, axis=0)
agent.remember(ob, action, reward, new_ob, done)
ob = new_ob
env.render()
if done or time >= 1000:
print("episode: {}/{}, time: {}, e: {:.3}"
.format(i, episode_count, time, agent.epsilon))
if len(agent.memory) > batch_size:
agent.replay(batch_size)
# agent.save("./save/montazuma-dqn.h5")
break
The same thing runs on two setups, get different results in run speed.

For anyone looking at this in the future, it's because of self.env.render(). Took me some time to figure out what was slowing my code down. It ends up rendering each action takes time, which will slow your code down

Related

Multiprocessing with Multiple Functions: Need to add a function to the pool from within another function

I am measuring the metrics of an encryption algorithm that I designed. I have declared 2 functions and a brief sample is as follows:
import sys, random, timeit, psutil, os, time
from multiprocessing import Process
from subprocess import check_output
pid=0
def cpuUsage():
global running
while pid == 0:
time.sleep(1)
running=true
p = psutil.Process(pid)
while running:
print(f'PID: {pid}\t|\tCPU Usage: {p.memory_info().rss/(1024*1024)} MB')
time.sleep(1)
def Encryption()
global pid, running
pid = os.getpid()
myList=[]
for i in range(1000):
myList.append(random.randint(-sys.maxsize,sys.maxsize)+random.random())
print('Now running timeit function for speed metrics.')
p1 = Process(target=metric_collector())
p1.start()
p1.join()
number=1000
unit='msec'
setup = '''
import homomorphic,random,sys,time,os,timeit
myList={myList}
'''
enc_code='''
for x in range(len(myList)):
myList[x] = encryptMethod(a, b, myList[x], d)
'''
dec_code='''
\nfor x in range(len(myList)):
myList[x] = decryptMethod(myList[x])
'''
time=timeit.timeit(setup=setup,
stmt=(enc_code+dec_code),
number=number)
running=False
print(f'''Average Time:\t\t\t {time/number*.0001} seconds
Total time for {number} Iters:\t\t\t {time} {unit}s
Total Encrypted/Decrypted Values:\t {number*len(myList)}''')
sys.exit()
if __name__ == '__main__':
print('Beginning Metric Evaluation\n...\n')
p2 = Process(target=Encryption())
p2.start()
p2.join()
I am sure there's an implementation error in my code, I'm just having trouble grabbing the PID for the encryption method and I am trying to make the overhead from other calls as minimal as possible so I can get an accurate reading of just the functionality of the methods being called by timeit. If you know a simpler implementation, please let me know. Trying to figure out how to measure all of the metrics has been killing me softly.
I've tried acquiring the pid a few different ways, but I only want to measure performance when timeit is run. Good chance I'll have to break this out separately and run it that way (instead of multiprocessing) to evaluate the function properly, I'm guessing.
There are at least three major problems with your code. The net result is that you are not actually doing any multiprocessing.
The first problem is here, and in a couple of other similar places:
p2 = Process(target=Encryption())
What this code passes to Process is not the function Encryption but the returned value from Encryption(). It is exactly the same as if you had written:
x = Encryption()
p2 = Process(target=x)
What you want is this:
p2 = Process(target=Encryption)
This code tells Python to create a new Process and execute the function Encryption() in that Process.
The second problem has to do with the way Python handles memory for Processes. Each Process lives in its own memory space. Each Process has its own local copy of global variables, so you cannot set a global variable in one Process and have another Process be aware of this change. There are mechanisms to handle this important situation, documented in the multiprocessing module. See the section titled "Sharing state between processes." The bottom line here is that you cannot simply set a global variable inside a Process and expect other Processes to see the change, as you are trying to do with pid. You have to use one of the approaches described in the documentation.
The third problem is this code pattern, which occurs for both p1 and p2.
p2 = Process(target=Encryption)
p2.start()
p2.join()
This tells Python to create a Process and to start it. Then you immediately wait for it to finish, which means that your current Process must stop at that point until the new Process is finished. You never allow two Processes to run at once, so there is no performance benefit. The only reason to use multiprocessing is to run two things at the same time, which you never do. You might as well not bother with multiprocessing at all since it is only making your life more difficult.
Finally I am not sure why you have decided to try to use multiprocessing in the first place. The functions that measure memory usage and execution time are almost certainly very fast, and I would expect them to be much faster than any method of synchronizing one Process to another. If you're worried about errors due to the time used by the diagnostic functions themselves, I doubt that you can make things better by multiprocessing. Why not just start with a simple program and see what results you get?

Why does implementing multiprocessing makes my program slower

I'm trying to implement multiprocessing in my code to make it faster.
To make it easier to understand I will just say the program fits an observed curve using a linear combination of a library of curves and from that measures properties of the observed curve.
I have to do this for over 400 curves and in order to estimate the errors of these properties I perform a Monte Carlo simulation, which means I have to iterate a number of times each calculation.
This takes a lot of time and work, and granted I believe it is a CPU-bound task I figured I'd use multiprocessing in the error estimation step. Here's a simplification of my code:
Without multiprocessing
import numpy as np
import fitting_package
import multiprocessing
def estimate_errors(best_fit_curve, signal_to_noise, fit_kwargs, iterations=100)
results = defaultdict(list)
def fit(best_fit_curve, signal_to_noise, fit_kwargs, results):
# Here noise is added to simulate a new curve (Monte Carlo simulation)
noise = best_fit/signal_to_noise
simulated_curve = np.random.normal(best_fit_curve, noise)
# The arguments from the original fit (outside the error estimation) are passed to the fitting
fit_kwargs.update({'curve' : simulated_curve})
# The fit is performed and it returns the properties packed together
solutions = fitting_package(**fit_kwargs)
# There are more properties so this is a simplification
property_1, property_2 = solutions
aux_dict = {'property_1' : property_1, 'property_2' : property_2}
for key, value in aux_dict.items():
results[key].append(values)
for _ in range(iterations):
fit(best_fit_curve, signal_to_noise, fit_kwargs, results)
return results
With multiprocessing
def estimate_errors(best_fit_curve, signal_to_noise, fit_kwargs, iterations=100)
def fit(best_fit_curve, signal_to_noise, fit_kwargs, queue):
results = queue.get()
noise = best_fit/signal_to_noise
simulated_curve = np.random.normal(best_fit_curve, noise)
fit_kwargs.update({'curve' : simulated_curve})
solutions = fitting_package(**fit_kwargs)
property_1, property_2 = solutions
aux_dict = {'property_1' : property_1, 'property_2' : property_2}
for key, value in aux_dict.items():
results[key].append(values)
queue.put(results)
process_list = []
queue = multiprocessing.Queue()
queue.put(defaultdict(list))
for _ in range(iterations):
process = multiprocessing.Process(target=fit, args=(best_fit_curve, signal_to_noise, fit_kwargs, queue))
process.start()
process_list.append(process)
for p in process_list:
p.join()
results = queue.get()
return results
I thought using multiprocessing would save time, but it actually takes more than double than the other way to do it. Why is this? Is there anyway I can make it faster with multiprocessing?
I thought using multiprocessing would save time, but it actually takes more than double than the other way to do it. Why is this?
Starting a process takes a long time (at least in computer terms). It also uses a lot of memory.
In your code, you are starting 100 separate Python interpreters in 100 separate OS processes. That takes a really long time, so unless each process runs a very long time, the time it takes to start the process is going to dominate the time it actually does useful work.
In addition to that, unless you actually have 100 un-used CPU cores, those 100 processes will just spend most of their time waiting for each other to finish. Even worse, since they all have the same priority, the OS will try to give each of them a fair amount of time, so it will run them for a bit of time, then suspend them, run others for a bit of time, suspend them, etc. All this scheduling also takes time.
Having more parallel workloads than parallel resources cannot speed up your program, since they have to wait to be executed one-after-another anyway.
Parallelism will only speed up your program if the time for the parallel tasks is not dominated by the time of creating, managing, scheduling, and re-joining the parallel tasks.

Learning from Images in openai gym: memory leak or offscreen glfw issue with render function

I am trying to learn a control policy from images in openai gym. My code is very strait forward, however I am for some reason incurring HUGE memory requirements that continue to grow as my code runs. My setup is basically the following:
downsample_obs = torchvision.transforms.Compose([
torchvision.transforms.ToPILImage(),
torchvision.transforms.Resize((resize,resize), interpolation=2),
torchvision.transforms.ToTensor()
])
env = gym.make('Hopper-v2')
state = env.reset()
observation = downsample_obs(env.render(mode='rgb_array')).detach()
for t in range(1000):
next_state, reward, done, _ = env.step(action)
observation = downsample_obs(env.render(mode='rgb_array')).detach()
memory.push(state, np.array([action]), mask, next_state, reward, stored_observe)
if done:
break
update_model(memory) ...
I have removed everything around the render function and am just calling calling env.render(mode='rgb_array') t times, and I am still maintaining this issue. I assumed that it was because the environment was not being closed, but when I include proper initializations / closures of the environment after each interaction, the memory requirement only increases. I also get the following message everytime I call make:
Creating offscreen glfw
I have tried various libraries to diagnose a memory leak such as gc and
SummaryTracker from pympler.tracker. but neither report anything in memory. This has been extremely frustrating, and any help would be appreciated!
As it turns out, and thanks in no small part to a collaborator, I found out that the window renderer (glwf) was allocating memory for each window, detaching the monitor from the window, but not destroying the window. Thus this info was never cleaned up. In order to fix this issue, I added the following:
import glfw
def f():
....
something that calls .render()
....
glfw.terminate()
This forces python to deallocate and garbage collect correctly. Phew.

Python pool.apply_async() doesn't call target function?

I'm writing an optimization routine to brute force search a solution space for optimal hyper parameters; and apply_async does not appear to be doing anything at all. Ubuntu Server 16.04, Python 3.5, PyCharm CE 2018. Also, I'm doing this on an Azure virtual machine. My code looks like this:
class optimizer(object):
def __init__(self,n_proc,frame):
# Set Class Variables
def prep(self):
# Get Data and prepare for optimization
def ret_func(self,retval):
self.results = self.results.append(retval)
print('Something')
def search(self):
p = multiprocessing.Pool(processes=self.n_proc)
for x, y in zip(repeat(self.data),self.grid):
job = p.apply_async(self.bot.backtest,(x,y),callback=self.ret_func)
p.close()
p.join()
self.results.to_csv('OptimizationResults.csv')
print('***************************')
print('Exiting, Optimization Complete')
if __name__ == '__main__':
multiprocessing.freeze_support()
opt = optimizer(n_proc=4,frame='ytd')
opt.prep()
print('Data Prepped, beginning search')
opt.search()
I was running this exact setup on a Windows Server VM, and I switched over due to issues with multiprocessing not utilizing all cores. Today, I configured my machine and was able to run the optimization one time only. After that, it mysteriously stopped working with no change from me. Also, I should mention that it spits out output every 1 in 10 times I run it. Very odd behavior. I expect to see:
Something
Something
Something
.....
Which would typically be the best "to-date" results of the optimization (omitted for clarity). Instead I get:
Data Prepped, beginning search
***************************
Exiting, Optimization Complete
If I call get() on the async object, the results are printed as expected, but only one core is utilized because the results are being gathered in the for loop. Why isn't apply_async doing anything at all? I should mention that I use the "stop" button on Pycharm to terminate the process, not sure if this has something to do with it?
Let me know if you need more details about prep(), or bot.backtest()
I found the error! Basically I was converting a dict() to a list() and passing the values from the list into my function! The list parameter order was different every time I ran the function, and one of the parameters needed to be an integer, not a float.
For some reason, on windows, the order of the dict was preserved when converting to a list; not the case with Ubuntu! Very interesting.

Parallel computing Python / Raspberry Pi

I am doing a project where I am required to measure vibration levels due to passing vehicles (using an accelerometer sampling at high rate) and capture video of the event "SIMULTANEOUSLY" when the vibrations exceed a certain threshold value. I want both the processes i.e. data acquisition and video recording to go simultaneously so that they are synchronized in time.
I am doing both the things on a raspberry pi. To achieve my goal I have 2 strategies in mind.
1) Write a function for
a) Data acquisition
b) Recording video
and run them in the same script.
2) Or while running the script for data acquisition, use a call to os.system, to run a script to capture video of the event.
Now I am curious in what order are functions executed in Python? So if I write
while 1:
if v_threshold > some_value:
Acquire_data(); Capture_Video();
Are they being executed simultaneously or does Capture_Video() only begin once Acquire_data() has finished execution? Or in general, how are functions executed in Python (or any other programming language).
What would be your advice to achieve the goal? Writing functions or running 2 parallel scripts? Or should I use the multiprocessing module?
Thanks
Nischal
I would go for option 1 as you have more control over the whole process (e.g. you can easily wait until both subprocesses are done)
Your main setup would look like this:
from multiprocessing import Process
import time
def acquire_data(arg):
for i in range(5):
print('acquiring data: {}'.format(arg))
time.sleep(1.1)
def capture_video():
for i in range(5):
print('capturing video')
time.sleep(1)
if __name__ == '__main__':
p_data = Process(target=acquire_data, args=('foo',))
p_video = Process(target=capture_video)
p_data.start()
p_video.start()
p_data.join() # wait until acquire_data is done
p_video.join() # wait also until capture_video is done
Also: Which model is it? Because if both data-acquiring and video-capturing is taking 100% cpu then you will run into an issue with only a single core raspberry pi. Model 3 has 4 cores so this is fine.

Resources