I have a Python class that conforms to OpenAI's environment API, but it's written in non-vectorized form i.e. it receives one input action per step and returns one reward per step. How do I vectorize the environment? I haven't been able to find any clear explanation on GitHub.
You could write a custom class that iterates over an internal tuple of environments while maintaining the basic Gym API. In practice, there will be some differences, because the underlying environments don't terminate on the same timestep. Consequently, it's easier to combine the standard step and reset functions in
one method called step. Here's an example:
class VectorEnv:
def __init__(self, make_env_fn, n):
self.envs = tuple(make_env_fn() for _ in range(n))
# Call this only once at the beginning of training (optional):
def seed(self, seeds):
assert len(self.envs) == len(seeds)
return tuple(env.seed(s) for env, s in zip(self.envs, seeds))
# Call this only once at the beginning of training:
def reset(self):
return tuple(env.reset() for env in self.envs)
# Call this on every timestep:
def step(self, actions):
assert len(self.envs) == len(actions)
return_values = []
for env, a in zip(self.envs, actions):
observation, reward, done, info = env.step(a)
if done:
observation = env.reset()
return_values.append((observation, reward, done, info))
return tuple(return_values)
# Call this at the end of training:
def close(self):
for env in self.envs:
env.close()
Then you can just instantiate it like this:
import gym
make_env_fn = lambda: gym.make('CartPole-v0')
env = VectorEnv(make_env_fn, n=4)
You'll have to do a little bookkeeping for your agent to handle the tuple of return values when you call step. This is also why I prefer to pass a function make_env_fn to __init__, because it's easy to add wrappers like gym.wrappers.Monitor that track statistics for each environment individually and automatically.
Related
First, I'd like to thank the StackOverflow community for the tremendous help it provided me over the years, without me having to ask a single question.
I could not find anything that I can relate to my problem, though it is probably due to my lack of understanding of the subject, rather than the absence of a response on the website. My apologies in advance if this is a duplicate.
I am relatively new to multiprocess; some time ago I succeeded in using multiprocessing.pools in a very simple way, where I didn't need any feedback between the child processes.
Now I am facing a much more complicated problem, and I am just lost in the documentation about multiprocessing. I hence ask for you help, your kindness and your patience.
I am trying to build a parallel tempering monte-carlo algorithm, from a class.
The basic class very roughly goes as follows:
import numpy as np
class monte_carlo:
def __init__(self):
self.x=np.ones((1000,3))
self.E=np.mean(self.x)
self.Elist=[]
def simulation(self,temperature):
self.T=temperature
for i in range(3000):
self.MC_step()
if i%10==0:
self.Elist.append(self.E)
return
def MC_step(self):
x=self.x.copy()
k = np.random.randint(1000)
x[k] = (x[k] + np.random.uniform(-1,1,3))
temp_E=np.mean(self.x)
if np.random.random()<np.exp((self.E-temp_E)/self.T):
self.E=temp_E
self.x=x
return
Obviously, I simplified a great deal (actual class is 500 lines long!), and built fake functions for simplicity: __init__ takes a bunch of parameters as arguments, there are many more lists of measurement else than self.Elist, and also many arrays derived from self.X that I use to compute them. The key point is that each instance of the class contains a lot of informations that I want to keep in memory, and that I don't want to copy over and over again, to avoid dramatic slowing down. Else I would just use the multiprocessing.pool module.
Now, the parallelization I want to do, in pseudo-code:
def proba(dE,pT):
return np.exp(-dE/pT)
Tlist=[1.1,1.2,1.3]
N=len(Tlist)
G=[]
for _ in range(N):
G.append(monte_carlo())
for _ in range(5):
for i in range(N): # this loop should be ran in multiprocess
G[i].simulation(Tlist[i])
for i in range(N//2):
dE=G[i].E-G[i+1].E
pT=G[i].T + G[i+1].T
p=proba(dE,pT) # (proba is a function, giving a probability depending on dE)
if np.random.random() < p:
T_temp = G[i].T
G[i].T = G[i+1].T
G[i+1].T = T_temp
Synthesis: I want to run several instances of my monte-carlo class in parallel child processes, with different values for a parameter T, then periodically pause everything to change the different T's, and run again the child processes/class instances, from where they paused.
Doing this, I want each class-instance/child-process to stay independent from one another, save its current state with all internal variables while it is paused, and do as few copies as possible. This last point is critical, as the arrays inside the class are quite big (some are 1000x1000), and a copy will therefore very quickly become quite time-costly.
Thanks in advance, and sorry if I am not clear...
Edit:
I am using a distant machine with many (64) CPUs, running on Debian GNU/Linux 10 (buster).
Edit2:
I made a mistake in my original post: in the end, the temperatures must be exchanged between the class-instances, and not inside the global Tlist.
Edit3: Charchit answer works perfectly for the test code, on both my personal machine and the distant machine I am usually using for running my codes. I hence check this as the accepted answer.
However, I want to report here that, inserting the actual, more complicated code, instead of the oversimplified monte_carlo class, the distant machine gives me some strange errors:
Unable to init server: Could not connect: Connection refused
(CMC_temper_all.py:55509): Gtk-WARNING **: ##:##:##:###: Locale not supported by C library.
Using the fallback 'C' locale.
Unable to init server: Could not connect: Connection refused
(CMC_temper_all.py:55509): Gdk-CRITICAL **: ##:##:##:###:
gdk_cursor_new_for_display: assertion 'GDK_IS_DISPLAY (display)' failed
(CMC_temper_all.py:55509): Gdk-CRITICAL **: ##:##:##:###: gdk_cursor_new_for_display: assertion 'GDK_IS_DISPLAY (display)' failed
The "##:##:##:###" are (or seems like) IP adresses.
Without the call to set_start_method('spawn') this error shows only once, in the very beginning, while when I use this method, it seems to show at every occurrence of result.get()...
The strangest thing is that the code seems otherwise to work fine, does not crash, produces the datafiles I then ask it to, etc...
I think this would deserve to publish a new question, but I put it here nonetheless in case someone has a quick answer.
If not, I will resort to add one by one the variables, methods, etc... that are present in my actual code but not in the test example, to try and find the origin of the bug. My best guess for now is that the memory space required by each child-process with the actual code, is too large for the distant machine to accept it, due to some restrictions implemented by the admin.
What you are looking for is sharing state between processes. As per the documentation, you can either create shared memory, which is restrictive about the data it can store and is not thread-safe, but offers better speed and performance; or you can use server processes through managers. The latter is what we are going to use since you want to share whole objects of user-defined datatypes. Keep in mind that using managers will impact speed of your code depending on the complexity of the arguments that you pass and receive, to and from the managed objects.
Managers, proxies and pickling
As mentioned, managers create server processes to store objects, and allow access to them through proxies. I have answered a question with better details on how they work, and how to create a suitable proxy here. We are going to use the same proxy defined in the linked answer, with some variations. Namely, I have replaced the factory functions inside the __getattr__ to something that can be pickled using pickle. This means that you can run instance methods of managed objects created with this proxy without resorting to using multiprocess. The result is this modified proxy:
from multiprocessing.managers import NamespaceProxy, BaseManager
import types
import numpy as np
class A:
def __init__(self, name, method):
self.name = name
self.method = method
def get(self, *args, **kwargs):
return self.method(self.name, args, kwargs)
class ObjProxy(NamespaceProxy):
"""Returns a proxy instance for any user defined data-type. The proxy instance will have the namespace and
functions of the data-type (except private/protected callables/attributes). Furthermore, the proxy will be
pickable and can its state can be shared among different processes. """
def __getattr__(self, name):
result = super().__getattr__(name)
if isinstance(result, types.MethodType):
return A(name, self._callmethod).get
return result
Solution
Now we only need to make sure that when we are creating objects of monte_carlo, we do so using managers and the above proxy. For that, we create a class constructor called create. All objects for monte_carlo should be created with this function. With that, the final code looks like this:
from multiprocessing import Pool
from multiprocessing.managers import NamespaceProxy, BaseManager
import types
import numpy as np
class A:
def __init__(self, name, method):
self.name = name
self.method = method
def get(self, *args, **kwargs):
return self.method(self.name, args, kwargs)
class ObjProxy(NamespaceProxy):
"""Returns a proxy instance for any user defined data-type. The proxy instance will have the namespace and
functions of the data-type (except private/protected callables/attributes). Furthermore, the proxy will be
pickable and can its state can be shared among different processes. """
def __getattr__(self, name):
result = super().__getattr__(name)
if isinstance(result, types.MethodType):
return A(name, self._callmethod).get
return result
class monte_carlo:
def __init__(self, ):
self.x = np.ones((1000, 3))
self.E = np.mean(self.x)
self.Elist = []
self.T = None
def simulation(self, temperature):
self.T = temperature
for i in range(3000):
self.MC_step()
if i % 10 == 0:
self.Elist.append(self.E)
return
def MC_step(self):
x = self.x.copy()
k = np.random.randint(1000)
x[k] = (x[k] + np.random.uniform(-1, 1, 3))
temp_E = np.mean(self.x)
if np.random.random() < np.exp((self.E - temp_E) / self.T):
self.E = temp_E
self.x = x
return
#classmethod
def create(cls, *args, **kwargs):
# Register class
class_str = cls.__name__
BaseManager.register(class_str, cls, ObjProxy, exposed=tuple(dir(cls)))
# Start a manager process
manager = BaseManager()
manager.start()
# Create and return this proxy instance. Using this proxy allows sharing of state between processes.
inst = eval("manager.{}(*args, **kwargs)".format(class_str))
return inst
def proba(dE,pT):
return np.exp(-dE/pT)
if __name__ == "__main__":
Tlist = [1.1, 1.2, 1.3]
N = len(Tlist)
G = []
# Create our managed instances
for _ in range(N):
G.append(monte_carlo.create())
for _ in range(5):
# Run simulations in the manager server
results = []
with Pool(8) as pool:
for i in range(N): # this loop should be ran in multiprocess
results.append(pool.apply_async(G[i].simulation, (Tlist[i], )))
# Wait for the simulations to complete
for result in results:
result.get()
for i in range(N // 2):
dE = G[i].E - G[i + 1].E
pT = G[i].T + G[i + 1].T
p = proba(dE, pT) # (proba is a function, giving a probability depending on dE)
if np.random.random() < p:
T_temp = Tlist[i]
Tlist[i] = Tlist[i + 1]
Tlist[i + 1] = T_temp
print(Tlist)
This meets the criteria you wanted. It does not create any copies at all, rather, all arguments to the simulation method call are serialized inside the pool and sent to the manager server where the object is actually stored. It gets executed there, and the results (if any) are serialized and returned in the main process. All of this, with only using the builtins!
Output
[1.2, 1.1, 1.3]
Edit
Since you are using Linux, I encourage you to use multiprocessing.set_start_method inside the if __name__ ... clause to set the start method to "spawn". Doing this will ensure that the child processes do not have access to variables defined inside the clause.
I'm running a constrained optimisation with scipy.optimize.minimize(method='COBYLA').
In order to evaluate the cost function, I need to run a relatively expensive simulation to compute a dataset from the input variables, and the cost function is one (cheap to compute) property of that dataset. However, two of my constraints are also dependent on that expensive data.
So far, the only way I have found to constrain the optimisation is to have each of the constraint functions recompute the same dataset that the cost function already has calculated (simplified quasi-code):
def costfun(x):
data = expensive_fun(x)
return(cheap_fun1(data))
def constr1(x):
data = expensive_fun(x)
return(cheap_fun2(data))
def constr2(x):
data = expensive_fun(x)
return(cheap_fun3(data))
constraints = [{'type':'ineq', 'fun':constr1},
{'type':'ineq', 'fun':constr2}]
# initial guess
x0 = np.ones((6,))
opt_result = minimize(costfun, x0, method='COBYLA',
constraints=constraints)
This is clearly not efficient because expensive_fun(x) is called three times for every x.
I could change this slightly to include a universal "evaluate some cost" function which runs the expensive computation, and then evaluates whatever criterion it has been given. But while that saves me from having to write the "expensive" code several times, it still runs three times for every iteration of the optimizer:
# universal cost function evaluator
def criterion_from_x(x, cfun):
data = expensive_fun(x)
return(cfun(data))
def costfun(data):
return(cheap_fun1(data))
def constr1(data):
return(cheap_fun2(data))
def constr2(data):
return(cheap_fun3(data))
constraints = [{'type':'ineq', 'fun':criterion_from_x, 'args':(constr1,)},
{'type':'ineq', 'fun':criterion_from_x, 'args':(constr2,)}
# initial guess
x0 = np.ones((6,))
opt_result = minimize(criterion_from_x, x0, method='COBYLA',
args=(costfun,), constraints=constraints)
I have not managed to find any way to set something up where x is used to generate data at each iteration, and data is then passed to both the objective function as well as the constraint functions.
Does something like this exist? I've noticed the callback argument to minimize(), but that is a function which is called after each step. I'd need some kind of preprocessor which is called on x before each step, whose results are then available to the cost function and constraint evaluation. Maybe there's a way to sneak it in somehow? I'd like to avoid writing my own optimizer.
One, more traditional, way to solve this would be to evaluate the constraints in the cost function (which has all the data it needs for that, have it add a penalty for violated constraints to the main cost function, and run the optimizer without the explicit constraints, but I've tried this before and found that the main cost function can become somewhat chaotic in cases where the constraints are violated, so an optimizer might get stuck in some place which violates the constraints and not find out again.
Another approach would be to produce some kind of global variable in the cost function and write the constraint evaluation to use that global variable, but that could be very dangerous if multithreading/-processing gets involved, or if the name I choose for the global variable collides with a name used anywhere else in the code:
'''
def costfun(x):
global data
data = expensive_fun(x)
return(cheap_fun1(data))
def constr1(x):
global data
return(cheap_fun2(data))
def constr2(x):
global data
return(cheap_fun3(data))
'''
I know that some people use file I/O for cases where the cost function involves running a large simulation which produces a bunch of output files. After that, the constraint functions can just access those files -- but my problem is not that big.
I'm currently using Python v3.9 and scipy 1.9.1.
You could write a decorator class in the same vein to scipy's MemoizeJac that caches the return values of the expensive function each time it is called:
import numpy as np
class MemoizeData:
def __init__(self, obj_fun, exp_fun, constr_fun):
self.obj_fun = obj_fun
self.exp_fun = exp_fun
self.constr_fun = constr_fun
self._data = None
self.x = None
def _compute_if_needed(self, x, *args):
if not np.all(x == self.x) or self._data is None:
self.x = np.asarray(x).copy()
self._data = self.exp_fun(x)
def __call__(self, x, *args):
self._compute_if_needed(x, *args)
return self.obj_fun(self._data)
def constraint(self, x, *args):
self._compute_if_needed(x, *args)
return self.constr_fun(self._data)
Followingly, the expensive function is only evaluated once for each iteration. Then, after writing all your constraints into one constraint function, you could use it like this:
from scipy.optimize import minimize
def all_constrs(data):
return np.hstack((cheap_fun2(data), cheap_fun3(data)))
obj = MemoizeData(cheap_fun1, expensive_fun, all_constrs)
constr = {'type': 'ineq', 'fun': obj.constraint}
x0 = np.ones(6)
opt_result = minimize(obj, x0, method="COBYLA", constraints=constr)
While Joni was writing their answer, I found another one, which is admittedly more hacky. I prefer theirs, but for the sake of completeness, I wanted to post this one, too.
It's derived from the material from https://mdobook.github.io/ and the accompanying video tutorials from BYU FLow Lab, in particular this video:
The trick is to use non-local variables to keep a cache of the last evaluation of the expensive function:
import numpy as np
last_x = None
last_data = None
def compute_data(x):
data = expensive_fun(x)
return(data)
def get_last_data(x):
nonlocal last_x, last_data
if not np.array_equal(x, last_x):
last_data = compute_data(x)
last_x = x
return(last_data)
def costfun(x):
data = get_last_data(x)
return(cheap_fun1(data)
def constr1(x):
data = get_last_data(x)
return(cheap_fun2(data)
def constr2(x):
data = get_last_data(x)
return(cheap_fun3(data)
...and then everything can progress as in my original code in the question.
Reasons why I prefer Joni's class-based version:
variable scopes are clearer than with nonlocal
If some of the functions allow calculation of their Jacobian, or there are other things worth buffering, the added complexity is held in check better than with
Having a class instance do all the work also allows you to do other interesting things, like keeping a record of all past evaluations and the path taken by the optimizer, without having to use a separate callback function. Very useful for debugging/tweaking convergence if the optimizer won't converge or takes too long, but also to visualize or otherwise investigate the objective function or similar.
The same ability might actually be really cool for things like constructing a response surface model from the results of previous function evaluations. That could be used to establish a starting guess in case the expensive function is some numerical method that benefits from a good starting point.
Both approaches allow the use of "cheap" constraints which don't require the expensive function to be evaluated, by simply providing them as separate functions. Not sure whether that would help much with compute times, though. I suppose that would depend on the algorithm used by the optimizer.
Today, when I was trying to implement an rl-agent under the environment openai-gym, I found a problem that it seemed that all agents are trained from the most initial state: env.reset(), i.e.
import gym
env = gym.make("CartPole-v0")
initial_observation = env.reset() # <-- Note
done = False
while not done:
action = env.action_space.sample()
next_observation, reward, done, info = env.step(action)
env.close() # close the environment
So it is natural that the agent can behave down the route env.reset() -(action)-> next_state -(action)-> next_state -(action)-> ... -(action)-> done, this is an episode. But how can an agent start from a sepecific state like a middle state, then take an action from that state? For example, I sample an experience from the replay buffer, i.e. (s, a, r, ns, done), what if I want train the agent start directly from the state ns, and get an action with a Q-Network, then for an n-step steps forward. Something like that:
import gym
env = gym.make("CartPole-v0")
initial_observation = ns # not env.reset()
done = False
while not done:
action = DQN(ns)
next_observation, reward, done, info = env.step(action)
# n-step later or done is true, break
env.close() # close the environment
But even though I set a variable initial_observation as ns, I think the agent or the env will not aware it at all. How can I tell the gym.env that I want set the initial observation as ns and let the agent know the specific start state, get continue train directly from that specific observation(get start with that specific environment)?
AFAIK, the current implementation of most OpenAI gym envs (including the CartPole-v0 you have used in your question) doesn't implement any mechanism to init the environment in a given state.
However, it shouldn't be too complex to modify the CartPoleEnv.reset() method in order to accept an optional parameter that acts as initial state.
I recommend you to use and adapt the following code to your needs, it works well and I used it in my AlphaZero implementation.
This example is for CartPole but you should be able to adapt it easily to other envs.
from copy import deepcopy
import gym
import numpy as np
from gym.spaces import Discrete, Dict, Box
class CartPole:
def __init__(self, config=None):
self.env = gym.make("CartPole-v0")
self.action_space = Discrete(2)
self.observation_space = self.env.observation_space
def reset(self):
return self.env.reset()
def step(self, action):
obs, rew, done, info = self.env.step(action)
return obs, rew, done, info
def set_state(self, state):
self.env = deepcopy(state)
obs = np.array(list(self.env.unwrapped.state))
return obs
def get_state(self):
return deepcopy(self.env)
def render(self):
self.env.render()
def close(self):
self.env.close()
The reason why a direct assignment to env.state is not working, is because the gym environment generated is actually a gym.wrappers.TimeLimit object.
To achieve what you intended, you have to also assign the ns value to the unwrapped environment. So, something like this should do the trick:
env.reset()
env.state = env.unwrapped.state = ns
I would you suggest you extend the CartPole environment so the reset method does what you need. Then wrap your environment yourself. e.g.
from gym.envs.classic_control import CartPoleEnv
class ExtendedCartPoleEnv(CartPoleEnv):
def reset(self):
self.state = your_very_special_method()
self.steps_beyond_done = None
return np.array(self.state, dtype=np.float32)
max_episode_steps = 200
env = ExtendedCartPoleEnv()
env = TimeLimit(env, max_episode_steps)
I've just tweaked the original method found here.
You can also extend the original environment to change the behavior of self.reset to take an argument, but this is not the standard. The wrapped environment wouldn't take the argument and then you would need to call env.unwrapped.reset directly. This gets ugly because then env.step will complain that env.reset has not been called. etc. There are ways to make it happen, but then again, this diverges from what a regular gym environment is supposed to look like.
I tried solving this issue by simply setting 'env.state' to what I wanted it to be before calling 'env.step()'. This did not work if I initialized the environment using 'gym.make()'. However, I copied the source code for the environment to another file, imported it from that, and then was able to simply set 'env.state'. You just have to make sure you don't set it to something outside of the allowed values.
import numpy as np
from continuous_mountain_car import Continuous_MountainCarEnv
env = Continuous_MountainCarEnv()
env.reset()
print(env.state) # Random value generated by reset
env.state = np.array([-0.53, 0]) # Set it to my own value
print(env.state) # It worked
next_state, reward, done, _ = env.step(np.array([0]))
print(next_state) # The next state should be close to my value not the random one
I copied the continuous_mountain_car file from https://github.com/openai/gym/blob/master/gym/envs/classic_control/continuous_mountain_car.py
For gym v0.25.2 you can initiate the fixed state by specifying seed in env.reset() to get the same state whenever you reset env. It works with the Taxi-v3 environment, e.g:
environment.reset(seed=115)
So I am writing a GAN in tensorflow, and need the discriminator and generator to be objects. Now I am having problems with creating the training dataset for the discriminator.
Currently the relevant part of my code looks like this:
self.dataset=tf.data.Dataset.from_tensor_slices((self.y_,self.x_)) #creates dataset
self.fake_dataset=tf.data.Dataset.from_tensor_slices((self.x_fake_)) #creates dataset
self.dataset=self.dataset.shuffle(buffer_size=BUFFER_SIZE) #shuffles
self.fake_dataset=self.fake_dataset.shuffle(buffer_size=BUFFER_SIZE) #shuffles
self.dataset=self.dataset.repeat().batch(self.batch_size) #batches
self.fake_dataset=self.fake_dataset.repeat().batch(self.batch_size) #batches
self.iterator=tf.data.Iterator.from_structure(self.dataset.output_types,self.dataset.output_shapes) #creates iterators
self.fake_iterator=tf.data.Iterator.from_structure(self.fake_dataset.output_types,self.fake_dataset.output_shapes) #creates iterators
self.x=self.iterator.get_next()
self.x_fake=self.fake_iterator.get_next()
self.dataset_init_op = self.iterator.make_initializer(self.dataset,name=self.name+'_dataset_init')
self.fake_dataset_init_op=self.fake_iterator.make_initializer(self.fake_dataset,name=self.name+'_dataset_init')
What I need is for the function to alternatively give one batch of self.x, followed by one batch of self.x_fake.
Is there an easy way to do this, or will I have to results to a counter and an if statement?
Not sure if I'm understanding exactly what you need, but if you want to get use the different iterators alternatively in the same call that would be defined at graph construction time, and so you could use Python logic to choose the iterator you need. For example:
def __init__(self):
# Make graph and iterators...
self._use_fake_batch = False
def next_batch(self):
iter = self.fake_iterator if self._use_fake_batch else self.iterator
self._use_fake_batch = not self._use_fake_batch
return iter.get_next()
Or without an additional variable, using itertools:
from itertools import chain, repeat
def __init__(self):
# Make graph and iterators...
self._iterators = chain.from_iterable(repeat((self.iterator, self.fake_iterator)))
def next_batch(self):
return next(self._iterators).get_next()
First of all I'm using python 3.3 & 3.2 on windows and Linux respectively.
I am starting to build an rpn calculator. It looks like cross-platform key listeners is a kind of holy grail for python. So far this seems to be doing the trick, but I've created other problems:
I cannot get away from the global variable for entries and using my
stack.
It looks like I have to build the program from inside
callback()
Here is a rough skeleton that shows my direction. Am I missing a way to pass information in and out of callback()
The goal was to build an RPN class before i found myself stuck inside callback().
import tkinter as tk
entry = ""
stack = list()
operators = {"+",
"-",
"*",
"/",
"^",
"sin",
"cos",
"tan"}
def operate(_op):
if _op == "+":
print("plus")
def callback(event):
global entry, stack
entry = entry + event.char
if event.keysym == 'Escape': # exit program
root.destroy()
elif event.keysym=='Return': # push string onto stack TODO
print(entry)
entry = ""
elif entry in operators:
operate(entry)
root = tk.Tk()
root.withdraw()
root.bind('<Key>', callback)
root.mainloop()
You have several options to do what you want to do.
1. Use a Class for your application
The canonical way of doing what you wish without resorting to a global variable is to place the application within a class, and pass a method as a callback (see print_contents) the following is straight from the docs:
class App(Frame):
def __init__(self, master=None):
Frame.__init__(self, master)
self.pack()
self.entrythingy = Entry()
self.entrythingy.pack()
# here is the application variable
self.contents = StringVar()
# set it to some value
self.contents.set("this is a variable")
# tell the entry widget to watch this variable
self.entrythingy["textvariable"] = self.contents
# and here we get a callback when the user hits return.
# we will have the program print out the value of the
# application variable when the user hits return
self.entrythingy.bind('<Key-Return>',
self.print_contents)
def print_contents(self, event):
print("hi. contents of entry is now ---->",
self.contents.get())
2. Curry the callback over your state
You can also use Python's functional programming constructs to curry a function over a global variable and then pass the curried function as a callback.
import functools
global_var = {}
def callback(var, event):
pass
#...
root.bind('<Key>', functools.partial(callback, global_var))
Although this probably isn't what you want.
3. Use a global variable
Sometimes, a global variable is ok.
4. Re-architect for cleanliness and readability
However, you most definitely do not have to build your program inside the callback.
In fact, I would recommend that you create a suite of test with various valid and invalid input, and make a Calculator class or function that takes a string input of RPN commands and returns a value. This is easy to test without a tkinter interface, and will be far more robust.
Then, use your callback to build a string which you pass to your Calculator.
If you want incremental calculation (ie, you're building a simulator), then simply make your Calculator accept single tokens rather than entire equations, but the design remains similar. All the state is then encapsulated inside the Calculator rather than globally.