What changes occur when using tf_agents.environments.TFPyEnvironment to convert a Python RL environment into a TF environment? - python-3.x

I noticed something weird happening when converting a Python environment into a TF environment using tf_agents.environments.TFPyEnvironment and I'd like to ask you what general changes occur.
To clarify the question please find below my code. I want the environment to simulate (in an oversimplied manner) interactions with a customers who want to buy fruits or vegetables. The agent should learn that when a customer asks for fruits, action 0 should be executed for example.
class CustomEnv(py_environment.PyEnvironment):
def __init__(self):
self._action_spec = array_spec.BoundedArraySpec(
shape=(), dtype=np.int32, minimum=0, maximum=1)
self._observation_spec = array_spec.BoundedArraySpec(
shape=(1,1), dtype=np.int32, minimum=0, maximum=1)
self._state = [0]
self._counter = 0
self._episode_ended = False
self.dictionary = {0: ["Fruits"],
1: ["Vegetables"]}
def action_spec(self):
return self._action_spec
def observation_spec(self):
return self._observation_spec
def _reset(self):
self._state = [0]
self._counter = 0
self._episode_ended = False
return ts.restart(np.array([self._state], dtype=np.int32))
def preferences(self):
return np.random.randint(2)
def pickedBasket(self, yes):
reward = -1.0
if yes:
reward = 0.0
return reward
def _step(self, action):
if self._episode_ended:
if self._counter<50:
self._counter += 1
basket = self.preferences()
condition = basket in self.dictionary[action]
reward = self.pickedBasket(condition)
self._state[0] = basket
if self._counter==50:
return ts.termination(np.array([self._state],
return ts.transition(np.array([self._state],
When I execute the following to code to check everything is working just fine:
py_env = ContextualMBA()
tf_env = tf_py_environment.TFPyEnvironment(py_env)
time_step = tf_env.reset()
action = 0
next_time_step = tf_env.step(action)
I get an unhashable type: 'numpy.ndarray' for the line condition = basket in self.dictionary[action] so I changed it into condition = basket in self.dictionary[int(action)] and it worked just fine. I'd also like to precise that it worked as a Python environment even without adding the int part. So I'd like to ask what changes the tf_agents.environments.TFPyEnvironment. I don't see how it can influence the type of action action since it isn't related to action_spec or anything (at least directly in the code).

Put basically, tf_agents.environments.TFPyEnvironment is a translator working between your Python environment and the TF-Agents API. The TF-Agents API does not know how many actions it is allowed to choose from, what data to observe and learn from or specially how the choice of actions will influence your custom environment.
Your custom environment is there to provide the rules of the environment and it follows some standards in order for the TFPyEnvironment to be able to translate it correctly so the TF-Agent can work with it. You need to define elements and methods in your custom environment, for example, such as:
I'm not sure if your doubt came from the fact that you gave an action = 0 for the agent and, unrelated to the action_spec, the agent actually worked. The action_spec had no relation with your _step() function, and that is correct. Your step function takes some action and applies it to the environment. How this action is shaped is the real point.
The problem is you chose the value and gave it to the tf_env.step() function. If you had actually delegated the choice of action to the agent, by tf_env.step(agent.policy.action) (or tf_env.step(agent.policy.action.action), sometimes TF-Agents make me confuse), the agent would have to look to your action_spec definition to understand what the environment expects the action to look like.
If action_spec is not defined, the agent would not know what to choose between 0 for "Fruits", 1 for "Vegetables" - that you wanted, and defined - or unexpected results as 2 for "Meat", or [3, 2] for 2 bottles of water, since 3 could stand for "Bottle of Water". The TF-Agent needs these definitions so it knows the rules of your environment.
As for the actual changes and what they do with your custom environment code, I believe you would get a better idea by looking at the source code of the TF-Agents library.


Python multiprocess: run several instances of a class, keep all child processes in memory

First, I'd like to thank the StackOverflow community for the tremendous help it provided me over the years, without me having to ask a single question.
I could not find anything that I can relate to my problem, though it is probably due to my lack of understanding of the subject, rather than the absence of a response on the website. My apologies in advance if this is a duplicate.
I am relatively new to multiprocess; some time ago I succeeded in using multiprocessing.pools in a very simple way, where I didn't need any feedback between the child processes.
Now I am facing a much more complicated problem, and I am just lost in the documentation about multiprocessing. I hence ask for you help, your kindness and your patience.
I am trying to build a parallel tempering monte-carlo algorithm, from a class.
The basic class very roughly goes as follows:
import numpy as np
class monte_carlo:
def __init__(self):
def simulation(self,temperature):
for i in range(3000):
if i%10==0:
def MC_step(self):
k = np.random.randint(1000)
x[k] = (x[k] + np.random.uniform(-1,1,3))
if np.random.random()<np.exp((self.E-temp_E)/self.T):
Obviously, I simplified a great deal (actual class is 500 lines long!), and built fake functions for simplicity: __init__ takes a bunch of parameters as arguments, there are many more lists of measurement else than self.Elist, and also many arrays derived from self.X that I use to compute them. The key point is that each instance of the class contains a lot of informations that I want to keep in memory, and that I don't want to copy over and over again, to avoid dramatic slowing down. Else I would just use the multiprocessing.pool module.
Now, the parallelization I want to do, in pseudo-code:
def proba(dE,pT):
return np.exp(-dE/pT)
for _ in range(N):
for _ in range(5):
for i in range(N): # this loop should be ran in multiprocess
for i in range(N//2):
pT=G[i].T + G[i+1].T
p=proba(dE,pT) # (proba is a function, giving a probability depending on dE)
if np.random.random() < p:
T_temp = G[i].T
G[i].T = G[i+1].T
G[i+1].T = T_temp
Synthesis: I want to run several instances of my monte-carlo class in parallel child processes, with different values for a parameter T, then periodically pause everything to change the different T's, and run again the child processes/class instances, from where they paused.
Doing this, I want each class-instance/child-process to stay independent from one another, save its current state with all internal variables while it is paused, and do as few copies as possible. This last point is critical, as the arrays inside the class are quite big (some are 1000x1000), and a copy will therefore very quickly become quite time-costly.
Thanks in advance, and sorry if I am not clear...
I am using a distant machine with many (64) CPUs, running on Debian GNU/Linux 10 (buster).
I made a mistake in my original post: in the end, the temperatures must be exchanged between the class-instances, and not inside the global Tlist.
Edit3: Charchit answer works perfectly for the test code, on both my personal machine and the distant machine I am usually using for running my codes. I hence check this as the accepted answer.
However, I want to report here that, inserting the actual, more complicated code, instead of the oversimplified monte_carlo class, the distant machine gives me some strange errors:
Unable to init server: Could not connect: Connection refused
(CMC_temper_all.py:55509): Gtk-WARNING **: ##:##:##:###: Locale not supported by C library.
Using the fallback 'C' locale.
Unable to init server: Could not connect: Connection refused
(CMC_temper_all.py:55509): Gdk-CRITICAL **: ##:##:##:###:
gdk_cursor_new_for_display: assertion 'GDK_IS_DISPLAY (display)' failed
(CMC_temper_all.py:55509): Gdk-CRITICAL **: ##:##:##:###: gdk_cursor_new_for_display: assertion 'GDK_IS_DISPLAY (display)' failed
The "##:##:##:###" are (or seems like) IP adresses.
Without the call to set_start_method('spawn') this error shows only once, in the very beginning, while when I use this method, it seems to show at every occurrence of result.get()...
The strangest thing is that the code seems otherwise to work fine, does not crash, produces the datafiles I then ask it to, etc...
I think this would deserve to publish a new question, but I put it here nonetheless in case someone has a quick answer.
If not, I will resort to add one by one the variables, methods, etc... that are present in my actual code but not in the test example, to try and find the origin of the bug. My best guess for now is that the memory space required by each child-process with the actual code, is too large for the distant machine to accept it, due to some restrictions implemented by the admin.
What you are looking for is sharing state between processes. As per the documentation, you can either create shared memory, which is restrictive about the data it can store and is not thread-safe, but offers better speed and performance; or you can use server processes through managers. The latter is what we are going to use since you want to share whole objects of user-defined datatypes. Keep in mind that using managers will impact speed of your code depending on the complexity of the arguments that you pass and receive, to and from the managed objects.
Managers, proxies and pickling
As mentioned, managers create server processes to store objects, and allow access to them through proxies. I have answered a question with better details on how they work, and how to create a suitable proxy here. We are going to use the same proxy defined in the linked answer, with some variations. Namely, I have replaced the factory functions inside the __getattr__ to something that can be pickled using pickle. This means that you can run instance methods of managed objects created with this proxy without resorting to using multiprocess. The result is this modified proxy:
from multiprocessing.managers import NamespaceProxy, BaseManager
import types
import numpy as np
class A:
def __init__(self, name, method):
self.name = name
self.method = method
def get(self, *args, **kwargs):
return self.method(self.name, args, kwargs)
class ObjProxy(NamespaceProxy):
"""Returns a proxy instance for any user defined data-type. The proxy instance will have the namespace and
functions of the data-type (except private/protected callables/attributes). Furthermore, the proxy will be
pickable and can its state can be shared among different processes. """
def __getattr__(self, name):
result = super().__getattr__(name)
if isinstance(result, types.MethodType):
return A(name, self._callmethod).get
return result
Now we only need to make sure that when we are creating objects of monte_carlo, we do so using managers and the above proxy. For that, we create a class constructor called create. All objects for monte_carlo should be created with this function. With that, the final code looks like this:
from multiprocessing import Pool
from multiprocessing.managers import NamespaceProxy, BaseManager
import types
import numpy as np
class A:
def __init__(self, name, method):
self.name = name
self.method = method
def get(self, *args, **kwargs):
return self.method(self.name, args, kwargs)
class ObjProxy(NamespaceProxy):
"""Returns a proxy instance for any user defined data-type. The proxy instance will have the namespace and
functions of the data-type (except private/protected callables/attributes). Furthermore, the proxy will be
pickable and can its state can be shared among different processes. """
def __getattr__(self, name):
result = super().__getattr__(name)
if isinstance(result, types.MethodType):
return A(name, self._callmethod).get
return result
class monte_carlo:
def __init__(self, ):
self.x = np.ones((1000, 3))
self.E = np.mean(self.x)
self.Elist = []
self.T = None
def simulation(self, temperature):
self.T = temperature
for i in range(3000):
if i % 10 == 0:
def MC_step(self):
x = self.x.copy()
k = np.random.randint(1000)
x[k] = (x[k] + np.random.uniform(-1, 1, 3))
temp_E = np.mean(self.x)
if np.random.random() < np.exp((self.E - temp_E) / self.T):
self.E = temp_E
self.x = x
def create(cls, *args, **kwargs):
# Register class
class_str = cls.__name__
BaseManager.register(class_str, cls, ObjProxy, exposed=tuple(dir(cls)))
# Start a manager process
manager = BaseManager()
# Create and return this proxy instance. Using this proxy allows sharing of state between processes.
inst = eval("manager.{}(*args, **kwargs)".format(class_str))
return inst
def proba(dE,pT):
return np.exp(-dE/pT)
if __name__ == "__main__":
Tlist = [1.1, 1.2, 1.3]
N = len(Tlist)
G = []
# Create our managed instances
for _ in range(N):
for _ in range(5):
# Run simulations in the manager server
results = []
with Pool(8) as pool:
for i in range(N): # this loop should be ran in multiprocess
results.append(pool.apply_async(G[i].simulation, (Tlist[i], )))
# Wait for the simulations to complete
for result in results:
for i in range(N // 2):
dE = G[i].E - G[i + 1].E
pT = G[i].T + G[i + 1].T
p = proba(dE, pT) # (proba is a function, giving a probability depending on dE)
if np.random.random() < p:
T_temp = Tlist[i]
Tlist[i] = Tlist[i + 1]
Tlist[i + 1] = T_temp
This meets the criteria you wanted. It does not create any copies at all, rather, all arguments to the simulation method call are serialized inside the pool and sent to the manager server where the object is actually stored. It gets executed there, and the results (if any) are serialized and returned in the main process. All of this, with only using the builtins!
[1.2, 1.1, 1.3]
Since you are using Linux, I encourage you to use multiprocessing.set_start_method inside the if __name__ ... clause to set the start method to "spawn". Doing this will ensure that the child processes do not have access to variables defined inside the clause.

How to inspect mapped tasks' inputs from reduce tasks in Prefect

I'm exploring Prefect's map-reduce capability as a powerful idiom for writing massively-parallel, robust importers of external data.
As an example - very similar to the X-Files tutorial - consider this snippet:
def retrieve_episode_ids():
api_connection = APIConnection(prefect.context.my_config)
return api_connection.get_episode_ids()
#task(max_retries=2, retry_delay=datetime.timedelta(seconds=3))
def download_episode(episode_id):
api_connection = APIConnection(prefect.context.my_config)
return api_connection.get_episode(episode_id)
def persist_episodes(episodes):
db_connection = DBConnection(prefect.context.my_config)
...store all episodes by their ID with a success/failure flag...
with Flow("import_episodes") as flow:
episode_ids = retrieve_episode_ids()
episodes = download_episode.map(episode_ids)
The peculiarity of my flow, compared with the simple X-Files tutorial, is that I would like to persist results for all the episodes that I have requested, even for the failed ones. Imagine that I'll be writing episodes to a database table as the episode ID decorated with an is_success flag. Moreover, I'd like to write all episodes with a single task instance, in order to be able to perform a bulk insert - as opposed to inserting each episode one by one - hence my persist_episodes task being a reduce task.
The trouble I'm having is in being able to gather the episode ID for the failed downloads from that reduce task, so that I can store the failed information in the table under the appropriate episode ID. I could of course rewrite the download_episode task with a try/catch and always return an episode ID even in the case of failure, but then I'd lose the automatic retry/failure functionality which is a good deal of the appeal of Prefect.
Is there a way for a reduce task to infer the argument(s) of a failed mapped task? Or, could I write this differently to achieve what I need, while still keeping the same level of clarity as in my example?
Mapping over a list preserves the order. This is a property you can use to link inputs with the errors. Check the code I have below, will add more explanation after.
from prefect import Flow, task
import prefect
def retrieve_episode_ids():
return [1,2,3,4,5]
def download_episode(episode_id):
if episode_id == 5:
return ValueError()
return episode_id
def persist_episodes(episode_ids, episodes):
# Note the last element here will be the ValueError
# We change that ValueError into a "fail" message
episodes = ["fail" if isinstance(x, BaseException) else x for x in episodes]
# Note the last element here will be the "fail"
result = {}
for i, episode_id in enumerate(episode_ids):
result[episode_id] = episodes[i]
# Check final results
with Flow("import_episodes") as flow:
episode_ids = retrieve_episode_ids()
episodes = download_episode.map(episode_ids)
persist_episodes(episode_ids, episodes)
The handling will largely happen in the persist_episodes. Just pass the list of inputs again and then we can match the inputs with the failed tasks. I added some handling around identifying errors and replacing them with what you want. Does that answer the question?
Always happy to chat more. You can reach out in the Prefect Slack or Discourse as well.

How to avoid creating a class attribute by accident

I know the motto is "we're all consenting adults around here."
but here is a problem I spent a day on. I got passed a class with over 100 attributes. I had specified one of them was to be called "run_count". The front-end had a place to enter run_count.
Somehow, the front-end/back-end package people decided to call it "run_iterations" instead.
So, my problem is I am writing unit test software, and I did this:
passed_parameters.run_count = 100
result = do_the_thing(passed_parameters)
assert result == 99.75
Now, the problem, of course, is that Python willingly let me set this "new" attribute called "run_count". But, after delving 10 levels down into the code, I discover that the function "do_the_thing" (obviously) never looks at "run_count", but uses "passed_paramaters.run_iterations" instead.
Is there some simple way to avoid allowing yourself to create a new attribute in a class, or a new entry in a dictionary, when you naievely assume you know the attribute name (or the dict key), and accidentally create a new entry that never gets looked at?
In an ideal world, no matter how dynamic, Python would allow you to "lock" and object or instance of one. Then, trying to set a new value for an attribute that doesn't exist would raise an attribute error, letting you know you are trying to change something that doesn't exist, rather than letting you create a new attribute that never gets used.
Use __setattr__, and check the attribute exists, otherwise, throw an error. If you do this, you will receive an error when you define those attributes inside __init__, so you have to workaround that situation. I found 4 ways of doing that. First, define those attributes inside the class, that way, when you try to set their initial value they will already be defined. Second, call object.__setattr__ directly. Third, add a fourth boolean param to __setattr__ indicating whether to bypass checking or not. Fourth, define the previous boolean flag as class-wide, set it to True, initialize the fields and set the flag back to False. Here is the code:
class A:
f = 90
a = None
bypass_check = False
def __init__(self, a, b, c, d1, d2, d3, d4):
# 1st workaround
self.a = a
# 2nd workaround
object.__setattr__(self, 'b', b)
# 3rd workaround
self.__setattr__('c', c, True)
# 4th workaround
self.bypass_check = True
self.d1 = d1
self.d2 = d2
self.d3 = d3
self.d4 = d4
self.bypass_check = False
def __setattr__(self, attr, value, bypass=False):
if bypass or self.bypass_check or hasattr(self, attr):
object.__setattr__(self, attr, value)
# Throw some error
print('Attribute %s not found' % attr)
a = A(1, 2, 3, 4, 5, 6, 7)
a.f = 100
a.d1 = -1
a.g = 200
print(a.f, a.a, a.d1, a.d4)
Attribute g not found
100 1 -1 7

Global variable values in PyCharm (Python 3.6) console

I'm new to both Python and PyCharm, so please forgive ignorance.
I was trying to tech myself about the execution of functions when initialising classes - specifically, I want to re-use a database connection object if passed into a new instance, but create one if not. I have a function get_cnx() that creates a connection. I discovered that, whether using a default argument in the __init__ statement to call get_cnx():
def __init__(self, db_cnx=get_cnx())
...or whether using a keyword argument:
self.db_cnx = kwargs.get('db_cnx', get_cnx())
...the function is always executed regardless of the presence (or content) of the connection argument that's passed in. Defeats the object of re-using a connection, so I reverted to an if condition. I believe there's a way of doing this with a decorator, but that felt like gilding the Lilly.
Anyway, this is the context for my actual question: to help me work out what was going on I created this simple test, as a module called "classes.py":
greeting = 'Good Day'
def my_func():
global greeting
greeting = 'Changed'
return 'Hello'
class Animal:
def __init__(self, greet):
if not greet:
self.greet = my_func()
self.greet = greet
if __name__ == '__main__':
cat = Animal(None)
If I run this module (with "Run with Python console" checked in the configuration), I see the global variable greeting shown in blue as 'Changed', which is what I'd expect.
If I change the last bit to this:
if __name__ == '__main__':
cat = Animal('Wotcha')
I see the global variable shown in blue as 'Good Day', which is also what I'd expect.
However, when I then type this into the console:
dog = Animal(None)
...the global variable name turns red but still shows 'Good Day'.
Similarly, using the PyCharm console does the same thing:
>>> print(greeting)
Good Day
>>> dog = Animal(None)
>>> print(greeting)
Good Day
Now, I loaded the module into IDLE and hit F5 (run module), and in the console, did this:
>>> greeting
'Good Day'
>>> dog = Animal(None)
>>> greeting
This is what I would have expected to see in the PyCharm console.
Can someone explain what's going on? Could it be a bug, or is it my lack of understanding of the way PyCharm deals with scope? Or my lack of broader understanding of execution scope?
JetBrains have opened a bug report for me - confirmed the behaviour isn't as expected.

Is there a way to bring lists and variables through a file import in python?

So, the title may seem vague, so I'm going to describe it in more detail here. I'm writing a space-themed text-based game as a project, and I've reached various problems. I'm trying to use multiple files to simply everything, and so, say I have these variables in file 1:
player_health = 100
o2 = 100
And that's fine and it works and all, but in the first file I have, I have an ever-growing check() function available (so the player can see game info), so I made another file to handle loops. In the second file, I have this, for example:
while True:
if o2 == 0:
player_health = player_health -1
and for the player,
while True:
if player_health = 0
Right now, o2 and player_health are set to 100 as variables. Game_over is a string I made, but that doesn't really matter currently. I defined o2 and player_health in file 1, say, and when I import file 2, they don't seem to work any more. Remember the While loops are handled in File 2. I could just put it all in one file, but then It would be harder for me to bugfix. The reason I'm doing this is so that I can handle the start and the loops and the actual story part separately, and add or remove loops as I see fit without having to sift through tons of code, like I would probably have to if it was all in a single file.
There's a lot of source code, So I'll just post snippets from file one and two. If you want to see all of the source code from both files, I could post it somehow, but there would be a lot.
File One Snippets:
(This is at the beginning of file one, which is named VortexA)
o2 = 100
nitrogen = 100
ship_store = []
player_inv = [] #These Lists and variables handle levels of life support and storage.
air = [o2, nitrogen]
ship_weapons = []
currency = 5
power = 0
He_3 = 0
ship_health = 0
player_health = 100
ship = False
(This is at the end of file one. Just adding it here)
print("Type start(game) to begin")
game = 1
def start(variable):
if game == 1: # This starts the game. Don't want everything in one file.
import vortexB
File 2 (vortexB) Snippets
I already posted some of the loops using variables above, but I'll post the errors I get here.
Traceback (most recent call last):
File "<pyshell#0>", line 1, in <module>
File "C:\Users\Daniel\Desktop\Python Stuff\Python Game\vortexa.py", line 87, in start
import vortexb
File "C:\Users\Daniel\Desktop\Python Stuff\Python Game\vortexb.py", line 25, in <module>
NameError: name 'player_health' is not defined
Basically, I don't know if it's possible to import variables into another python file through the
import file
function. I'm trying to keep my game simple and easy to fix if something goes wrong by having separate files to handle loops and story.. If that's possible. If it's not, I'm going to have one hell of a messy file...
I'm just going to have the first file handle a tutorial-type thing, the second handle loops, and the third handle game-play elements.The way I want it set up is so that the variables are declared during a tutorial, and then the loops are in effect, and then I want the actual game to start. I feel like I'm missing something really simple, or what I'm trying to do is impossible. I strongly feel that it's something to do with o2 and nitrogen being stored in the air list, but that doesn't really explain player_health not being able to work. If it won't work at all, I'll see if there is another way to do it, like putting it all in one file.
Thanks in advance. Any and all help is appreciated.
Honestly, it sounds like you need a separation of concerns.
As for the title of this article goes, yes. You can import variables (lists, strings, objects, etc.) from specific files.
If you have a file constants.py:
class Player:
health = 100
power = 12
# etc
def __init__(self, name, power):
self.name = name
if power:
self.power = power
def is_ded(self):
if self.health < 0:
return True
return False
And then inside of battle.py:
from constants import Player
me = Player('Kris', 500)
mic = Player('A microphone', 0)
def do_battle(player1, player2):
player1.health = player1.health - player2.power
player2.health = player2.health - player1.power
do_battle(me, mic)
# >>>True
# >>>False
This is built into Python. Congratulations for picking the best language.
