How to ignore a parameter in functools. lru_cache? - python-3.x

This is a skeleton of the function I want to enhance with a cache, because doing RPC (remote procedure call) involves a TCP connection to other host.
def rpc(rpc_server, rpc_func, arg):
return rpc_server.do_rpc(rpc_func, arg)
However, the most convenient way of simply decorating it with:
#functools.lru_cache()
does not work well, beacuse rpc_server objects come and go and this parameter should be ignored by the cache.
I can write a simple memoizing code myself. No problem with that. Actually, I see no other solution.
I am unable to rewrite this function in such way that #lru_cache() decorator can be applied and rpc_server will be passed as an argument (i.e. I don't want to make rpc_server a global variable). Is it possible?

I'm posting this just for completness. Comments are welcome, but please do not vote.
I have found a way how to satisfy the conditions from my question. I'm not going to use this code. But it shows how flexible Python is.
import functools
class BlackBox:
"""All BlackBoxes are the same."""
def __init__(self, contents):
# TODO: use a weak reference for contents
self._contents = contents
#property
def contents(self):
return self._contents
def __eq__(self, other):
return isinstance(other, type(self))
def __hash__(self):
return hash(type(self))
#functools.lru_cache()
def _cached_func(blackbox, real_arg):
print("called with args:", blackbox.contents, real_arg)
return real_arg + 1000
def cached_func(ignored_arg, real_arg):
# ignored means ignored by the cache
return _cached_func(BlackBox(ignored_arg), real_arg)
cached_func("foo", 1) # cache miss
cached_func("bar", 1) # cache hit
cached_func("bar", 2) # cache miss
cached_func("foo", 2) # cache hit

Related

Setting instance variables explicitly or via function

If we have a instance variable which can be set either randomly or via a list input is it better to set the instance variable explicitly (via a function return) or as a side-effect of a function? E.i., which of the versions below is better?
class A():
def __init__(self, *input):
if input:
self.property = self.create_property_from_input(input)
else:
self.property = self.create_property_randomly()
#staticmethod
def create_property_from_input(input)
# Do something useful with the input.
return result
#staticmethod
def create_property_randomly():
# Do something useful
return result
or
class A():
def __init__(self, *input):
if parents:
self.create_property_from_input(input)
else:
self.create_property_randomly()
def create_property_from_input(self, input)
# Do something useful with the input.
self.property = result
# return None
def create_property_randomly(self):
# Do something useful
self.property = result
# return None
I think that in the first version, it is not strictly needed to set the two create_property-functions as static methods. However, since they do not need to know anything about the instance I thought it was more clear to do it that way. Personally, I tend to think that the first version is more explicit, but the use of static methods tend to make it look more advanced than it is.
Which version would you think is closer to best practices?

Python multiprocess: run several instances of a class, keep all child processes in memory

First, I'd like to thank the StackOverflow community for the tremendous help it provided me over the years, without me having to ask a single question.
I could not find anything that I can relate to my problem, though it is probably due to my lack of understanding of the subject, rather than the absence of a response on the website. My apologies in advance if this is a duplicate.
I am relatively new to multiprocess; some time ago I succeeded in using multiprocessing.pools in a very simple way, where I didn't need any feedback between the child processes.
Now I am facing a much more complicated problem, and I am just lost in the documentation about multiprocessing. I hence ask for you help, your kindness and your patience.
I am trying to build a parallel tempering monte-carlo algorithm, from a class.
The basic class very roughly goes as follows:
import numpy as np
class monte_carlo:
def __init__(self):
self.x=np.ones((1000,3))
self.E=np.mean(self.x)
self.Elist=[]
def simulation(self,temperature):
self.T=temperature
for i in range(3000):
self.MC_step()
if i%10==0:
self.Elist.append(self.E)
return
def MC_step(self):
x=self.x.copy()
k = np.random.randint(1000)
x[k] = (x[k] + np.random.uniform(-1,1,3))
temp_E=np.mean(self.x)
if np.random.random()<np.exp((self.E-temp_E)/self.T):
self.E=temp_E
self.x=x
return
Obviously, I simplified a great deal (actual class is 500 lines long!), and built fake functions for simplicity: __init__ takes a bunch of parameters as arguments, there are many more lists of measurement else than self.Elist, and also many arrays derived from self.X that I use to compute them. The key point is that each instance of the class contains a lot of informations that I want to keep in memory, and that I don't want to copy over and over again, to avoid dramatic slowing down. Else I would just use the multiprocessing.pool module.
Now, the parallelization I want to do, in pseudo-code:
def proba(dE,pT):
return np.exp(-dE/pT)
Tlist=[1.1,1.2,1.3]
N=len(Tlist)
G=[]
for _ in range(N):
G.append(monte_carlo())
for _ in range(5):
for i in range(N): # this loop should be ran in multiprocess
G[i].simulation(Tlist[i])
for i in range(N//2):
dE=G[i].E-G[i+1].E
pT=G[i].T + G[i+1].T
p=proba(dE,pT) # (proba is a function, giving a probability depending on dE)
if np.random.random() < p:
T_temp = G[i].T
G[i].T = G[i+1].T
G[i+1].T = T_temp
Synthesis: I want to run several instances of my monte-carlo class in parallel child processes, with different values for a parameter T, then periodically pause everything to change the different T's, and run again the child processes/class instances, from where they paused.
Doing this, I want each class-instance/child-process to stay independent from one another, save its current state with all internal variables while it is paused, and do as few copies as possible. This last point is critical, as the arrays inside the class are quite big (some are 1000x1000), and a copy will therefore very quickly become quite time-costly.
Thanks in advance, and sorry if I am not clear...
Edit:
I am using a distant machine with many (64) CPUs, running on Debian GNU/Linux 10 (buster).
Edit2:
I made a mistake in my original post: in the end, the temperatures must be exchanged between the class-instances, and not inside the global Tlist.
Edit3: Charchit answer works perfectly for the test code, on both my personal machine and the distant machine I am usually using for running my codes. I hence check this as the accepted answer.
However, I want to report here that, inserting the actual, more complicated code, instead of the oversimplified monte_carlo class, the distant machine gives me some strange errors:
Unable to init server: Could not connect: Connection refused
(CMC_temper_all.py:55509): Gtk-WARNING **: ##:##:##:###: Locale not supported by C library.
Using the fallback 'C' locale.
Unable to init server: Could not connect: Connection refused
(CMC_temper_all.py:55509): Gdk-CRITICAL **: ##:##:##:###:
gdk_cursor_new_for_display: assertion 'GDK_IS_DISPLAY (display)' failed
(CMC_temper_all.py:55509): Gdk-CRITICAL **: ##:##:##:###: gdk_cursor_new_for_display: assertion 'GDK_IS_DISPLAY (display)' failed
The "##:##:##:###" are (or seems like) IP adresses.
Without the call to set_start_method('spawn') this error shows only once, in the very beginning, while when I use this method, it seems to show at every occurrence of result.get()...
The strangest thing is that the code seems otherwise to work fine, does not crash, produces the datafiles I then ask it to, etc...
I think this would deserve to publish a new question, but I put it here nonetheless in case someone has a quick answer.
If not, I will resort to add one by one the variables, methods, etc... that are present in my actual code but not in the test example, to try and find the origin of the bug. My best guess for now is that the memory space required by each child-process with the actual code, is too large for the distant machine to accept it, due to some restrictions implemented by the admin.
What you are looking for is sharing state between processes. As per the documentation, you can either create shared memory, which is restrictive about the data it can store and is not thread-safe, but offers better speed and performance; or you can use server processes through managers. The latter is what we are going to use since you want to share whole objects of user-defined datatypes. Keep in mind that using managers will impact speed of your code depending on the complexity of the arguments that you pass and receive, to and from the managed objects.
Managers, proxies and pickling
As mentioned, managers create server processes to store objects, and allow access to them through proxies. I have answered a question with better details on how they work, and how to create a suitable proxy here. We are going to use the same proxy defined in the linked answer, with some variations. Namely, I have replaced the factory functions inside the __getattr__ to something that can be pickled using pickle. This means that you can run instance methods of managed objects created with this proxy without resorting to using multiprocess. The result is this modified proxy:
from multiprocessing.managers import NamespaceProxy, BaseManager
import types
import numpy as np
class A:
def __init__(self, name, method):
self.name = name
self.method = method
def get(self, *args, **kwargs):
return self.method(self.name, args, kwargs)
class ObjProxy(NamespaceProxy):
"""Returns a proxy instance for any user defined data-type. The proxy instance will have the namespace and
functions of the data-type (except private/protected callables/attributes). Furthermore, the proxy will be
pickable and can its state can be shared among different processes. """
def __getattr__(self, name):
result = super().__getattr__(name)
if isinstance(result, types.MethodType):
return A(name, self._callmethod).get
return result
Solution
Now we only need to make sure that when we are creating objects of monte_carlo, we do so using managers and the above proxy. For that, we create a class constructor called create. All objects for monte_carlo should be created with this function. With that, the final code looks like this:
from multiprocessing import Pool
from multiprocessing.managers import NamespaceProxy, BaseManager
import types
import numpy as np
class A:
def __init__(self, name, method):
self.name = name
self.method = method
def get(self, *args, **kwargs):
return self.method(self.name, args, kwargs)
class ObjProxy(NamespaceProxy):
"""Returns a proxy instance for any user defined data-type. The proxy instance will have the namespace and
functions of the data-type (except private/protected callables/attributes). Furthermore, the proxy will be
pickable and can its state can be shared among different processes. """
def __getattr__(self, name):
result = super().__getattr__(name)
if isinstance(result, types.MethodType):
return A(name, self._callmethod).get
return result
class monte_carlo:
def __init__(self, ):
self.x = np.ones((1000, 3))
self.E = np.mean(self.x)
self.Elist = []
self.T = None
def simulation(self, temperature):
self.T = temperature
for i in range(3000):
self.MC_step()
if i % 10 == 0:
self.Elist.append(self.E)
return
def MC_step(self):
x = self.x.copy()
k = np.random.randint(1000)
x[k] = (x[k] + np.random.uniform(-1, 1, 3))
temp_E = np.mean(self.x)
if np.random.random() < np.exp((self.E - temp_E) / self.T):
self.E = temp_E
self.x = x
return
#classmethod
def create(cls, *args, **kwargs):
# Register class
class_str = cls.__name__
BaseManager.register(class_str, cls, ObjProxy, exposed=tuple(dir(cls)))
# Start a manager process
manager = BaseManager()
manager.start()
# Create and return this proxy instance. Using this proxy allows sharing of state between processes.
inst = eval("manager.{}(*args, **kwargs)".format(class_str))
return inst
def proba(dE,pT):
return np.exp(-dE/pT)
if __name__ == "__main__":
Tlist = [1.1, 1.2, 1.3]
N = len(Tlist)
G = []
# Create our managed instances
for _ in range(N):
G.append(monte_carlo.create())
for _ in range(5):
# Run simulations in the manager server
results = []
with Pool(8) as pool:
for i in range(N): # this loop should be ran in multiprocess
results.append(pool.apply_async(G[i].simulation, (Tlist[i], )))
# Wait for the simulations to complete
for result in results:
result.get()
for i in range(N // 2):
dE = G[i].E - G[i + 1].E
pT = G[i].T + G[i + 1].T
p = proba(dE, pT) # (proba is a function, giving a probability depending on dE)
if np.random.random() < p:
T_temp = Tlist[i]
Tlist[i] = Tlist[i + 1]
Tlist[i + 1] = T_temp
print(Tlist)
This meets the criteria you wanted. It does not create any copies at all, rather, all arguments to the simulation method call are serialized inside the pool and sent to the manager server where the object is actually stored. It gets executed there, and the results (if any) are serialized and returned in the main process. All of this, with only using the builtins!
Output
[1.2, 1.1, 1.3]
Edit
Since you are using Linux, I encourage you to use multiprocessing.set_start_method inside the if __name__ ... clause to set the start method to "spawn". Doing this will ensure that the child processes do not have access to variables defined inside the clause.

How to implement "relationship" caching system in a similar query?

I noticed that when having a Model such as :
class User(Model):
id = ...
books = relationship('Book')
When calling user.books for the first time, SQLAlchemy query the database (when lazy='select' for instance, which is the default), but sub-sequent call to user.books don't call the database. The results seems to have been cached.
I'd like to have the same feature from SQLAlchemy when using a method that query, for instance:
class User:
def get_books(self):
return Book.query.filter(Book.user_id == self.id).all()
But when doing that, if I call 3 times get_books(), SQLAlchemy does call the database 3 times (when setting the ECHO property to True).
How can I change get_books() to use the caching system from SQLAlchemy ?
I insist to mention "from SQLAlchemy" because I believe they handle the refresh/expunge/flush system and changes are then re-queried to the DB if one of these happened. Opposed to if I were to simply create a caching property in the model with a simple:
def get_books(self):
if self._books is None:
self._books = Book.query.filter(Book.user_id == self.id).all()
return self._books
This does not work well with flush/refresh/expunge from SQLAlchemy.
So, How can I change get_books() to use the caching system from SQLAlchemy ?
Edit 1:
I realized that the solution provided under is not perfect, because it caches for the current object. If you have two instances of the same user, and call get_books on both, two queries will be made because the caching applies only on the instance, not globally, contrary to SQLAlchemy.
The reason is simple - I believe - but still unclear how to apply it in my case: The object is defined at the class level, not the instance (books = relationship()), and they build their own query internally, so they can cache it based on the query.
In the solution I gave, the memoize_getter is unaware of the query made, and as such, cannot cache it for the same value accros multiple instance, so any identical call made to another instance will query the database.
Original answer:
I've been trying to wrap my head around SQLAlchemy's code (wow that's dense!), and I think I figured it out!
A relationship, at least when being set as "lazy='select'" (default), is a InstrumentedAttribute, which contains a get function that does the following :
def __get__(self, instance, owner):
if instance is None:
return self
dict_ = instance_dict(instance)
if self._supports_population and self.key in dict_:
return dict_[self.key]
else:
try:
state = instance_state(instance)
except AttributeError as err:
util.raise_(
orm_exc.UnmappedInstanceError(instance),
replace_context=err,
)
return self.impl.get(state, dict_)
So, a basic caching system, respecting SQLAlchemy, would be something like:
from sqlalchemy.orm.base import instance_dict
def get_books(self):
dict_ = instance_dict(self)
if 'books' not in dict_:
dict_['books'] = Book.query.filter(Book.user_id == self.id).all()
return dict_['books']
Now, we can push the vice a bit further, and do ... a decorator (oh sweet):
def memoize_getter(f):
#functools.wraps(f)
def decorator(instance, *args, **kwargs):
property_name = f.__name__.replace('get_', '')
dict_ = instance_dict(instance)
if property_name not in dict_:
dict_[property_name] = f(instance, *args, **kwargs)
return dict_[property_name]
return decorator
Thus transforming the original method to :
class User:
#memoize_getter
def get_books(self):
return Book.query.filter(Book.user_id == self.id).all()
If someone has a better solution, I'm eagerly interested!

Get a user's keyboard input that was requested by another function

I am using a python package for database managing. The provided class has a method delete() that deletes a record from the database. Before deleting, it asks a user to verify the operation from a console, e.g. Proceed? [yes, No]:
My function needs to perform other actions depending on whether a user chose to delete a record. Can I get user's input requested by the function from the package?
Toy example:
def ModuleFunc():
while True:
a=input('Proceed? [yes, No]:')
if a in ['yes','No']:
#Perform some actions behind a hood
return
This function will wait for one of the two responses and return None once it gets either. After calling this function, can I determine the User's response (without modifying this function)? I think a modification of the Package's source code is not a good idea in general.
Why not just patch the class at runtime? Say you had a file ./lib/db.py defining a class DB like this:
class DB:
def __init__(self):
pass
def confirm(self, msg):
a=input(msg + ' [Y, N]:')
if a == 'Y':
return True
return False
def delete(self):
if self.confirm('Delete?'):
print ('Deleted!')
return
Then in main.py you could do:
from lib.db import DB
def newDelete(self):
if self.confirm('Delete?'):
print('Do some more stuff!')
print('Deleted!')
return
DB.delete = newDelete
test = DB()
test.delete()
See it working here
I would save key events to somewhere(file or memory) with something like Keylogger. Then, you will be able to reuse last one.
However, if you can modify module package 📦 and redistribute, it would be easier.
Return
To
Return a

assert wrapper function

So I came across this interesting problem while reviewing code:
class Foo:
def __init__(self, foo_name):
self.foo_doo = getattr(foo_name, 'foo_lists', None)
def assert_foo(self, varname):
assert hasattr(self, 'foo_%s' % varname)
def foobar(self):
assert_foo('doo')
Wonder if wrapping assert to a customized version of your own is faster/better solution then using assert hasattr(...) everytime you need to make sure the attribute is present and not None?
The last line will raise NameError unless changed to
self.assert_foo('doo')
That aside, I do not think assert should be used in the above code with or without the wrapper. The corrected line only checks that self has .foo_doo set, but not that it is not None.
if self.foo_doo is not None:
does both.
If one wants an abbreviated look-first attribute check, one could write
def has_foo(self, name):
return hasattr(self, 'foo_'+name)
def foobar(self):
if has_foo('doo'):
If you also want a non-None check, change the has_foo return to:
return getattr(self, 'foo_'+name, None) is not None
Beyond this, assert in production code should only be used to check internal logic and not runtime conditions affected by users of the code. Users can delete or disable assertions, so code should not depend on assert for its proper operation once released.
In the code above, __init__ sets self.foo_doo to something, but the caller can subsequently delete the attribute. So both the existence and value of the attribute are user-determined run time conditions and not appropriate subjects for assertions.
The TestCase.assertXxx methods of unittest are only used for testing, and when they fail, they do more than just wrap a simple assert.

Resources