Python3 and Multithreading - How does "join" method work?

Python3 and Multithreading - How does "join" method work? - python-3.x

I started to use the threading library recently because I need to make my software faster but unfortunately I can't. below an example about what I wanna do:
from threading import Thread
# in my PC, it takes around 30 seconds to complete the task:
def do_something(string):
a=string
for n in range(1000000000):
a+=a+a
return {"a":"aa", "b":"bb", "c":"cc"}
ls_a, ls_b, ls_c = [], [], []
ls_strings=["ciao", "hello", "hola"]
for key in ls_strings:
t=Thread(target=do_something, args=(key,))
t.start()
dic=t.join()
ls_a.append(dic["a"]) # <--- TypeError: 'NoneType' object is not subscriptable
ls_b.append(dic["b"])
ls_c.append(dic["c"])
print(ls_a)
print(ls_b)
print(ls_c)
this code doesn't work, it returns an exception when Python starts to read the line "ls_a.append(dic["a"])":
TypeError: 'NoneType' object is not subscriptable
there is this error because the instruction "dic=t.join()" returns "None" and I really don't understand why (I expected to receive "a" and not "None"). why does't the method "join" work? how can I fix my code? can you guys help me to understand?

what I want to do is run "do_something" function for more strings (in my example "ciao", "hello" and "hola") at the same time.
The trick in that case is, to not join any of them until all of them have been started. Use two loops instead of just one:
threads = []
for key in ls_strings:
t=Thread(target=do_something, args=(key,))
t.start()
threads.append(t)
# optionally, do something else here while the threads run.
for t in threads:
t.join()
Note: this does not solve your problem of how to "return" a value from a thread. There's lots of questions already answered on this site that tell you how to do that (e.g., How to get the return value from a thread in Python?)

Related

How to replace the call to random.randint() in a function tested with pytest?

I'm new to programming and did search a lot through the questions but couldn't find an answer to my present problem.
I am writing a little game in python 3.8 and use pytest to run some basic tests.
One of my tests is about a function using random.randint()
Here's an extract of my code :
import random
...
def hit(self, enemy, attack):
dmg = random.randint(self.weapon.damage_min, self.weapon.damage_max) + self.strength // 4
hit() does other things after that but the problem is with this first line.
I tried to use monkeypatching to get a fake random number for the test :
def test_player_hit_missed(monkeypatch, monster, hero):
monkeypatch.setattr('random.randint', -3)
hero.hit(monster, 'Scream')
assert monster.life == 55
When I run the test, I get this error :
TypeError: 'int' object is not callable
From what I understand, the monkey-patch did replace randint() by the number I indicated (-3), but then my function hit() did try to call it nonetheless.
I thought -3 would replace randint()'s result.
Can someone explain me :
- why this doesn't work (I probably haven't correctly understood the behavior of the monkeypatch) ?
- and how I can replace the call to random.randint() by the value -3 during the test ?
Thanks

Why is the deconstructor not automatically being called?

I'm working on an assignment for school and having some difficulty understanding the __del__ method. I understand that it is called after all the references to the object are deleted, but I'm not exactly sure how to get to that point. It states that the __del__ method should be called automatically, but I'm having a rough time even getting the del() to automatically call __del__ as I understand it should.
I've tried to manually call the del method and have tried looking at various sample coding. Something is just not clicking with me for this. The only way I can some-what get it to be called is by using this piece of code at the end:
for faq in faqs:
Faq.__del__(faq)
But I know that is not correct.
class Faq:
def __init__(self, question, answer):
self.question = question
self.answer = answer
return
def print_faq(self):
print('\nQuestion: {}'.format(self.question))
print('Answer: {}'.format(self.answer))
def __del__(self):
print('\nQuestion: {}'.format(self.question))
print('FAQ deleted')
faqs = []
faq1 = Faq('Does this work?', 'Yes.')
faqs.append(faq1)
faq2 = Faq('What about now?', 'Still yes.')
faqs.append(faq2)
faq3 = Faq('Should I give up?', 'Nope!')
faqs.append(faq3)
print("FAQ's:")
print('='*30)
for faq in faqs:
obj = Faq.print_faq(faq)
print()
print('='*30)
I expect the code to output the __del__ print statements to verify the code ran.

The method __del__ is called
when the instance is about to be destroyed
This happens when there are no more references to it.
del x doesn’t directly call x.__del__() — the former decrements the reference count for x by one, and the latter is only called when x’s reference count reaches zero.
So the reason you don't see the expected prints is because each Faq object has 2 references to it:
The variable it is assigned to (faq1, faq2 ...)
A reference from the list faqs
So doing del faq1 is not enough as this will only leave one last reference from the list. To delete those references too, you can do del faqs[:].
As to the code posted here, I am guessing you expect to see the prints because when the program finishes all resources are released. Well that is true, but:
It is not guaranteed that __del__() methods are called for objects that still exist when the interpreter exits.

You bind faqN and faqs and you keep these references. You need to destroy bindings.
For example:
faq1 = None # del faq1
faq2 = None
faq3 = None
faqs = None # [] or del faqs

How to set timeout for a block of code which is not a function python3

After spending a lot of hours looking for a solution in stackoverflow, I did not find a good solution to set a timeout for a block of code. There are approximations to set a timeout for a function. Nevertheless, I would like to know how to set a timeout without having a function. Let's take the following code as an example:
print("Doing different things")
for i in range(0,10)
# Doing some heavy stuff
print("Done. Continue with the following code")
So, How would you break the for loop if it has not finished after x seconds? Just continue with the code (maybe saving some bool variables to know that timeout was reached), despite the fact that the for loop did not finish properly.

i think implement this efficiently without using functions not possible , look this code ..
import datetime as dt
print("Doing different things")
# store
time_out_after = dt.timedelta(seconds=60)
start_time = dt.datetime.now()
for i in range(10):
if dt.datetime.now() > time_started + time_out:
break
else:
# Doing some heavy stuff
print("Done. Continue with the following code")
the problem : the timeout will checked in the beginning of every loop cycle, so it may be take more than the specified timeout period to break of the loop, or in worst case it maybe not interrupt the loop ever becouse it can't interrupt the code that never finish un iteration.
update :
as op replayed, that he want more efficient way, this is a proper way to do it, but using functions.
import asyncio
async def test_func():
print('doing thing here , it will take long time')
await asyncio.sleep(3600) # this will emulate heaven task with actual Sleep for one hour
return 'yay!' # this will not executed as the timeout will occur early
async def main():
# Wait for at most 1 second
try:
result = await asyncio.wait_for(test_func(), timeout=1.0) # call your function with specific timeout
# do something with the result
except asyncio.TimeoutError:
# when time out happen program will break from the test function and execute code here
print('timeout!')
print('lets continue to do other things')
asyncio.run(main())
Expected output:
doing thing here , it will take long time
timeout!
lets continue to do other things
note:
now timeout will happen after exactly the time you specify. in this example code, after one second.
you would replace this line:
await asyncio.sleep(3600)
with your actual task code.
try it and let me know what do you think. thank you.
read asyncio docs:
link
update 24/2/2019
as op noted that asyncio.run introduced in python 3.7 and asked for altrnative on python 3.6
asyncio.run alternative for python older than 3.7:
replace
asyncio.run(main())
with this code for older version (i think 3.4 to 3.6)
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
loop.close()

You may try the following way:
import time
start = time.time()
for val in range(10):
# some heavy stuff
time.sleep(.5)
if time.time() - start > 3: # 3 is timeout in seconds
print('loop stopped at', val)
break # stop the loop, or sys.exit() to stop the script
else:
print('successfully completed')
I guess it is kinda viable approach. Actual timeout is greater than 3 seconds and depends on the single step execution time.

How to write a function that sums a list using parallel computing?

I am trying to write a Python function for fast calculation of the sum of a list, using parallel computing. Initially I tried to use the Python multithreading library, but then I noticed that all threads run on the same CPU, so there is no speed gain, so I switched to using multiprocessing. In the first version I made the list a global variable:
from multiprocessing import Pool
array = 100000000*[1]
def sumPart(fromTo:tuple):
return sum(array[fromTo[0]:fromTo[1]])
with Pool(2) as pool:
print(sum(pool.map(sumPart, [(0,len(array)//2), (len(array)//2,len(array))])))
This worked well and returned the correct sum after about half the time of a serial computation.
But then I wanted to make it a function that accepts the array as an argument:
def parallelSum(theArray):
def sumPartLocal(fromTo: tuple):
return sum(theArray[fromTo[0]:fromTo[1]])
with Pool(2) as pool:
return (sum(pool.map(sumPartLocal, [(0, len(theArray) // 2), (len(theArray) // 2, len(theArray))])))
Here I got an error:
AttributeError: Can't pickle local object 'parallelSum.<locals>.sumPartLocal'
What is the correct way to write this function?

When scheduling jobs to a Python Pool you need to ensure both the function and it's arguments can be serialized as they will be transferred over a pipe.
Python uses the pickle protocol to serialize its objects. You can see what can be pickled in the module documentation. In your case, you are facing this limitation.
functions defined at the top level of a module (using def, not lambda)
Under the hood, the Pool is sending a string with the function name and its parameters. The Python interpreter in the child process looks for that function name in the module and fails to find it as it's nested in the scope of another function parallelSum.
Move sumPartLocal outside parallelSum and everything will be fine.

I believe you are hitting this, or see the documentation
What you could do is leave def sumPartLocal at module level, and pass theArray as third component of your tuple so that would be fromTo[2] inside the sumPartLocal function.
Example:
from multiprocessing import Pool
def sumPartLocal(fromTo: tuple):
return sum(fromTo[2][fromTo[0]:fromTo[1]])
def parallelSum(theArray):
with Pool(2) as pool:
return (sum
(pool.map
(sumPartLocal, [
(0, len(theArray) // 2, theArray),
(len(theArray) // 2, len(theArray), theArray)
]
)
)
)
if __name__ == '__main__':
theArray = 100000000*[1]
s = parallelSum(theArray)
print(s)
[EDIT 15-Dec-2017 based on comments]
Anyone who is thinking of multi-threading in python, I strongly recommend reading up about the Global Interpreter Lock
Also, some good answers on this question here on SO

Python 3.5 asyncio execute coroutine on event loop from synchronous code in different thread

I am hoping someone can help me here.
I have an object that has the ability to have attributes that return coroutine objects. This works beautifully, however I have a situation where I need to get the results of the coroutine object from synchronous code in a separate thread, while the event loop is currently running. The code I came up with is:
def get_sync(self, key: str, default: typing.Any=None) -> typing.Any:
"""
Get an attribute synchronously and safely.
Note:
This does nothing special if an attribute is synchronous. It only
really has a use for asynchronous attributes. It processes
asynchronous attributes synchronously, blocking everything until
the attribute is processed. This helps when running SQL code that
cannot run asynchronously in coroutines.
Args:
key (str): The Config object's attribute name, as a string.
default (Any): The value to use if the Config object does not have
the given attribute. Defaults to None.
Returns:
Any: The vale of the Config object's attribute, or the default
value if the Config object does not have the given attribute.
"""
ret = self.get(key, default)
if asyncio.iscoroutine(ret):
if loop.is_running():
loop2 = asyncio.new_event_loop()
try:
ret = loop2.run_until_complete(ret)
finally:
loop2.close()
else:
ret = loop.run_until_complete(ret)
return ret
What I am looking for is a safe way to synchronously get the results of a coroutine object in a multithreaded environment. self.get() can return a coroutine object, for attributes I have set to provide them. The issues I have found are: If the event loop is running or not. After searching for a few hours on stack overflow and a few other sites, my (broken) solution is above. If the loop is running, I make a new event loop and run my coroutine in the new event loop. This works, except that the code hangs forever on the ret = loop2.run_until_complete(ret) line.
Right now, I have the following scenarios with results:
results of self.get() is not a coroutine
Returns results. [Good]
results of self.get() is a coroutine & event loop is not running (basically in same thread as the event loop)
Returns results. [Good]
results of self.get() is a coroutine & event loop is running (basically in a different thread than the event loop)
Hangs forever waiting for results. [Bad]
Does anyone know how I can go about fixing the bad result so I can get the value I need? Thanks.
I hope I made some sense here.
I do have a good, and valid reason to be using threads; specifically I am using SQLAlchemy which is not async and I punt the SQLAlchemy code to a ThreadPoolExecutor to handle it safely. However, I need to be able to query these asynchronous attributes from within these threads for the SQLAlchemy code to get certain configuration values safely. And no, I won't switch away from SQLAlchemy to another system just in order to accomplish what I need, so please do not offer alternatives to it. The project is too far along to switch something so fundamental to it.
I tried using asyncio.run_coroutine_threadsafe() and loop.call_soon_threadsafe() and both failed. So far, this has gotten the farthest on making it work, I feel like I am just missing something obvious.
When I get a chance, I will write some code that provides an example of the problem.
Ok, I implemented an example case, and it worked the way I would expect. So it is likely my problem is elsewhere in the code. Leaving this open and will change the question to fit my real problem if I need.
Does anyone have any possible ideas as to why a concurrent.futures.Future from asyncio.run_coroutine_threadsafe() would hang forever rather than return a result?
My example code that does not duplicate my error, unfortunately, is below:
import asyncio
import typing
loop = asyncio.get_event_loop()
class ConfigSimpleAttr:
__slots__ = ('value', '_is_async')
def __init__(
self,
value: typing.Any,
is_async: bool=False
):
self.value = value
self._is_async = is_async
async def _get_async(self):
return self.value
def __get__(self, inst, cls):
if self._is_async and loop.is_running():
return self._get_async()
else:
return self.value
class BaseConfig:
__slots__ = ()
attr1 = ConfigSimpleAttr(10, True)
attr2 = ConfigSimpleAttr(20, True)
def get(self, key: str, default: typing.Any=None) -> typing.Any:
return getattr(self, key, default)
def get_sync(self, key: str, default: typing.Any=None) -> typing.Any:
ret = self.get(key, default)
if asyncio.iscoroutine(ret):
if loop.is_running():
fut = asyncio.run_coroutine_threadsafe(ret, loop)
print(fut, fut.running())
ret = fut.result()
else:
ret = loop.run_until_complete(ret)
return ret
config = BaseConfig()
def example_func():
return config.get_sync('attr1')
async def main():
a1 = await loop.run_in_executor(None, example_func)
a2 = await config.attr2
val = a1 + a2
print('{a1} + {a2} = {val}'.format(a1=a1, a2=a2, val=val))
return val
loop.run_until_complete(main())
This is the stripped down version of exactly what my code is doing, and the example works, even if my actual application doesn't. I am stuck as far as where to look for answers. Suggestions are welcome as to where to try to track down my "stuck forever" problem, even if my code above doesn't actually duplicate the problem.

It is very unlikely that you need to run several event loops at the same time, so this part looks quite wrong:
if loop.is_running():
loop2 = asyncio.new_event_loop()
try:
ret = loop2.run_until_complete(ret)
finally:
loop2.close()
else:
ret = loop.run_until_complete(ret)
Even testing whether the loop is running or not doesn't seem to be the right approach. It's probably better to give explicitly the (only) running loop to get_sync and schedule the coroutine using run_coroutine_threadsafe:
def get_sync(self, key, loop):
ret = self.get(key, default)
if not asyncio.iscoroutine(ret):
return ret
future = asyncio.run_coroutine_threadsafe(ret, loop)
return future.result()
EDIT: Hanging problems can be related to tasks being scheduled in the wrong loop (e.g. forgetting about the optional loop argument when calling a coroutine). This kind of problem should be easier to debug with the PR 303 (now merged): a RuntimeError is raised instead when the loop and the future don't match. So you might want to run your tests with the latest version of asyncio.

Ok, I got my code working, by taking a different approach to it. The problem was tied with using something that had file IO, which I was converting into a coroutine using loop.run_in_executor() on the file IO components. Then, I was trying to use this in a sync function being called from another thread, processed using another loop.run_in_executor() on that function. This is a very important routine in my code (called probably a million times or more during the execution of my short-running code), and I made a decision that my logic was just getting too complicated. So... I uncomplicated it. Now, if I want to use the file IO components asynchronously, I explicitly use my "get_async()" method, otherwise, I use my attribute through normal attribute access.
By removing the complexity of my logic, it made the code cleaner, easier to understand, and even more importantly, it actually works. While I am not 100% certain that I know the root cause of the issue (I believe it has something to do with a thread processing an attribute, which then in turn starts another thread that tries to read the attribute before it is processed, which caused something like a race condition and halting my code, but I could never duplicate the error outside of my application unfortunately to completely prove it out), I was able to get past it and continue with my development efforts.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Python3 and Multithreading - How does "join" method work? - python-3.x

Related

How to replace the call to random.randint() in a function tested with pytest?

Why is the deconstructor not automatically being called?

How to set timeout for a block of code which is not a function python3

How to write a function that sums a list using parallel computing?

Python 3.5 asyncio execute coroutine on event loop from synchronous code in different thread

Categories

Resources