Multiprocessing error: function not defined - python-3.x

The following returns "NameError: name 'times_2' is not defined", and I can't figure out why:
def pass_data(data): return times_2(data)
def times_2(data): return data*2
import multiprocessing
multiprocessing.pool = Pool()
pool.ncpus = 2
res = pool.map(pass_data, range(5))
print(res)
What I'm actually trying to do is apply a function to a pandas dataframe. However, because I can't use a lambda function to do this:
pool.map(lambda x: x.apply(get_weather, axis=1), df_split)
instead I have this with the following helper methods, but it throws a "NameError: name 'get_weather' is not defined":
def get_weather(df):
*do stuff*
return weather_df
def pass_dataframe(df):
return df.apply(get_weather, axis=1)
results = pool.map(pass_dataframe, df_split)

Try using Pool like this:
from multiprocessing import Pool
def pass_data(data): return times_2(data)
def times_2(data): return data*2
with Pool(processes=4) as pool:
res = pool.map(pass_data, range(5))
print(res)
On Windows:
from multiprocessing import Pool
def pass_data(data): return times_2(data)
def times_2(data): return data*2
if __name__ == '__main__':
with Pool(processes=4) as pool:
res = pool.map(pass_data, range(5))
print(res)
See docs https://docs.python.org/3/library/multiprocessing.html#multiprocessing-programming

Related

Python | Using AsyncHTMLSession from requests_html

This is my code:
#!/usr/bin/env python
from requests_html import AsyncHTMLSession
async def get_link(url):
r = await asession.get(url)
return r
if __name__ == "__main__":
asession = AsyncHTMLSession()
results = asession.run(
lambda: get_link("https://www.digitalocean.com/blog/2/"),
lambda: get_link("https://www.digitalocean.com/blog/3/"),
lambda: get_link("https://www.digitalocean.com/blog/4/"),
lambda: get_link("https://www.digitalocean.com/blog/5/"),
lambda: get_link("https://www.digitalocean.com/blog/6/"),
lambda: get_link("https://www.digitalocean.com/blog/7/"),
lambda: get_link("https://www.digitalocean.com/blog/8/"),
lambda: get_link("https://www.digitalocean.com/blog/9/"),
)
[print(result.html.absolute_links) for result in results]
The blog links are incrementing by 1.
How do I rewrite the code so that I can use a variable for looping over the numbers from 2 to 9?
The aim is to avoid the repetition of the lambda lines.
You can use the unpacking functionality of the asterisk like so:
results = asession.run(
*(lambda: get_link(f"https://www.digitalocean.com/blog/{i}/") for i in range(2, 10))
)

How to pass a callable as an argument to `functools.partial`

This code
import itertools
import functools
i = itertools.cycle(["1", "2"])
def f1():
return next(i)
def f2(a):
print(a)
f = functools.partial(f2, f1())
f()
f()
produces output 1 1.
Is there an obvious way to prevent calculating f1 when f is created, so result output would be 1 2?
How about using closure instead of functools?
import itertools
i = itertools.cycle(["1", "2"])
def f1():
return next(i)
def f2(a):
print(a)
def wrap(target, fun):
def inner():
target(fun())
return inner
f = wrap(f2, f1)
f() # 1
f() # 2
I was able to achieve the intended output using lambda instead of functools.partial:
import itertools
i = itertools.cycle(["1", "2"])
def f1():
return next(i)
def f2(a):
print(a)
f = lambda: f2(f1())
f()
f()
import itertools
import functools
i = itertools.cycle(["1", "2"])
def f1(a):
print(next(a))
f = functools.partial(f1, i)
f()
f()
Just moved work on argument been passed to the patched function

Python multi processing on for loop

I have a function with two parameters
reqs =[1223,1456,1243,20455]
url = "pass a url"
def crawl(i,url):
print("%s is %s" % (i, url))
I want to trigger above function by multi processing concept.
from multiprocessing import Pool
if __name__ == '__main__':
p = Pool(5)
print(p.map([crawl(i,url) for i in reqs]))
above code is not working for me. can anyone please help me on this!
----- ADDING NEW CODE ---------
from multiprocessing import Pool
reqs = [1223,1456,1243,20455]
url = "pass a url"
def crawl(combined_args):
print("%s is %s" % (combined_args[0], combined_args[1]))
def main():
p = Pool(5)
print(p.map(crawl, [(i,url) for i in reqs]))
if __name__ == '__main__':
main()
when I am trying to execute above code, I am getting below error
According to the multiprocessing.Pool.map this is the function argument line:
map(func, iterable[, chunksize])
You are trying to pass to the map a iterator instead of (func, iterable).
Please refer to the following example of multiprocessing.pool (source):
import time
from multiprocessing import Pool
work = (["A", 5], ["B", 2], ["C", 1], ["D", 3])
def work_log(work_data):
print(" Process %s waiting %s seconds" % (work_data[0], work_data[1]))
time.sleep(int(work_data[1]))
print(" Process %s Finished." % work_data[0])
def pool_handler():
p = Pool(2)
p.map(work_log, work)
if __name__ == '__main__':
pool_handler()
Please note that he is passing one argument to the work_log function and in the function he use the index to get to the relevant fields.
Refering to your example:
from multiprocessing import Pool
reqs = [1223,1456,1243,20455]
url = "pass a url"
def crawl(combined_args):
print("%s is %s" % (combined_args[0], combined_args[1]))
def main():
p = Pool(5)
print(p.map(crawl, [(i,url) for i in reqs]))
if __name__ == '__main__':
main()
Results with:
1223 is pass a url
1456 is pass a url
1243 is pass a url
20455 is pass a url
[None, None, None, None] # This is the output of the map function
Issue resolved. crawl function should in separate module like below:
crawler.py
def crawl(combined_args):
print("%s is %s" % (combined_args[0], combined_args[1]))
run.py
from multiprocessing import Pool
import crawler
def main():
p = Pool(5)
print(p.map(crawler.crawl, [(i,url) for i in reqs]))
if __name__ == '__main__':
main()
Then output will be like below:
**output :**
1223 is pass a url
1456 is pass a url
1243 is pass a url
20455 is pass a url
[None, None, None, None] # This is the output of the map function

mock multiprocessing Pool.map in testing

I want to write a test case for a method in my code mocking the multiprocessing part using pytest.
This the code and I need to write a test for simulation method.
from multiprocessing import Pool
class Simulation:
def __init__(self, num):
self.num = num
def simulation(self):
pool = Pool(processes=self.num)
val = [1, 2, 3, 4, 5]
res = pool.map(self.sim, val)
return res
def sim(self, val):
return val * val
Here is the test case.
import pytest
from multi import Simulation
#pytest.fixture
def sim():
return Simulation(2)
#pytest.fixture
def mock_map():
def map(self, val1=1, val2=2):
return [val1, val2]
return map
def test_sim(sim, mock_map, monkeypatch):
monkeypatch.setattr('multiprocessing.Pool.map', mock_map)
res = sim.simulation()
assert res == [1,2]
When running the test I get the out put as [1,4,6,16,25] where I need to mock the out put as [1,2].

Python asyncio: how to mock __aiter__() method?

I have a code which is listening to messages on WebSocket using aiohttp.
It looks like:
async for msg in ws:
await self._ws_msg_handler.handle_message(ws, msg, _services)
Where ws is an instance of aiohttp.web.WebSocketResponse() (original code)
In my test I mock WebSocketResponse() and its __aiter__ method:
def coro_mock(**kwargs):
return asyncio.coroutine(mock.Mock(**kwargs))
#pytest.mark.asyncio
#mock.patch('aiojsonrpc.request_handler.WebSocketMessageHandler')
async def test_rpc_websocket_handler(
MockWebSocketMessageHandler,
rpc_websocket_handler
):
ws_response = 'aiojsonrpc.request_handler.WebSocketResponse'
with mock.patch(ws_response) as MockWebSocketResponse:
MockRequest = mock.MagicMock()
req = MockRequest()
ws_instance = MockWebSocketResponse.return_value
ws_instance.prepare = coro_mock()
ws_instance.__aiter__ = coro_mock(return_value=iter(range(5)))
ws_instance.__anext__ = coro_mock()
handle_msg_result = 'Message processed'
MockWebSocketMessageHandler.handle_message.side_effect = Exception(
handle_msg_result)
msg_handler = MockWebSocketMessageHandler()
with pytest.raises(Exception) as e:
await request_handler.RpcWebsocketHandler(msg_handler)(req)
assert str(e.value) == handle_msg_result
Though when I run the test it fails with the error message saying:
'async for' requires an object with __aiter__ method, got MagicMock
=================================================================================== FAILURES ===================================================================================
__________________________________________________________________________ test_rpc_websocket_handler __________________________________________________________________________
MockWebSocketMessageHandler = <MagicMock name='WebSocketMessageHandler' id='140687969989632'>
rpc_websocket_handler = <aiojsonrpc.request_handler.RpcWebsocketHandler object at 0x7ff47879b0f0>
#pytest.mark.asyncio
#mock.patch('aiojsonrpc.request_handler.WebSocketMessageHandler')
async def test_rpc_websocket_handler(
MockWebSocketMessageHandler,
rpc_websocket_handler
):
ws_response = 'aiojsonrpc.request_handler.WebSocketResponse'
with mock.patch(ws_response) as MockWebSocketResponse:
# MockRequest = mock.create_autospec(aiohttp.web_reqrep.Request)
# req = MockRequest(*[None] * 6)
MockRequest = mock.MagicMock()
req = MockRequest()
ws_instance = MockWebSocketResponse.return_value
ret = mock.Mock()
ws_instance.prepare = coro_mock()
ws_instance.__aiter__ = coro_mock(return_value=iter(range(5)))
ws_instance.__anext__ = coro_mock()
handle_msg_result = 'Message processed'
MockWebSocketMessageHandler.handle_message.side_effect = Exception(
handle_msg_result)
msg_handler = MockWebSocketMessageHandler()
with pytest.raises(Exception) as e:
await request_handler.RpcWebsocketHandler(msg_handler)(req)
> assert str(e.value) == handle_msg_result
E assert "'async for' ...got MagicMock" == 'Message processed'
E - 'async for' requires an object with __aiter__ method, got MagicMock
E + Message processed
tests/test_request_handler.py:252: AssertionError
So it behaves like __aiter__() was never mocked.
How I'm supposed to accomplish correct mocking in this case?
Update:
For now I've found a workaround to make the code testable though I would really appreciate if someone tell me how to deal with the issue described in the original question.
You can make the mocked class return an object implementing the expected interface:
class AsyncIterator:
def __init__(self, seq):
self.iter = iter(seq)
def __aiter__(self):
return self
async def __anext__(self):
try:
return next(self.iter)
except StopIteration:
raise StopAsyncIteration
MockWebSocketResponse.return_value = AsyncIterator(range(5))
I don't think there is a way (yet) to correctly mock an object implementing __aiter__, it may be a python bug, as async for rejects a MagicMock, even if hasattr(the_magic_mock, '__aiter__') is True.
EDIT (13/12/2017): the library asynctest supports asynchronous iterators and context managers since 0.11, asynctest.MagicMock provides this feature for free.
For posterity, I had the same problem of needing to test an async for loop, but the accepted solution doesn't seem to work for Python 3.7. The example below works for 3.6.x and 3.7.0, but not for 3.5.x:
import asyncio
class AsyncIter:
def __init__(self, items):
self.items = items
async def __aiter__(self):
for item in self.items:
yield item
async def print_iter(items):
async for item in items:
print(item)
if __name__ == '__main__':
loop = asyncio.get_event_loop()
things = AsyncIter([1, 2, 3])
loop.run_until_complete(print_iter(things))
loop.close()
With the above, mocking it looks something like:
with mock.patch('some.async.iter', return_value=AsyncIter([1, 2, 3])):
# do test requiring mocked iter
Works for py38
from unittest.mock import MagicMock
async def test_iterable(self):
loop_iterations = 0
mock = MagicMock()
mock.__aiter__.return_value = range(5)
async for _ in mock:
loop_iterations += 1
self.assertEqual(5, loop_iterations)
I have a python version that supports AsyncMock and I also leverage pytest_mock. I came up with this solution to this problem combining the use of AsyncMock side_effect:
from typing import List
import pytest
import asyncio
from pytest_mock.plugin import MockerFixture
pytestmark = pytest.mark.asyncio
async def async_generator(numbers: List[int]):
for number in numbers:
yield number
await asyncio.sleep(0.1)
async def function_to_test(numbers: List[int]):
async for thing in async_generator(numbers):
yield thing * 3
await asyncio.sleep(0.1)
async def test_async_generator(mocker: MockerFixture):
mock_numbers = [1, 2, 3, 4, 5]
async def async_generator_side_effect(numbers: List[int]):
for number in numbers:
yield number
mock_async_generator = mocker.patch("tests.test_async_generator.async_generator")
mock_async_generator.side_effect = async_generator_side_effect
actual = []
async for result in function_to_test(mock_numbers):
actual.append(result)
assert actual == [3, 6, 9, 12, 15]

Resources