Python | Using AsyncHTMLSession from requests_html - python-3.x

This is my code:
#!/usr/bin/env python
from requests_html import AsyncHTMLSession
async def get_link(url):
r = await asession.get(url)
return r
if __name__ == "__main__":
asession = AsyncHTMLSession()
results = asession.run(
lambda: get_link("https://www.digitalocean.com/blog/2/"),
lambda: get_link("https://www.digitalocean.com/blog/3/"),
lambda: get_link("https://www.digitalocean.com/blog/4/"),
lambda: get_link("https://www.digitalocean.com/blog/5/"),
lambda: get_link("https://www.digitalocean.com/blog/6/"),
lambda: get_link("https://www.digitalocean.com/blog/7/"),
lambda: get_link("https://www.digitalocean.com/blog/8/"),
lambda: get_link("https://www.digitalocean.com/blog/9/"),
)
[print(result.html.absolute_links) for result in results]
The blog links are incrementing by 1.
How do I rewrite the code so that I can use a variable for looping over the numbers from 2 to 9?
The aim is to avoid the repetition of the lambda lines.

You can use the unpacking functionality of the asterisk like so:
results = asession.run(
*(lambda: get_link(f"https://www.digitalocean.com/blog/{i}/") for i in range(2, 10))
)

Related

How to pass a callable as an argument to `functools.partial`

This code
import itertools
import functools
i = itertools.cycle(["1", "2"])
def f1():
return next(i)
def f2(a):
print(a)
f = functools.partial(f2, f1())
f()
f()
produces output 1 1.
Is there an obvious way to prevent calculating f1 when f is created, so result output would be 1 2?
How about using closure instead of functools?
import itertools
i = itertools.cycle(["1", "2"])
def f1():
return next(i)
def f2(a):
print(a)
def wrap(target, fun):
def inner():
target(fun())
return inner
f = wrap(f2, f1)
f() # 1
f() # 2
I was able to achieve the intended output using lambda instead of functools.partial:
import itertools
i = itertools.cycle(["1", "2"])
def f1():
return next(i)
def f2(a):
print(a)
f = lambda: f2(f1())
f()
f()
import itertools
import functools
i = itertools.cycle(["1", "2"])
def f1(a):
print(next(a))
f = functools.partial(f1, i)
f()
f()
Just moved work on argument been passed to the patched function

Python multi processing on for loop

I have a function with two parameters
reqs =[1223,1456,1243,20455]
url = "pass a url"
def crawl(i,url):
print("%s is %s" % (i, url))
I want to trigger above function by multi processing concept.
from multiprocessing import Pool
if __name__ == '__main__':
p = Pool(5)
print(p.map([crawl(i,url) for i in reqs]))
above code is not working for me. can anyone please help me on this!
----- ADDING NEW CODE ---------
from multiprocessing import Pool
reqs = [1223,1456,1243,20455]
url = "pass a url"
def crawl(combined_args):
print("%s is %s" % (combined_args[0], combined_args[1]))
def main():
p = Pool(5)
print(p.map(crawl, [(i,url) for i in reqs]))
if __name__ == '__main__':
main()
when I am trying to execute above code, I am getting below error
According to the multiprocessing.Pool.map this is the function argument line:
map(func, iterable[, chunksize])
You are trying to pass to the map a iterator instead of (func, iterable).
Please refer to the following example of multiprocessing.pool (source):
import time
from multiprocessing import Pool
work = (["A", 5], ["B", 2], ["C", 1], ["D", 3])
def work_log(work_data):
print(" Process %s waiting %s seconds" % (work_data[0], work_data[1]))
time.sleep(int(work_data[1]))
print(" Process %s Finished." % work_data[0])
def pool_handler():
p = Pool(2)
p.map(work_log, work)
if __name__ == '__main__':
pool_handler()
Please note that he is passing one argument to the work_log function and in the function he use the index to get to the relevant fields.
Refering to your example:
from multiprocessing import Pool
reqs = [1223,1456,1243,20455]
url = "pass a url"
def crawl(combined_args):
print("%s is %s" % (combined_args[0], combined_args[1]))
def main():
p = Pool(5)
print(p.map(crawl, [(i,url) for i in reqs]))
if __name__ == '__main__':
main()
Results with:
1223 is pass a url
1456 is pass a url
1243 is pass a url
20455 is pass a url
[None, None, None, None] # This is the output of the map function
Issue resolved. crawl function should in separate module like below:
crawler.py
def crawl(combined_args):
print("%s is %s" % (combined_args[0], combined_args[1]))
run.py
from multiprocessing import Pool
import crawler
def main():
p = Pool(5)
print(p.map(crawler.crawl, [(i,url) for i in reqs]))
if __name__ == '__main__':
main()
Then output will be like below:
**output :**
1223 is pass a url
1456 is pass a url
1243 is pass a url
20455 is pass a url
[None, None, None, None] # This is the output of the map function

Multiprocessing error: function not defined

The following returns "NameError: name 'times_2' is not defined", and I can't figure out why:
def pass_data(data): return times_2(data)
def times_2(data): return data*2
import multiprocessing
multiprocessing.pool = Pool()
pool.ncpus = 2
res = pool.map(pass_data, range(5))
print(res)
What I'm actually trying to do is apply a function to a pandas dataframe. However, because I can't use a lambda function to do this:
pool.map(lambda x: x.apply(get_weather, axis=1), df_split)
instead I have this with the following helper methods, but it throws a "NameError: name 'get_weather' is not defined":
def get_weather(df):
*do stuff*
return weather_df
def pass_dataframe(df):
return df.apply(get_weather, axis=1)
results = pool.map(pass_dataframe, df_split)
Try using Pool like this:
from multiprocessing import Pool
def pass_data(data): return times_2(data)
def times_2(data): return data*2
with Pool(processes=4) as pool:
res = pool.map(pass_data, range(5))
print(res)
On Windows:
from multiprocessing import Pool
def pass_data(data): return times_2(data)
def times_2(data): return data*2
if __name__ == '__main__':
with Pool(processes=4) as pool:
res = pool.map(pass_data, range(5))
print(res)
See docs https://docs.python.org/3/library/multiprocessing.html#multiprocessing-programming

Appending to merged async generators in Python

I'm trying to merge a bunch of asynchronous generators in Python 3.7 while still adding new async generators on iteration. I'm currently using aiostream to merge my generators:
from asyncio import sleep, run
from aiostream.stream import merge
async def go():
yield 0
await sleep(1)
yield 50
await sleep(1)
yield 100
async def main():
tasks = merge(go(), go(), go())
async for v in tasks:
print(v)
if __name__ == '__main__':
run(main())
However, I need to be able to continue to add to the running tasks once the loop has begun. Something like.
from asyncio import sleep, run
from aiostream.stream import merge
async def go():
yield 0
await sleep(1)
yield 50
await sleep(1)
yield 100
async def main():
tasks = merge(go(), go(), go())
async for v in tasks:
if v == 50:
tasks.merge(go())
print(v)
if __name__ == '__main__':
run(main())
The closest I've got to this is using the aiostream library but maybe this can also be written fairly neatly with just the native asyncio standard library.
Here is an implementation that should work efficiently even with a large number of async iterators:
class merge:
def __init__(self, *iterables):
self._iterables = list(iterables)
self._wakeup = asyncio.Event()
def _add_iters(self, next_futs, on_done):
for it in self._iterables:
it = it.__aiter__()
nfut = asyncio.ensure_future(it.__anext__())
nfut.add_done_callback(on_done)
next_futs[nfut] = it
del self._iterables[:]
return next_futs
async def __aiter__(self):
done = {}
next_futs = {}
def on_done(nfut):
done[nfut] = next_futs.pop(nfut)
self._wakeup.set()
self._add_iters(next_futs, on_done)
try:
while next_futs:
await self._wakeup.wait()
self._wakeup.clear()
for nfut, it in done.items():
try:
ret = nfut.result()
except StopAsyncIteration:
continue
self._iterables.append(it)
yield ret
done.clear()
if self._iterables:
self._add_iters(next_futs, on_done)
finally:
# if the generator exits with an exception, or if the caller stops
# iterating, make sure our callbacks are removed
for nfut in next_futs:
nfut.remove_done_callback(on_done)
def append_iter(self, new_iter):
self._iterables.append(new_iter)
self._wakeup.set()
The only change required for your sample code is that the method is named append_iter, not merge.
This can be done using stream.flatten with an asyncio queue to store the new generators.
import asyncio
from aiostream import stream, pipe
async def main():
queue = asyncio.Queue()
await queue.put(go())
await queue.put(go())
await queue.put(go())
xs = stream.call(queue.get)
ys = stream.cycle(xs)
zs = stream.flatten(ys, task_limit=5)
async with zs.stream() as streamer:
async for item in streamer:
if item == 50:
await queue.put(go())
print(item)
Notice that you may tune the number of tasks that can run at the same time using the task_limit argument. Also note that zs can be elegantly defined using the pipe syntax:
zs = stream.call(queue.get) | pipe.cycle() | pipe.flatten(task_limit=5)
Disclaimer: I am the project maintainer.

EOF error when using Pool

In my code, I am trying to use multiprocessing to find the max price of each coin given a URL. There are around 1400 coins that I have to get data for, so I implemented Python's multiprocessing Pool. I'm not sure if I am using it correctly, but I followed the example given from this website: https://docs.python.org/3.4/library/multiprocessing.html?highlight=process
Here is my code:
import requests
import json
from bs4 import BeautifulSoup
from multiprocessing import Pool
max_prices = []
def find_max (url):
# finds maximum price of a coin
r = requests.get(url)
cont = r.json()
prices = list(map(lambda x: x[1], cont["price_usd"]))
maxPrice = max(prices)
return maxPrice
with open("coins.txt", "r") as f:
data = json.load(f)
coin_slug = [d["slug"] for d in data]
coin_names = [d["name"] for d in data]
urls = []
for item in coin_slug:
url = "https://graphs2.coinmarketcap.com/currencies/"+item+"/"
urls.append(url)
if __name__ == '__main__':
with Pool(5) as p:
print(p.map(find_max, urls)
When I added this part of the code, it gave me an EOF error:
if __name__ == '__main__':
with Pool(5) as p:
print(p.map(find_max, urls)
You have unbalanced brackets in the last line. It should be
print(p.map(find_max, urls)).

Resources