My project consists of consuming an api that is built on top of the aws lambda service. Technically, the leader who built the api tells me that there is no fixed request limit since the service is elastic, but it is important to take into account the number of requests per second that the api can support.
To control the limit of requests per second (concurrently), the python script that I am developing uses asyncio and httpx to consume the api concurrently, and taking advantage of the max_connections parameter of httpx.Limits I am trying to find the optimal value so that the api does not freeze.
My problem is that I don't know if I am misinterpreting the use of the max_connections parameter, since when testing with a value of 1000, my understanding tells me that per second I am making 1000 requests concurrently to the api, but even so, the api after a certain time freezes.
I would like to be able to control the limit of requests per second without the need to use third-party libraries.
How could I do it?
Here is my MWE
async def consume(client, endpoint: str = '/create', reg):
data = {"param1": reg[1]}
response = await client.post(url=endpoint, data=json.dumps(data))
return response.json()
async def run(self, regs):
# Empty list to consolidate all responses
results = []
# httpx limits configuration
limits = httpx.Limits(max_keepalive_connections=None, max_connections=1000)
timeout = httpx.Timeout(connect=60.0, read=30.0, write=30.0, pool=60.0)
# httpx client context
async with httpx.AsyncClient(base_url='https://apiexample', headers={'Content-Type': 'application/json'},
limits=limits, timeout=timeout) as client:
# regs is a list of more than 1000000 tuples
tasks = [asyncio.ensure_future(consume(client=client, reg=reg))
for reg in regs]
result = await asyncio.gather(*tasks)
results += result
return results
Thanks in advance.
Your leader is wrong - there is a request limit for AWS lambda (it's 1000 concurrent executions by default).
AWS API is highly unlikely to "freeze" (there are many layers of protection), so I would look for a problem on your side.
Start debugging, by lowering the concurent connections setting (e.g. 100), and explore other settings if this doesn't fix the issue..
More info: https://www.bluematador.com/blog/why-aws-lambda-throttles-functions
Related
I have an application where I need to process several pdfs using Azure Computer Vision SDK. I am following this example from the official documentation.
From what I have understood, we can submit pdfs by
# Async SDK call that "reads" the image
response = client.read_in_stream(filepath, raw=False)
and get the results using operation_location
# Get ID from returned headers
operation_location = response.headers["Operation-Location"]
operation_id = operation_location.split("/")[-1]
# SDK call that gets what is read
while True:
result = client.get_read_result(operation_id)
if result.status.lower () not in ['notstarted', 'running']:
break
print ('Waiting for result...')
time.sleep(10)
return result
I have noticed that read_in_stream takes around 10 to 30 seconds (depending on number of pages, images in pdf, quality, etc.) and would like to use asyncio to concurrently proceed to next tasks instead of waiting for pdfs to just submit. I tried using joblib backends to speed this up (multithreading, multiprocessing and also a combination of the two), but the performance was just 2-2.5 times better even after tweaking other parameters of joblib.
I want to know the correct way of using asyncio for this problem and would like to keep using the azure SDK objects instead of resorting to requests library and dealing with raw json responses. I think with requests, asyncio and aiohttp library this could be achieved as below, but how to proceed with azure SDK?
timeout = aiohttp.ClientTimeout(total = 3600)
async with aiohttp.ClientSession(timeout = timeout) as session:
async with session.get(url) as resp:
# Submit pdfs and get its result
In my application, the state of a common object is changed by making requests, and the response depends on the state.
class SomeObj():
def __init__(self, param):
self.param = param
def query(self):
self.param += 1
return self.param
global_obj = SomeObj(0)
#app.route('/')
def home():
flash(global_obj.query())
render_template('index.html')
If I run this on my development server, I expect to get 1, 2, 3 and so on. If requests are made from 100 different clients simultaneously, can something go wrong? The expected result would be that the 100 different clients each see a unique number from 1 to 100. Or will something like this happen:
Client 1 queries. self.param is incremented by 1.
Before the return statement can be executed, the thread switches over to client 2. self.param is incremented again.
The thread switches back to client 1, and the client is returned the number 2, say.
Now the thread moves to client 2 and returns him/her the number 3.
Since there were only two clients, the expected results were 1 and 2, not 2 and 3. A number was skipped.
Will this actually happen as I scale up my application? What alternatives to a global variable should I look at?
You can't use global variables to hold this sort of data. Not only is it not thread safe, it's not process safe, and WSGI servers in production spawn multiple processes. Not only would your counts be wrong if you were using threads to handle requests, they would also vary depending on which process handled the request.
Use a data source outside of Flask to hold global data. A database, memcached, or redis are all appropriate separate storage areas, depending on your needs. If you need to load and access Python data, consider multiprocessing.Manager. You could also use the session for simple data that is per-user.
The development server may run in single thread and process. You won't see the behavior you describe since each request will be handled synchronously. Enable threads or processes and you will see it. app.run(threaded=True) or app.run(processes=10). (In 1.0 the server is threaded by default.)
Some WSGI servers may support gevent or another async worker. Global variables are still not thread safe because there's still no protection against most race conditions. You can still have a scenario where one worker gets a value, yields, another modifies it, yields, then the first worker also modifies it.
If you need to store some global data during a request, you may use Flask's g object. Another common case is some top-level object that manages database connections. The distinction for this type of "global" is that it's unique to each request, not used between requests, and there's something managing the set up and teardown of the resource.
This is not really an answer to thread safety of globals.
But I think it is important to mention sessions here.
You are looking for a way to store client-specific data. Every connection should have access to its own pool of data, in a threadsafe way.
This is possible with server-side sessions, and they are available in a very neat flask plugin: https://pythonhosted.org/Flask-Session/
If you set up sessions, a session variable is available in all your routes and it behaves like a dictionary. The data stored in this dictionary is individual for each connecting client.
Here is a short demo:
from flask import Flask, session
from flask_session import Session
app = Flask(__name__)
# Check Configuration section for more details
SESSION_TYPE = 'filesystem'
app.config.from_object(__name__)
Session(app)
#app.route('/')
def reset():
session["counter"]=0
return "counter was reset"
#app.route('/inc')
def routeA():
if not "counter" in session:
session["counter"]=0
session["counter"]+=1
return "counter is {}".format(session["counter"])
#app.route('/dec')
def routeB():
if not "counter" in session:
session["counter"] = 0
session["counter"] -= 1
return "counter is {}".format(session["counter"])
if __name__ == '__main__':
app.run()
After pip install Flask-Session, you should be able to run this. Try accessing it from different browsers, you'll see that the counter is not shared between them.
Another example of a data source external to requests is a cache, such as what's provided by Flask-Caching or another extension.
Create a file common.py and place in it the following:
from flask_caching import Cache
# Instantiate the cache
cache = Cache()
In the file where your flask app is created, register your cache with the following code:
# Import cache
from common import cache
# ...
app = Flask(__name__)
cache.init_app(app=app, config={"CACHE_TYPE": "filesystem",'CACHE_DIR': Path('/tmp')})
Now use throughout your application by importing the cache and executing as follows:
# Import cache
from common import cache
# store a value
cache.set("my_value", 1_000_000)
# Get a value
my_value = cache.get("my_value")
While totally accepting the previous upvoted answers, and discouraging use of global variables for production and scalable Flask storage, for the purpose of prototyping or really simple servers, running under the flask 'development server'...
...
The Python built-in data types, and I personally used and tested the global dict, as per Python documentation are thread safe. Not process safe.
The insertions, lookups, and reads from such a (server global) dict will be OK from each (possibly concurrent) Flask session running under the development server.
When such a global dict is keyed with a unique Flask session key, it can be rather useful for server-side storage of session specific data otherwise not fitting into the cookie (max size 4 kB).
Of course, such a server global dict should be carefully guarded for growing too large, being in-memory. Some sort of expiring the 'old' key/value pairs can be coded during request processing.
Again, it is not recommended for production or scalable deployments, but it is possibly OK for local task-oriented servers where a separate database is too much for the given task.
...
I was trying various approaches with python multi-threading to see which one fits my requirements. To give an overview, I have a bunch of items that I need to send to an API. Then based on the response, some of the items will go to a database and all the items will be logged; e.g., for an item if the API returns success, that item will only be logged but when it returns failure, that item will be sent to database for future retry along with logging.
Now based on the API response I can separate out success items from failure and make a batch query with all failure items, which will improve my database performance. To do that, I am accumulating all requests at one place and trying to perform multithreaded API calls(since this is an IO bound task, I'm not even thinking about multiprocessing) but at the same time I need to keep track of which response belongs to which request.
Coming to the actual question, I tried two different approaches which I thought would give nearly identical performance, but there turned out to be a huge difference.
To simulate the API call, I created an API in my localhost with a 500ms sleep(for avg processing time). Please note that I want to start logging and inserting to database after all API calls are complete.
Approach - 1(With threading.Thread and queue.Queue())
import requests
import datetime
import threading
import queue
def target(data_q):
while not data_q.empty():
data_q.get()
response = requests.get("https://postman-echo.com/get?foo1=bar1&foo2=bar2")
print(response.status_code)
data_q.task_done()
if __name__ == "__main__":
data_q = queue.Queue()
for i in range(0, 20):
data_q.put(i)
start = datetime.datetime.now()
num_thread = 5
for _ in range(num_thread):
worker = threading.Thread(target=target(data_q))
worker.start()
data_q.join()
print('Time taken multi-threading: '+str(datetime.datetime.now() - start))
I tried with 5, 10, 20 and 30 times and the results are below correspondingly,
Time taken multi-threading: 0:00:06.625710
Time taken multi-threading: 0:00:13.326969
Time taken multi-threading: 0:00:26.435534
Time taken multi-threading: 0:00:40.737406
What shocked me here is, I tried the same without multi-threading and got almost same performance.
Then after some googling around, I was introduced to futures module.
Approach - 2(Using concurrent.futures)
def fetch_url(im_url):
try:
response = requests.get(im_url)
return response.status_code
except Exception as e:
traceback.print_exc()
if __name__ == "__main__":
data = []
for i in range(0, 20):
data.append(i)
start = datetime.datetime.now()
urls = ["https://postman-echo.com/get?foo1=bar1&foo2=bar2" + str(item) for item in data]
with futures.ThreadPoolExecutor(max_workers=5) as executor:
responses = executor.map(fetch_url, urls)
for ret in responses:
print(ret)
print('Time taken future concurrent: ' + str(datetime.datetime.now() - start))
Again with 5, 10, 20 and 30 times and the results are below correspondingly,
Time taken future concurrent: 0:00:01.276891
Time taken future concurrent: 0:00:02.635949
Time taken future concurrent: 0:00:05.073299
Time taken future concurrent: 0:00:07.296873
Now I've heard about asyncio, but I've not used it yet. I've also read that it gives even better performance than futures.ThreadPoolExecutor().
Final question, If both approaches are using threads(or so I think) then why there is a huge performance gap? Am I doing something terribly wrong? I looked around. Was not able to find a satisfying answer. Any thoughts on this would be highly appreciated. Thanks for going through the question.
[Edit 1]The whole thing is running on python 3.8.
[Edit 2] Updated code examples and execution times. Now they should run on anyone's system.
The documentation of ThreadPoolExecutor explains in detail how many threads are started when the max_workers parameter is not given, as in your example. The behaviour is different depending on the exact Python version, but the number of tasks started is most probably more than 3, the number of threads in the first version using a queue. You should use futures.ThreadPoolExecutor(max_workers= 3) to compare the two approaches.
To the updated Approach - 1 I suggest to modify the for loop a bit:
for _ in range(num_thread):
target_to_run= target(data_q)
print('target to run: {}'.format(target_to_run))
worker = threading.Thread(target= target_to_run)
worker.start()
The output will be like this:
200
...
200
200
target to run: None
target to run: None
target to run: None
target to run: None
target to run: None
Time taken multi-threading: 0:00:10.846368
The problem is that the Thread constructor expects a callable object or None as its target. You do not give it a callable, rather queue processing happens on the first invocation of target(data_q) by the main thread, and 5 threads are started that do nothing because their target is None.
I have researched a lot on this topic but the problem is am not able to figure out how to send multi-threading post requests using python3
names = ["dfg","dddfg","qwed"]
for name in names :
res = requests.post(url,data=name)
res.text
Here I want to send all these names and I want to use multi threading to make it faster.
Solution 1 - concurrent.futures.ThreadPoolExecutor fixed number of threads
Using a custom function (request_post) you can do almost anything.
import concurrent
import requests
def request_post(url, data):
return requests.post(url, data=data)
with concurrent.futures.ThreadPoolExecutor() as executor: # optimally defined number of threads
res = [executor.submit(request_post, url, data) for data in names]
concurrent.futures.wait(res)
res will be list of request.Response for each request made wrapped on Future instances. To access the request.Response you need to use res[index].result() where index size is len(names).
Future objects give you better control on the responses received, like if it completed correctly or there was an exception or time-out etc. More about here
You don't take risk of problems related to high number of threads (solution 2).
Solution 2 - multiprocessing.dummy.Pool and spawn one thread for each request
Might be usefull if you are not requesting a lot of pages and also or if the response time is quite slow.
from multiprocessing.dummy import Pool as ThreadPool
import itertools
import requests
with ThreadPool(len(names)) as pool: # creates a Pool of 3 threads
res = pool.starmap(requests.post(itertools.repeat(url),names))
pool.starmap - is used to pass (map) multiple arguments to one function (requests.post) that is gonna be called by a list of Threads (ThreadPool). It will return a list of request.Response for each request made.
intertools.repeat(url) is needed to make the first argument be repeated the same number of threads being created.
names is the second argument of requests.post so it's gonna work without needing to explicitly use the optional parameter data. Its len must be the same of the number of threads being created.
This code will not work if you needed to call another parameter like an optional one
I have a list of youtube videos from different playlists and I need to check if these videos are still valid (they are around 1000). What I am doing at the moment it is hitting Youtube using its API v2 and Groovy with this simple script:
import groovyx.net.http.HTTPBuilder
import static groovyx.net.http.Method.GET
http = new HTTPBuilder('http://gdata.youtube.com')
myVideoIds.each { id ->
if (!isValidYoutubeUrl(id)) {
// do stuff
}
}
boolean isValidYoutubeUrl (id) {
boolean valid = true
http.request(GET) {
uri.path = "feeds/api/videos/${id}"
headers.'User-Agent' = 'Mozilla/5.0 Ubuntu/8.10 Firefox/3.0.4'
response.failure = { resp ->
valid = false
}
}
valid
}
but after a few seconds it starts to return 403 for any single id (it may be due to the fact it is running too many requests closely). The problem is reduced if I insert something like Thread.sleep(3000). Is there a better solution than just delaying the requests?
In V2 of the API, there are time-based limits on how many requests you can make, but they aren't a hard and fast limit (that is, it depends somewhat on many under-the-hood factors and may not always be the same limit). Here's what the documentation says:
The YouTube API enforces quotas to prevent problems associated with
irregular API usage. Specifically, a too_many_recent_calls error
indicates that the API servers have received too many calls from the
same caller in a short amount of time. If you receive this type of
error, then we recommend that you wait a few minutes and then try your
request again.
You can avoid this by putting in a sleep like you do, but you'd want it to be 10-15 seconds or so.
It's more important, though, to implement batch processing. With this, you can make up to 50 requests at once (this counts as 50 requests against your overall request per day quota, but only as one against your per time quota). Batch processing with v2 of the API is a little involved, as you make a POST request to a batch endpoint first, and then based on those results you can send in the multiple requests. Here's the documentation:
https://developers.google.com/youtube/2.0/developers_guide_protocol?hl=en#Batch_processing
If you use v3 of the API, batch processing becomes quite a bit easier, as you just send 50 IDs at a time in the request. Change:
http = new HTTPBuilder('http://gdata.youtube.com')
to:
http = new HTTPBuilder('https://www.googleapis.com')
Then set your uri.path to
youtube/v3/videos?part=id&max_results=key={your API key}&id={variable here that represents 50 YouTube IDs, comma separated}
For 1000 videos, then, you'll only need to make 20 calls. Any video that doesn't come back in the list doesn't exist anymore (if you need to get video details, change the part parameter to be id,snippet,contentDetails or something appropriate for your needs.
Here's the documentation:
https://developers.google.com/youtube/v3/docs/videos/list#id