Cherrypy_handling requests - multithreading

I've been searching for a while now but can't find an answere.
I know that cherrypy creates a new thread for handling requests (GET, PUT, POST, DELETE etc).
Now i fetch the parameters like this:
...
#cherrypy.tools.json_in()
#cherrypy.tools.json_out()
def POST(self):
Forum.lock_post.acquire()
conn = self.io.psqlConnect(self.dict_psql)
cur = conn.cursor(cursor_factory = psycopg2.extras.RealDictCursor)
params = cherrypy.request.json
...
return some_dict
As you can see im locking the thread to avoid race condition on the variable params. But is this really necessary? I'm asking cos if i do it like this all the other requests on POST will have to wait. Is there any better solution without locking the whole POST? I'm using params several times along the code.

First a clarification, CherryPy doesn't create a new thread for each requests, it has a predetermined pool of threads (10 by default), from which indeed one thread can be used to handle a single request at a time.
As for if you should lock cherrypy.request.json. You really don't, there is a concept called "thread locals" on which you can have multiple references to different objects depending on which thread is accessing such object. (python docs).
Having said that... you should make sure that the code that you write doesn't interfere with the state of the other threads (you can use the cherrypy.thread_data as a quick fix).
Take a look into the cherrypy plugin architecture, if you want a resource to be shared among threads usually a plugin is the way to: http://docs.cherrypy.org/en/latest/extend.html#plugins

Related

is `requests` post operation thread-safe?

I'm trying to understand whether requests.post is thread safe. In other words, assuming the server side can (and will) handle multiple requests concurrently, will something like this
def f(i) :
response = requests.post(...)
# do something with the response
with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
for i in range(10):
executor.submit(func, i)
work as expected?
this question and the discussion around it suggest that although the requests package claims to be thread-safe the Session() object is not thread safe. Anyway, it's not clear to me whether this has any bearing on my specific case or not (perhaps because I don't have a clear grasp of the relationship between sessions, cookies, and the requests.post operation).

Is at a good idea to use ThreadPoolExecutor with one worker?

I have a simple rest service which allows you to create task. When a client requests a task - it returns a unique task number and starts executing in a separate thread. The easiest way to implement it
class Executor:
def __init__(self, max_workers=1):
self.executor = ThreadPoolExecutor(max_workers)
def execute(self, body, task_number):
# some logic
pass
def some_rest_method(request):
body = json.loads(request.body)
task_id = generate_task_id()
Executor(max_workers=1).execute(body)
return Response({'taskId': task_id})
Is it a good idea to create each time ThreadPoolExecutor with one (!) workers if i know than one request - is one new task (new thread). Perhaps it is worth putting them in the queue somehow? Maybe the best option is to create a regular stream every time?
Is it a good idea to create each time ThreadPoolExecutor...
No. That completely defeats the purpose of a thread pool. The reason for using a thread pool is so that you don't create and destroy a new thread for every request. Creating and destroying threads is expensive. The idea of a thread pool is that it keeps the "worker thread(s)" alive and re-uses it/them for each next request.
...with just one thread
There's a good use-case for a single-threaded executor, though it probably does not apply to your problem. The use-case is, you need a sequence of tasks to be performed "in the background," but you also need them to be performed sequentially. A single-thread executor will perform the tasks, one after another, in the same order that they were submitted.
Perhaps it is worth putting them in the queue somehow?
You already are putting them in a queue. Every thread pool has a queue of pending tasks. When you submit a task (i.e., executor.execute(...)) that puts the task into the queue.
what's the best way...in my case?
The bones of a simplistic server look something like this (pseudo-code):
POOL = ThreadPoolExecutor(...with however many threads seem appropriate...)
def service():
socket = create_a_socket_that_listens_on_whatever_port()
while True:
client_connection = socket.accept()
POOL.submit(request_handler, connection=connection)
def request_handler(connection):
request = receive_request_from(connection)
reply = generate_reply_based_on(request)
send_reply_to(reply, connection)
connection.close()
def main():
initialize_stuff()
service()
Of course, there are many details that I have left out. I can't design it for you. Especially not in Python. I've written servers like this in other languages, but I'm pretty new to Python.

Multi Thread Requests Python3

I have researched a lot on this topic but the problem is am not able to figure out how to send multi-threading post requests using python3
names = ["dfg","dddfg","qwed"]
for name in names :
res = requests.post(url,data=name)
res.text
Here I want to send all these names and I want to use multi threading to make it faster.
Solution 1 - concurrent.futures.ThreadPoolExecutor fixed number of threads
Using a custom function (request_post) you can do almost anything.
import concurrent
import requests
def request_post(url, data):
return requests.post(url, data=data)
with concurrent.futures.ThreadPoolExecutor() as executor: # optimally defined number of threads
res = [executor.submit(request_post, url, data) for data in names]
concurrent.futures.wait(res)
res will be list of request.Response for each request made wrapped on Future instances. To access the request.Response you need to use res[index].result() where index size is len(names).
Future objects give you better control on the responses received, like if it completed correctly or there was an exception or time-out etc. More about here
You don't take risk of problems related to high number of threads (solution 2).
Solution 2 - multiprocessing.dummy.Pool and spawn one thread for each request
Might be usefull if you are not requesting a lot of pages and also or if the response time is quite slow.
from multiprocessing.dummy import Pool as ThreadPool
import itertools
import requests
with ThreadPool(len(names)) as pool: # creates a Pool of 3 threads
res = pool.starmap(requests.post(itertools.repeat(url),names))
pool.starmap - is used to pass (map) multiple arguments to one function (requests.post) that is gonna be called by a list of Threads (ThreadPool). It will return a list of request.Response for each request made.
intertools.repeat(url) is needed to make the first argument be repeated the same number of threads being created.
names is the second argument of requests.post so it's gonna work without needing to explicitly use the optional parameter data. Its len must be the same of the number of threads being created.
This code will not work if you needed to call another parameter like an optional one

Why pass parameters through thread function?

When I create a new thread in a program... in it's thread handle function, why do I pass variables that I want that thread to use through the thread function prototype as parameters (as a void pointer)? Since threads share the same memory segments (except for stack) as the main program, shouldn't I be able to just use the variables directly instead of passing parameters from main program to new thread?
Well, yes, you could use the variables directly. Maybe. Assuming that they aren't changed by some other thread before your thread starts running.
Also, a big part of passing parameters to functions (including thread functions) is to limit the amount of information the called function has to know about the outside world. If you pass the thread function everything it needs in order to do its work, then you can change the rest of the program with relative impunity and the thread will still continue to work. If, however, you force the thread to know that there is a global list of strings called MyStringList, then you can't change that global list without also affecting the thread.
Information hiding. Encapsulation. Separation of concerns. Etc.
You cannot pass parameters to a thread function in any kind of normal register/stack manner because thread functions are not called by the creating thread - they are given execution directly by the underlying OS and the API's that do this copy a fixed number of parameters, (usually only one void pointer), to the new and different stack of the new thread.
As Jim says, failure to understand this mechanism often results in disaster. There are numnerous questions on SO where the vars that devs. hope would be used by a new thread are RAII'd away before the new thread even starts.

Asynchronous IO in Scala with futures

Let's say I'm getting a (potentially big) list of images to download from some URLs. I'm using Scala, so what I would do is :
import scala.actors.Futures._
// Retrieve URLs from somewhere
val urls: List[String] = ...
// Download image (blocking operation)
val fimages: List[Future[...]] = urls.map (url => future { download url })
// Do something (display) when complete
fimages.foreach (_.foreach (display _))
I'm a bit new to Scala, so this still looks a little like magic to me :
Is this the right way to do it? Any alternatives if it is not?
If I have 100 images to download, will this create 100 threads at once, or will it use a thread pool?
Will the last instruction (display _) be executed on the main thread, and if not, how can I make sure it is?
Thanks for your advice!
Use Futures in Scala 2.10. They were joint work between the Scala team, the Akka team, and Twitter to reach a more standardized future API and implementation for use across frameworks. We just published a guide at: http://docs.scala-lang.org/overviews/core/futures.html
Beyond being completely non-blocking (by default, though we provide the ability to do managed blocking operations) and composable, Scala's 2.10 futures come with an implicit thread pool to execute your tasks on, as well as some utilities to manage time outs.
import scala.concurrent.{future, blocking, Future, Await, ExecutionContext.Implicits.global}
import scala.concurrent.duration._
// Retrieve URLs from somewhere
val urls: List[String] = ...
// Download image (blocking operation)
val imagesFuts: List[Future[...]] = urls.map {
url => future { blocking { download url } }
}
// Do something (display) when complete
val futImages: Future[List[...]] = Future.sequence(imagesFuts)
Await.result(futImages, 10 seconds).foreach(display)
Above, we first import a number of things:
future: API for creating a future.
blocking: API for managed blocking.
Future: Future companion object which contains a number of useful methods for collections of futures.
Await: singleton object used for blocking on a future (transferring its result to the current thread).
ExecutionContext.Implicits.global: the default global thread pool, a ForkJoin pool.
duration._: utilities for managing durations for time outs.
imagesFuts remains largely the same as what you originally did- the only difference here is that we use managed blocking- blocking. It notifies the thread pool that the block of code you pass to it contains long-running or blocking operations. This allows the pool to temporarily spawn new workers to make sure that it never happens that all of the workers are blocked. This is done to prevent starvation (locking up the thread pool) in blocking applications. Note that the thread pool also knows when the code in a managed blocking block is complete- so it will remove the spare worker thread at that point, which means that the pool will shrink back down to its expected size.
(If you want to absolutely prevent additional threads from ever being created, then you ought to use an AsyncIO library, such as Java's NIO library.)
Then we use the collection methods of the Future companion object to convert imagesFuts from List[Future[...]] to a Future[List[...]].
The Await object is how we can ensure that display is executed on the calling thread-- Await.result simply forces the current thread to wait until the future that it is passed is completed. (This uses managed blocking internally.)
val all = Future.traverse(urls){ url =>
val f = future(download url) /*(downloadContext)*/
f.onComplete(display)(displayContext)
f
}
Await.result(all, ...)
Use scala.concurrent.Future in 2.10, which is RC now.
which uses an implicit ExecutionContext
The new Future doc is explicit that onComplete (and foreach) may evaluate immediately if the value is available. The old actors Future does the same thing. Depending on what your requirement is for display, you can supply a suitable ExecutionContext (for instance, a single thread executor). If you just want the main thread to wait for loading to complete, traverse gives you a future to await on.
Yes, seems fine to me, but you may want to investigate more powerful twitter-util or Akka Future APIs (Scala 2.10 will have a new Future library in this style).
It uses a thread pool.
No, it won't. You need to use the standard mechanism of your GUI toolkit for this (SwingUtilities.invokeLater for Swing or Display.asyncExec for SWT). E.g.
fimages.foreach (_.foreach(im => SwingUtilities.invokeLater(new Runnable { display im })))

Resources