How to process concurrent user requests in Clojure? [closed]

How to process concurrent user requests in Clojure? [closed] - multithreading

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 1 year ago.
Improve this question
I'm a beginner in Clojure and I am interested in how to process concurrent user requests in a web application. As a node.js developer, I use an event loop (promises/callback/async-await) to access the database to process concurrent user requests, which means the main thread won't be blocked and I can process other users' requests without waiting for a database response. How I can process concurrent user requests in Clojure if I make a request to the database, that means my thread would be blocked and I can't process any other user requests? Should I create a new thread for each user to process requests? I know that I can use futures in Clojure to create a new thread, but how can I use it for example with one request to DB, if create a new thread my main thread will be blocked to wait for a child thread which requests the database? I don't understand how futures can help me in this situation. What is the best practice or maybe servers such as Jetty provide better processing out of the box? I would be very grateful if you can help me in this matter.

The short answer is that a typical Java web server uses many threads in its implementation, some of which are assigned for use by application code when processing requests. At any instant in time, several user requests may be in-flight. There is usually an application thread pool dedicated for application code, managed by the web server.
As a Clojure application developer, there is nothing to do to make use of the application thread pool. You do not need to write code to allocate new threads. Your web application will automatically be handling concurrent user requests, in distinct threads managed by the web server. There may be some configuration available, depending on the web server library you use. For example, ring-jetty-adapter has options to control the number of threads in the thread pool.
There are Clojure libraries available which enable asynchronous processing, a model which will be very familiar to you as a node.js developer. To be honest though, async is not the first tool a beginner should reach for as making good use of the existing concurrency features in Java web severs (i.e. thread pools) is often sufficient for handling hundreds of concurrent requests.
The most well known Clojure async libraries are core.async, manifold, promesa and more recently missionary.

Related

Efficient way to process many threads of same Application

I have a Multi-Client Single-Server application where client and server gets connected through sockets. Client and Server are in different machine.
In client Application, client socket gets connected to server and sends data periodically to server.
In server application server socket listens for client to connect. When a client is connected, new thread is created for client to receive data.
for example: 1 client = 1 thread created by server for receiving data. If its 10000 client, server creates 10000 threads. This seems not good and scalable too.
My Application is in Java.
Is there an alternate method for this problem?
Thanks in advance

This is a typical C10K problem. There are patterns to solve this, one examples is Reactor pattern
Java NIO is another way where the incoming request can be processed in non blocking way. See a reference implementation here

Yes, you should not need a separate thread for each client. There's a good tutorial here that explains how to use await to handle asynchronous socket communication. Once you receive data over the socket you can use a fixed number of threads. The tutorial also covers techniques to handle thousands of simultaneous communications.
Unfortunately given the complexity it's not possible to post the code here, so although link-only answers are frowned upon ...

I would say it's a perfect candidate for an Erlang/Elixir application. Whatsapp, RabbitMQ...
Erlang processes are cheap and fast to start, Erlang manages the scheduling for you so you don't have to think about the number of threads, CPUs or even machines, Erlang manages garbage collection for each process after you don't need it anymore.
Haskell is slow, Erlang is fast enough for most applications that are not doing heavy calculations and even then you can use it and hand off the heavy lifting to a C process.
What are you writing in?

Yes, you can use the Actor model, with e.g. Akka or Akka.net. This allows you to create millions of actors that run on e.g. 4 threads. Erlang is a programming language that implements the actor model natively.
However, actors and non-blocking code won't do you much good if you are relying on blocking library calls for backend services that you rely on, such as (the most prominent example in the JVM world) JDBC calls.
There is also a rather interesting approach that Haskell uses, called green threads. It means that the runtime threads are very lightweight and are dynamically mapped to OS threads. It also means that you get a certain amount of scalability "for free", with no need to write non-blocking IO code. It does however require a good IO manager in the runtime to schedule the IO operations efficiently, and GHC Haskell has had a substantial amount of work put into that in recent years.

Good resource for distributed computing / scale out patterns [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
we're about to start working on a parallel cloud processing application and I'm looking for good resources how to set things up. Let me set the context:
First we load a DB with a whole lot of data
Then we have n-instances of cloud services that will generate PDFs from the data
PDFs will then be merged, again should be scaleable
Result stored in DB
Done.
I'm looking for resources to help me answer questions like:
How can you measure progress for all of these instances? I suppose one controller instance that's monitoring. Should we use polling or a pub/sub system?
How can you control these n-instances to start/stop/pause, whatever?
Should the controller process be aware of the processors, or should he listen to broadcasts?
We're thinking about a 'data'-queue, the queue from which each PDF generation instance gets the instructions to process, should we also use a 'command' queue for commands like 'start/stop'?
Or - is there already something out of the box for this? I'm looking for the 'Patterns of Enterprise Application Architecture' but tailored to scaling out/parallel/cloud processing.
Any thoughts?
EDIT
Thanks for the -1. In case you're in doubt, I have Googled it, I have searched PluralSight and I have looked through Azure videos. I haven't ran across any patterns describing a process controller/processor setup.

As #JuneT mentioned, look at the Cloud Design Patterns guide. I would recommend Leader Election pattern as mentioned in the comment.
Some other thoughts:
I don't think you should think about having one controller instance. All instances should have equal opportunity to become a leader. This way you're ensuring against leader failing. For deciding leader, you should look into Lease Blob functionality.
You should look into Windows Azure Diagnostics as a way to monitor the health of these instances. Windows Azure Diagnostics also support custom performance counters using which you can monitor the effectiveness of each instance. For scaling, you can rely on Windows Azure Auto Scaling feature available in the portal or look for 3rd party solutions like AzureWatch from Paraleap software. The process responsible for scaling should not be a part of your solution IMHO. It should sit somewhere outside.
So the general sequence of the events could be:
All instances will fight among themselves to be a leader. Only one instance will be elected to be the leader. All other instances will wait out to hear from the leader (lets call them followers).
Leader will fetch the data from database and push the information in a queue. Followers will poll this queue. Once the messages arrive in the queue, followers will start processing those messages. Once the follower has finished one task, they will go back to the queue and see if there's more work for them. Once the leader has put all the messages in a queue, it will become a follower and work on processing the messages.
Once all messages have been processed, all instances will go back to step 1.

I don't think you need to have controller at all. What business process starts the data load into the DB? Whatever this process is, it can populate a queue with messages as to which PDF's need to be generated.
You then want to have a Worker Role with N number of servers that basically keep looking at the queue, pull off messages from the queue if there are any, process them (ie: generate PDFs) and remove the messages from the queue
An autoscaling solution like Azure's native basic autoscaling, or a more powerful one like AzureWatch can create extra servers when the queue has messages in it and remove the non-needed servers when queues are depleted.
This is a very standard approach to distributing load across N number of instances. Progress can be measured by looking at the queue and see how many messages are left in it.

What is a multithreading performance impact for data intensive tasks on a massive mulitcore machine? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I'm reading a post about multi-threading performance issue on massive multicore machine:
http://www.reddit.com/r/Python/comments/1mn12l/what_you_do_not_like_in_python/ccbc5h8
An author of that post claims that in a massive multicore systems multithreading applications has much bigger performance impact then multiprocessing ones.
AFIAK multithreading is cheaper then multiprocessing now (both in terms of system management and context switching).
For simplicity let's assume that we don't need to use locks.
If we don't use locks to protect shared memory, are there any system limitations to manage multithreading applications and their access to resources?
Is there any non userspace implementation related reason, when multithreading has a huge performance impact (which the post author had)?
In other words: What is a system level property which cause data intensive multithread application to perform badly compared to similar multiprocess solution?
I'm aware about semantic difference between threads and processes.

Threads share a view of memory that processes do not. If you have a case where executors frequently need to make changes to the view of memory, a multi-threaded approach can be slower than a multi-process approach because of contention for the locks that internally protect the view of memory.
Threads also share file descriptors. If you have a case where files are frequently opened and closed, threads could wind up blocking each other for access to the process file descriptor table. A multi-process approach won't have this issue.
There can also be internal synchronization overhead in library functions. In a single-thread case, locks that protect process-level structures can be no-ops. In a multi-threaded case, these locks may require expensive atomic operations.
Lastly, multi-threaded processes may require frequent access to thread-local storage to implement things like errno. On some platforms, these accesses can be expensive and can be avoided in a single-threaded process.

Server design for synchronous I/O services

Im currently trying to decide on a design for a TCP server where the services that the server will provide consist of performing synchronous I/O (tons of DB queries - existing code!)
The system that this server will be part of has a couple of hundred clients that are typically all connected simultaniously and stay connected for several hours.
Basically all client requests are a result of human interaction, so the frequency is low but the response time should be as fast possible.
As I said, the service implementation must perform synchronous I/O, so a fully event based server is obviously out of the question.
Threads seem like a natural choice to serialize blocking IO, but you see advice to not use more threads than CPU cores.
Currently I'm leaning towards using a thread pool with a number of threads that is actually higher than the core count, since the threads will mostly be blocking anyway.
Would such a design be reasonable? What alternatives exist for a server with these requirements?

It seems, (yes again, from ohter posts, not my direct experience), that 200 .NET managed threads 'causes all kinds of problems'. 200 unmanaged threads is not a big problem on Windows/Linux.
200 persistent database connections, however, seems like a lot! You may wish to pool them, or use a threadpool for DB access with suitable inter-thread comms.

How do you minimize the number of threads used in a tcp server application?

I am looking for any strategies people use when implementing server applications that service client TCP (or UDP) requests: design patterns, implementation techniques, best practices, etc.
Let's assume for the purposes of this question that the requests are relatively long-lived (several minutes) and that the traffic is time sensitive, so no delays are acceptable in responding to messages. Also, we are both servicing requests from clients and making our own connections to other servers.
My platform is .NET, but since the underlying technology is the same regardless of platform, I'm interested to see answers for any language.

The modern approach is to make use of the operating system to multiplex many network sockets for you, freeing your application to only processing active connections with traffic.
Whenever you open a socket it's associated it with a selector. You use a single thread to poll that selector. Whenever data arrives, the selector will indicate the socket which is active, you hand off that operation to a child thread and continue polling.
This way you only need a thread for each concurrent operation. Sockets which are open but idle will not tie up a thread.
Using the select() and poll() methods
Building Highly Scalable Servers with Java NIO

A more sophosticated aproach would be to use IO Completion ports. (Windows)
With IO Completion ports you leave to the operating system to manage polling, which lets it potentially use very high level of optimization with NIC driver support.
Basically, you have a queue of network operations which is OS managed, and provide a callback function which is called when the operation completes. A bit like (Hard-drive) DMA but for network.
Len Holgate wrote an eccelent series on IO completion ports a few years ago on Codeproject:
http://www.codeproject.com/KB/IP/jbsocketserver2.aspx
And
I found an article on IO completion ports for .net (haven't read it though)
http://www.codeproject.com/KB/cs/managediocp.aspx
I would also say that it is easy to use completion ports compared to try and write a scaleable alternative. The problem is that they are only available on NT (2000, XP, Vista)

If you were using C++ and the Win32 directly then I'd suggest that you read up about overlapped I/O and I/O Completion ports. I have a free C++, IOCP, client/server framework with complete source code, see here for more details.
Since you're using .Net you should be looking at using the asynchronous socket methods so that you don't need have a thread for every connection; there are several links from this blog posting of mine that may be useful starting points: http://www.lenholgate.com/blog/2005/07/disappointing-net-sockets-article-in-msdn-magazine-this-month.html (some of the best links are in the comments to the original posting!)

G'day,
I'd start by looking at the metaphor you want to use for your thread framework.
Maybe "leader follower" where a thread is listening for incoming requests and when a new request comes in it does the work and the next thread in the pool starts listening for incoming requests.
Or thread pool where the same thread is always listening for incoming requests and then passing the requests over to the next available thread in the thread pool.
You might like to visit the Reactor section of the Ace Components to get some ideas.
HTH.
cheers,
Rob

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string