Logging frameworks and synchronization in multi-threaded apps

Logging frameworks and synchronization in multi-threaded apps - multithreading

I want to use a logging framework like log4cxx in a multi-threaded application.
If the output of the log will be to a file, correct serialization of the messages is needed.
I was asking myself how (and if) these frameworks get correct serialization of the output without using some sort of synchronization object.
I guess that if it is using synchronization objects (for example to access a queue to log messages), this could cause changes in the behaviour of the involved threads, so also changing the behaviour (and bugs...) of the whole logged application.

log4cxx is indeed synchronized, like the other log4XXX frameworks. The synchronization is done in the appenders and is necessary to guarantee that content of log entries are not mixed together. This does not change the behavior of your threads, but the threads do encounter a small performance hit. The performance hit is small compared to the performance hit of I/O when logging to a file.
If you are still worried about performance you can consider using asynchronous logging (using the AsyncAppender that handles logging in a separate thread. Using the async approach you cannot be guaranteed that messages are logged (e.g. if the application crashes before the logging thread handles the message). The most simple way to improve performance is to reduce the amount of logging.

Related

Node.js vs ASP.Net Core: Response times when doing I/O heavy operations under stress test?

Let's assume we are stress testing 2 servers that perform database read/write operations without caching, and make network calls. One is a Node.js server, and the other is an ASP.Net Core one that utilizes Tasks. Assume that both perform the same operations, receive the same requests/second, and run on machines with equal processing power.
I've read that Node.js is very performant on I/O operations, but I can't get my head around it. Sure, the awaited db and network calls run in a non-blocking way, however each of these calls are handled by an individual thread from a limited thread pool. This doesn't sound performant to me at all. I would be forced to implement a caching mechanism to mitigate this, which is something I don't really like. So either Node.js is not the greatest choice for these kinds of operations, or I have incorrect knowledge.
As for ASP.NET Core, I don't know the internals of it, but I'm pretty sure it doesn't have the thread limitation issues Node.js has, so it should have shorter response times by logic. Yet I still can't know for sure if this is the case due to resource consumption and context switching cost concerns.
So which of these 2 would theoretically have shorter response times, and why exactly?

Minimal multithreaded transaction with Hibernate

I'm using Hibernate in an embedded Jetty server, and I want to be able to parallelize my data processing with some multithreading and still have it all be in the same transaction. As Sessions are not thread safe this means I need a way to get multiple sessions attached to the same transaction, which means I need to switch away from the "thread" session context I've been using.
By my understanding of the documentation, this means I need to switch to JTA session context, but I'm having trouble getting that to work. My research so far seems to indicate that it requires something external to Hibernate in the server to provide transaction management, and that Jetty does not have such a thing built in, so I would have to pull in some additional library to do it. The top candidates I keep running across for that generally seem to be large packages that do all sorts of other stuff too, which seems wasteful, confusing, and distracting when I'm just looking for the one specific feature.
So, what is the minimal least disruptive setup and configuration change that will allow getCurrentSession() to return Sessions attached to the same transaction in different threads?
While I'm at it, I know that fetching objects in one thread and altering them in another is not safe, but what about reading their properties in another thread, for example calling toString() or a side effect free getter?

JMS MDB or ScheduledThreadPoolExecutor for asynchronous tasks

I've been using JMS Message Driven Bean for a while and it is working great for the asynchronous tasks. I know that there is many ways to handle the asynchronous processes, but I am just curious what are the benefits over using JMS Message Driven Bean and ScheduledThreadPoolExecutor?
For example I have a web service which handles some tasks asynchronously. So I see two main differences. If I would be using ScheduledThreadPoolExecutor I don't need application server, I could use a servlet container for e.g. Tomcat, because I am not using any EJB stuff, for MDB I need an application server, for e.g. Glassfish. But in terms of handling the actual asynchronous process, what are the advantages over each ScheduledThreadPoolExecutor and MDB?

ScheduledThreadPoolExecutor is used to schedule tasks, the abstraction best corresponding to MDB is ExecutorService. But back to your question.
MDB is more heavyweight, API is much more complex and in principle it was actually designed for transferring data, not logic. On the other hand ExecutorService is a thin layer on top of actual thread pool. So if you need performance, low latency and small overhead, go for ordinary thread pool.
The only reason for MDB and JMS is when you need durability and transaction support. That of course introduces even bigger overhead as each message needs to be persisted. But you won't loose any tasks that are queued or even in the middle of processing are not lost due to crash.

Why would I choose a threaded/process-based approach vs. asynchronous web server

As I've done some more research into web server software, I've begun to question if Apache's thread/process based method is the way to go vs. the the asynchronous request handling provided by servers like Nginx a Lighttpd, which tend to scale better with heavier loads.
I understand there are many other differences between these latter two and Apache. My question is under what circumstances would I pick a thread/process based method over the asynchronous handling.
Are there any features/technologies that I can't use with an asynchronous method (or would function poorly/not as well)?
What situations would cause the performance of an asynchronous method to perform worse than a thread/process based approach? Are these common or rare cases, and how big is the difference?
Are there any other factors I should take into consideration when comparing the two? Keep in mind I'm focusing mainly on the thread/process based method vs. asynchronous, not any particular server software which happens to utilize one of these methods. These concerns might be difficulty of managing/debugging, security issues, etc.

This is old, but worth answering. Let's first start by saying how each model works.
In threaded, you have a request come in to a handler, the handler spawns a new OS thread to handle that request, and all work for that request happens in that thread until a response is sent and the thread is ended. This model supports as many concurrent requests as threads that your server can spawn (but threads can be somewhat heavyweight).
When doing async a request comes in to a handler but instead of creating a thread to deal with it, it adds the connection to what's known as an event loop. The event loop listens for data/state changes on the connection and fires callbacks each time "something" happens. Once the connection is added to the event loop, the handler immediately listens for new connections to add. This allows you to have many (sometimes 100K) concurrent connections at the same time.
Are there any features/technologies that I can't use with an asynchronous method (or would function poorly/not as well)?
Yes, when you're doing number crunching. The architecture of an async (or "evented") system is such that it is great at passing data around but not processing data. It can handle thousands of concurrent operations, but because it only runs on one OS thread, the callbacks it fires need to do as little as possible to get the most throughput. This is because if one of your callbacks does some number crunching that takes 5 seconds, your entire server is frozen for 5 seconds until that operation completes. The idea is to get data, send it to where it's going (database, API, etc) and send a response all with minimal processing.
Async is good for network I/O: passing data between multiple sources/destinations (and also user interfaces, but that's beyond this post).
What situations would cause the performance of an asynchronous method to perform worse than a thread/process based approach? Are these common or rare cases, and how big is the difference?
See above, but any time you're doing more CPU work than network I/O, you should switch to a threaded model. However, there are architecture workarounds...for instance, you could have an async app, and anytime it needs to do real work, it sends a job to a worker queue. However, if every request requires CPU processing then that architecture is overkill and you might as well just use a threaded server.
Are there any other factors I should take into consideration when comparing the two? Keep in mind I'm focusing mainly on the thread/process based method vs. asynchronous, not any particular server software which happens to utilize one of these methods. These concerns might be difficulty of managing/debugging, security issues, etc.
Programming in async is generally more complicated than threaded. That said, if you're not doing the programming yourself (ie you're choosing between nginx and apache) then I usually recommend you go async (nginx) because you'll generally be able to squeeze more juice out of your server that way. I'm always in favor of using as much async in the stack as possible.
That said, if you're programming an app and trying to decide whether to use a threaded or async model, you will have to take developer time into account. Unless you're using a language that has green threads over an event loop (like scheme), expect to tear your hair out quite a bit over rogue exceptions crashing your entire app and in general wrapping your head around CPS/using callbacks for everything. Futures/promises are your friend, but are only a bandaid to make async nicer.
TL;DR
Async, when used in a server, can squeeze (a lot) more concurrent operations than threading if you're doing network IO and nothing else.
If you're doing any kind of number crunching, either use a threaded app server or use an async app with a background queuing system.
Async is a lot harder to program in unless your language supports "fake" threading over it (ie green threads). Once you get past the initial hump you're fine, generally. If you don't have green threads, use promises.
If you have the choice between threaded and async as a component in your stack (apache vs nginx), and they provide the exact same features, slightly favor async. Don't just pick it because you think it will make everything 20x faster though.

Processes have several advantages compared to threads and async models related to security and reliability. Most websites don't need these particular advantages, but sometimes they're indispensable.
Security: you can run your worker processes in a sandbox, as a low privileged user, and handle only one request per worker process. This mitigates against some kinds of security vulnerabilities: even if an attacker takes over your entire worker process, as long as you sandboxed it tightly based on request metadata (i.e. it doesn't have write access to all your data), then it can't harm system stability or affect the responses made to requests.
Security #2: sometimes you need to sandbox untrusted code, or to enforce segregation between different code or different requests, and the only way to do this is with a separate one-shot process. (Think running user-provided code.)
Reliability: memory leaks and memory corruption are much less severe if you teardown and replace worker processes regularly (or for each request).
It's easy to enforce hard limits on CPU time, disk and network quota, etc. spent on handling a user request in a separate process. Even if the request-handling code goes into an infinite loop, the master process (or the OS) can enforce a timeout.

Using log4net for multithreaded application

I have a application which uses .net Thread-pool to have multiple threads.It uses log4net for write logs to a plain text file. Is it a good idea to use log4net for asynchronous logging like this. Or do i need to have separate MSMQ implementation to append messages?

You can use log4net as-is for file-based logging for multi-threaded applications. The log messages from all the threads will be written to the same file. It can get a little confusing to read all the interspersed messages, but it's better than not having logging. You'll definitely want to log the thread ID in the appender format so you can tell which messages are coming from which thread.
There are probably more fancy things you can do to handle the logging for different threads, but I've never really had to go down that road. I prefer to stick with file-based logging, and having all the threads log to one file is easier to deal with than having each thread log to its own file, in my opinion.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string