I want to create logger that will handle messages from multiple threads. Threads will be executed by ExecutorService and they will stay alive for a few minutes. Each of them performs activity which is completely independent from other threads. When I'm reading log, I want to see separate messages for each of threads in consistent state but also have all of them in a single file. So I want to use only one instance of logger (as I will log into a single file) but each thread will communicate his own buffer for this logger. When thread is about to finish execution he should flush the buffer - so when I read the log, the messages originating from this thread will not be interspersed with other threads' messages.
How can I achieve it with log4j? I tried to search docs but either I can't specify my requirements well or this kind of feature is not supported.
Related
I am working on a coding exercise in which I have to create a logging framework in java.
Steps for logging is like Log4J i.e. 1. Message is processed by a processor. 2. Message is appended to IO or in file or on server by an appender.
The requirement is class should handle multiple threads on same time (Asynchronous Manner). Class has two methods log() – used for logging and shutdown() – for termination.
I have objects of message_processor and appender but, this process should done in FIFO manner.
My Job is when log() is called message_processor will process the message which is very time-consuming process and then the processed message is appended and logged.
I have a rough idea which is, I have to create a static data structure in which I will be adding a thread and its key. I will keep adding whenever log() is called.
After adding into my data structure I will start the thread and keep checking if the thread is completed or not.
When the thread is completed and the thread which is completed has no predecessors then, I will append it or else I will wait for the predecessors to be completed. If the thread is completed or not I have to maintain a flag.
Log() will spawn threads for message processing and will wait for threads to be completed their task and when the thread is completed, log() will append the messages in-order.
The data structure will be read and write by multiple threads on same time.
My question is which data structure should I use to implement this kind of functionality in Java. For the flag should I create a Java class which implements a Runnable interface or Is there any data structure which these kind of functionality?
Is it an efficient idea to implement Asynchronous threading?
Any suggestions are welcomed.
I think, you should use concurrent data structures that provide by framework instead of doing your implementation.
I have a (Posix) server that acts as a proxy for many clients to another upstream server. Messages typically flow down from the upstream server, are then matched against, and pushed out to some subset of the clients interested in that traffic (maintaining the FIFO order from the upstream server). Currently, this proxy server is single threaded using an event loop (e.g. - select, epoll, etc.), but now I'd like to make it multithreaded so that the proxy can more fully utilize an entire machine and achieve much higher throughput.
My high level design is to have a pool of N worker pthreads (where N is some small multiple of the number of cores on the machine) who each run their own event loop. Each client connection will be assigned to a specific worker thread who would then be responsible for servicing all of that client's I/O + timeout needs for the duration of that client connection. I also intend to have a single dedicated thread who pulls in the messages in from the upstream server. Once a message is read in, its contents can be considered constant / unchanging, until it is no longer needed and reclaimed. The workers never alter the message contents -- they just pass them along to their clients as needed.
My first question is: should the matching of client interests preferably be done by the producer thread or the worker threads?
In the former approach, for each worker thread, the producer could check the interests (e.g. - group membership) of the worker's clients. If the message matched any clients, then it could push the message onto a dedicated queue for that worker. This approach requires some kind of synchronization between the producer and each worker about their client's rarely changing interests.
In the latter approach, the producer just pushes every message onto some kind of queue shared by all of the worker threads. Then each worker thread checks ALL of the messages for a match against their clients' interests and processes each message that matches. This is a twist on the usual SPMC problem where a consumer is usually assumed to unilaterally take an element for themselves, rather than all consumers needing to do some processing on every element. This approach distributes the matching work across multiple threads, which seems desirable, but I worry it may cause more contention between the threads depending on how we implement their synchronization.
In both approaches, when a message is no longer needed by any worker thread, it then needs to be reclaimed. So, some tracking needs to be done to know when no worker thread needs a message any longer.
My second question is: what is a good way of tracking whether a message is still needed by any of the worker threads?
A simple way to do this would be to assign to each message a count of how many worker threads still need to process the message when it is first produced. Then, when each worker is done processing a message it would decrement the count in a thread-safe manner and if/when the count went to zero we would know it could be reclaimed.
Another way to do this would be to assign 64b sequence numbers to the messages as they came in, then each thread could track and record the highest sequence number up through which they have processed somehow. Then we could reclaim all messages with sequence numbers less than or equal to the minimum processed sequence number across all of the worker threads in some manner.
The latter approach seems like it could more easily allow for a lazy reclamation process with less cross-thread synchronization necessary. That is, you could have a "clean-up" thread that only runs periodically who goes and computes the minimum across the worker threads, with much less inter-thread synchronization being necessary. For example, if we assume that reads and writes of a 64b integer are atomic and a worker's fully processed sequence number is always monotonically increasing, then the "clean-up" thread can just periodically read the workers' fully processed counts (maybe with some memory barrier) and compute the minimum.
Third question: what is the best way for workers to realize that they have new work to do in their queue(s)?
Each worker thread is going to be managing its own event loop of client file descriptors and timeouts. Is it best for each worker thread to just have their own pipe to which signal data can be written by the producer to poke them into action? Or should they just periodically check their queue(s) for new work? Are there better ways to do this?
Last question: what kind of data structure and synchronization should I use for the queue(s) between the producer and the consumer?
I'm aware of lock-free data structures but I don't have a good feel for whether they'd be preferable in my situation or if I should instead just go with a simple mutex for operations that affect the queue. Also, in the shared queue approach, I'm not entirely sure how a worker thread should track "where" it is in processing the queue.
Any insights would be greatly appreciated! Thanks!
Based on your problem description, matching of client interests needs to be done for each client for each message anyway, so the work in matching is the same whichever type of thread it occurs in. That suggests the matching should be done in the client threads to improve concurrency. Synchronization overhead should not be a major issue if the "producer" thread ensures the messages are flushed to main memory (technically, "synchronize memory with respect to other threads") before their availability is made known to the other threads, as the client threads can all read the information from main memory simultaneously without synchronizing with each other. The client threads will not be able to modify messages, but they should not need to.
Message reclamation is probably better done by tracking the current message number of each thread rather than by having a message specific counter, as a message specific counter presents a concurrency bottleneck.
I don't think you need formal queueing mechanisms. The "producer" thread can simply keep a volatile variable updated which contains the number of the most recent message that has been flushed to main memory, and the client threads can check the variable when they are free to do work, sleeping if no work is available. You could get more sophisticated on the thread management, but the additional efficiency improvement would likely be minor.
I don't think you need sophisticated data structures for this. You need volatile variables for the number of the latest message that is available for processing and for the number of the most recent message that have been processed by each client thread. You need to flush the messages themselves to main memory. You need some way of finding the messages in main memory from the message number, perhaps using a circular buffer of pointers, or of messages if the messages are all of the same length. You don't really need much else with respect to the data to be communicated between the threads.
I'm using logback implementation and created an AsyncAppender so as to use logging inside a thread.
The thread will be something like a monitor: it consumes a BlockingQueue of Objects added from other threads and while the queue is not empty, and there is no blocking signal it logs the content of the queue. At the same time, the queue is filled by a few threads.
When the threads get the stopping signal from the coordinator, they interrupt, so they don't add more content in the queue.
The monitor queue interrupts once there is blocking signal (the producer threads already interrupted) and the BlockingQueue is empty.
There are two problems with the logging of the monitor thread:
After the producers have interrupted, the queue becomes empty so the monitor thread interrupts as well immediately, without showing all of the contents of the queue, even when it removed everything from the queue.
The order of the shown messages (both in console appender and file appender) is not the same as they have been inserted in the queue
I tried 3 different approaches: creating a static logger inside the thread, creating a non-static one and providing a loader from the class where the monitor thread is created.
In case I do everything in a while(true){} loop in the monitor thread, everything is shown but not in the right order, plus the fact that I have to find out how to interrupt the thread.
I checked also the case of the MDC but my problem is somehow different: I have to consume the product of the producers and do it while they are producing, plus after they finished in case there is still stuff in the queue.
I checked also the LoggerContext inside the thread and it's started is false. Shouldn't it be true;
Any idea on how can show all the content before interrupting the thread and show it in the right order, would be valuable.
Thanks.
Is there any way to share a message queue among several threads, or otherwise to read a message queue of a different thread, without using hooks?
GetMessage and PeekMessage only read messages for the current thread, you can't use them to read messages sent to the input queue owned by another thread.
Try joining the thread input queues using AttachThreadInput, that might work.
Messages in a message queue can be differentiated on the basis of the window they're for, but I don't think messages can be differentiated on the basis of an inteded thread - the fields just aren't there in the MSG structure - so I don't think you can share a queue over multiple threads.
That leaves you with a non-hook monitoring solution.
I'm pretty sure you could peek another threads queue, but the problem is you're basically polling; so you'll miss messages.
Do you have any influence over the threads you wish to read? if so, you can get them to rebroadcast their messages to you.
Apart from that, I can't see a way to do this.
I've got a RabbitMQ queue that might, at times, hold a considerable amount of data to process.
As far as I understand, using channel.consume will try to force the messages into the Node program, even if it's reaching its RAM limit (and, eventually, crash).
What is the best way to ensure workers get only as many tasks to process as they are capable of handling?
I'm thinking about using a chain of (transform) streams together with channel.get (which gets just one message). If the first stream's buffer is full, we simply stop getting messages.
I believe what you want is to specify the consumer prefetch.
This indicates to RabbitMQ how many messages it should "push" to the consumer at once.
An example is provided here
channel.prefetch(1);
Would be the lowest value to provide, and should ensure the least memory consumption for your node program.
This is based on your description, if my understanding is correct, I'd also recommend renaming your question (parallel processing would relate more to multiple consumers on a single queue, not a single consumer getting all the messages)