Detecting low user activity and checking email on background - multithreading

I'm writing an application that must do some things in background: check emails and parse them to inject some data in a database, and connect to a web service to check status for some asynchronous operations.
Right now, my solution is a simple timer that performs these operations on a predefined schedule: email every five minutes, and web service checks every minute (but these are only performed if there is pending activity, so most of the time this does nothing.)
Right now I'm not using a thread for this (I'm early in the development stage.) But my plan is to create a background thread and let it do the work offline.
Couple of questions:
I plan to control everything in the timer(s). Set up a global variable (rudimentary "locking",) start the thread. If the "lock" is already set, ignore it. The thread cleans it up on termination. Should I use a more robust locking / queue mechanism for my threads? (I already have OmniThread installed)
How can I run a thread with low priority? I don't want the application to feel sluggish when the background thread is performing data insertion or networking.
Is there a clean way to verify for user activity and start this thread only when the user is not busy at the keyboard / mouse?
Please have in mind that I'm not experienced with threads. I wrote an FTP sync application once so I'm not a complete newbie, but that was long time ago.

For part 3 of your question, the Windows API has a GetLastInputInfo function which should return information about the last time the user did something. It even says it's
"This function is useful for input idle detection". I did plan to use this for something myself, but haven't had a chance to test it.
Edit: Delphi implementation link

I plan to control everything in the timer(s). Set up a global variable (rudimentary "locking",) start the thread. If the "lock" is already set, ignore it. The thread cleans it up on termination. Should I use a more robust locking / queue mechanism for my threads? (I already have OmniThread installed)
I wouldn't bother with the Timer at all. Make your thread's loop look like this, and you'll have your delays. You will NOT need a lock because there's only one thread, it will not sleep until the previous job is over.
procedure YourThread;
var N: Integer;
begin
while not Terminated do
begin
// Figure out if there's a job to do
// Do the job
// Sleep for a while, but give the thread a chance to notice
// it needs to terminate.
for N := 1 to 500 do
if not Terminated then
Sleep(100);
end;
end;
How can I run a thread with low priority? I don't want the application to feel sluggish when the background thread is performing data insertion or networking.
Don't bother. You can easily use SetThreadPriority but it's not worth the trouble. If your background thread is waiting for I/O (networking), then it will not consume any CPU resource. Even if your background thread works full-speed, your GUI will not feel sluggish because Windows does a good job of splitting available CPU time among all available threads.
Is there a clean way to verify for user activity and start this thread only when the user is not busy at the keyboard / mouse?
Again, why bother checking for user activity? Checking for email is network (ie: I/O) bound, the thread checking for email will mostly be idle.

Can you not just do all this in the background thread, getting rid of all the thread micro-management? Seems to me that you could just loop around a sleep(60000) call in the background thread. Check the web service every time round the loop, check the email every 5 times round. You can set the priority to tpLower, if you want, but this thread is going to be sleeping or blocked on I/O nearly all the time, so I don't think it's even worth the typing.
I would be surprised if such a thread is noticeable at all to the user at the keyboard/mouse, no matter when it runs.
'Set up a global variable (rudimentary "locking",) start the thread' - what is this global variable intended to do? What is there to lock?

Related

If I "get back to the main thread" then what exactly happens, and how do interrupts work with threads?

Background: I was using Beej's guide and he mentioned forking and ensuring you "get the zombies". An Operating Systems book I grabbed explained how the OS creates "threads" (I always thought it was a more fundamental piece), and by quoting it, I mean it the OS decides nearly everything. Basically they share all external resources, but they split the register and stack spaces (and I think a 3rd thing).
So I get to the waitpid function which http://www.qnx.com's developer docs explain very well. In fact, I read the entire section on threads, minus all the types of conditions after a Processes and Threads google.
The fact that I can split code up and put it back together doesn't confuse me. HOW I can do this is confusing.
In C and C++, your program is a Main() function, which goes forward, calls other functions, maybe loops forever (waiting for input or rendering), and then eventually quits or returns. In this model I see NO reason for it to stop beyond a "I'm waiting for something", in which case it just loops.
Well, it seems it can loop by setting certain things, like "I'm waiting for a semaphore" or "a response" or "an interrupt". Or maybe it gets interrupted without waiting for one. This is what confuses me.
The processor time-slices processes and threads. That's all fine and dandy, but how does it decide when to stop one? I understand that you get to the Polling function and say "Hey I'm waiting for input, clock tick or user do something". Somehow it tells this to the os? I'm not sure. But moreso:
It seems to be able to completely randomly interrupt or interject, even on a single-threaded application. So you're running one thread and suddenly waitpid() says "Hey, I finished a process, let me interrupt this, we both hate zombies, I gotta do this." and you're still looping on some calculation. So, what just happens??? I have no idea, somehow they both run and your computation isn't messed with, 'cause it's single threaded, but that somehow doesn't mean that it won't stop what it's doing to run waitpid() inside the same thread WHILE you're still doing your other app things.
Also confusing, is how you can be notified, like iOSes notifications, and say "Hey, I got some UI changes, get me off of 16 and put me back on 1 so I can change this thing". But same question as last paragraph, how does it interrupt a thread that's running?
I think I understand the splitting, but this joining is utterly confusing. It's like the textbooks have this "rabbit from hat" step I'm supposed to accept. Other SO posts told me they don't share the same stack, but that didn't help, now I'm imagining a slinky (stack) leaning over to another slinky, but unsure how it recombines to change the data.
Thanks for any help, I apologize that this is long, but I know someone's going to misinterpret this and give me the "they are different stacks" answer if I'm too concise here.
Thanks,
OK, I'll have a go, though it's gonna be 'economical with the truth':)
It's sorta like this:
The OS kernel scheduler/dispatcher is a state-machine for managing threads. A thread comprises a stack, (allocated at the time of thread creation), and a Thread Control Block, (TCB), struct in the kernel that holds thread state and can store thread context, (including user registers, especially the stack-pointer). A thread must have code to run, but the code is not dedicated to the thread - many threads can run the same code. Threads have states, eg. blocked on I/O, blocked on an inter-thread signal, sleeping for a timer period, ready, running on a core.
Threads belong to processes - a process must have at least one thread to run its code and has one created for it by the OS loader when the process starts up. The 'main thread' may then create others that will also belong to that process.
The state-machine inputs are software interrupts - system calls from those threads that are already running on cores, and hardware interrupts from perhiperal devices/controllers, (disk, network, mouse, KB etc), that use processor hardware features to stop the processor/s running instructions from the threads and 'immediately' run driver code instead.
The output of the state-machine is a set of threads running on cores. If there are fewer ready threads than cores, the OS will halt the unuseable cores. If there are more ready threads than cores, (ie. the machine is overloaded), the 'sheduling algorithm' that decided with threads to run takes into account several factors - thread and process priority, prority boosts for threads that have just become ready on I/O completion or inter-thread signal, foreground-process boosts and others.
The OS has the ability to stop any running thread on any core. It has an interprocessor hardware-interrupt channel and drivers that can force any thread to enter the OS and be blocked/stopped, (maybe because another thread has just beome ready and the OS scheduling algorithm has decided that a running thread must be immediately preempted).
The software intrrupts from running threads can change the set of running threads by requesting I/O, or by signaling other threads, (the events, mutexes, condition-variables and semaphores). The hardware interrupts from peripheral devices can change the set of running threads by signaling I/O completion.
When the OS gets these inputs, it uses that input, and internal state in containers of Thread Control Block and Process Control Block structs, to decide which set of ready threads to run next. It can block a thread from running by saving its context, (including registers, especially stack pointer), in its TCB and not returning from the interrupt. It can run a thread that was blocked by restoring its context from its TCB to a core and performing an interrupt-return, so allowing the thread to resume from where it left off.
The gain is that no thread that is waiting for I/O gets to run at all and so does not use any CPU and, when I/O becomes avilable, a waiting thread is made ready 'immediately' and, if there is a core available, running.
This combination of OS state data, and hardware/software interrupts, effciently matches up threads that can make forward progress with cores avalable to run them, and no CPU is wasted on polling I/O or inter-thread comms flags.
All this complexity, both in the OS and for the developer who has to design multithreaded apps and so put up with locks, synchronization, mutexes etc, has just one vital goal - high performance I/O. Without it, you can forget video streaming, BitTorrent and browsers - they would all be too piss-slow to be useable.
Statements and phrases like 'CPU quantum', 'give up the remainder of their time-slice' and 'round-robin' make me want to throw up.
It's a state-machine. Hardware and software interrupts go in, a set of running threads comes out. The hardware timer interrupt, (the one that can time-out system calls, allow threads to sleep and share out CPU on a box that is overloaded), though valuable, is just one of many.
So I'm on thread 16, and I need to get to thread 1 to modify UI. I
randomly stop it anywhere, "move the stack over to thread 1" then
"take its context and modify it"?
No, time for 'economical with truth' #2...
Thread 1 is running the GUI. To do this, it needs inputs from mouse, keyboard. The classic way for this to happen is that thread 1 waits, blocked, on a GUI input queue - a thread-safe producer-consumer queue, for KB/mouse messages. It's using no CPU - the cores are off running services and BitTorrent downloads. You hit a key on the keyboard, and the keyboard-controller hardware raises an interrupt line on the interrupt controller, causing a core to jump to the keyboard driver code as soon as it has finished its current instruction. The driver reads the KB controller, assembles a KeyPressed message and pushes it onto the input queue of the GUI thread with focus - your thread 1. The driver exits by calling the scheduler interrupt entry point so that a scheduling run can be performed and your GUI thread is assigned a core an run on it. To thread 1, all it has done is make a blocking 'pop' call on a queue and, eventually, it returns with a message to process.
So, thread 1 is performing:
void* HandleGui{
while(true){
GUImessage message=thread1InputQueue.pop();
switch(message.type){
.. // lots of case statements to handle all the possible GUI messages
..
..
};
};
};
If thread 16 wants to interact with the GUI, it cannot do it directly. All it can do is to queue a message to thread 1, in a similar way to the KB/mouse drivers, to instruct it to do stuff.
This may seem a bit restrictive, but the message from thread 16 can contain more than POD. It could have a 'RunMyCode' message type and contain a function pointer to code that thread 16 wants to be run in the context of thread 1. When thread 1 gets around to hadling the message, its 'RunMyCode' case statement calls the function pointer in the message. Note that this 'simple' mechanism is asynchronous - thread 16 has issued the mesage and runs on - it has no idea when thread 1 will get around to running the function it passed. This can be a problem if the function accesses any data in thread 16 - thread 16 may also be accessing it. If this is an issue, (and it may not be - all the data required by the function may be in the message, which can be passed into the function as a parameter when thread 1 calls it), it is possible to make the function call synchronous by making thread 16 wait until thread 1 has run the function. One way would be for the function signal an OS synchronization object as its last line - an object upon which thread 16 will wait immediately after queueing its 'RunMyCode' message:
void* runOnGUI(GUImessage message){
// do stuff with GUI controls
message.notifyCompletion->signal(); // tell thread 16 to run again
};
void* thread16run(){
..
..
GUImessage message;
waitEvent OSkernelWaitObject;
message.type=RunMyCode;
message.function=runOnGUI;
message.notifyCompletion=waitEvent;
thread1InputQueue.push(message); // ask thread 1 to run my function.
waitEvent->wait(); // wait, blocked, until the function is done
..
..
};
So, getting a function to run in the context of another thread requires cooperation. Threads cannot call other threads - only signal them, usually via the OS. Any thread that is expected to run such 'externally signaled' code must have an accessible entry point where the function can be placed and must execute code to retreive the function address and call it.

Intervening threads that waited for too long

Is there anyway in F# that I can detect if a current waiting thread is waiting for too long without being contacted?
I have a case where threads must be actively contacting other waiting threads to pass their work to once they're finished. My solution is having a bug somewhere that sometimes one or more threads just wait for too long and eventually the program got deadlocked because other threads don't contact them.
I think by detecting if a waiting thread is simply waiting for too long, it will just actively go looking for available work, rather than keeping waiting for other threads to pass to it.
It's probably better to try and understand why your threads are getting stuck than just terminating them. If you can reproduce this with the Visual Studio debugger attached, you can click the Pause button and use the Threads window to see what code all threads are in.
That said; if you still have the need to do this, the solution will depend on how you're managing your threads. To monitor them from the outside, you'll need some process that has a list of threads and the ability to tell whether they're dead.
The Thread class doesn't appear have any built-in mechanism for sharing state between the thread and its control except for Name. You could possibly abuse name, but I would probably have a thread-safe collection (eg. a ConcurrentDictionary<Thread, DateTime>) to store all of the threads and the timestamp of their last communication, and pass an Action into each thread when it's started that allows it to "Ping" by calling the action periodically. The action would simply update the DateTime stored against that thread.
The controlling process then simply scans through the dictionary periodically for anything with a timestamp that is too old, declares that thread dead and Aborts() it.
It's hard to give a code sample without knowing exactly how you're spawning your threads and describe what a thread "being contacted" means in more detail.

.net 4.0 c# : Pausing/Resuming parallel running threads from threadpool temporarily?

I could setup a multi-threaded environment using the .net ThreadPool and I do get a significant performance benefit. This runs in the background of my application.
Now when a new task is requested by the user, I want it to get maximum CPU resources to maximize performance. Hence I would like to temporarily pause all the threads that I began (via the ThreadPool.Queueuserworkitem method) and then resume once the new task, requested by the user in foreground, is completed.
There could be several solutions to my problem:
a. Starting lesser background threads so that any new user request gets some share of the CPU resources. (but I loose the performance gain I had :( )
b. Set higher priority for the thread for a new user requested task. (not sure if this works?)
c. Suspending/resuming the ThreadPool threads I began. But suspending / resuming / interrupting threads is highly discouraged. Moreover, this could get tricky and error prone.
Any other ideas?
Note: when the user makes a request, performing the task would normally not take more than 300ms. However, when I start ThreadPool threads in background, it now takes about 3 seconds to complete (10 times worse)! I am OK if it takes 500-800ms though. All background threads complete in about 8 seconds (and I am OK if they take 1-2 seconds more). Hence, I am trying out option ( a ) for now.
Thanks in advance!
Be noted that Thread scheduling is done by CPU and hence cannot be directed from within a program. Only thing that can be done is setting ThreadPriority (that too on new Threads, not on ThreadPool threads). Check section Limitations of Using the Thread Pool
As your requirement is to suspend all background threads while executing a new task, what you can do is to create a class level flag.
Now you can put checkpoints in methods to be executed in Background task. At the checkpoints, check the class level flag, if it is set, call Thread.Sleep, which should (NOT MUST) trigger thread context switch by OS/CPU thread scheduler.
Putting checkpoints in methods (to be executed by ThreadPool) is analogous to putting checkpoints for cancellation support in background worker.

Best Practice for killing a JavaME 1.2 thread?

Question: I'm interested to know the best practice for killing a long standing operation that is running as a background thread (lets call this thread WorkerThread) in Java 1.2.
Scenario
Specifically, I'm developing an application for Blackberry devices whereby I make a HTTP connection. Big picture: a URL request if forwarded to a background thread (WorkerThread), the thread makes the request and returns the result using a call back.
Scenario Details
Now there exists a situation where at connection time, a data connection exists but then for whatever reason (drives through a tunnel) that connection no longer exists. Due to a limitation in Blackberry's design architecture, that actual connection will hang as the time out is fixed to be 2 minutes. As a result, there's a crucial need to kill a connection that has been hanging for a relatively (15 seconds) long period of time.
My Current Solution - 2 Theads?
Right now my current solution is to run WorkerThread inside another thread (lets call this new thread MonitorThead). MonitorThread starts WorkerThread, sleeps for 1000ms and then routinely checks if WorkerThread is still alive. If after 15 seconds WorkerThread is still alive, MonitorThread puts WorkerThread to sleep and exits. Is this really the best possible approach?
Summary of Question & Key points
In summary, below is the core question and key restraints associated with the question. Cheers!
How do I successful kill a java background thread that is stuck in a specific operation?
Scenario Restraints:
No control of having operation pause
and check the threads requested state
Specific to Blackberry's
implementation of Java ME 1.2 and its
Thread API so no explicit
kill() method
Most concerned about the best practice and how to
most safely kill a holding thread.
Follow Up/Edit
Neil Coffey recommended that I simply hold a reference to the connection object and instead call close() on that object. I am currently looking into this...
How to kill a Thread is a difficult question. There is no guaranteed way to be able to stop or interrupt a Thread. However, if you take your current architecture and upon timeout, just close the stream (not the Connection), that should cause an I/O Exception to occur on the thread that is stuck in I/O. If it doesn't cause an IOException, then it should at least cause the read or write to return with EOF.
Note that closing the Connection doesn't help, as the JavaDoc says:
Any open streams will cause the connection to be held open until
they themselves are closed.
You have to close the stream that was derived from the Connection.
Well, the best practice would normally be to get the connection to close, and then let the consequences of that ripple through to the thread, allowing it to exit cleanly.
How are you making the connection? Rather than waiting for it to time out, what are your chances of forcing it to close? Can you get hold of some connection object? Does the Blackberry have some other command that can be executed to kill a given connection?
I always believed that passing a flag into background thread in an atomic transaction has been the best way to ask a thread to stop. If it doesn't stop for a while, kill it.
Well, to add to that, if you believe 2 minutes is a long time, good for you. I'd let the user decide what's a long time with a cancel button.

Which is the better method? Allowing the thread to sleep for a while or deleting it and recreating it later?

We have a process that needs to run every two hours. It's a process that needs to run on it's own thread so as to not interrupt normal processing.
When it runs, it will download 100k records and verify them against a database. The framework to run this has a lot of objects managing this process. These objects only need to be around when the process is running.
What's a better standard?
Keep the thread in wait mode by letting it sleep until I need it again. Or,
Delete it when it is done and create it the next time I need it? (System Timer Events.)
There is not that much difference between the two solutions. I tend to prefer the one where the thread is created each time.
Having a thread lying around consumes resources (memory at least). In a garbage collected language, it may be easy to have some object retained in this thread, thus using even more memory. If you have not the thread laying around, all resources are freed and made available for two hours to the main process.
When you want to stop your whole process, where your thread may be executing or not, you need to interrupt the thread cleanly. It is always difficult to interrupt a thread or knowing if it is sleeping or working. You may have some race conditions there. Having the thread started on demand relieves you from those potential problems: you know if you started the thread and in that case calling thread_join makes you wait until the thread is done.
For those reasons, I would go for the thread on demand solution, even though the other one has no insurmontable problems.
Starting one thread every two hours is very cheap, so I would go with that.
However, if there is a chance that at some time in the future the processing could take more than the run interval, you probably want to keep the thread alive. That way, you won't be creating a second thread that will start processing the records while the first is still running, possibly corrupting data or processing records twice.
Either should be fine but I would lean towards keeping the thread around for cases where the verification takes longer than expected (ex: slow network links or slow database response).
How would you remember to start a new thread when the two hours are up ? With a timer? (That's on another thread!) with another thread that sleeps until the specified time? Shutting down the thread and restarting it based on something running somewhere else does you no good if the something else is either on it's own separate thread, or blocks the main app while it's waiting to "Create" the worker thread when the two hours are up, no?
Just let the Thread sleep...
I agree with Vilx that it's mostly a matter of taste. There is processing and memory overhead of both methods, but probably not enough for either to matter.
If you are using Java you could check Timer class. It allows you to schedule tasks on given time.
Also, if you need more control you can use quartz library.
I guess actually putting the thread to sleep is most effective, ending it and recreating it would "cost" some resources, while putting it to sleep would just fill a little space in the sceduler while it's data could be paged by the operationg system if needed.
But anyway it's probably not a very big difference, and the difference would probably depend on how good the OS' sceduler is, etc...
It really depends on one thing as I can tell... state.
If the thread creates a lot of state (allocates memory) that is useful to have during the next iteration of the thread run, then I would keep it around. That way, your process can potentially optimize its run by only performing certain operations if certain things changed since the last running.
However, if the state that the process creates is significant compared with the amount of work to be done, and you are short on resources on the machine, then it may not be worth the cost of keeping the state around in between exectutions. If thats the case, then you should recreate the thread from scratch each time.
I think it's just a matter of taste. Both are good. Use the one which you find easier to implement. :)
I would create the thread a single time, and use events/condition variables to let it sleep until signaled to wake up again. That way if the amount of time needed ever has to change, you only need change the timing in firing the event and your code will still be pretty clean.
I wouldn't think it's very important, but the best approach is very platform dependent.
A .NET System.Threading.Timer costs nothing while it's waiting, and will invoke your code on a pool thread. In theory, that would be the best of both your suggestions.
Another important thing to consider if you are on a garbage collected system like Java is that anything strongly referenced by a sleeping thread is not garbage. In that respect, it's better to kill idle threads, and let them, and any objects they reference, get cleaned up.
It all depends, of course. But by default I would go with a separate process (not thread) started on demand.

Resources