I have a process Scheduled using Timer and TimerTask that runs nightly. Currently ti takes about an hour to finish. Considering there are only 6000 records to loop through the process and the upper management feels like it is very inefficient Job. So I wanted to know if I could span multiple threads of the same job with different datasets. Probaby each thread processes only 500 records at a time.
If i am hitting the same table for read/insert and update using
multiple threads would that be ok to do it?
if so how do i run multiple threads within a timer task? I suppose I could
just create threads and run but how do i ensure they run simultaneously but not sequentially?
I am using java 1.4 and this runs on a jboss 2.4 and i make use EJB 1.1 session beans in the process to read/update/add data.
There isn't enough info in your post for a surefire answer, but I'll share some thoughts:
It depends. Generally you can do reads in parallel, but not writes. If you're doing much more reading than writing, you're probably ok, but you may find yourself dealing with frustrating race conditions.
It depends. You are never guaranteed to have threads run in parallel. That's up to the cpu/kernel/jvm to decide. You just make threads to tell the machine that it's allowed to execute them in parallel.
Related
I want to know if a program can run two threads at the same time (that is basically what it is used for correct?). But if I were to do a system call in one function where it runs on thread A, and have some other tasks running in another function where it runs on thread B, would they both be able to run at the same time or would my second function wait until the system call finishes?
Add-on to my original question: Now would this process still be an uninterruptable process while the system call is going on? I am talking about using any system call on UNIX/LINUX.
Multi-threading and parallel processing are two completely different topics, each worthy of its own conversation, but for the sake of introduction...
Threading:
When you launch an executable, it is running in a thread within a process. When you launch another thread, call it thread 2, you now have 2 separately running execution chains (threads) within the same process. On a single core microprocessor (uP), it is possible to run multiple threads, but not in parallel. Although conceptually the threads are often said to run at the same time, they are actually running consecutively in time slices allocated and controlled by the operating system. These slices are interleaved with each other. So, the execution steps of thread 1 do not actually happen at the same time as the execution steps of thread 2. These behaviors generally extend to as many threads as you create, i.e. packets of execution chains all working within the same process and sharing time slices doled out by the operating system.
So, in your system call example, it really depends on what the system call is as to whether or not it would finish before allowing the execution steps of the other thread to proceed. Several factors play into what will happen: Is it a blocking call? Does one thread have more priority than the other. What is the duration of the time slices?
Links relevant to threading in C:
SO Example
POSIX
ANSI C
Parallel Processing:
When multi-threaded program execution occurs on a multiple core system (multiple uP, or multiple multi-core uP) threads can run concurrently, or in parallel as different threads may be split off to separate cores to share the workload. This is one example of parallel processing.
Again, conceptually, parallel processing and threading are thought to be similar in that they allow things to be done simultaneously. But that is concept only, they are really very different, in both target application and technique. Where threading is useful as a way to identify and split out an entire task within a process (eg, a TCP/IP server may launch a worker thread when a new connection is requested, then connects, and maintains that connection as long as it remains), parallel processing is typically used to send smaller components of the same task (eg. a complex set of computations that can be performed independently in separate locations) off to separate resources (cores, or uPs) to be completed simultaneously. This is where multiple core processors really make a difference. But parallel processing also takes advantage of multiple systems, popular in areas such as genetics and MMORPG gaming.
Links relevant to parallel processing in C:
OpenMP
More OpenMP (examples)
Gribble Labs - Introduction to OpenMP
CUDA Tookit from NVIDIA
Additional reading on the general topic of threading and architecture:
This summary of threading and architecture barely scratches the surface. There are many parts to the the topic. Books to address them would fill a small library, and there are thousands of links. Not surprisingly within the broader topic some concepts do not seem to follow reason. For example, it is not a given that simply having more cores will result in faster multi-threaded programs.
Yes, they would, at least potentially, run "at the same time", that's exactly what threads are for; of course there are many details, for example:
If both threads run system calls that e.g. write to the same file descriptor they might temporarily block each other.
If thread synchronisation primitives like mutexes are used then the parallel execution will be blocked.
You need a processor with at least two cores in order to have two threads truly run at the same time.
It's a very large and very complex subject.
If your computer has only a single CPU, you should know, how it can execute more than one thread at the same time.
In single-processor systems, only a single thread of execution occurs at a given instant. because Single-processor systems support logical concurrency, not physical concurrency.
On multiprocessor systems, several threads do, in fact, execute at the same time, and physical concurrency is achieved.
The important feature of multithreaded programs is that they support logical concurrency, not whether physical concurrency is actually achieved.
The basics are simple, but the details get complex real quickly.
You can break a program into multiple threads (if it makes sense to do so), and each thread will run "at its own pace", such that if one must wait for, eg, some file I/O that doesn't slow down the others.
On a single processor multiple threads are accommodated by "time slicing" the processor somehow -- either on a simple clock basis or by letting one thread run until it must wait (eg, for I/O) and then "switching" to the next thread. There is a whole art/science to doing this for maximum efficiency.
On a multi-processor (such as most modern PCs which have from 2 to 8 "cores") each thread is assigned to a separate processor, and if there are not enough processors then they are shared as in the single processor case.
The whole area of assuring "atomicity" of operations by a single thread, and assuring that threads don't somehow interfere with each other is incredibly complex. In general a there is a "kernel" or "nucleus" category of system call that will not be interrupted by another thread, but thats only a small subset of all system calls, and you have to consult the OS documentation to know which category a particular system call falls into.
They will run at the same time, for one thread is independent from another, even if you perform a system call.
It's pretty easy to test it though, you can create one thread that prints something to the console output and perform a system call at another thread, that you know will take some reasonable amount of time. You will notice that the messages will continue to be printed by the other thread.
Yes, A program can run two threads at the same time.
it is called Multi threading.
would they both be able to run at the same time or would my second function wait until the system call finishes?
They both are able to run at the same time.
if you want, you can make thread B wait until Thread A completes or reverse
Two thread can run concurrently only if it is running on multiple core processor system, but if it has only one core processor then two threads can not run concurrently. So only one thread run at a time and if it finishes its job then the next thread which is on queue take the time.
I am using the tpl to process thousands of files in a multithreaded fashion.All good.
However there is some part of the application that I must process those files single thread.
Setting maxdegreeParallelism=1 means 1 thread x core is this correct?
When you dont you parallelism and you have 4 cores does it still use 1 thread x core?
The problem is that tpl does lot of hard work for you and also not been very familiar with threading does not help.
Bottom line I need to make sure that maxdegreeParallelism=1 is single threaded
Sorry for silly question but could not find a straight answer by googling.
See here.
No. It is not the case that necessarily one thread is run per core when you set the `MaxDegreePrallelism". It has a different meaning. It limits the number of parallel tasks done in the entire parallel operation. If you set that to one, it basically renders your parallel approach useless.
The TPL schedules a task on the threadpool. Once the task is scheduled, the threadpool decides how all the tasks to be done are to be distributed among threads, cores and processors. This is based on certain heuristics like the virtual address space, number of threads currently in blocked state, etc.
Now, if you mean that there is a part of your application in which the tasks should be done in a sequential form, there are ways to achieve that. Take a look at ContinueWith.
Documentation doesnt say anything about CPU-Cores but concurrent operations.
So this means, setting to 1 equals 1 Thread in total. Though it is a different thread than the callingthread.
Fiddle to poorly prove assumption.
I want to know if a program can run two threads at the same time (that is basically what it is used for correct?). But if I were to do a system call in one function where it runs on thread A, and have some other tasks running in another function where it runs on thread B, would they both be able to run at the same time or would my second function wait until the system call finishes?
Add-on to my original question: Now would this process still be an uninterruptable process while the system call is going on? I am talking about using any system call on UNIX/LINUX.
Multi-threading and parallel processing are two completely different topics, each worthy of its own conversation, but for the sake of introduction...
Threading:
When you launch an executable, it is running in a thread within a process. When you launch another thread, call it thread 2, you now have 2 separately running execution chains (threads) within the same process. On a single core microprocessor (uP), it is possible to run multiple threads, but not in parallel. Although conceptually the threads are often said to run at the same time, they are actually running consecutively in time slices allocated and controlled by the operating system. These slices are interleaved with each other. So, the execution steps of thread 1 do not actually happen at the same time as the execution steps of thread 2. These behaviors generally extend to as many threads as you create, i.e. packets of execution chains all working within the same process and sharing time slices doled out by the operating system.
So, in your system call example, it really depends on what the system call is as to whether or not it would finish before allowing the execution steps of the other thread to proceed. Several factors play into what will happen: Is it a blocking call? Does one thread have more priority than the other. What is the duration of the time slices?
Links relevant to threading in C:
SO Example
POSIX
ANSI C
Parallel Processing:
When multi-threaded program execution occurs on a multiple core system (multiple uP, or multiple multi-core uP) threads can run concurrently, or in parallel as different threads may be split off to separate cores to share the workload. This is one example of parallel processing.
Again, conceptually, parallel processing and threading are thought to be similar in that they allow things to be done simultaneously. But that is concept only, they are really very different, in both target application and technique. Where threading is useful as a way to identify and split out an entire task within a process (eg, a TCP/IP server may launch a worker thread when a new connection is requested, then connects, and maintains that connection as long as it remains), parallel processing is typically used to send smaller components of the same task (eg. a complex set of computations that can be performed independently in separate locations) off to separate resources (cores, or uPs) to be completed simultaneously. This is where multiple core processors really make a difference. But parallel processing also takes advantage of multiple systems, popular in areas such as genetics and MMORPG gaming.
Links relevant to parallel processing in C:
OpenMP
More OpenMP (examples)
Gribble Labs - Introduction to OpenMP
CUDA Tookit from NVIDIA
Additional reading on the general topic of threading and architecture:
This summary of threading and architecture barely scratches the surface. There are many parts to the the topic. Books to address them would fill a small library, and there are thousands of links. Not surprisingly within the broader topic some concepts do not seem to follow reason. For example, it is not a given that simply having more cores will result in faster multi-threaded programs.
Yes, they would, at least potentially, run "at the same time", that's exactly what threads are for; of course there are many details, for example:
If both threads run system calls that e.g. write to the same file descriptor they might temporarily block each other.
If thread synchronisation primitives like mutexes are used then the parallel execution will be blocked.
You need a processor with at least two cores in order to have two threads truly run at the same time.
It's a very large and very complex subject.
If your computer has only a single CPU, you should know, how it can execute more than one thread at the same time.
In single-processor systems, only a single thread of execution occurs at a given instant. because Single-processor systems support logical concurrency, not physical concurrency.
On multiprocessor systems, several threads do, in fact, execute at the same time, and physical concurrency is achieved.
The important feature of multithreaded programs is that they support logical concurrency, not whether physical concurrency is actually achieved.
The basics are simple, but the details get complex real quickly.
You can break a program into multiple threads (if it makes sense to do so), and each thread will run "at its own pace", such that if one must wait for, eg, some file I/O that doesn't slow down the others.
On a single processor multiple threads are accommodated by "time slicing" the processor somehow -- either on a simple clock basis or by letting one thread run until it must wait (eg, for I/O) and then "switching" to the next thread. There is a whole art/science to doing this for maximum efficiency.
On a multi-processor (such as most modern PCs which have from 2 to 8 "cores") each thread is assigned to a separate processor, and if there are not enough processors then they are shared as in the single processor case.
The whole area of assuring "atomicity" of operations by a single thread, and assuring that threads don't somehow interfere with each other is incredibly complex. In general a there is a "kernel" or "nucleus" category of system call that will not be interrupted by another thread, but thats only a small subset of all system calls, and you have to consult the OS documentation to know which category a particular system call falls into.
They will run at the same time, for one thread is independent from another, even if you perform a system call.
It's pretty easy to test it though, you can create one thread that prints something to the console output and perform a system call at another thread, that you know will take some reasonable amount of time. You will notice that the messages will continue to be printed by the other thread.
Yes, A program can run two threads at the same time.
it is called Multi threading.
would they both be able to run at the same time or would my second function wait until the system call finishes?
They both are able to run at the same time.
if you want, you can make thread B wait until Thread A completes or reverse
Two thread can run concurrently only if it is running on multiple core processor system, but if it has only one core processor then two threads can not run concurrently. So only one thread run at a time and if it finishes its job then the next thread which is on queue take the time.
I could setup a multi-threaded environment using the .net ThreadPool and I do get a significant performance benefit. This runs in the background of my application.
Now when a new task is requested by the user, I want it to get maximum CPU resources to maximize performance. Hence I would like to temporarily pause all the threads that I began (via the ThreadPool.Queueuserworkitem method) and then resume once the new task, requested by the user in foreground, is completed.
There could be several solutions to my problem:
a. Starting lesser background threads so that any new user request gets some share of the CPU resources. (but I loose the performance gain I had :( )
b. Set higher priority for the thread for a new user requested task. (not sure if this works?)
c. Suspending/resuming the ThreadPool threads I began. But suspending / resuming / interrupting threads is highly discouraged. Moreover, this could get tricky and error prone.
Any other ideas?
Note: when the user makes a request, performing the task would normally not take more than 300ms. However, when I start ThreadPool threads in background, it now takes about 3 seconds to complete (10 times worse)! I am OK if it takes 500-800ms though. All background threads complete in about 8 seconds (and I am OK if they take 1-2 seconds more). Hence, I am trying out option ( a ) for now.
Thanks in advance!
Be noted that Thread scheduling is done by CPU and hence cannot be directed from within a program. Only thing that can be done is setting ThreadPriority (that too on new Threads, not on ThreadPool threads). Check section Limitations of Using the Thread Pool
As your requirement is to suspend all background threads while executing a new task, what you can do is to create a class level flag.
Now you can put checkpoints in methods to be executed in Background task. At the checkpoints, check the class level flag, if it is set, call Thread.Sleep, which should (NOT MUST) trigger thread context switch by OS/CPU thread scheduler.
Putting checkpoints in methods (to be executed by ThreadPool) is analogous to putting checkpoints for cancellation support in background worker.
My question might sound a bit naive but I'm pretty new with multi-threaded programming.
I'm writing an application which processes incoming external data. For each data that arrives a new task is created in the following way:
System.Threading.Tasks.Task.Factory.StartNew(() => methodToActivate(data));
The items of data arrive very fast (each second, half second, etc...), so many tasks are created. Handling each task might take around a minute. When testing it I saw that the number of threads is increasing all the time. How can I limit the number of tasks created, so the number of actual working threads is stable and efficient. My computer is only dual core.
Thanks!
One of your issues is that the default scheduler sees tasks that last for a minute and makes the assumption that they are blocked on another tasks that have yet to be executed. To try and unblock things it schedules more pending tasks, hence the thread growth. There are a couple of things you can do here:
Make your tasks shorter (probably not an option).
Write a scheduler that deals with this scenario and doesn't add more threads.
Use SetMaxThreads to prevent
unbounded thread pool growth.
See the section on Thread Injection here:
http://msdn.microsoft.com/en-us/library/ff963549.aspx
You should look into using the producer/consumer pattern with a BlockingCollection<T> around a ConcurrentQueue<T> where you set the BoundedCapacity to something that makes sense given the characteristics of your workload. You can make your BoundedCapacity configurable and then tweak as you run through some profiling sessions to find the sweet spot.
While it's true that the TPL will take care of queueing up the tasks you create, creating too many tasks does not come without penalties. Also, what's the point in producing more work than you can consume? You want to produce enough work that the consumers will never be starved, but you don't want to get to far ahead of yourself because that's just wasting resources and potentially stealing those very same resources from your consumers.
You can create a custom TaskScheduler for the Task Parallel library and then schedule tasks on that by passing an instance of it to the TaskFactory constructor.
Here's one example of how to do that: Task Scheduler with a maximum degree of parallelism.