I'am trying to use PagingPredicate for map with only 340 entries. For first page with pageSize=15 it tooks about 15 ms to retrieve result, but for last page it tooks 250 ms. is it normal result?
code example:
public List<NaturalPerson> getNaturalPersonByNameAndUser(String name, User user, int offset, int limit) {
final PagingPredicate pagingPredicate = new PagingPredicate(new NaturalPersonPredicate(name,user), /*naturalPersonComparator,*/ limit);
for(int i = 0; i< offset; i = i + limit){
pagingPredicate.nextPage();
}
return Lists.newArrayList(naturalPersonMap.values(pagingPredicate));
}
Totally depends. How many times did you run the test btw? Writing a microbenchmark is quite complicated. Personally I use JMH for it.
Can you try it with JMH and see if the numbers still the same?
For an example with Hazelcast:
https://github.com/hazelcast/performancetop5/tree/master/item1
Related
I am wondering about how concurrency can be expressed without an explicit thread object, not the implementation, which probably would use threads or thread pools, but the language design related issues.
Q1: I wonder what would be lost if there was no thread object, what couldn't be done in such a language?
Q2: I also wonder about how this would be expressed, what ways were proposed or implemented as alternatives or complements to threads?
one possibility is the MPI-programm-model (GPU as well)
lets say you have the following code
for(int i=0; i < 100; i++) {
work(i);
}
the "normal" thread-based way would be the separation of the iteration-range into multiple subsets. So something like this
Thread-1:
for(int i=0; i < 50; i++) {
work(i);
}
Thread-2:
for(int i=50; i < 100; i++) {
work(i);
}
however in MPI/GPU you do something different.
the idea is, that every core execute the same(GPU) or at least
a similar (MPI) programm. the difference is, that each core uses
a different ID, which changes the behavior of the code.
mpi-style: (not exactly the MPI-syntax)
int rank = get_core_id();
int size = get_num_core();
int subset = 100 / size;
for (int i = rank * subset;i < (rand+1)*subset; i+) {
//each core will use a different range for i
work(i);
}
the next big thing is communication. Normally you need to use all of the synchronization-stuff manually. MPI is message-based, meaning that its not perfectly suited for classical shared-memory modells (every core has access to the same memory), but in a cluster system (many cores combined with a network) it works excellent. This is not only limited to supercomputers (they use basically only mpi-style stuff), but in the recent years a new type of core-architecture (manycores) was developed. They have a local so called Network-On-Chip, so each core can send/receive messages without having the problem with synchronization.
MPI contains not only simple messages, but higher constructs to automatically scatter and gather data to every core.
Example: (again not MPI-syntax)
int rank = get_core_id();
int size = get_num_core();
int data[100];
int result;
int results[size];
if (rank == 0) { //master-core only
fill_with_stuff(data);
}
scatter(0, data); //core-0 will send the data-content to all other cores
result = work(rank, data); // every core works on the same data
gather(0,result,results); //get all local results and store them in
//the results-array of core-0
an other solutions is the openMP-libary
here you declare parallel-blocks. the whole thread-part is done by the libary itself
example:
//this will split the for-loop automatically in 4 threads
#pragma omp parallel for num_threads(4)
for(int i=0; i < 100; i++) {
work(i);
}
the big advantage is, that its fast to write. thats it
you may get better performance with writing the threads on your own,
but it takes a lot more time and knowledge about synchronization
I have a piece of code as follows. I want to improve time complexity for this.
This is a thread and I can have upto maximum of 2000 threads that execute this function at the same time
On top of that, I wait for file descriptors that are ready from a pollset. MAX_RTP_SESSIONS is also huge (value of 5000 or more). so its a big for loop and therefore i can see performance getting affected. [When value of MAX_RTP_SESSIONS is reduced to just 500, i can see a huge improvement in performance]
But i will have to use 2000 threads and also 5000 sessions. I wish i could find a way to change time complexity from o(n^2) to atleast o(n) or better. Any ideas are really appreciated!
//..
retval=epoll_wait(epfd, pollset, EPOLL_MAX_EVENTS, mSecTimeout)
//..
sem_wait(&sem_sessions);
for(i = 0; i< retVal; i++) {
for (j=0; j < MAX_RTP_SESSIONS; j++) {
if ((g_rtp_sessions[j].destroy==FALSE) &&
(g_rtp_sessions[j].used!=FALSE) &&
(g_rtp_sessions[j].p_rtp->rtp_socket->fd == pollset[i].data.fd))
{
if (0 < rtp_recv_data(....)) {
rtp_update(...)
}
}
}
}
sem_post(&sem_sessions);
//..
Sort pollset, then you can do a binary search on it which should lead to a O(n log n) algorithm.
I know prime finding is well studied, and there are a lot of different implementations. My question is, using the provided method (code sample), how can I go about breaking up the work? The machine it will be running on has 4 quad core hyperthreaded processors and 16GB of ram. I realize that there are some improvements that could be made, particularly in the IsPrime method. I also know that problems will occur once the list has more than int.MaxValue items in it. I don't care about any of those improvements. The only thing I care about is how to break up the work.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace Prime
{
class Program
{
static List<ulong> primes = new List<ulong>() { 2 };
static void Main(string[] args)
{
ulong reportValue = 10;
for (ulong possible = 3; possible <= ulong.MaxValue; possible += 2)
{
if (possible > reportValue)
{
Console.WriteLine(String.Format("\nThere are {0} primes less than {1}.", primes.Count, reportValue));
try
{
checked
{
reportValue *= 10;
}
}
catch (OverflowException)
{
reportValue = ulong.MaxValue;
}
}
if (IsPrime(possible))
{
primes.Add(possible);
Console.Write("\r" + possible);
}
}
Console.WriteLine(primes[primes.Count - 1]);
Console.ReadLine();
}
static bool IsPrime(ulong value)
{
foreach (ulong prime in primes)
{
if (value % prime == 0) return false;
if (prime * prime > value) break;
}
return true;
}
}
}
There are 2 basic schemes I see: 1) using all threads to test a single number, which is probably great for higher primes but I cannot really think of how to implement it, or 2) using each thread to test a single possible prime, which can cause a non-continuous string of primes to be found and run into unused resources problems when the next number to be tested is greater than the square of the highest prime found.
To me it feels like both of these situations are challenging only in the early stages of building the list of primes, but I'm not entirely sure. This is being done for a personal exercise in breaking this kind of work.
If you want, you can parallelize both operations: the checking of a prime, and the checking of multiple primes at once. Though I'm not sure this would help. To be honest I'd consider remove the threading in main().
I've tried to stay faithful to your algorithm, but to speed it up a lot I've used x*x instead of reportvalue; this is something you could easily revert if you wish.
To further improve on my core splitting you could determine an algorithm to figure out the number of computations required to perform the divisions based on the size of the numbers and split the list that way. (aka smaller numbers take less time to divide by so make the first partitions larger)
Also my concept of threadpool may not exist the way I want to use it
Here's my go at it(pseudo-ish-code):
List<int> primes = {2};
List<int> nextPrimes = {};
int cores = 4;
main()
{
for (int x = 3; x < MAX; x=x*x){
int localmax = x*x;
for(int y = x; y < localmax; y+=2){
thread{primecheck(y);}
}
"wait for all threads to be executed"
primes.add(nextPrimes);
nextPrimes = {};
}
}
void primecheck(int y)
{
bool primality;
threadpool? pool;
for(int x = 0; x < cores; x++){
pool.add(thread{
if (!smallcheck(x*primes.length/cores,(x+1)*primes.length/cores ,y)){
primality = false;
pool.kill();
}
});
}
"wait for all threads to be executed or killed"
if (primality)
nextPrimes.add(y);
}
bool smallcheck(int a, int b, int y){
foreach (int div in primes[a to b])
if (y%div == 0)
return false;
return true;
}
E: I added what I think pooling should look like, look at revision if you want to see it without.
Use the sieve of Eratosthenes instead. It's not worthwhile to parallelize unless you use a good algorithm in the first place.
Separate the space to sieve into large regions and sieve each in its own thread. Or better use some workqueue concept for large regions.
Use a bit array to represent the prime numbers, it takes less space than representing them explicitly.
See also this answer for a good implementation of a sieve (in Java, no split into regions).
in my application there is a small part of function,in which it will read files to get some information,the number of filecount would be utleast 50,So I thought of implementing threading.Say if the user is giving 50 files,I wanted to separate it as 5 *10, 5 thread should be created,so that each thread can handle 10 files which can speed up the process.And also from the below code you can see that some variables are common.I read some articles about threading and I am aware that only one thread should access a variable/contorl at a me(CCriticalStiuation can be used for that).For me as a beginner,I am finding hard to imlplement what I have learned about threading.Somebody please give me some idea with code shown below..thanks in advance
file read function://
void CMyClass::GetWorkFilesInfo(CStringArray& dataFilesArray,CString* dataFilesB,
int* check,DWORD noOfFiles,LPWSTR path)
{
CString cFilePath;
int cIndex =0;
int exceptionInd = 0;
wchar_t** filesForWork = new wchar_t*[noOfFiles];
int tempCheck;
int localIndex =0;
for(int index = 0;index < noOfFiles; index++)
{
tempCheck = *(check + index);
if(tempCheck == NOCHECKBOX)
{
*(filesForWork+cIndex) = new TCHAR[MAX_PATH];
wcscpy(*(filesForWork+cIndex),*(dataFilesB +index));
cIndex++;
}
else//CHECKED or UNCHECKED
{
dataFilesArray.Add(*(dataFilesB+index));
*(check + localIndex) = *(check + index);
localIndex++;
}
}
WorkFiles(&cFilePath,dataFilesArray,filesForWork,
path,
cIndex);
dataFilesArray.Add(cFilePath);
*(check + localIndex) = CHECKED;
}
I think you would be better off having just one thread to read all the files. The context switch between the threads together with the synchronization issues is really not worth the price in your example. The hard drive is one resource so imagine all five threads taking turns moving the hard drive read heads to various positions on the hard drive == not very effective.
I have a list of 300,000 + items.
What i am currently doing with the list is validating the Address and writing both the original address and corrected address to a specified file.
What i'd like to do is split the list evenly among a given number of threads and do processes on them concurrently.
Can anyone help me with an example on how i can go about doing something like this?
Thanks
If you're working in 2.0 and the list is only being used in a read only fashion (not being changed while this processing is occuring) then you can simply divide up the indexes. For instance ...
public void Process(List<Item> list, int threadCount) {
int perThread = list.Count < threadCount ? list.Count : list.Count / threadCount;
int index = 0;
while ( index < list.Count ) {
int start = index;
int count = Math.Min(perThread,list.Count-start);
WaitCallBack del = delegate(object state) { ProcessCore(list, start, count); };
ThreadPool.QueueUserWorkItem(del);
index += count;
}
}
private void ProcessCore(List<Item> list, int startIndex, int count) {
// Do work here
}
Conceptually, it's fairly simple, given a couple of assumptions:
You're not changing the list during
processing.
You know the number of threads ahead of time.
Basically, your algorithm is this:
Split the list into mostly-even sections based on the number of threads.
Give each thread the start and end index of its section, and the output file.
In each thread:
a. Process an item
b. Lock access to the output file
c. Write the original and corrected address
d. Unlock access to the output file