I'm running a fortran code in multi processes using Open MPI. Each process needs to open and write many files. During the run time, it's possible that two different processes will open and write different files with the same unit number concurrently.
processA: open(unit=10, file1)
processB: open(unit=10, file2)
Will this cause a problem?
Yes, it is possible, and no it should not cause problems. The MPI processes all live on their own and are not aware of the memory (and therefore unit numbers) of other processes. Though you should be careful to not create too many files, if you use thousands of processes you may run into limitations of the filesystem.
Related
Will reading from the same file make threads run slower? If so, how does YouTube or Netflix servers handle so many people watching the same movie and everyone is at different place in the movie?
Or if reading from the same file make threads slow, then if space is not a concern, is it better to have multiple copies of the file, or split the file into parts?
Will reading from the same file make threads run slower?
No. Modern operating systems handle this situation extremely efficiently.
I am working on a program where I am required to download a large amount of JSON files from different URLs.
Currently, my program creates multiple threads, and in each thread, it calls the LibCurl easy_perform() function but I am running into issues where the program fails occasionally with an error of "double free". It seems to be some sort of Heisenbug but I have been able to catch it in GDB which confirms the error originates in LibCurl (backtraced).
While I would love suggestions on the issue I am having, my actual question is this: Would it be better if I were to change the structure of my code to use the LibCurl Multi Interface on one thread instead of calling the single interface across multiple threads? What are the trade offs of using one over the other?
Note: By "better", I mean is it faster and less taxing on my CPU? Is it more reliable as the multi interface was designed for this?
EDIT:
The three options I have as I understand it are these:
1) Reuse the same easy_handle in a single thread. The connections wont need to be reestablished making it faster.
2) Call curl_easy_perform() in each individual thread. They all run in parallel, again, making it faster.
3) Call curl_multi_perform() in a single thread. This is non-blocking so I imagine all of the files are downloaded in parallel making it faster?
Which of these options is the most time efficient?
curl_easy_perform is blocking operation. That means if you run in in one thread you have to download files sequentially. In multithreaded application you can run many operations in parallel - this usually means faster download time (if speed is not limited by network or destination server).
But there is non-blocking variant that may work better for you if you want to go single threaded way - curl_multi_perform
From curl man
You can do any amount of calls to curl_easy_perform while using the
same easy_handle. If you intend to transfer more than one file, you
are even encouraged to do so. libcurl will then attempt to re-use the
same connection for the following transfers, thus making the
operations faster, less CPU intense and using less network resources.
Just note that you will have to use curl_easy_setopt between the
invokes to set options for the following curl_easy_perform.
In short - it will give few benefits you want vs curl_easy_perform.
This may be a beginner's question. Is there a difference between executing multiple threads and running a program multiple times? By running a program multiple times, I mean literally starting up a terminal and running the program multiple times. I read that there is a limit of 1 thread per CPU, and I have a quad-core machine, so I guess that means I have 4 CPUs. Is there a limit of programs per CPU also?
Generally, if a program uses multiple threads, the threads will divide the work of the program between themselves. For example, one thread might work on half of a giant data set and another thread might take the other half, or multiple threads might talk to separate machines across a network. Running a program 2 times won't have that effect; you'll get two webservers or two games of Minecraft that have nothing to do with each other. It's possible for a program to communicate with other copies of itself, and some programs do that, but it's not the usual approach.
Multiple Threads means you can execute different instances of an action in same time.
If you running multiple programs it will execute one by one . Using thread we can increase the processing speed by parallel process
I have been asked to write the test cases to show practically the performance of semaphore and read write semaphore in case of more readers and less writers and vice versa.
I have implemented the semaphore (in kernel space we were asked actually) but not getting how to write the use cases and do a live practical evaluation ( categorically ) of same.
Why don't you just write your two versions of the code (Semaphore / R/W Semaphore) to start. The use cases will depend on the actual feature being tested. Is it a device driver? Is it IO related at all? Is it networking related? It's hard to come up with use cases without knowing this.
Generally what I would do for something like an IO benchmark would be running multiple simulations over an increasing memory footprint for a set of runs. Another set of runs might be over an increasing process load. Another may be over different block sizes. I would compare each one of those against something like aggregate bandwidth and see how performance (aggregate bandwidth in this case) changed over those tests.
Again, your use cases might be completely different if you are testing something like a USB driver.
Using your custom semaphores, write the following 2 C programs and compile them
reader.c
writer.c
As a simple rudimentary test, write a shell script test.sh and add your commands to load the test binaries as follows.
#!/bin/sh
./reader &
./reader &
./reader &
./reader &
./writer &
Launching the above shell script as ./test.sh will launch 4 readers and 1 writer. Customise this to your test scenario.
Ensure that your programs are operating properly i.e. verify data is being exchanged properly first before trying to profile the performance.
Once you are sure that IPC is working as expected, profile the cpu usage. Prior to launching test.sh, run the top command in another terminal. Observe the cpu usage patterns for varying number of readers/writers during the run-time of test script.
Also you can launch the individual binaries(or in the test-script) with :
time <binary>
To print the total lifetime and time spent waiting on the kernel driver.
perf record <binary>
and after completion, run perf annotate main
To obtain the relative amount of time spent in various sections of the code.
I'm learning how to use the TPL for parellizing an application I have. The application processes ZIP files, exctracting all of the files held within them and importing the contents into a database. There may be several thousand zip files waiting to be processed at a given time.
Am I right in kicking off a separate task for each of these ZIP files or is this an inefficient way to use the TPL?
Thanks.
This seems like a problem better suited for worker threads (separate thread for each file) managed with the ThreadPool rather than the TPL. TPL is great when you can divide and conquer on a single item of data but your zip files are treated individually.
Disc I/O is going to be your bottle neck so I think that you will need to throttle the number of jobs running simultaneously. It's simple to manage this with worker threads but I'm not sure how much control you have (if nay) over the parallel for, foreach as far as how parallelism goes on at once, which could choke your process and actually slow it down.
Anytime that you have a long running process, you can typically gain additional performance on multi-processor systems by making different threads for each input task. So I would say that you are most likely going down the right path.
I would have thought that this would depend on if the process is limited by CPU or disk. If the process is limited by disk I'd thought that it might be a bad idea to kick off too many threads since the various extractions might just compete with each other.
This feels like something you might need to measure to get the correct answer for what's best.
I have to disagree with certain statements here guys.
First of all, I do not see any difference between ThreadPool and Tasks in coordination or control. Especially when tasks runs on ThreadPool and you have easy control over tasks, exceptions are nicely propagated to the caller during await or awaiting on Tasks.WhenAll(tasks) etc.
Second, I/O wont have to be the only bottleneck here, depending on data and level of compression the ZIPping is going to take msot likely more time than reading the file from the disc.
It can be thought of in many ways, but I would best go for something like number of CPU cores or little less.
Loading file paths to ConcurrentQueue and then allowing running tasks to dequeue filepaths, load files, zip them, save them.
From there you can tweak the number of cores and play with load balancing.
I do not know if ZIP supports file partitioning during compression, but in some advanced/complex cases it could be good idea especially on large files...
WOW, it is 6 years old question, bummer! I have not noticed...:)