MPI I/O, matching processes to files - io

I have a number of files, say 100 and a number of processors, say 1000. Each proc needs to read parts of some subset of files. For instance, proc 3 needs file04.dat, file05.dat, and file09.dat, while proc 4 needs file04.dat, file07.dat, and file08.dat., etc. Which files are needed by which procs are not known at compile time and cannot be determined from any algorithm, but are easily determined during runtime from an existing metadata file.
I am trying to determine the best way to do this, using MPI I/O. It occurs to me that I could just have all the procs cycle through the files they need, calling MPI_File_open with MPI_COMM_SELF as the communicator argument. However, I'm a beginner with MPI I/O, and I suspect this would create some problems with large numbers of procs or files. Is this the case?
I have also thought that perhaps the thing to do would be to establish a separate communicator for each file, and each processor that needs a particular file would be a member of the file's associated communicator. But here, first, would that be a good idea? And second, I'm not an expert on communicators either and I can't figure out how to set up the communicators in this manner. Any ideas?
And if anyone has a completely different idea that would work better, I would be glad to hear it.

Related

What is the meaning of using COMM_WORLD or COMM_SELF to instantiate a TS, DMDA, Vec, etc

I'm looking at several examples from PETSc and petsc4py and looking at the PDF user manual of PETSc. The manual states:
For those not familiar with MPI, acommunicatoris a way of indicating a collection of processes that will be involved together in a calculation or communication. Communicators have the variable type MPI_Comm. In most cases users can employ the communicator PETSC_COMM_WORLD to indicate all processes in a given run and PETSC_COMM_SELF to indicate a single process.
I believe I understand that statement, but I'm unsure of the real consequences of actually using these communicators are. I'm unsure of what really happens when you do TSCreate(PETSC_COMM_WORLD,...) vs TSCreate(PETSC_COMM_SELF,...) or likewise for a distributed array. If you created a DMDA with PETSC_COMM_SELF, does this maybe mean that the DM object won't really be distributed across multiple processes? Or if you create a TS with PETSC_COMM_SELF and a DM with PETSC_COMM_WORLD, does this mean the solver can't actually access ghost nodes? Does it effect the results of DMCreateLocalVector and DMCreateGlobalVector?
The communicator for a solver decides which processes participate in the solver operations. For example, a TS with PETSC_COMM_SELF would run independently on each process, whereas one with PETSC_COMM_WORLD would evolve a single system across all processes. If you are using a DM with the solver, the communicators must be congruent.

TensorFlow: More than one thread in shuffle_batch for single sample files

I'm trying to understand the significance of using num_threads>1 in tf.train.shuffle_batch connected to tf.WholeFileReader reading image files (each file contains a single data sample). Will setting num_threads>1 make any difference in such case compared to num_threads=1? What is the mechanics of the file and batch queues in such case?
A short answer: probably it will make the execution faster. Here is some authoritative explanation from the guide:
single reader via the tf.train.shuffle_batch with num_threads bigger
than 1. This will make it read from a single file at the same time
(but faster than with 1 thread), instead of N files at once. This can
be important:
If you have more reading threads than input files, to avoid the risk
that you will have two threads reading the same example from the same
file near each other.
Or if reading N files in parallel causes too
many disk seeks. How many threads do you need?
the
tf.train.shuffle_batch* functions add a summary to the graph that
indicates how full the example queue is. If you have enough reading
threads, that summary will stay above zero.

File read access from threads

I have a static class that contains a number of functions that read values from configuration files. The configuration files are provided with the software and the software itself NEVER writes to them.
I have a number of threads that are running in my application and I need to call a function in the static class. The function will then go to one of the configuration files, look up a value (depending on the parameter that I pass when I call the function) and then return a result.
I need the threads to be able to read the file all at the same time (or rather, without synchronising to the main thread). The threads will NEVER write to the configuration files.
My question is simply, therefore, will there be any issues in allowing multiple threads to call the same static functions to read values from the same file at the same time? I can appreciate that there would be serialization issues if some threads were writing to the file while others were reading, but this will never happen.
Basically:
1. Are there any issues allowing multiple threads to read from the same file at the same time?
2. Are there any issues allowing multiple threads to call the same static functions (in the same static class) at the same time?
Yes, this CAN be an issue, depending on how the class is actually locating and reading from the files, and more so if the class is also caching the values so it does not need to read from the files every time. Without seeing your class's actual code, there is no way to tell you whether your code is thread-safe or not.

Is Locking needed necessary during read operation by mutiple threads?

Say, my application has n number of threads trying to read the same collection object, say a List. Will there be any race-codition or dead-lock or any similar problems ? In other words, Is it necessary to lock the List for read only operation ?
It totally depends on you whether you want to restrict the number of users or not. Normally if you see excel files in Windows, when it is shared across network, a maximum of 10 people can open it for reading at a time. This number can be increased to any number or for that matter there need not be any restriction at all. It is your wish as a programmer whether you want to restrict or not. The only thing you need to keep in mind is that if the file is on a server and if 1 million read requests are coming every second, if there is no restriction imposed, it is likely that your system will slow down and it will not be able to serve anyone. Instead if you impose locking say that only 100 users can read it at a time, you can be sure that your system will not be overloaded. This is a real time scenario I am explaining considering the worst case.
But If you are asking it only for learning sake, I would say it is not required. If n number of users are opening the same file for reading, ideally speaking you can give access to all the n users to read the collection object. No synchronisation mechanism is needed. When there is no synchronisation there will be no dead lock or anything.
Hope this removes your confusion. Thanks.
not necessary unless the read operation causes internal status change of the collection object.

Is reading data in one thread while it is written to in another dangerous for the OS?

There is nothing in the way the program uses this data which will cause the program to crash if it reads the old value rather than the new value. It will get the new value at some point.
However, I am wondering if reading and writing at the same time from multiple threads can cause problems for the OS?
I am yet to see them if it does. The program is developed in Linux using pthreads.
I am not interested in being told how to use mutexs/semaphores/locks/etc edit: so my program is only getting the new values, that is not what I'm asking.
No.. the OS should not have any problem. The tipical problem is the that you dont want to read the old values or a value that is half way updated, and thus not valid (and may crash your app or if the next value depends on the former, then you can get a corrupted value and keep generating wrong values all the itme), but if you dont care about that, the OS wont either.
Are the kernel/drivers reading that data for any reason (eg. it contains structures passed in to kernel APIs)? If no, then there isn't any issue with it, since the OS will never ever look at your hot memory.
Your own reads must ensure they are consistent so you don't read half of a value pre-update and half post-update and end up with a value that is neither pre neither post update.
There is no danger for the OS. Only your program's data integrity is at risk.
Imagine you data to consist of a set (structure) of values, which cannot be updated in an atomic operation. The reading thread is bound to read inconsistent data at some point (data consisting of a mixture of old and new values). But you did not want to hear about mutexes...
Problems arise when multiple threads share access to data when accessing that data is not atomic. For example, imagine a struct with 10 interdependent fields. If one thread is writing and one is reading, the reading thread is likely to see a struct that is halfway between one state and another (for example, half of it's members have been set).
If on the other hand the data can be read and written to with a single atomic operation, you will be fine. For example, imagine if there is a global variable that contains a count... One thread is incrementing it on some condition, and another is reading it and taking some action... In this case, there is really no intermediate inconsistent state. It's either got the new value, or it has the old value.
Logically, you can think of locking as a tool that lets you make arbitrary blocks of code atomic, at least as far as the other threads of execution are concerned.

Resources