Cluster nodes need to read different sections of an input file - how do I organize it? - io

I am trying to read an input file in a cluster environment. Different nodes will read different parts of it. However the parts are not clearly separated, but interleaved in a "grid".
For example, a file with 16 elements (assume integers):
0 1 2 3
4 5 6 7
8 9 A B
C D E F
If I use four nodes, the first node will read the top left 2x2 square (0,1,4,5), the second node will read the top right 2x2 square and so on.
How should I handle this? I can use MPI or OpenMP. I have two ideas but I don't know which would work better:
Each node will open the file and have its own handle to it. Each node would read the file independently, using only the part of the file it needs and skipping over the rest of it. In this case, what would be the difference between using fopen or MPI_File_open? Which one would be better?
Use one node read the whole file and send each part of the input to the node that needs it.

Regarding your question,
I will not suggest the second option you mentioned. that is using one node to read and then distributing the parts. Reasons being this is slow .. especially if the file is large. Here you have twice the overhead, first to keep other processes waiting and second to send the data which is read. So clearly a no go for me.
Regarding your first option, there is no big difference between using fopen and MPI_Fole_open. But Here I will still suggest MPI_File_open to avail certain facilities like non blocking I/O operations and Shared file pointers (makes life easy)

Related

How do I seek for holes and data in a sparse file in golang [duplicate]

I want to copy files from one place to another and the problem is I deal with a lot of sparse files.
Is there any (easy) way of copying sparse files without becoming huge at the destination?
My basic code:
out, err := os.Create(bricks[0] + "/" + fileName)
in, err := os.Open(event.Name)
io.Copy(out, in)
Some background theory
Note that io.Copy() pipes raw bytes – which is sort of understandable once you consider that it pipes data from an io.Reader to an io.Writer which provide Read([]byte) and Write([]byte), correspondingly.
As such, io.Copy() is able to deal with absolutely any source providing
bytes and absolutely any sink consuming them.
On the other hand, the location of the holes in a file is a "side-channel" information which "classic" syscalls such as read(2) hide from their users.
io.Copy() is not able to convey such side-channel information in any way.
IOW, initially, file sparseness was an idea to just have efficient storage of the data behind the user's back.
So, no, there's no way io.Copy() could deal with sparse files in itself.
What to do about it
You'd need to go one level deeper and implement all this using the syscall package and some manual tinkering.
To work with holes, you should use the SEEK_HOLE and SEEK_DATA special values for the lseek(2) syscall which are, while formally non-standard, are supported by all major platforms.
Unfortunately, the support for those "whence" positions is not present
neither in the stock syscall package (as of Go 1.8.1)
nor in the golang.org/x/sys tree.
But fear not, there are two easy steps:
First, the stock syscall.Seek() is actually mapped to lseek(2)
on the relevant platforms.
Next, you'd need to figure out the correct values for SEEK_HOLE and
SEEK_DATA for the platforms you need to support.
Note that they are free to be different between different platforms!
Say, on my Linux system I can do simple
$ grep -E 'SEEK_(HOLE|DATA)' </usr/include/unistd.h
# define SEEK_DATA 3 /* Seek to next data. */
# define SEEK_HOLE 4 /* Seek to next hole. */
…to figure out the values for these symbols.
Now, say, you create a Linux-specific file in your package
containing something like
// +build linux
const (
SEEK_DATA = 3
SEEK_HOLE = 4
)
and then use these values with the syscall.Seek().
The file descriptor to pass to syscall.Seek() and friends
can be obtained from an opened file using the Fd() method
of os.File values.
The pattern to use when reading is to detect regions containing data, and read the data from them – see this for one example.
Note that this deals with reading sparse files; but if you'd want to actually transfer them as sparse – that is, with keeping this property of them, – the situation is more complicated: it appears to be even less portable, so some research and experimentation is due.
On Linux, it appears you could try to use fallocate(2) with
FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE to try to punch a hole at the
end of the file you're writing to; if that legitimately fails
(with syscall.EOPNOTSUPP), you just shovel as many zeroed blocks to the destination file as covered by the hole you're reading – in the hope
the OS will do the right thing and will convert them to a hole by itself.
Note that some filesystems do not support holes at all – as a concept.
One example is the filesystems in the FAT family.
What I'm leading you to is that inability of creating a sparse file might
actually be a property of the target filesystem in your case.
You might find Go issue #13548 "archive/tar: add support for writing tar containing sparse files" to be of interest.
One more note: you might also consider checking whether the destination directory to copy a source file resides in the same filesystem as the source file, and if this holds true, use the syscall.Rename() (on POSIX systems)
or os.Rename() to just move the file across different directories w/o
actually copying its data.
You don't need to resort to syscalls.
package main
import "os"
func main() {
f, _ := os.Create("/tmp/sparse.dat")
f.Write([]byte("start"))
f.Seek(1024*1024*10, 0)
f.Write([]byte("end"))
}
Then you'll see:
$ ls -l /tmp/sparse.dat
-rw-rw-r-- 1 soren soren 10485763 Jun 25 14:29 /tmp/sparse.dat
$ du /tmp/sparse.dat
8 /tmp/sparse.dat
It's true you can't use io.Copy as is. Instead you need to implement an alternative to io.Copy which reads a chunk from the src, checks if it's all '\0'. If it is, just dst.Seek(len(chunk), os.SEEK_CUR) to skip past that part in dst. That particular implementation is left as an exercise to the reader :)

Top command: How to stick to one unit (KB/KiB)

I'm using the top command in several distros to feed a Bash script. Currently I'm calling it with top -b -n1.
I'd prefer a unified output in KiB or KB. However, it will display large units in megabytes or gigabytes. Is there an option to avoid these large units?
Please consider the following example:
4911 root 20 0 274m 248m 146m S 0 12.4 0:07.19 example
Edit: To answer 123's question, I transform the columns and send them to a log monitoring appliance. If there's no alternative, I'll convert the units via awk beforehand as per this thread.
Consider cutting out the middleman top and reading directly from /proc/[1-9]*/statm. All those files consist of one line of numbers, of which the first three correspond with top's VIRT RES SHR, respectively, in units of pages, normally 4096 B, so that by multiplying with 4 you get units of KiB.
You need a config file. You can create it yourself as $HOME/.toprc or using top interactively. The latter is easy. You just need to press W while top is running in interactive mode.
But first you need to set top interactively to the state you want. To change the memory scale press e until you see what you want. (Then save with W.)
Either way, you need this set in your config: Task_mscale=0 for the lowest scale.

FIO Flexible IO tester for repetitive data access patterns

I am currently working on a project and I need to test my prototype with repetitive data access patterns. I came across fio which is a flexible I/O tester for Linux (1).
Fio has many options and I want it to produce a workload which accesses the same blocks of a file, the same number of times over and over again. I also need those accesses to not be equal among these blocks. For instance, if fio creates a file named "test.txt"
and this file is divided on 10 blocks, I need the workload to read a specific number of these blocks, with different number of IOs each, over and over again. Let's say that it chooses to access block 3, 7 and 9. Then I want to access these in a specific order and a specific number of times each, over and over again. If this workload can be described by N passes, then I want to be something like this:
1st pass: read block 3 10 times, read block 7 5 times, read block 9 2 times.
2nd pass: read block 3 10 times, read block 7 5 times, read block 9 2 times.
...
N-pass: read block 3 10 times, read block 7 5 times, read block 9 2 times.
Question 1: Can the above workload be produced with Fio? If yes, How?
Question 2: Is there a mailing list, forum, website, community for Fio users?
Thank you,
Nick
http://www.spinics.net/lists/fio/index.html This is the website you can follow mailing list.
http://www.bluestop.org/fio/HOWTO.txt link will also help you.
This is actually quite a tricky thing to do. The closest you'll get with parameters is using one of the non-uniform distributions (see random_distribution in the HOWTO) but you'll be saying re-read blocks A, B, C more than blocks X, Y, Z and you won't be able to control the exact counts.
An alternative is to write an iolog that can be replayed that has the exact sequence you're looking for (see Trace file format v2 in the HOWTO).

File output redirection in Linux

I have two programs A and B. I can't change the program A - I can only run it with some parameters, but I have written the B myself, and I can modify it the way I like.
Program A runs for a long time (20-40 hours) and during that time it produces output to the file, so that its size increases constantly and can be huge at the end of run (like 100-200 GB). The program B then reads the file and calculates some stuff. The special property of the file is that its content is not correlated: I can divide the file in half and run calculations on each part independently, so that I don't need to store all the data at once: I can calculate on the first part, then throw it away, calculate on the second one, etc.
The problem is that I don't have enough space to store such a big files. I wonder if it is possible to pipe somehow the output of the A to B without storing all the data at once and without making huge files. Is it possible to do something like that?
Thank you in advance, this is crucial for me now, Roman.
If program A supports it, simply pipe.
A | B
Otherwise, use a fifo.
mkfifo /tmp/fifo
ls -la > /tmp/fifo &
cat /tmp/fifo
EDIT: Adjust buffer sizes with ulimit -p and then:
cat /tmp/fifo | B
It is possible to pipeline output of one program into another.
Read here to know the syntax and know-hows of Unix pipelining.
you can use socat which can take stdout and feed it to network and get from network and feed it to stdin
named or unnamed pipe have a problem of small ( 4k ? ) buffer .. that means too many process context switches if you are writing multi gb ...
Or if you are adventurous enough .. you can LD_PRELOAD a so in process A, and trap the open/write calls to do whatever ..

Doing file operations with 64-bit addresses in C + MinGW32

I'm trying to read in a 24 GB XML file in C, but it won't work. I'm printing out the current position using ftell() as I read it in, but once it gets to a big enough number, it goes back to a small number and starts over, never even getting 20% through the file. I assume this is a problem with the range of the variable that's used to store the position (long), which can go up to about 4,000,000,000 according to http://msdn.microsoft.com/en-us/library/s3f49ktz(VS.80).aspx, while my file is 25,000,000,000 bytes in size. A long long should work, but how would I change what my compiler(Cygwin/mingw32) uses or get it to have fopen64?
The ftell() function typically returns an unsigned long, which only goes up to 232 bytes (4 GB) on 32-bit systems. So you can't get the file offset for a 24 GB file to fit into a 32-bit long.
You may have the ftell64() function available, or the standard fgetpos() function may return a larger offset to you.
You might try using the OS provided file functions CreateFile and ReadFile. According to the File Pointers topic, the position is stored as a 64bit value.
Unless you can use a 64-bit method as suggested by Loadmaster, I think you will have to break the file up.
This resource seems to suggest it is possible using _telli64(). I can't test this though, as I don't use mingw.
I don't know of any way to do this in one file, a bit of a hack but if splitting the file up properly isn't a real option, you could write a few functions that temp split the file, one that uses ftell() to move through the file and swaps ftell() to a new file when its reaching the split point, then another that stitches the files back together before exiting. An absolutely botched up approach, but if no better solution comes to light it could be a way to get the job done.
I found the answer. Instead of using fopen, fseek, fread, fwrite... I'm using _open, lseeki64, read, write. And I am able to write and seek in > 4GB files.
Edit: It seems the latter functions are about 6x slower than the former ones. I'll give the bounty anyone who can explain that.
Edit: Oh, I learned here that read() and friends are unbuffered. What is the difference between read() and fread()?
Even if the ftell() in the Microsoft C library returns a 32-bit value and thus obviously will return bogus values once you reach 2 GB, just reading the file should still work fine. Or do you need to seek around in the file, too? For that you need _ftelli64() and _fseeki64().
Note that unlike some Unix systems, you don't need any special flag when opening the file to indicate that it is in some "64-bit mode". The underlying Win32 API handles large files just fine.

Resources