Parallel output using MPI IO to a single file - io

I have a very simple task to do, but somehow I am still stuck.
I have one BIG data file ("File_initial.dat"), which should be read by all nodes on the cluster (using MPI), each node will perform some manipulation on part of this BIG file (File_size / number_of_nodes) and finally each node will write its result to one shared BIG file ("File_final.dat"). The number of elements of files remain the same.
By googling I understood, that it is much better to write data file as a binary file (I have only decimal numbers in this file) and not as *.txt" file. Since no human will read this file, but only computers.
I tried to implement myself (but using formatted in/output and NOT binary file) this, but I get incorrect behavior.
My code so far follows:
#include <fstream>
#define NNN 30
int main(int argc, char **argv)
{
ifstream fin;
// setting MPI environment
int rank, nprocs;
MPI_File file;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &nprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
// reading the initial file
fin.open("initial.txt");
for (int i=0;i<NNN;i++)
{
fin >> res[i];
cout << res[i] << endl; // to see, what I have in the file
}
fin.close();
// starting position in the "res" array as a function of "rank" of process
int Pstart = (NNN / nprocs) * rank ;
// specifying Offset for writing to file
MPI_Offset offset = sizeof(double)*rank;
MPI_File file;
MPI_Status status;
// opening one shared file
MPI_File_open(MPI_COMM_WORLD, "final.txt", MPI_MODE_CREATE|MPI_MODE_WRONLY,
MPI_INFO_NULL, &file);
// setting local for each node array
double * localArray;
localArray = new double [NNN/nprocs];
// Performing some basic manipulation (squaring each element of array)
for (int i=0;i<(NNN / nprocs);i++)
{
localArray[i] = res[Pstart+i]*res[Pstart+i];
}
// Writing the result of each local array to the shared final file:
MPI_File_seek(file, offset, MPI_SEEK_SET);
MPI_File_write(file, localArray, sizeof(double), MPI_DOUBLE, &status);
MPI_File_close(&file);
MPI_Finalize();
return 0;
}
I understand, that I do something wrong, while trying to write double as a text file.
How one should change the code in order to be able to save
as .txt file (format output)
as .dat file (binary file)

Your binary file output is almost right; but your calculations for your offset within the file and the amount of data to write is incorrect. You want your offset to be
MPI_Offset offset = sizeof(double)*Pstart;
not
MPI_Offset offset = sizeof(double)*rank;
otherwise you'll have each rank overwriting each others data as (say) rank 3 out of nprocs=5 starts writing at double number 3 in the file, not (30/5)*3 = 18.
Also, you want each rank to write NNN/nprocs doubles, not sizeof(double) doubles, meaning you want
MPI_File_write(file, localArray, NNN/nprocs, MPI_DOUBLE, &status);
How to write as a text file is a much bigger issue; you have to convert the data into string internally and then output those strings, making sure you know how many characters each line requires by careful formatting. That is described in this answer on this site.

Related

Printing Lines from Intel HEX Record File

I'm trying to send the contents of an Intel Hex file over a Serial connection to a microcontroller, which will process each line sent and program them into memory as needed. The processing code expects the lines to be sent as they appear in the Hex file, including the newline characters at the end of each line.
This code is being run in Visual Studio 2013 on a Windows 10 PC; for reference, the microcontroller is an ARM Cortex-M0+ model.
However, the following code doesn't seem to be processing the Intel Hex record file the way that I expected.
...
int count = 0;
char hexchar;
unsigned char Buffer[69]; // 69 is max ascii hex read length for microcontroller
ifstream hexfile("pdu.hex");
while (hexfile.get(hexchar))
{
Buffer[count] = hexchar;
count++;
if (hexchar == '\n')
{
for (int i = 0; i < count; i++)
{
printf("%c", Buffer[i]);
}
serial_tx_function(Buffer); // microcontroller requires unsigned char
count = 0;
}
}
...
Currently, the serial transmission call is commented out, and the for loop is there to verify that the file is being read properly. I expect to see each line of the hex file printed out to the terminal. Instead, I get nothing at all. Any ideas?
EDIT: After further investigation, I determined that the program isn't even entering the while loop because the file fails to open. I don't know why that would be the case, since the file exists and can be opened in other programs like Notepad. However, I'm not terribly experienced with file I/O, so I might be overlooking something.
*.hex files contain non-ascii data a lot of the times that can have issues being printed out on command-line terminals.
I would just say you should try to open the file as a binary and print the characters as hexadecimal numbers.
So make sure you open the file in binary mode with ifstream hexfile("pdu.hex", ifstream::binary); and if you want to print hex characters the printf specifier is %x or %hhx for char.
The whole program would look something like this:
#include <iostream>
#include <fstream>
#include <cassert>
int main()
{
using namespace std;
int count = 0;
char hexchar;
constexpr int MAX_LINE_LENGTH = 69;
unsigned char Buffer[MAX_LINE_LENGTH]; // 69 is max ascii hex read length for microcontroller
ifstream hexfile("pdu.hex",ios::binary);
while (hexfile.get(hexchar))
{
assert(count < MAX_LINE_LENGTH);
Buffer[count] = hexchar;
count++;
if (hexchar == '\n')
{
for (int i = 0; i < count; i++)
{
printf("%hhx ", Buffer[i]);
}
printf("\n");
//serial_tx_function(Buffer); // microcontroller requires unsigned char
count = 0;
}
}
}

fuse: Setting offsets for the filler function in readdir

I am implementing a virtual filesystem using the fuse, and need some understanding regarding the offset parameter in readdir.
Earlier we were ignoring the offset and passing 0 in the filler function, in which case the kernel should take care.
Our filesystem database, is storing: directory name, filelength, inode number and parent inode number.
How do i calculate get the offset?
Then is the offset of each components, equal to their size sorted in incremental form of their inode number? What happens is there is a directory inside a directory, is the offset in that case equal to the sum of the files inside?
Example: in case the dir listing is - a.txt b.txt c.txt
And inode number of a.txt=3, b.txt=5, c.txt=7
Offset of a.txt= directory offset
Offset of b.txt=dir offset + size of a.txt
Offset of c.txt=dir offset + size of b.txt
Is the above assumption correct?
P.S: Here are the callbacks of fuse
The selected answer is not correct
Despite the lack of upvotes on this answer, this is the correct answer. Cracking into the format of the void buffer should be discouraged, and that's the intent behind declaring such things void in C code - you shouldn't write code that assumes knowledge of the format of the data behind void pointers, use whatever API is provided properly instead.
The code below is very simple and straightforward, as it should be. No knowledge of the format of the Fuse buffer is required.
Fictitious API
This is a contrived example of what some device's API could look
like. This is not part of Fuse.
// get_some_file_names() -
// returns a struct with buffers holding the names of files.
// PARAMETERS
// * path - A path of some sort that the fictitious device groks.
// * offset - Where in the list of file names to start.
// RETURNS
// * A name_list, it has some char buffers holding the file names
// and a couple other auxiliary vars.
//
name_list *get_some_file_names(char *path, size_t offset);
Listing the files in parts
Here's a Fuse callback that can be registered with the Fuse system to
list the filenames provided by get_some_file_names(). It's arbitrarily named readdir_callback() so its purpose is obvious.
int readdir_callback( char *path,
void *buf, // This is meant to be "opaque".
fuse_fill_dir_t *filler, // filler takes care of buf.
off_t off, // Last value given to filler.
struct fuse_file_info *fi )
{
// Call the fictitious API to get a list of file names.
name_list *list = get_some_file_names(path, off);
for (int i = 0; i < list->length; i++)
{
// Feed the file names to filler() one at a time.
if (filler(buf, list->names[i], NULL, off + i + 1))
{
break; // filler() returned 1, requesting a break.
}
incr_num_files_listed(list);
}
if (all_files_listed(list))
{
return 1; // Tell Fuse we're done.
}
return 0;
}
The off (offset) value is not used by the filler function to fill its opaque buffer, buf. The off value is, however, meaningful to the callback as an offset base as it provides file names to filler(). Whatever value was last passed to filler() is what gets passed back to readdir_callback() on its next invocation. filler()
itself only cares whether the off value is 0 or not-0.
Indicating "I'm done listing!" to Fuse
To signal to the Fuse system that your readdir_callback() is done listing file names in parts (when the last of the list of names has been given to filler()), simply return 1 from it.
How off Is Used
The off, offset, parameter should be non-0 to perform the partial listings. That's its only requirement as far as filler() is concerned. If off is 0, that indicates to Fuse that you're going to do a full listing in one shot (see below).
Although filler() doesn't care what the off value is beyond it being non-0, the value can still be meaningfully used. The code above is using the index of the next item in its own file list as its value. Fuse will keep passing the last off value it received back to the read dir callback on each invocation until the listing is complete (when readdir_callback() returns 1).
Listing the files all at once
int readdir_callback( char *path,
void *buf,
fuse_fill_dir_t *filler,
off_t off,
struct fuse_file_info *fi )
{
name_list *list = get_all_file_names(path);
for (int i = 0; i < list->length; i++)
{
filler(buf, list->names[i], NULL, 0);
}
return 0;
}
Listing all the files in one shot, as above, is simpler - but not by much. Note that off is 0 for the full listing. One may wonder, 'why even bother with the first approach of reading the folder contents in parts?'
The in-parts strategy is useful where a set number of buffers for file names is allocated, and the number of files within folders may exceed this number. For instance, the implementation of name_list above may only have 8 allocated buffers (char names[8][256]). Also, buf may fill up and filler() start returning 1 if too many names are given at once. The first approach avoids this.
The offset passed to the filler function is the offset of the next item in the directory. You can have the entries in the directory in any order you want. If you don't want to return an entire directory at once, you need to use the offset to determine what gets asked for and stored. The order of items in the directory is up to you, and doesn't matter what order the names or inodes or anything else is.
Specifically, in the readdir call, you are passed an offset. You want to start calling the filler function with entries that will be at this callback or later. In the simplest case, the length of each entry is 24 bytes + strlen(name of entry), rounded up to the nearest multiple of 8 bytes. However, see the fuse source code at http://sourceforge.net/projects/fuse/ for when this might not be the case.
I have a simple example, where I have a loop (pseudo c-code) in my readdir function:
int my_readdir(const char *path, void *buf, fuse_fill_dir_t filler, off_t offset, struct fuse_file_info *fi)
{
(a bunch of prep work has been omitted)
struct stat st;
int off, nextoff=0, lenentry, i;
char namebuf[(long enough for any one name)];
for (i=0; i<NumDirectoryEntries; i++)
{
(fill st with the stat information, including inode, etc.)
(fill namebuf with the name of the directory entry)
lenentry = ((24+strlen(namebuf)+7)&~7);
off = nextoff; /* offset of this entry */
nextoff += lenentry;
/* Skip this entry if we weren't asked for it */
if (off<offset)
continue;
/* Add this to our response until we are asked to stop */
if (filler(buf, namebuf, &st, nextoff))
break;
}
/* All done because we were asked to stop or because we finished */
return 0;
}
I tested this within my own code (I had never used the offset before), and it works fine.

enter the file name by keyboard instead of command line arguments?

Actually the below program is for Dispersal Algorithm called Rabin-IDA; this algorithm divided the data into N pieces and then recombine it from M pieces (such that M<N).
Thus, the below program needs command line arguments,which entering by Project properties/Debugging.
this argument is file name, where the program performing spitted the file into N files, and then recombine it from M divided files, and put it on another file which should also passing its name as argument .
Now my question is, How can i make this program enter the file name by keyboard??(i mean enter the files name by user from screen not as command line arguments)
the below code is just the main function of program, and the whole of it in this link (http://www.juancamilocorena.com/home/projects) Information Dispersal Algorithms Rabin-IDA.
#include "include.h"
void __cdecl _tmain(int argc, TCHAR *argv[])
{
DWORD ini=GetTickCount();
try
{
if( argc == 3 ) //recombine
{
RabinIDA rabin=RabinIDA(17,10);
long long size=GetFileSize(argv[1]);
int f[]={0,2,3,5,6,8,9,11,14,15};
rabin.recombine(argv[1],
f,
argv[2],
size);
}
else if(argc == 2)
{
RabinIDA rabin=RabinIDA(17,10);
rabin.split(argv[1]);
}
else
{
printf("Error. To split a file pass a parameter with the file to be splitted\n");
printf("To recombine the file give the name of the original file and the output file\n");
printf("The name of the file is used to get the size of the original file only, in a production\n");
printf("environment the length of the original file and the id of the share must be stored along with the share");
return;
}
printf("%d\n",GetTickCount()-ini);
}
catch (int)
{
PrintLastError(_T("MAIN CATCH"));
}
}
If you want to get the file name from the console, you can do this:
cout << "Enter file name: ";
string filename;
getline(cin, filename);

.txt to binary c++, issue with int

I have a text file that I am converting to binary. Its a 7 digit no. followed by a name and then repeat for however many names is listed.
1234567 First Last
7654321 First Last
Because its a 7 digit int, I am having trouble outputting it to the binary using this method with the int struct. It gives me an awfully large .DAT (binary) file whenever I write to it even with say just 3 names. Is there a better way of outputting it so my binary .dat files look about 200kb and doesn't end up in the 20mb+ range?
const int MAX = 50;
struct StudentRegistration{
int studentID;
char name[MAX];
};
fstream afile;
ifstream infile;
afile.open (fileName2, ios::out | ios::binary);
infile.open (fileName1);
while (infile >> s.studentID)
{
infile.get(space);
infile.getline(&s.name, MAX);
afile.seekp((s.studentID-1)*sizeof(StudentRegistration), ios::beg);
afile.write(reinterpret_cast <const char *>(&s), sizeof(s));
}
afile.close();
infile.close();
I removed the seekp line and it seems to do what I want to it for now.

How to get size of file in visual c++?

Below is my code. My problem is, my destination file always has a lot more strings than the originating file. Then, inside the for loop, instead of using i < sizeof more, I realized that I should use i < sizeof file2 . Now my problem is, how to get the size of file2?
int i = 0;
FILE *file2 = fopen(LOG_FILE_NAME,"r");
wfstream file3 (myfile, ios_base::out);
// char more[1024];
char more[SIZE-OF-file2];
for(i = 0; i < SIZE-OF-file2 ; i++)
{
fgets(more, SIZE-OF-file2, file2);
file3 << more;
}
fclose(file2);
file3.close();
The most basic way is to fseek to the end of the file and to use ftell to give you the offset. The other (stat) functions also do this, but they're not cross-platform. Of course, if you want your code rot in hell, you could also use GetFileSize().
fseek(file, 0, SEEK_END);
off_t offset = ftell(file);
fseek(file, 0, SEEK_SET);
Every time you refer to C as Visual C, or C++ as Visual C++ I die a little.
You can do this using GetFileSize(). By reading the size of the file from the filesystem, you will avoid a lot of unnecessary computation. This can also be done with _stat(), or on unix it would just be stat().
Here is the definition:
DWORD WINAPI GetFileSize(
__in HANDLE hFile,
__out_opt LPDWORD lpFileSizeHigh
);
Doc for GetFileSize:
http://msdn.microsoft.com/en-us/library/aa364955%28VS.85%29.aspx
Alternatively you might want to use _stat()
Doc for stat:
http://msdn.microsoft.com/en-us/library/14h5k7ff%28VS.80%29.aspx

Resources