Binary file I/O

Binary file I/O - io

How to read and write to binary files in D language? In C would be:
FILE *fp = fopen("/home/peu/Desktop/bla.bin", "wb");
char x[4] = "RIFF";
fwrite(x, sizeof(char), 4, fp);
I found rawWrite at D docs, but I don't know the usage, nor if does what I think. fread is from C:
T[] rawRead(T)(T[] buffer);
If the file is not opened, throws an exception. Otherwise, calls fread for the file handle and throws on error.
rawRead always read in binary mode on Windows.

rawRead and rawWrite should behave exactly like fread, fwrite, only they are templates to take care of argument sizes and lengths.
e.g.
auto stream = File("filename","r+");
auto outstring = "abcd";
stream.rawWrite(outstring);
stream.rewind();
auto inbytes = new char[4];
stream.rawRead(inbytes);
assert(inbytes[3] == outstring[3]);
rawRead is implemented in terms of fread as
T[] rawRead(T)(T[] buffer)
{
enforce(buffer.length, "rawRead must take a non-empty buffer");
immutable result =
.fread(buffer.ptr, T.sizeof, buffer.length, p.handle);
errnoEnforce(!error);
return result ? buffer[0 .. result] : null;
}

If you just want to read in a big buffer of values (say, ints), you can simply do:
int[] ints = cast(int[]) std.file.read("ints.bin", numInts * int.sizeof);
and
std.file.write("ints.bin", ints);
Of course, if you have more structured data then Scott Wales' answer is more appropriate.

Related

How to add pointer char datas (created using malloc) to a char array in C?

In my MPI code in C, i'm receiving a word from each of my slave processes. I want to add all these words to an char array in master side (part of code below). I can print these words but not collect them into a single char array.
(I consider max word length as 10, and number of slave's as slavenumber)
char* word = (char*)malloc(sizeof(char)*10);
char words[slavenumber*10];
for (int p = 0; p<slavenumber; p++){
MPI_Recv(word, 10, MPI_CHAR, p, 0,MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("Word: %s\n", word); //it works fine
words[p*10] = *word; //This does not work, i think there is a problem here.
}
printf(words); //This does not work correctly, it gives something like: ��>;&�>W�
Can anybody help me on this?

Let's break it down line by line
// allocate a buffer large enough to hold 10 elements of type `char`
char* word = (char*)malloc(sizeof(char)*10);
// define a variable-length-array large enough to
// hold 10*slavenumber elements of `char`
char words[slavenumber*10];
for (int p = 0; p<slavenumber; p++){
// dereference `word` which is exactly the same as writing
// `word[0]` assigning it to `words[p*10]`
words[p*10] = *word;
// words[p*10+1] to words[p*10+9] are unchanged,
// i.e. uninitialized
}
// printing from an array. For this to work properly all
// accessed elements must be initialized and the buffer
// terminated by a null byte. You have neither
printf(words);
Because you left elements uninitialized and didn't null terminate, you're invoking undefined behavior. Be happy that you didn't get demons crawl out of your nose.
In seriousness though, in C you can copy strings by mere assignment. Your usage case calls for strncpy.
for (int p = 0; p<slavenumber; p++){
strncpy(&words[p*10], word, 10);
}

How to find condition that start char in UTF-8 file is read, using FileStream and StreamReader?

In C# .NET 4.0 (really 4.5.2), my code reads a UTF-8 file.
FileStream fstream = new FileStream(path, FileMode.Open);
BufferedStream stream = new BufferedStream(fstream);
using (StreamReader reader = new StreamReader(stream, new UTF8Encoding())) {
int i;
while((i = reader.Read()) > -1) {
//a guess at a condition that is true I.F.F. reader has read character 1 of the file
if (stream.Position == (0 + sizeof(char)) || stream.Position == (0 + sizeof(int)) ) {
//while loop has reader read through all characters,
//but within this block, the reader has surely read character 1?
char c = (char)i;
}
}
reader.Close();
return 0;
}
I.F.F. we reach the condition that StreamReader reads the start character of the UTF-8 file, then run some function on the first character read.
With a FileStream and StreamReader used in reading a UTF-8 file, how do you know whether the aforementioned condition is met?
I am looking for an answer, please, that uses a property or method that already exists in the C# .NET 4.0 System.IO namespace. I thought use of the Stream.Position (BufferedStream.Position) property is the obvious way to find out where (i.e. at what character) in the file the reader is, but in trying a UTF-8 file that starts with some character in '0' to '9' (48 to 57), the loop with reader.Read() reads that char, and stream.Position = 43 . I don't know why 43 of all integral values is the value of stream.Position after the 1st character is read, or what the 43 means.
update: As the loop iterates and the reader reads more characters, the stream.Position value remains at 43. I don't know the Position property is useful then.

bool first = true;
while((i = reader.Read()) > -1)
{
if (first)
{
first = false;
// Do first character things
}
Note that the concept of first character is complex: what happens if the first glyph is è, that occupies two bytes in the file? The stream position will be at least 2 :-)
In general, you can check what the Position of the StreamReader.BaseStream is, but that Position is nearly useless, because there could be multiple levels of caching, or simply because for reading a single char, the StreamReader could have consumed 1-4 bytes (à is one byte, while some Unicode characters are long 4 bytes)... And then UTF8 files can have a BOM (an initial header long 3 bytes). That too is normally skipped from StreamReader.
Still, if you want, you can subclass the entire StreamReader class, overriding all the Read*, and keeping an internal flag SomethingHasBeenRead. It isn't difficult (everything is virtual in StreamReader)... It is only a little long to do.

Returning string from a remote server using rpcgen

I am going through RPC tutorial and learn few techniques in rpcgen. I have the idea of adding, multiplying different data types using rpcgen.
But I have not found any clue that how could I declare a function in .x file which will return a string. Actually I am trying to build a procedure which will return a random string(rand string array is in server).
Can any one advise me how to proceed in this issue? It will be helpful if you advise me any tutorial regarding this returning string/pointer issue.
Thank you in advance.

Ok, answering to the original question (more than 2 years old), the first answer is correct but a little tricky.
In your .x file, you define your structure with the string inside, having defined previously the size of the string:
typedef string str_t<255>;
struct my_result {
str_t data;
};
...
Then you invoke rpcgen on your .x file to generate client and server stubs and .xdr file:
$rpcgen -N *file.x*
Now you can compile client and server in addition to any program where you pretend to use the remote functions. To do so, I followed the "repcgen Tutorial" in ORACLE's web page:
https://docs.oracle.com/cd/E19683-01/816-1435/rpcgenpguide-21470/index.html
The tricky part is, although you defined a string of size m (array of m characters) what rpcgen and .xdr file create is a pointer to allocated memmory. Something like this:
.h file
typedef char *str_t;
struct my_result {
int res;
str_t data;
};
typedef struct my_result my_result;
.xdr file
bool_t xdr_str_t (XDR *xdrs, str_t *objp)
{
register int32_t *buf;
if (!xdr_string (xdrs, objp, 255))
return FALSE;
return TRUE;
}
So just take into account when using this structure in your server side that it is not a string of size m, but a char pointer for which you'll have to reserve memory before using it or you'll be prompted the same error than me on execution:
Segmentation fault!
To use it on the server you can write:
static my_result response;
static char text[255];
memset(&response, '\0', sizeof(my_result));
memset(text, '\0', sizeof(text));
response.data = text;
And from there you are ready to use it wisely! :)

According to the XDR protocol specification you can define a string type where m is the length of the string in bytes:
The standard defines a string of n (numbered 0 to n -1) bytes to be the number n encoded as an unsigned integer (as described above), and followed by the n bytes of the string. Each byte must be regarded by the implementation as being 8-bit transparent data. This allows use of arbitrary character set encodings. Byte m of the string always precedes byte m +1 of the string, and byte 0 of the string always follows the string's length. If n is not a multiple of four, then the n bytes are followed by enough (0 to 3) residual zero bytes, r, to make the total byte count a multiple of four.
string object<m>;
You can then define a struct with the string type str_t as one of the variables:
typedef string str_t<255>;
struct my_result {
str_t data;
};
Then in your .x file you can define an RPC in your program which returns a struct of type my_result. Since rpcgen will give you a pointer to this struct (which I have called res) you can print the message with prinf("%s\n", res->data);.
program HELLO_PROG {
version HELLO_VERSION {
my_result abc() = 1;
} = 1;
} = 1000;

Parallel output using MPI IO to a single file

I have a very simple task to do, but somehow I am still stuck.
I have one BIG data file ("File_initial.dat"), which should be read by all nodes on the cluster (using MPI), each node will perform some manipulation on part of this BIG file (File_size / number_of_nodes) and finally each node will write its result to one shared BIG file ("File_final.dat"). The number of elements of files remain the same.
By googling I understood, that it is much better to write data file as a binary file (I have only decimal numbers in this file) and not as *.txt" file. Since no human will read this file, but only computers.
I tried to implement myself (but using formatted in/output and NOT binary file) this, but I get incorrect behavior.
My code so far follows:
#include <fstream>
#define NNN 30
int main(int argc, char **argv)
{
ifstream fin;
// setting MPI environment
int rank, nprocs;
MPI_File file;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &nprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
// reading the initial file
fin.open("initial.txt");
for (int i=0;i<NNN;i++)
{
fin >> res[i];
cout << res[i] << endl; // to see, what I have in the file
}
fin.close();
// starting position in the "res" array as a function of "rank" of process
int Pstart = (NNN / nprocs) * rank ;
// specifying Offset for writing to file
MPI_Offset offset = sizeof(double)*rank;
MPI_File file;
MPI_Status status;
// opening one shared file
MPI_File_open(MPI_COMM_WORLD, "final.txt", MPI_MODE_CREATE|MPI_MODE_WRONLY,
MPI_INFO_NULL, &file);
// setting local for each node array
double * localArray;
localArray = new double [NNN/nprocs];
// Performing some basic manipulation (squaring each element of array)
for (int i=0;i<(NNN / nprocs);i++)
{
localArray[i] = res[Pstart+i]*res[Pstart+i];
}
// Writing the result of each local array to the shared final file:
MPI_File_seek(file, offset, MPI_SEEK_SET);
MPI_File_write(file, localArray, sizeof(double), MPI_DOUBLE, &status);
MPI_File_close(&file);
MPI_Finalize();
return 0;
}
I understand, that I do something wrong, while trying to write double as a text file.
How one should change the code in order to be able to save
as .txt file (format output)
as .dat file (binary file)

Your binary file output is almost right; but your calculations for your offset within the file and the amount of data to write is incorrect. You want your offset to be
MPI_Offset offset = sizeof(double)*Pstart;
not
MPI_Offset offset = sizeof(double)*rank;
otherwise you'll have each rank overwriting each others data as (say) rank 3 out of nprocs=5 starts writing at double number 3 in the file, not (30/5)*3 = 18.
Also, you want each rank to write NNN/nprocs doubles, not sizeof(double) doubles, meaning you want
MPI_File_write(file, localArray, NNN/nprocs, MPI_DOUBLE, &status);
How to write as a text file is a much bigger issue; you have to convert the data into string internally and then output those strings, making sure you know how many characters each line requires by careful formatting. That is described in this answer on this site.

How to append binary data to a buffer in node.js

I have a buffer with some binary data:
var b = new Buffer ([0x00, 0x01, 0x02]);
and I want to append 0x03.
How can I append more binary data? I'm searching in the documentation but for appending data it must be a string, if not, an error occurs (TypeError: Argument must be a string):
var b = new Buffer (256);
b.write ("hola");
console.log (b.toString ("utf8", 0, 4)); //hola
b.write (", adios", 4);
console.log (b.toString ("utf8", 0, 11)); //hola, adios
Then, the only solution I can see here is to create a new buffer for every appended binary data and copy it to the major buffer with the correct offset:
var b = new Buffer (4); //4 for having a nice printed buffer, but the size will be 16KB
new Buffer ([0x00, 0x01, 0x02]).copy (b);
console.log (b); //<Buffer 00 01 02 00>
new Buffer ([0x03]).copy (b, 3);
console.log (b); //<Buffer 00 01 02 03>
But this seems a bit inefficient because I have to instantiate a new buffer for every append.
Do you know a better way for appending binary data?
EDIT
I've written a BufferedWriter that writes bytes to a file using internal buffers. Same as BufferedReader but for writing.
A quick example:
//The BufferedWriter truncates the file because append == false
new BufferedWriter ("file")
.on ("error", function (error){
console.log (error);
})
//From the beginning of the file:
.write ([0x00, 0x01, 0x02], 0, 3) //Writes 0x00, 0x01, 0x02
.write (new Buffer ([0x03, 0x04]), 1, 1) //Writes 0x04
.write (0x05) //Writes 0x05
.close (); //Closes the writer. A flush is implicitly done.
//The BufferedWriter appends content to the end of the file because append == true
new BufferedWriter ("file", true)
.on ("error", function (error){
console.log (error);
})
//From the end of the file:
.write (0xFF) //Writes 0xFF
.close (); //Closes the writer. A flush is implicitly done.
//The file contains: 0x00, 0x01, 0x02, 0x04, 0x05, 0xFF
LAST UPDATE
Use concat.

Updated Answer for Node.js ~>0.8
Node is able to concatenate buffers on its own now.
var newBuffer = Buffer.concat([buffer1, buffer2]);
Old Answer for Node.js ~0.6
I use a module to add a .concat function, among others:
https://github.com/coolaj86/node-bufferjs
I know it isn't a "pure" solution, but it works very well for my purposes.

Buffers are always of fixed size, there is no built in way to resize them dynamically, so your approach of copying it to a larger Buffer is the only way.
However, to be more efficient, you could make the Buffer larger than the original contents, so it contains some "free" space where you can add data without reallocating the Buffer. That way you don't need to create a new Buffer and copy the contents on each append operation.

This is to help anyone who comes here looking for a solution that wants a pure approach. I would recommend understanding this problem because it can happen in lots of different places not just with a JS Buffer object. By understanding why the problem exists and how to solve it you will improve your ability to solve other problems in the future since this one is so fundamental.
For those of us that have to deal with these problems in other languages it is quite natural to devise a solution, but there are people who may not realize how to abstract away the complexities and implement a generally efficient dynamic buffer. The code below may have potential to be optimized further.
I have left the read method unimplemented to keep the example small in size.
The realloc function in C (or any language dealing with intrinsic allocations) does not guarantee that the allocation will be expanded in size with out moving the existing data - although sometimes it is possible. Therefore most applications when needing to store a unknown amount of data will use a method like below and not constantly reallocate, unless the reallocation is very infrequent. This is essentially how most file systems handle writing data to a file. The file system simply allocates another node and keeps all the nodes linked together, and when you read from it the complexity is abstracted away so that the file/buffer appears to be a single contiguous buffer.
For those of you who wish to understand the difficulty in just simply providing a high performance dynamic buffer you only need to view the code below, and also do some research on memory heap algorithms and how the memory heap works for programs.
Most languages will provide a fixed size buffer for performance reasons, and then provide another version that is dynamic in size. Some language systems opt for a third-party system where they keep the core functionality minimal (core distribution) and encourage developers to create libraries to solve additional or higher level problems. This is why you may question why a language does not provide some functionality. This small core functionality allows costs to be reduced in maintaining and enhancing the language, however you end up having to write your own implementations or depending on a third-party.
var Buffer_A1 = function (chunk_size) {
this.buffer_list = [];
this.total_size = 0;
this.cur_size = 0;
this.cur_buffer = [];
this.chunk_size = chunk_size || 4096;
this.buffer_list.push(new Buffer(this.chunk_size));
};
Buffer_A1.prototype.writeByteArrayLimited = function (data, offset, length) {
var can_write = length > (this.chunk_size - this.cur_size) ? (this.chunk_size - this.cur_size) : length;
var lastbuf = this.buffer_list.length - 1;
for (var x = 0; x < can_write; ++x) {
this.buffer_list[lastbuf][this.cur_size + x] = data[x + offset];
}
this.cur_size += can_write;
this.total_size += can_write;
if (this.cur_size == this.chunk_size) {
this.buffer_list.push(new Buffer(this.chunk_size));
this.cur_size = 0;
}
return can_write;
};
/*
The `data` parameter can be anything that is array like. It just must
support indexing and a length and produce an acceptable value to be
used with Buffer.
*/
Buffer_A1.prototype.writeByteArray = function (data, offset, length) {
offset = offset == undefined ? 0 : offset;
length = length == undefined ? data.length : length;
var rem = length;
while (rem > 0) {
rem -= this.writeByteArrayLimited(data, length - rem, rem);
}
};
Buffer_A1.prototype.readByteArray = function (data, offset, length) {
/*
If you really wanted to implement some read functionality
then you would have to deal with unaligned reads which could
span two buffers.
*/
};
Buffer_A1.prototype.getSingleBuffer = function () {
var obuf = new Buffer(this.total_size);
var cur_off = 0;
var x;
for (x = 0; x < this.buffer_list.length - 1; ++x) {
this.buffer_list[x].copy(obuf, cur_off);
cur_off += this.buffer_list[x].length;
}
this.buffer_list[x].copy(obuf, cur_off, 0, this.cur_size);
return obuf;
};

insert byte to specific place.
insertToArray(arr,index,item) {
return Buffer.concat([arr.slice(0,index),Buffer.from(item,"utf-8"),arr.slice(index)]);
}

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Binary file I/O - io

If you just want to read in a big buffer of values (say, ints), you can simply do: int[] ints = cast(int[]) std.file.read("ints.bin", numInts * int.sizeof); and std.file.write("ints.bin", ints); Of course, if you have more structured data then Scott Wales' answer is more appropriate.

Related

How to add pointer char datas (created using malloc) to a char array in C?

How to find condition that start char in UTF-8 file is read, using FileStream and StreamReader?

Returning string from a remote server using rpcgen

Parallel output using MPI IO to a single file

How to append binary data to a buffer in node.js

Categories

Resources