D (Tango) Read all standard input and assign it to a string - string

In the D language how I can read all standard input and assign it to a string (with Tango library) ?

Copied straight from http://www.dsource.org/projects/tango/wiki/ChapterIoConsole:
import tango.text.stream.LineIterator;
foreach (line; new LineIterator!(char)(Cin.stream))
// do something with each line
If only 1 line is required, use
auto line = Cin.copyln();

Another, probably more efficient way, of dumping the contents of Stdin would be something like this:
module dumpstdin;
import tango.io.Console : Cin;
import tango.io.device.Array : Array;
import tango.io.model.IConduit : InputStream;
const BufferInitialSize = 4096u;
const BufferGrowingStep = 4096u;
ubyte[] dumpStream(InputStream ins)
{
auto buffer = new Array(BufferInitialSize, BufferGrowingStep);
buffer.copy(ins);
return cast(ubyte[]) buffer.slice();
}
import tango.io.Stdout : Stdout;
void main()
{
auto contentsOfStdin
= cast(char[]) dumpStream(Cin.stream);
Stdout
("Finished reading Stdin.").newline()
("Contents of Stdin was:").newline()
("<<")(contentsOfStdin)(">>").newline();
}
Some notes:
The second parameter to Array is necessary; if you omit it, Array will not grow in size.
I used 4096 since that's generally the size of a page of memory.
dumpStream returns a ubyte[] because char[] is defined as a UTF-8 string, which Stdin doesn't necessarily need to be. For example, if someone piped a binary file to your program, you would end up with an invalid char[] that could throw an exception if anything checks it for validity. If you only care about text, then casting the result to a char[] is fine.
copy is a method on the OutputStream interface that causes it to drain the provided InputStream of all input.

Related

Why is using tinyxml2-ex::text returning corrupted text?

I am trying to use the tinyxml2-ex library to read some XML data.
When I try using it's specific API call:
const CString strNameToUse(tinyxml2::text(pAssign).c_str());
The resulting string loses things like accents. In the end I have reverted to my original approach with the UTF8 handling:
const CString strNameToUse(CA2CT(pAssign->GetText(), CP_UTF8));
This works fine. Does anyone know why the tinyxml2-ex::text approach fails? Note that it is permissible to the use the tinyxml2 namespace.
The referred to library is using std::string and does it like this:
// helper function to get element text as a string, blank if none
inline std::string text (const XMLElement * element)
{
if (!element)
throw XmlException ("null element"s);
if (auto value = element -> GetText())
return std::string (value);
else
return ""s;
}
The library author explained (GitHub discussion:
It's because tixml2ex::text (see line 465 in tixml2ex.h) does this:
if (auto value = element -> GetText())
return std::string (value);
which will corrupt any string containing characters outside ASCII 127.

How to find condition that start char in UTF-8 file is read, using FileStream and StreamReader?

In C# .NET 4.0 (really 4.5.2), my code reads a UTF-8 file.
FileStream fstream = new FileStream(path, FileMode.Open);
BufferedStream stream = new BufferedStream(fstream);
using (StreamReader reader = new StreamReader(stream, new UTF8Encoding())) {
int i;
while((i = reader.Read()) > -1) {
//a guess at a condition that is true I.F.F. reader has read character 1 of the file
if (stream.Position == (0 + sizeof(char)) || stream.Position == (0 + sizeof(int)) ) {
//while loop has reader read through all characters,
//but within this block, the reader has surely read character 1?
char c = (char)i;
}
}
reader.Close();
return 0;
}
I.F.F. we reach the condition that StreamReader reads the start character of the UTF-8 file, then run some function on the first character read.
With a FileStream and StreamReader used in reading a UTF-8 file, how do you know whether the aforementioned condition is met?
I am looking for an answer, please, that uses a property or method that already exists in the C# .NET 4.0 System.IO namespace. I thought use of the Stream.Position (BufferedStream.Position) property is the obvious way to find out where (i.e. at what character) in the file the reader is, but in trying a UTF-8 file that starts with some character in '0' to '9' (48 to 57), the loop with reader.Read() reads that char, and stream.Position = 43 . I don't know why 43 of all integral values is the value of stream.Position after the 1st character is read, or what the 43 means.
update: As the loop iterates and the reader reads more characters, the stream.Position value remains at 43. I don't know the Position property is useful then.
bool first = true;
while((i = reader.Read()) > -1)
{
if (first)
{
first = false;
// Do first character things
}
Note that the concept of first character is complex: what happens if the first glyph is è, that occupies two bytes in the file? The stream position will be at least 2 :-)
In general, you can check what the Position of the StreamReader.BaseStream is, but that Position is nearly useless, because there could be multiple levels of caching, or simply because for reading a single char, the StreamReader could have consumed 1-4 bytes (à is one byte, while some Unicode characters are long 4 bytes)... And then UTF8 files can have a BOM (an initial header long 3 bytes). That too is normally skipped from StreamReader.
Still, if you want, you can subclass the entire StreamReader class, overriding all the Read*, and keeping an internal flag SomethingHasBeenRead. It isn't difficult (everything is virtual in StreamReader)... It is only a little long to do.

Node.js buffer string serialization

I want to serialize a buffer to string without any overhead ( one character for one byte) and be able to unserialize it into buffer again.
var b = new Buffer (4) ;
var s = b.toString() ;
var b2 = new Buffer (s)
Produces the same results only for values below 128. I want to use the whole scope of 0-255.
I know I can write it in a loop with String.fromCharCode() in serializing and String.charCodeAt() in deserializing, but I'm looking for some native module implementation if there is any.
You can use the 'latin1' encoding, but you should generally try to avoid it because converting a Buffer to a binary string has some extra computational overhead.
Example:
var b = Buffer.alloc(4);
var s = b.toString('latin1');
var b2 = Buffer.from(s, 'latin1');

String splitting in the D language

I am learning D and trying to split strings:
import std.stdio;
import std.string;
auto file = File(path, "r");
foreach (line; file.byLine) {
string[] parts = split(line);
This fails to compile with:
Error: cannot implicitly convert expression (split(line)) of type char[][] to string[]
This works:
auto file = File(path, "r");
foreach (line; file.byLine) {
char[][] parts = split(line);
But why do I have to use a char[][]? As far as I understand the documentation, it says that split returns a string[], which I would prefer.
Use split(line.idup);
split is a template function, the return type depends on its argument. file.byLine.front returns a char[] which is also reused for performance reasons. So if you need the parts after the current loop iteration you have to do a dup or idup, whatever you need.
You can use std.stdio.lines. Depending on how you type the variable of your foreach loop, it will allocate a new buffer for every iteration or reuse the old. This way you can save the .dup/.idup.
However what type to choose depends on your use case (i.e. how long do you need the data).
foreach(string line; lines(file)) { // new string every iteration }
foreach(char[] line; lines(file)) { // reuse buffer }
Using ubyte instead of char will disable the utf8 validation.

Binary file I/O

How to read and write to binary files in D language? In C would be:
FILE *fp = fopen("/home/peu/Desktop/bla.bin", "wb");
char x[4] = "RIFF";
fwrite(x, sizeof(char), 4, fp);
I found rawWrite at D docs, but I don't know the usage, nor if does what I think. fread is from C:
T[] rawRead(T)(T[] buffer);
If the file is not opened, throws an exception. Otherwise, calls fread for the file handle and throws on error.
rawRead always read in binary mode on Windows.
rawRead and rawWrite should behave exactly like fread, fwrite, only they are templates to take care of argument sizes and lengths.
e.g.
auto stream = File("filename","r+");
auto outstring = "abcd";
stream.rawWrite(outstring);
stream.rewind();
auto inbytes = new char[4];
stream.rawRead(inbytes);
assert(inbytes[3] == outstring[3]);
rawRead is implemented in terms of fread as
T[] rawRead(T)(T[] buffer)
{
enforce(buffer.length, "rawRead must take a non-empty buffer");
immutable result =
.fread(buffer.ptr, T.sizeof, buffer.length, p.handle);
errnoEnforce(!error);
return result ? buffer[0 .. result] : null;
}
If you just want to read in a big buffer of values (say, ints), you can simply do:
int[] ints = cast(int[]) std.file.read("ints.bin", numInts * int.sizeof);
and
std.file.write("ints.bin", ints);
Of course, if you have more structured data then Scott Wales' answer is more appropriate.

Resources