String splitting in the D language - string

I am learning D and trying to split strings:
import std.stdio;
import std.string;
auto file = File(path, "r");
foreach (line; file.byLine) {
string[] parts = split(line);
This fails to compile with:
Error: cannot implicitly convert expression (split(line)) of type char[][] to string[]
This works:
auto file = File(path, "r");
foreach (line; file.byLine) {
char[][] parts = split(line);
But why do I have to use a char[][]? As far as I understand the documentation, it says that split returns a string[], which I would prefer.

Use split(line.idup);
split is a template function, the return type depends on its argument. file.byLine.front returns a char[] which is also reused for performance reasons. So if you need the parts after the current loop iteration you have to do a dup or idup, whatever you need.

You can use std.stdio.lines. Depending on how you type the variable of your foreach loop, it will allocate a new buffer for every iteration or reuse the old. This way you can save the .dup/.idup.
However what type to choose depends on your use case (i.e. how long do you need the data).
foreach(string line; lines(file)) { // new string every iteration }
foreach(char[] line; lines(file)) { // reuse buffer }
Using ubyte instead of char will disable the utf8 validation.

Related

Why is using tinyxml2-ex::text returning corrupted text?

I am trying to use the tinyxml2-ex library to read some XML data.
When I try using it's specific API call:
const CString strNameToUse(tinyxml2::text(pAssign).c_str());
The resulting string loses things like accents. In the end I have reverted to my original approach with the UTF8 handling:
const CString strNameToUse(CA2CT(pAssign->GetText(), CP_UTF8));
This works fine. Does anyone know why the tinyxml2-ex::text approach fails? Note that it is permissible to the use the tinyxml2 namespace.
The referred to library is using std::string and does it like this:
// helper function to get element text as a string, blank if none
inline std::string text (const XMLElement * element)
{
if (!element)
throw XmlException ("null element"s);
if (auto value = element -> GetText())
return std::string (value);
else
return ""s;
}
The library author explained (GitHub discussion:
It's because tixml2ex::text (see line 465 in tixml2ex.h) does this:
if (auto value = element -> GetText())
return std::string (value);
which will corrupt any string containing characters outside ASCII 127.

How would I populate a vector with all the elements from a List of type system string while converting it to std::string?

I am trying to understand lambda functions better and would like some example of how I could add to a vector while converting System.String^ to std::string with such a Lambda example (If I am able to).
My current foreach:
List<String^>^ names = //Returning 'System.String' List from C#
for each (System::String^ name in names)
{
std::string convertedString = msclr::interop::marshal_as< std::string >(name);
nameObjects.push_back(MyObject(convertedString, "test"));
}
But I would like to extend it to something like this (My best guess but I am missing the logic to convert each element of "names" to a single string, this is where a Lambda would help me):
std::vector<nameObjects> testObjects{ std::begin(msclr::interop::marshal_as< std::string >(names)), std::end(msclr::interop::marshal_as< std::string >(names)) };
Alright, I figured out a way to make this work...it requires using the obscure cliext classes.
First, create a cliext::vector, there is an overload with takes an IEnumerator.
cliext::vector<String^> v_names(names);
Now, you can use cliext::transform() (not std::transform) to do STL-style iteration, and create MyObject instances with a lambda
std::vector<MyObject> testObjects;
cliext::transform(v_names.begin(), v_names.end(), std::back_inserter(testObjects), [](String^ name)
{
std::string convertedString = msclr::interop::marshal_as< std::string >(name);
return MyObject(convertedString, "test");
});

Struct store large chunk of data

I have a data file with millions of rows and I wanted to read that and store in a struct.
public struct Sample
{
public int A;
public DateTime B;
}
Sample[] sample = new Sample[];
This definition gives me this error "Wrong number of indicies inside[]; expected 1"
How do I store data in struct (with less memory usage)? Array is that best of something else?
var reader = new StreamReader(File.OpenRead(#"C:\test.csv"));
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
var values = line.Split(';');
}
In order to allocate an array, you need to specify how many elements it holds. That's what the error is telling you.
If you don't know the size, you can use List<T> which will grow as needed. Internally List<T> is implemented using T[] which means that from a look-up stand point it acts like an array. However, the work of reallocating larger arrays as needed is handled by List<T>.

C++ : Strings, Structures and Access Violation Writing Locations

I'm attempting to try and use a string input from a method and set that to a variable of a structure, which i then place in a linked list. I didn't include, all of code but I did post constructor and all that good stuff. Now the code is breaking at the lines
node->title = newTitle;
node->isbn = newISBN;
So newTitle is the string input from the method that I'm trying to set to the title variable of the Book structure of the variable node. Now, I'm assuming this has to do with a issue with pointers and trying to set data to them, but I can't figure out a fix/alternative.
Also, I tried using
strcpy(node->title, newTitle)
But that had an issue with converting the string into a list of chars because strcpy only uses a list of characters. Also tried a few other things, but none seemed to pan out, help with an explanation would be appreciated.
struct Book
{
string title;
string isbn;
struct Book * next;
};
//class LinkedList will contains a linked list of books
class LinkedList
{
private:
Book * head;
public:
LinkedList();
~LinkedList();
bool addElement(string title, string isbn);
bool removeElement(string isbn);
void printList();
};
//Constructor
//It sets head to be NULL to create an empty linked list
LinkedList::LinkedList()
{
head = NULL;
}
//Description: Adds an element to the link in alphabetical order, unless book with
same title then discards
// Returns true if added, false otherwise
bool LinkedList::addElement(string newTitle, string newISBN)
{
struct Book *temp;
struct Book *lastEntry = NULL;
temp = head;
if (temp==NULL) //If the list is empty, sets data to first entry
{
struct Book *node;
node = (Book*) malloc(sizeof(Book));
node->title = newTitle;
node->isbn = newISBN;
head = node;
}
while (temp!=NULL)
{
... //Rest of Code
Note that your Book struct is already a linked list implementation, so you don't need the LinkedList class at all, or alternatively you don't need the 'next' element of the struct.
But there's no reason from the last (long) code snippet you pasted to have an error at the lines you indicated. node->title = newTitle should copy the string in newTitle to the title field of the struct. The string object is fixed size so it's not possible to overwrite any buffer and cause a seg fault.
However, there may be memory corruption from something you do further up the code, which doesn't cause an error until later on. The thing to look for is any arrays, including char[], that you might be overfilling. Another idea is you mention you save method parameters. If you copy, it's ok, but if you do something like
char* f() {
char str[20];
strcpy(str, "hello");
return str;
}
...then you've got a problem. (Because str is allocated on the stack and you return only the pointer to a location that won't be valid after the function returns.) Method parameters are local variables.
The answer you seek can be found here.
In short: the memory malloc returns does not contain a properly constructed object, so you can't use it as such. Try using new / delete instead.

D (Tango) Read all standard input and assign it to a string

In the D language how I can read all standard input and assign it to a string (with Tango library) ?
Copied straight from http://www.dsource.org/projects/tango/wiki/ChapterIoConsole:
import tango.text.stream.LineIterator;
foreach (line; new LineIterator!(char)(Cin.stream))
// do something with each line
If only 1 line is required, use
auto line = Cin.copyln();
Another, probably more efficient way, of dumping the contents of Stdin would be something like this:
module dumpstdin;
import tango.io.Console : Cin;
import tango.io.device.Array : Array;
import tango.io.model.IConduit : InputStream;
const BufferInitialSize = 4096u;
const BufferGrowingStep = 4096u;
ubyte[] dumpStream(InputStream ins)
{
auto buffer = new Array(BufferInitialSize, BufferGrowingStep);
buffer.copy(ins);
return cast(ubyte[]) buffer.slice();
}
import tango.io.Stdout : Stdout;
void main()
{
auto contentsOfStdin
= cast(char[]) dumpStream(Cin.stream);
Stdout
("Finished reading Stdin.").newline()
("Contents of Stdin was:").newline()
("<<")(contentsOfStdin)(">>").newline();
}
Some notes:
The second parameter to Array is necessary; if you omit it, Array will not grow in size.
I used 4096 since that's generally the size of a page of memory.
dumpStream returns a ubyte[] because char[] is defined as a UTF-8 string, which Stdin doesn't necessarily need to be. For example, if someone piped a binary file to your program, you would end up with an invalid char[] that could throw an exception if anything checks it for validity. If you only care about text, then casting the result to a char[] is fine.
copy is a method on the OutputStream interface that causes it to drain the provided InputStream of all input.

Resources