Extra [ ] Operator Added When Using 2 overloaded [ ] Operators - object

As a side project I'm writing a couple of classes to do matrix operations and operations on linear systems. The class LinearSystem holds pointers to Matrix class objects in a std::map. The Matrix class itself holds the 2d array ("matrix") in a double float pointer. To I wrote 2 overloaded [ ] operators, one to return a pointer to a matrix object directly from the LinearSystem object, and another to return a row (float *) from a matrix object.
Both of these operators work perfectly on their own. I'm able to use LinearSystem["keyString"] to get a matrix object pointer from the map. And I'm able to use Matrix[row] to get a row (float *) and Matrix[row][col] to get a specific float element from a matrix objects' 2d array.
The trouble comes when I put them together. My limited understanding (rising senior CompSci major) tells me that I should have no problem using LinearSystem["keyString"][row][col] to get an element from a specific array within the Linear system object. The return types should look like LinearSystem->Matrix->float *->float. But for some reason it only works when I place an extra [0] after the overloaded key operator so the call looks like this: LinearSystem["keyString"][0][row][col]. And it HAS to be 0, anything else and I segfault.
Another interesting thing to note is that CLion sees ["keyString"] as overloaded and [row] as overloaded, but not [0], as if its calling the standard index operator, but on what is the question that has me puzzled. LinearSystem["keyString"] is for sure returning Matrix * which only has an overloaded [ ] operator. See the attached screenshot.
screenshot
Here's the code, let me know if more is needed.
LinearSystem [ ] and map declaration:
Matrix *myNameSpace::LinearSystem::operator[](const std::string &name) {
return matrices[name];
}
std::map<std::string, Matrix *> matrices;
Matrix [ ] and array declaration:
inline float *myNameSpace::Matrix::operator[](const int row) {
return elements[row];
}
float **elements;
Note, the above function is inline'd because I'm challenging myself to make the code as fast as possible and even with compiler optimizations, the overloaded [ ] was 15% to 30% slower than using Matrix.elements[row].
Please let me know if any more info is needed, this is my first post so it I'm sure its not perfect.
Thank you!

You're writing C in C++. You need to not do that. It adds complexity. Those stars you keep putting after your types. Those are raw pointers, and they should almost always be avoided. linearSystem["foo"] is a Matrix*, i.e. a pointer to Matrix. It is not a Matrix. And pointers index as arrays, so linearSystem["foo"][0] gets the first element of the Matrix*, treating it as an array. Since there's actually only one element, this works out and seems to do what you want. If that sounds confusing, that's because it is.
Make sure you understand ownership semantics. Raw pointers are frowned upon because they don't convey any information. If you want a value to be owned by the data structure, it should be a T directly (or a std::unique_ptr<T> if you need virtual inheritance), not a T*. If you're taking a function argument or returning a value that you don't want to pass ownership of, you should take/return a const T&, or T& if you need to modify the referent.
I'm not sure exactly what your data looks like, but a reasonable representation of a Matrix class (if you're going for speed) is as an instance variable of type std::vector<double>. std::vector is a managed array of double. You can preallocate the array to whatever size you want and change it whenever you want. It also plays nice with Rule of Three/Five.
I'm not sure what LinearSystem is meant to represent, but I'm going to assume the matrices are meant to be owned by the linear system (i.e. when the system is freed, the matrices should have their memory freed as well), so I'm looking at something like
std::map<std::string, Matrix> matrices;
Again, you can wrap Matrix in a std::unique_ptr if you plan to inherit from it and do dynamic dispatch. Though I might question your design choices if your Matrix class is intended to be subclassed.
There's no reason for Matrix::operator[] to return a raw pointer to float (in fact, "raw pointer to float" is a pretty pointless type to begin with). I might suggest having two overloads.
float myNameSpace::Matrix::operator[](int row) const;
float& myNameSpace::Matrix::operator[](int row);
Likewise, LinearSystem::operator[] can have two overloads: one constant and one mutable.
const Matrix& myNameSpace::LinearSystem::operator[](const std::string& name) const;
Matrix& myNameSpace::LinearSystem::operator[](const std::string& name);
References (T& as opposed to T*) are smart and will effectively dereference when needed, so you can call Matrix::operator[] on a Matrix&, whereas you can't call that on a Matrix* without acknowledging the layer of indirection.
There's a lot of bad C++ advice out there. If the book / video / teacher you're learning from is telling you to allocate float** everywhere, then it's a bad book / video / teacher. C++-managed data is going to be far less error-prone and will perform comparably to raw pointers (the C++ compiler is smarter than either you or I when it comes to optimization, so let it do its thing).
If you do find yourself really feeling the need to go low-level and allocate raw pointers everywhere, then switch to C. C is a language designed for low-level pointer manipulation. C++ is a higher-level managed-memory language; it just so happens that, for historical reasons, C++ has several hundred footguns in the form of C-style allocations placed sporadically throughout the standard.
In summary: Modern C++ almost never uses T*, new, or delete. Start getting comfortable with smart pointers (std::unique_ptr and std::shared_ptr) and references. You'll thank yourself later.

Related

How to map a structure from a buffer like in C with a pointer and cast

In C, I can define many structures and structure of structures.
From a buffer, I can just set the pointer at the beginning of this structure to say this buffer represents this structure.
Of course, I do not want to copy anything, just mapping, otherwise I loose the benefit of the speed.
Is it possible in NodeJs ? How can I do ? How can I be sure it's a mapping and not creating a new object and copy information inside ?
Example:
struct House = {
uint8 door,
uint16BE kitchen,
etc...
}
var mybuff = Buffer.allocate(10, 0)
var MyHouse = new House(mybuff) // same as `House* MyHouse = (House*) mybuff`
console.log(MyHouse.door) // will display the value of door
console.log(MyHouse.kitchen) // will display the value of kitchen with BE function.
This is wrong but explain well what I am looking for.
This without copying anything.
And if I do MyHouse.door=56, mybuff contains know the 56. I consider mybuff as a pointer.
Edit after question update below
Opposed to C/C++, javascript uses pionters by default, so you don't have to do anything. It's the other way around, actually: You have to put some effort in if you want a copy of the current object.
In C, a struct is nothing more than a compile-time reference to different parts of data in the struct. So:
struct X {
int foo;
int bar;
}
is nothing more than saying: if you want bar from a variable with type X, just add the length of foo (length of int) to the base pointer.
In Javascript, we do not even have such a type. We can just say:
var x = {
foo: 1,
bar: 2
}
The lookup of bar will automatically be a pointer (we call them references in javascript) lookup. Because javascript does not have types, you can view an object as a map/dictionary with pointers to mixed types.
If you, for any reason, want to create a copy of a datastructure, you would have to iterate through the entire datastructure (recursively) and create a copy of the datastructure manually. The basic types are not pointer based. These include number (Javascript automatically differentiates between int and float under the hood), string and boolean.
Edit after question update
Although I am not an expert on this area, I do not think it is possible. The problem is, the underlying data representation (as in how the data is represented as bytes in memory) is different, because javascript does not have compile-time information about data structures. As I said before, javascript doesn't have classes/structs, just objects with fields, which basically behave (and may be implemented as) maps/dictionaries.
There are, however, some third party libraries to cope with these problems. There are two general approaches:
Unpack everything to javascript objects. The data will be copied, but you can work with it as normal javascript objects. You should use this if you read/write the data intensively, because the performance increase you get when working with normal javascript objects outweighs the advantage of not having to unpack the data. Link to example library
Leave all data in the buffer. When you need some of the data, compute the location of the data in the buffer at runtime, and read/write at this location accordingly. Because the struct data location computations are done in runtime, you should use this only when you have loads of data and only a few reads/writes to it. In this case the performance decrease of unpacking all data outweighs the few runtime computations that have to be done. Link to example library
As a side-note, if the amount of data you have to process isn't that much, I'd recommend to just unpack the data. It saves you the headache of having to use the library as interface to your data. Computers are fast enough nowadays to copy/process some amount of data in memory. Also, these third party libraries are just some examples. I recommend you do a little more research for libraries to decide which one suits your needs.

Box<X> vs move semantics on X

I have an easy question regarding Box<X>.
I understand what it does, it allocates X on the heap.
In C++ you use the new operator to allocate something on the heap so it can outlive the current scope (because if you create something on the stack it goes away at the end of the current block).
But reading Rust's documentation, it looks like you can create something on the stack and still return it taking advantage of the language's move semantics without having to resort to the heap.
Then it's not clear to me when to use Box<X> as opposed to simply X.
I just started reading about Rust so I apologize if I'm missing something obvious.
First of all: C++11 (and newer) has move semantics with rvalue references, too. So your question would also apply to C++. Keep in mind though, that C++'s move semantics are -- unlike Rust's ones -- highly unsafe.
Second: the word "move semantic" somehow hints the absence of a "copy", which is not true. Suppose you have a struct with 100 64-bit integers. If you would transfer an object of this struct via move semantics, those 100 integers will be copied (of course, the compiler's optimizer can often remove those copies, but anyway...). The advantage of move semantics comes to play when dealing with objects that deal with some kind of data on the heap (or pointers in general).
For example, take a look at Vec (similar to C++'s vector): the type itself only contains a pointer and two pointer-sized integer (ptr, len and cap). Those three times 64bit are still copied when the vector is moved, but the main data of the vector (which lives on the heap) is not touched.
That being said, let's discuss the main question: "Why to use Box at all?". There are actually many use cases:
Unsized types: some types (e.g. Trait-objects which also includes closures) are unsized, meaning their size is not known to the compiler. But the compiler has to know the size of each stack frame -- hence those unsized types cannot live on the stack.
Recursive data structures: think of a BinaryTreeNode struct. It saves two members named "left" and "right" of type... BinaryTreeNode? That won't work. So you can box both children so that the compiler knows the size of your struct.
Huge structs: think of the 100 integer struct mentioned above. If you don't want to copy it every time, you can allocate it on the heap (this happens pretty seldom).
There are cases where you can’t return X eg. if X is ?Sized (traits, non-compile-time-sized arrays, etc.). In those cases Box<X> will still work.

How can I obtain constant time access (like in an array) in a data structure in Haskell?

I'll get straight to it - is there a way to have a dynamically sized constant-time access data-structure in Haskell, much like an array in any other imperative language?
I'm sure there is a module somewhere that does this for us magically, but I'm hoping for a general explanation of how one would do this in a functional manner :)
As far as I'm aware, Map uses a binary tree representation so it has O(log(n)) access time, and lists of course have O(n) access time.
Additionally, if we made it so that it was immutable, it would be pure, right?
Any ideas how I could go about this (beyond something like Array = Array { one :: Int, two :: Int, three :: Int ...} in template Haskell or the like)?
If your key is isomorphic to Int then you can use IntMap as most of its operations are O(min(n,W)), where n is the number of elements and W is the number of bits in Int (usually 32 or 64), which means that as the collection gets large the cost of each individual operation converges to a constant.
a dynamically sized constant-time access data-structure in Haskell,
Data.Array
Data.Vector
etc etc.
For associative structures you can choose between:
Log-N tree and trie structures
Hash tables
Mixed hash mapped tries
With various different log-complexities and constant factors.
All of these are on hackage.
In addition to the other good answers, it might be useful to say that:
When restricted to Algebraic Data Types and purity, all dynamically
sized data structure must have at least logarithmic worst-case access
time.
Personally, I like to call this the price of purity.
Haskell offers you three main ways around this:
Change the problem: Use hashes or prefix trees.
For constant-time reads use pure Arrays or the more recent Vectors; they are not ADTs and need compiler support / hidden IO inside. Constant-time writes are not possible since purity forbids the original data structure to be modified.
For constant-time writes use the IO or ST monad, preferring ST when you can to avoid externally visible side effects. These monads are implemented in the compiler.
It's true that you can't have constant time access arrays in Haskell without compiler/runtime magic.
However, this isn't (just) because Haskell is functional. Arrays in Java and C# also require runtime magic. In Rust you might be able to implement them in unsafe code, but not in safe Rust.
The truth is any language that doesn't allow you to allocate memory of dynamic size, or that doesn't allow you to use pointers is going to require runtime magic to implement arrays.
That excludes any safe language, whether object oriented, or functional.
The only difference between Haskell and eg. Java with respect to Arrays, is that arrays are far less useful in Haskell than in Java, but in Java arrays are so core to everything we do that we don't even notice that they're magic.
There is one way though that Haskell requires more magic for arrays than eg. Java.
With Java you can initialise an empty array (which requires magic) and then fill it up with values (which doesn't).
With Haskell this would obviously go against immutability. So any array would have to be initialised with its values. Thus the compiler magic doesn't just stretch to giving you an empty chunk of memory to index into. It also requires giving you a way to initialise the array with values. So creation and initialisation of the array has to be a single step, entirely handled by the compiler.

Creating strings in D without allocating memory?

Is there any typesafe way to create a string in D, using information only available at runtime, without allocating memory?
A simple example of what I might want to do:
void renderText(string text) { ... }
void renderScore(int score)
{
char[16] text;
int n = sprintf(text.ptr, "Score: %d", score);
renderText(text[0..n]); // ERROR
}
Using this, you'd get an error because the slice of text is not immutable, and is therefore not a string (i.e. immutable(char)[])
I can only think of three ways around this:
Cast the slice to a string. It works, but is ugly.
Allocate a new string using the slice. This works, but I'd rather not have to allocate memory.
Change renderText to take a const(char)[]. This works here, but (a) it's ugly, and (b) many functions in Phobos require string, so if I want to use those in the same manner then this doesn't work.
None of these are particularly nice. Am I missing something? How does everyone else get around this problem?
You have static array of char. You want to pass it to a function that takes immutable(char)[]. The only way to do that without any allocation is to cast. Think about it. What you want is one type to act like it's another. That's what casting does. You could choose to use assumeUnique to do it, since that does exactly the cast that you're looking for, but whether that really gains you anything is debatable. Its main purpose is to document that what you're doing by the cast is to make the value being cast be treated as immutable and that there are no other references to it. Looking at your example, that's essentially true, since it's the last thing in the function, but whether you want to do that in general is up to you. Given that it's a static array which risks memory problems if you screw up and you pass it to a function that allows a reference to it to leak, I'm not sure that assumeUnique is the best choice. But again, it's up to you.
Regardless, if you're doing a cast (be it explicitly or with assumeUnique), you need to be certain that the function that you're passing it to is not going to leak references to the data that you're passing to it. If it does, then you're asking for trouble.
The other solution, of course, is to change the function so that it takes const(char)[], but that still runs the risk of leaking references to the data that you're passing in. So, you still need to be certain of what the function is actually going to do. If it's pure, doesn't return const(char)[] (or anything that could contain a const(char)[]), and there's no way that it could leak through any of the function's other arguments, then you're safe, but if any of those aren't true, then you're going to have to be careful. So, ultimately, I believe that all that using const(char)[] instead of casting to string really buys you is that you don't have to cast. That's still better, since it avoids the risk of screwing up the cast (and it's just better in general to avoid casting when you can), but you still have all of the same things to worry about with regards to escaping references.
Of course, that also requires that you be able to change the function to have the signature that you want. If you can't do that, then you're going to have to cast. I believe that at this point, most of Phobos' string-based functions have been changed so that they're templated on the string type. So, this should be less of a problem now with Phobos than it used to be. Some functions (in particular, those in std.file), still need to be templatized, but ultimately, functions in Phobos that require string specifically should be fairly rare and will have a good reason for requiring it.
Ultimately however, the problem is that you're trying to treat a static array as if it were a dynamic array, and while D definitely lets you do that, you're taking a definite risk in doing so, and you need to be certain that the functions that you're using don't leak any references to the local data that you're passing to them.
Check out assumeUnique from std.exception Jonathan's answer.
No, you cannot create a string without allocation. Did you mean access? To avoid allocation, you have to either use slice or pointer to access a previously created string. Not sure about cast though, it may or may not allocate new memory space for the new string.
One way to get around this would be to copy the mutable chars into a new immutable version then slice that:
void renderScore(int score)
{
char[16] text;
int n = sprintf(text.ptr, "Score: %d", score);
immutable(char)[16] itext = text;
renderText(itext[0..n]);
}
However:
DMD currently doesn't allow this due to a bug.
You're creating an unnecessary copy (better than a GC allocation, but still not great).

What's going on in the 'offsetof' macro?

Visual C++ 2008 C runtime offers an operator 'offsetof', which is actually macro defined as this:
#define offsetof(s,m) (size_t)&reinterpret_cast<const volatile char&>((((s *)0)->m))
This allows you to calculate the offset of the member variable m within the class s.
What I don't understand in this declaration is:
Why are we casting m to anything at all and then dereferencing it? Wouldn't this have worked just as well:
&(((s*)0)->m)
?
What's the reason for choosing char reference (char&) as the cast target?
Why use volatile? Is there a danger of the compiler optimizing the loading of m? If so, in what exact way could that happen?
An offset is in bytes. So to get a number expressed in bytes, you have to cast the addresses to char, because that is the same size as a byte (on this platform).
The use of volatile is perhaps a cautious step to ensure that no compiler optimisations (either that exist now or may be added in the future) will change the precise meaning of the cast.
Update:
If we look at the macro definition:
(size_t)&reinterpret_cast<const volatile char&>((((s *)0)->m))
With the cast-to-char removed it would be:
(size_t)&((((s *)0)->m))
In other words, get the address of member m in an object at address zero, which does look okay at first glance. So there must be some way that this would potentially cause a problem.
One thing that springs to mind is that the operator & may be overloaded on whatever type m happens to be. If so, this macro would be executing arbitrary code on an "artificial" object that is somewhere quite close to address zero. This would probably cause an access violation.
This kind of abuse may be outside the applicability of offsetof, which is supposed to only be used with POD types. Perhaps the idea is that it is better to return a junk value instead of crashing.
(Update 2: As Steve pointed out in the comments, there would be no similar problem with operator ->)
offsetof is something to be very careful with in C++. It's a relic from C. These days we are supposed to use member pointers. That said, I believe that member pointers to data members are overdesigned and broken - I actually prefer offsetof.
Even so, offsetof is full of nasty surprises.
First, for your specific questions, I suspect the real issue is that they've adapted relative to the traditional C macro (which I thought was mandated in the C++ standard). They probably use reinterpret_cast for "it's C++!" reasons (so why the (size_t) cast?), and a char& rather than a char* to try to simplify the expression a little.
Casting to char looks redundant in this form, but probably isn't. (size_t) is not equivalent to reinterpret_cast, and if you try to cast pointers to other types into integers, you run into problems. I don't think the compiler even allows it, but to be honest, I'm suffering memory failure ATM.
The fact that char is a single byte type has some relevance in the traditional form, but that may only be why the cast is correct again. To be honest, I seem to remember casting to void*, then char*.
Incidentally, having gone to the trouble of using C++-specific stuff, they really should be using std::ptrdiff_t for the final cast.
Anyway, coming back to the nasty surprises...
VC++ and GCC probably won't use that macro. IIRC, they have a compiler intrinsic, depending on options.
The reason is to do what offsetof is intended to do, rather than what the macro does, which is reliable in C but not in C++. To understand this, consider what would happen if your struct uses multiple or virtual inheritance. In the macro, when you dereference a null pointer, you end up trying to access a virtual table pointer that isn't there at address zero, meaning that your app probably crashes.
For this reason, some compilers have an intrinsic that just uses the specified structs layout instead of trying to deduce a run-time type. But the C++ standard doesn't mandate or even suggest this - it's only there for C compatibility reasons. And you still have to be careful if you're working with class heirarchies, because as soon as you use multiple or virtual inheritance, you cannot assume that the layout of the derived class matches the layout of the base class - you have to ensure that the offset is valid for the exact run-time type, not just a particular base.
If you're working on a data structure library, maybe using single inheritance for nodes, but apps cannot see or use your nodes directly, offsetof works well. But strictly speaking, even then, there's a gotcha. If your data structure is in a template, the nodes may have fields with types from template parameters (the contained data type). If that isn't POD, technically your structs aren't POD either. And all the standard demands for offsetof is that it works for POD. In practice, it will work - your type hasn't gained a virtual table or anything just because it has a non-POD member - but you have no guarantees.
If you know the exact run-time type when you dereference using a field offset, you should be OK even with multiple and virtual inheritance, but ONLY if the compiler provides an intrinsic implementation of offsetof to derive that offset in the first place. My advice - don't do it.
Why use inheritance in a data structure library? Well, how about...
class node_base { ... };
class leaf_node : public node_base { ... };
class branch_node : public node_base { ... };
The fields in the node_base are automatically shared (with identical layout) in both the leaf and branch, avoiding a common error in C with accidentally different node layouts.
BTW - offsetof is avoidable with this kind of stuff. Even if you are using offsetof for some jobs, node_base can still have virtual methods and therefore a virtual table, so long as it isn't needed to dereference member variables. Therefore, node_base can have pure virtual getters, setters and other methods. Normally, that's exactly what you should do. Using offsetof (or member pointers) is a complication, and should only be used as an optimisation if you know you need it. If your data structure is in a disk file, for instance, you definitely don't need it - a few virtual call overheads will be insignificant compared with the disk access overheads, so any optimisation efforts should go into minimising disk accesses.
Hmmm - went off on a bit of a tangent there. Whoops.
char is guarenteed to be the smallest number of bits the architectural can "bite" (aka byte).
All pointers are actually numbers, so cast adress 0 to that type because it's the beginning.
Take the address of member starting from 0 (resulting into 0 + location_of_m).
Cast that back to size_t.
1) I also do not know why it is done in this way.
2) The char type is special in two ways.
No other type has weaker alignment restrictions than the char type. This is important for reinterpret cast between pointers and between expression and reference.
It is also the only type (together with its unsigned variant) for which the specification defines behavior in case the char is used to access stored value of variables of different type. I do not know if this applies to this specific situation.
3) I think that the volatile modifier is used to ensure that no compiler optimization will result in attempt to read the memory.
2 . What's the reason for choosing char reference (char&) as the cast target?
if type s has operator& overloaded then we can't get address using &s
so we reinterpret_cast the type s to primitive type char because primitive type char
doesn't have operator& overloaded
now we can get address from that
if in C then reinterpret_cast is not required
3 . Why use volatile? Is there a danger of the compiler optimizing the loading of m? If so, in what exact way could that happen?
here volatile is not relevant to compiler optimizing.
if type s have const or volatile or both qualifier(s) then
reinterpret_cast can't cast to char& because reinterpret_cast can't remove cv-qualifiers
so result is using <const volatile char&> for casting work from any combination

Resources