What is the difference between the following code and why this type casting is used - string

I am writing the following size of macro, I wanted to know the difference.
#define my_sizeof(type) (char*)(&type+1)-(char*)(&type)
#define my_sizeof(type) (void*)(&type+1)-(void*)(&type)
1st question is why the type casting is required, I know that if I don't type cast it, it always return 1, I checked it by running it. I want to know the significance(I mean what it tells the compiler to do).
2ndly,
What difference does it make with char* and void*.?
Thanks a lot.

For the second question: You cannot do that at all, since there is no pointer arithmetic for void pointers (or for pointers to incomplete types in general, for that matter).
For the first part: By definition, sizeof(char) == 1, so by casting the pointers to char pointers, you obtain the difference in units of 1 rather than in units of sizeof(type) -- in other words, you obtain precisely the value of sizeof(type).

Related

Extra [ ] Operator Added When Using 2 overloaded [ ] Operators

As a side project I'm writing a couple of classes to do matrix operations and operations on linear systems. The class LinearSystem holds pointers to Matrix class objects in a std::map. The Matrix class itself holds the 2d array ("matrix") in a double float pointer. To I wrote 2 overloaded [ ] operators, one to return a pointer to a matrix object directly from the LinearSystem object, and another to return a row (float *) from a matrix object.
Both of these operators work perfectly on their own. I'm able to use LinearSystem["keyString"] to get a matrix object pointer from the map. And I'm able to use Matrix[row] to get a row (float *) and Matrix[row][col] to get a specific float element from a matrix objects' 2d array.
The trouble comes when I put them together. My limited understanding (rising senior CompSci major) tells me that I should have no problem using LinearSystem["keyString"][row][col] to get an element from a specific array within the Linear system object. The return types should look like LinearSystem->Matrix->float *->float. But for some reason it only works when I place an extra [0] after the overloaded key operator so the call looks like this: LinearSystem["keyString"][0][row][col]. And it HAS to be 0, anything else and I segfault.
Another interesting thing to note is that CLion sees ["keyString"] as overloaded and [row] as overloaded, but not [0], as if its calling the standard index operator, but on what is the question that has me puzzled. LinearSystem["keyString"] is for sure returning Matrix * which only has an overloaded [ ] operator. See the attached screenshot.
screenshot
Here's the code, let me know if more is needed.
LinearSystem [ ] and map declaration:
Matrix *myNameSpace::LinearSystem::operator[](const std::string &name) {
return matrices[name];
}
std::map<std::string, Matrix *> matrices;
Matrix [ ] and array declaration:
inline float *myNameSpace::Matrix::operator[](const int row) {
return elements[row];
}
float **elements;
Note, the above function is inline'd because I'm challenging myself to make the code as fast as possible and even with compiler optimizations, the overloaded [ ] was 15% to 30% slower than using Matrix.elements[row].
Please let me know if any more info is needed, this is my first post so it I'm sure its not perfect.
Thank you!
You're writing C in C++. You need to not do that. It adds complexity. Those stars you keep putting after your types. Those are raw pointers, and they should almost always be avoided. linearSystem["foo"] is a Matrix*, i.e. a pointer to Matrix. It is not a Matrix. And pointers index as arrays, so linearSystem["foo"][0] gets the first element of the Matrix*, treating it as an array. Since there's actually only one element, this works out and seems to do what you want. If that sounds confusing, that's because it is.
Make sure you understand ownership semantics. Raw pointers are frowned upon because they don't convey any information. If you want a value to be owned by the data structure, it should be a T directly (or a std::unique_ptr<T> if you need virtual inheritance), not a T*. If you're taking a function argument or returning a value that you don't want to pass ownership of, you should take/return a const T&, or T& if you need to modify the referent.
I'm not sure exactly what your data looks like, but a reasonable representation of a Matrix class (if you're going for speed) is as an instance variable of type std::vector<double>. std::vector is a managed array of double. You can preallocate the array to whatever size you want and change it whenever you want. It also plays nice with Rule of Three/Five.
I'm not sure what LinearSystem is meant to represent, but I'm going to assume the matrices are meant to be owned by the linear system (i.e. when the system is freed, the matrices should have their memory freed as well), so I'm looking at something like
std::map<std::string, Matrix> matrices;
Again, you can wrap Matrix in a std::unique_ptr if you plan to inherit from it and do dynamic dispatch. Though I might question your design choices if your Matrix class is intended to be subclassed.
There's no reason for Matrix::operator[] to return a raw pointer to float (in fact, "raw pointer to float" is a pretty pointless type to begin with). I might suggest having two overloads.
float myNameSpace::Matrix::operator[](int row) const;
float& myNameSpace::Matrix::operator[](int row);
Likewise, LinearSystem::operator[] can have two overloads: one constant and one mutable.
const Matrix& myNameSpace::LinearSystem::operator[](const std::string& name) const;
Matrix& myNameSpace::LinearSystem::operator[](const std::string& name);
References (T& as opposed to T*) are smart and will effectively dereference when needed, so you can call Matrix::operator[] on a Matrix&, whereas you can't call that on a Matrix* without acknowledging the layer of indirection.
There's a lot of bad C++ advice out there. If the book / video / teacher you're learning from is telling you to allocate float** everywhere, then it's a bad book / video / teacher. C++-managed data is going to be far less error-prone and will perform comparably to raw pointers (the C++ compiler is smarter than either you or I when it comes to optimization, so let it do its thing).
If you do find yourself really feeling the need to go low-level and allocate raw pointers everywhere, then switch to C. C is a language designed for low-level pointer manipulation. C++ is a higher-level managed-memory language; it just so happens that, for historical reasons, C++ has several hundred footguns in the form of C-style allocations placed sporadically throughout the standard.
In summary: Modern C++ almost never uses T*, new, or delete. Start getting comfortable with smart pointers (std::unique_ptr and std::shared_ptr) and references. You'll thank yourself later.

Are There Any Hidden Costs to Passing Around a Struct With a Single Reference?

I was recently reading this article on structs and classes in D, and at one point the author comments that
...this is a perfect candidate for a struct. The reason is that it contains only one member, a pointer to an ALLEGRO_CONFIG. This means I can pass it around by value without care, as it's only the size of a pointer.
This got me thinking; is that really the case? I can think of a few situations in which believing you're passing a struct around "for free" could have some hidden gotchas.
Consider the following code:
struct S
{
int* pointer;
}
void doStuff(S ptrStruct)
{
// Some code here
}
int n = 123;
auto s = S(&n);
doStuff(s);
When s is passed to doStuff(), is a single pointer (wrapped in a struct) really all that's being passed to the function? Off the top of my head, it seems that any pointers to member functions would also be passed, as well as the struct's type information.
This wouldn't be an issue with classes, of course, since they're always reference types, but a struct's pass by value semantics suggests to me that any extra "hidden" data such as described above would be passed to the function along with the struct's pointer to int. This could lead to a programmer thinking that they're passing around an (assuming a 64-bit machine) 8-byte pointer, when they're actually passing around an 8-byte pointer, plus several other 8-byte pointers to functions, plus however many bytes an object's typeinfo is. The unwary programmer is then allocating far more data on the stack than was intended.
Am I chasing shadows here, or is this a valid concern when passing a struct with a single reference, and thinking that you're getting a struct that is a pseudo reference type? Is there some mechanism in D that prevents this from being the case?
I think this question can be generalized to wrapping native types. E.g. you could make a SafeInt type which wraps and acts like an int, but throws on any integer overflow conditions.
There are two issues here:
Compilers may not optimize your code as well as with a native type.
For example, if you're wrapping an int, you'll likely implement overloaded arithmetic operators. A sufficiently-smart compiler will inline those methods, and the resulting code will be no different than that as with an int. In your example, a dumb compiler might be compiling a dereference in some clumsy way (e.g. get the address of the struct's start, add the offset of the pointer field (which is 0), then dereference that).
Additionally, when calling a function, the compiler may decide to pass the struct in some other way (due to e.g. poor optimization, or an ABI restriction). This could happen e.g. if the compiler doesn't pay attention to the struct's size, and treats all structs in the same way.
struct types in D may indeed have a hidden member, if you declare it in a function.
For example, the following code works:
import std.stdio;
void main()
{
string str = "I am on the stack of main()";
struct S
{
string toString() const { return str; }
}
S s;
writeln(s);
}
It works because S saves a hidden pointer to main()'s stack frame. You can force a struct to not have any hidden pointers by prefixing static to the declaration (e.g. static struct S).
There is no hidden data being passed. A struct consists exactly of what's declared in it (and any padding bytes if necessary), nothing else. There is no need to pass type information and member function information along because it's all static. Since a struct cannot inherit from another struct, there is no polymorphism.

Why does the compiler not accept this generic function?

public static T clipGN<T>(this T v, T lo, T hi) where T : decimal,double
{ return Math.Max(lo, Math.Min(hi, v)); }
gives for the second line:
Argument 1: cannot convert from 'T' to 'decimal'
Why? I thought both types meeting that T constraint can be converted to decimal.
BTW, an acceptable alternative coding can be found in the answer here:
How should one best recode this example extension method to be generic for all numeric types?
I tried compiling that code myself and I receive the following error myself:
'double' is not a valid constraint. A type used as a constraint must
be an interface, a non-sealed class or a type parameter. (CS0701)
Same for decimal. This suggests that neither decimal nor double are allowed to constrain the type parameter T since the only types that could meet that constraint are themselves (it would be no different from making a non-generic overload, replacing T with either decimal or double). Even if, individually, they were allowed to constrain T (which they are not), the combination constraint should still not be allowed since no type can simultaneously be a decimal and a double.
This is unlike if the constraint had read where T : IComparable<T>, where both types, as well as other types, can meet that constraint.
You don't need generics for this. While the concept of "DRY" makes the idea of coding a single function that can work for all types, this is a case where you're better off having discreet functions for each numeric type. All of the numeric types are known, and the list is not overly large; there are likely numeric types that you aren't actually going to use, anyway. If you really (for whatever reason) want a single function, then your only real option is the IComparable option that you linked to, which has the unfortunate (and unnecessary) consequence of causing boxing on the numeric parameters.
That being said, your problem is that you cannot have T : decimal, double, as that means that T must be both decimal and double (which is impossible), not that it can be either one.
In addition, since this is all that this function does, I'd probably not call the Math.Max and Math.Min functions anyway. It's probably just as simple, if not slightly clearer, to write the functions this way:
public static decimal ClipGN(this decimal v, decimal lo, decimal hi)
{
return v <= lo ? lo : v >= hi ? hi : v;
}
And you should be able to duplicate this code verbatim (apart from the return and parameter types, of course) for each of the numeric types.

Does D have 'newtype'?

Does D have 'newtype' (as in Haskell).
It's a naive question, as I'm just skimming D, but Google didn't turn up anything useful.
In Haskell this is a way of making different types of the same thing distinct at compile time, but without incurring any runtime performance penalties.
e.g. you could make newtypes (doubles) for metres, seconds and kilograms. This would error at compile time if your program added a quantity in metres to a quantity in seconds, but would be just as fast at runtime as if both were doubles (which they are at runtime).
If D doesn't have something analogous to 'newtype', what are the accepted methods for dealing with dimensioned quantities?
Thanks,
Chris.
In D1.0 there is typedef, which is the strong typing from a predefined type to a 'newtype.'
D2.0 has removed this and only alias remains (what typedef is in C). There is talk about having a wrapper template that can strongly create a new type.
The issue with typedef was that there were good arguments for making the newtype a sub-type of the predefined type, and also good arguments for making it a super-type.
The semantics of typedef are that the base type is implicitly converted to the newtype, but the newtype is not converted to the base type or other types with the same base type. I am using base type here since:
typedef int Fish;
typedef Fish Cat;
Fish gold = 1;
Cat fluff = gold;
Will fail to compile.
And as of right now, 2.048 DMD still allows the use of typedef (but don't use it).
Having the base type convert to the newtype is useful so you don't have to write
meters = cast(meters) 12.7;
Funny, as he_the_great mentions, D1 had a strong typedef but noone used it, possibly because it was impossible to customize the exact semantics for each case. Possibly the simplest way to handle this situation, at least for primitive types, is to include a mixin template somewhere in Phobos that allows you to forward all operators but have the boilerplate to do this automatically generated via the mixin. Then you'd just create a wrapper struct and be all set.

What's going on in the 'offsetof' macro?

Visual C++ 2008 C runtime offers an operator 'offsetof', which is actually macro defined as this:
#define offsetof(s,m) (size_t)&reinterpret_cast<const volatile char&>((((s *)0)->m))
This allows you to calculate the offset of the member variable m within the class s.
What I don't understand in this declaration is:
Why are we casting m to anything at all and then dereferencing it? Wouldn't this have worked just as well:
&(((s*)0)->m)
?
What's the reason for choosing char reference (char&) as the cast target?
Why use volatile? Is there a danger of the compiler optimizing the loading of m? If so, in what exact way could that happen?
An offset is in bytes. So to get a number expressed in bytes, you have to cast the addresses to char, because that is the same size as a byte (on this platform).
The use of volatile is perhaps a cautious step to ensure that no compiler optimisations (either that exist now or may be added in the future) will change the precise meaning of the cast.
Update:
If we look at the macro definition:
(size_t)&reinterpret_cast<const volatile char&>((((s *)0)->m))
With the cast-to-char removed it would be:
(size_t)&((((s *)0)->m))
In other words, get the address of member m in an object at address zero, which does look okay at first glance. So there must be some way that this would potentially cause a problem.
One thing that springs to mind is that the operator & may be overloaded on whatever type m happens to be. If so, this macro would be executing arbitrary code on an "artificial" object that is somewhere quite close to address zero. This would probably cause an access violation.
This kind of abuse may be outside the applicability of offsetof, which is supposed to only be used with POD types. Perhaps the idea is that it is better to return a junk value instead of crashing.
(Update 2: As Steve pointed out in the comments, there would be no similar problem with operator ->)
offsetof is something to be very careful with in C++. It's a relic from C. These days we are supposed to use member pointers. That said, I believe that member pointers to data members are overdesigned and broken - I actually prefer offsetof.
Even so, offsetof is full of nasty surprises.
First, for your specific questions, I suspect the real issue is that they've adapted relative to the traditional C macro (which I thought was mandated in the C++ standard). They probably use reinterpret_cast for "it's C++!" reasons (so why the (size_t) cast?), and a char& rather than a char* to try to simplify the expression a little.
Casting to char looks redundant in this form, but probably isn't. (size_t) is not equivalent to reinterpret_cast, and if you try to cast pointers to other types into integers, you run into problems. I don't think the compiler even allows it, but to be honest, I'm suffering memory failure ATM.
The fact that char is a single byte type has some relevance in the traditional form, but that may only be why the cast is correct again. To be honest, I seem to remember casting to void*, then char*.
Incidentally, having gone to the trouble of using C++-specific stuff, they really should be using std::ptrdiff_t for the final cast.
Anyway, coming back to the nasty surprises...
VC++ and GCC probably won't use that macro. IIRC, they have a compiler intrinsic, depending on options.
The reason is to do what offsetof is intended to do, rather than what the macro does, which is reliable in C but not in C++. To understand this, consider what would happen if your struct uses multiple or virtual inheritance. In the macro, when you dereference a null pointer, you end up trying to access a virtual table pointer that isn't there at address zero, meaning that your app probably crashes.
For this reason, some compilers have an intrinsic that just uses the specified structs layout instead of trying to deduce a run-time type. But the C++ standard doesn't mandate or even suggest this - it's only there for C compatibility reasons. And you still have to be careful if you're working with class heirarchies, because as soon as you use multiple or virtual inheritance, you cannot assume that the layout of the derived class matches the layout of the base class - you have to ensure that the offset is valid for the exact run-time type, not just a particular base.
If you're working on a data structure library, maybe using single inheritance for nodes, but apps cannot see or use your nodes directly, offsetof works well. But strictly speaking, even then, there's a gotcha. If your data structure is in a template, the nodes may have fields with types from template parameters (the contained data type). If that isn't POD, technically your structs aren't POD either. And all the standard demands for offsetof is that it works for POD. In practice, it will work - your type hasn't gained a virtual table or anything just because it has a non-POD member - but you have no guarantees.
If you know the exact run-time type when you dereference using a field offset, you should be OK even with multiple and virtual inheritance, but ONLY if the compiler provides an intrinsic implementation of offsetof to derive that offset in the first place. My advice - don't do it.
Why use inheritance in a data structure library? Well, how about...
class node_base { ... };
class leaf_node : public node_base { ... };
class branch_node : public node_base { ... };
The fields in the node_base are automatically shared (with identical layout) in both the leaf and branch, avoiding a common error in C with accidentally different node layouts.
BTW - offsetof is avoidable with this kind of stuff. Even if you are using offsetof for some jobs, node_base can still have virtual methods and therefore a virtual table, so long as it isn't needed to dereference member variables. Therefore, node_base can have pure virtual getters, setters and other methods. Normally, that's exactly what you should do. Using offsetof (or member pointers) is a complication, and should only be used as an optimisation if you know you need it. If your data structure is in a disk file, for instance, you definitely don't need it - a few virtual call overheads will be insignificant compared with the disk access overheads, so any optimisation efforts should go into minimising disk accesses.
Hmmm - went off on a bit of a tangent there. Whoops.
char is guarenteed to be the smallest number of bits the architectural can "bite" (aka byte).
All pointers are actually numbers, so cast adress 0 to that type because it's the beginning.
Take the address of member starting from 0 (resulting into 0 + location_of_m).
Cast that back to size_t.
1) I also do not know why it is done in this way.
2) The char type is special in two ways.
No other type has weaker alignment restrictions than the char type. This is important for reinterpret cast between pointers and between expression and reference.
It is also the only type (together with its unsigned variant) for which the specification defines behavior in case the char is used to access stored value of variables of different type. I do not know if this applies to this specific situation.
3) I think that the volatile modifier is used to ensure that no compiler optimization will result in attempt to read the memory.
2 . What's the reason for choosing char reference (char&) as the cast target?
if type s has operator& overloaded then we can't get address using &s
so we reinterpret_cast the type s to primitive type char because primitive type char
doesn't have operator& overloaded
now we can get address from that
if in C then reinterpret_cast is not required
3 . Why use volatile? Is there a danger of the compiler optimizing the loading of m? If so, in what exact way could that happen?
here volatile is not relevant to compiler optimizing.
if type s have const or volatile or both qualifier(s) then
reinterpret_cast can't cast to char& because reinterpret_cast can't remove cv-qualifiers
so result is using <const volatile char&> for casting work from any combination

Resources