Temporary materialization conversion - Confusion about terminology and concepts

Temporary materialization conversion - Confusion about terminology and concepts - temporary-objects

Hi stackoverflow community,
I'm a few months into C++ and recently I've been trying to grasp the concepts revolving around the "new" value categories, move semantics, and especially temporary materialization.
First of all, it's not straightforward to me how to interpret the term "temporary materialization conversion". The conversion part is clear to me (prvalue -> xvalue). But how exactly is a "temporary" defined in this context? I used to think that temporaries were unnamed objects that only exist - from a language point of view - until the last step in the evaluation of the expression they were created in. But this conception doesn't seem to match what temporaries actually seem to be in the broader context of temporary materialization, the new value categories, etc.
The lack of clarity about the term "temporary" results in me not being able to tell if a "temporary materialization" is a temporary that gets materialized or a materialization that is temporary. I assume it's the former, but I'm not sure. Also: Is the term temporary only used for class types?
This directly brings me to the next point of confusion: What roles do prvalues and xvalues play regarding temporaries? Suppose I have an rvalue expression that needs to be evaluated in such a way that it has to be converted into an xvalue, e.g. by performing member access.
What will exactly happen? Is the the prvalue something that is actually existent (in memory or elsewhere) and is the prvalue already the temporary? Now, the "temporary materialization conversion" described as "A prvalue of any complete type T can be converted to an xvalue of the same type T. This conversion initializes a temporary object of type T from the prvalue by evaluating the prvalue with the temporary object as its result object, and produces an xvalue denoting the temporary object" at cppreference.com (https://en.cppreference.com/w/cpp/language/implicit_conversion) converts the prvalue into an xvalue. This extract makes me think that a prvalue is something that is not existent anywhere in memory or a register up until it gets "materialized" by such conversion. (Also, I'm not sure if a temporary object is the same as a temporary.) So, as far as I understand, this conversion is done by the evaluation of the prvalue expression which has "real" object as a result. This object is then REPRESENTED (= denoted?) by the xvalue expression. What happens in memory? Where has the rvalue been, where is the xvalue now?
My next question is more specific question about a certain part of temporary materialization. In the talk "Understanding value categories in C++" by Kris van Rens on YouTube (https://www.youtube.com/watch?v=liAnuOfc66o&t=3576s) at ~56:30 he shows this slide:
Based on what cppreference.com says about temporary materialization numbers 1 and 2 are clear cases (1: member access on a class pravlue, 2: binding a reference to a prvalue (as in the std::string +operator).
I'm not too sure about number 3, though. Cppreference says: "Note that temporary materialization does not occur when initializing an object from a prvalue of the same type (by direct-initialization or copy-initialization): such object is initialized directly from the initializer. This ensures "guaranteed copy elision"."
The +operator returns a prvalue. Now, this prvalue of type std::string is used to initialize an auto (which should resolve to std::string as well) variable. This sounds like the case that is discussed in the prior cppreference excerpt. So does temporary materialization really occur here? And what happens to the objects (1 and 2) that were "denoted" by the xvalue expressions in between? When do they get destroyed? And if the +operator is returning an prvalue, does it even "exist" somewhere? And how is the object auto x " initialized directly from the initializer" (the prvalue) if the prvalue is not even a real (materialized?) object?
In the talk "Nothing is better than copy or move - Roger Orr [ACCU 2018]" on YouTube (https://www.youtube.com/watch?v=-dc5vqt2tgA&t=2557s) at ~ 40:00 there is nearly the same example:
This slide even says that temporary materialization occurs when initializing a variable which clearly contradicts the exception from cppference from above. So what's true?
As you can see, I'm pretty confused about this whole topic. For me, it's especially hard to grasp these concepts as I cannot find any clear definitions of various term that are used in a uniform way online. I'd appreciate any help a lot!
Best regards,
Ruperrrt
TL;DR: What is a temporary in the temporary materialization conversion context? Does that mean that a temporary gets materialized or that it is materialization that is temporary? Also temporary = temporary object?
In the slides, is 3 (first slide) respectively 1 (second slide) really a point where temporary materialization occurs (conflicts with what cppreference says about initialization from pravlues of the same type)?

107 views, 6 months and no answer nor comments. Interesting. Here's my take on your question.
Temporary materialization should mean "temporary that gets materialized". I don't even know what "materialization that is temporary" would even mean to be honest.
The term temporary is not only used for class types.
Prvalues, loosely speaking, don't exist in memory unlike xvalues. The thing you should care about is the context. Let's say you have defined a structure
struct S { int m; };.
In the expression S x = S();, subexpression S() denotes a prvalue. The compiler with treat it just as if you have written S x{}; (note that I've put curly brackets on purpose because S x(); is actually a declaration of a function). On the other hand in expression like int i = S().m;, subexpression S() is a prvalue that will be converted to xvalue that is, S() will denote something that will exist in the memory.
Regarding your second question, the thing you need to know about is that with the C++-17, the circumstances in which temporaries are going to be created were brought down to minimum (cppreference describes it very well). However, the expression
auto x = std::string("Guaca") + std::string("mole").c_str();
will require two temporary objects to be created before assignment. Firstly, you are doing member access with c_str() method so a temporary std::string will be created. Secondly, the operator + will bind one a reference to the std::string("Guaca") (new temporary). and one to the result object of c_str(), but without creating additional temporary because of:. That's pretty much it. It's worth to note that the order of creation temporary objects isn't known - it's totally implementation-defined.
After that, we're calling the operator + which probably constructs another std::string which technically isn't a temporary object because that's a part of the implementation. That object might or might not be constructed into the memory location of x depending on the NRVO. In any case, whatever value does the prvalue expression std::string("Guaca") + std::string("mole").c_str() denote will be the same value (of the same object) denoted by the expression x because of cpp.ref:
Note that temporary materialization does not occur when initializing an object from a prvalue of the same type (by direct-initialization or copy-initialization): such object is initialized directly from the initializer. This ensures "guaranteed copy elision".
This quote isn't really precise and might confuse you so I also suggest reading copy elision (the first part about mandatory elision).
I'm not a C++ expert so take all of this with a grain of salt.

Related

How do Rust's ownership semantics relate to uniqueness typing as found in Clean and Mercury?

I noticed that in Rust moving is applied to lvalues, and it's statically enforced that moved-from objects are not used.
How do these semantics relate to uniqueness typing as found in Clean and Mercury? Are they the same concept? If not, how do they differ?

The concept of ownership in Rust is not the same as uniqueness in Mercury and Clean, although they are related in that they both aim to provide safety via static checking, and they are both defined in terms of the number of references within a scope. The key differences are:
Uniqueness is a more abstract concept. While it can be interpreted as saying that a reference to a memory location is unique, like Rust's lvalues, it can also apply to abstract values such as the state of every object in the universe, to give an extreme but typical example. There is no pointer corresponding to such a value - it cannot be opened up and inspected within a debugger or anything like that - but it can be used through an interface just like any other abstract type. The aim is to give a value-oriented semantics that remains consistent in the presence of statefulness.
In Mercury, at least (I can't speak for Clean), uniqueness is a more limited concept than ownership, in that there must be exactly one reference. You can't share several copies of a reference on the proviso that they will not be written to, as can be done in Rust. You also can't lend a reference for writing but get it back later after the borrower has finished with it.
Declaring something unique in Mercury does not guarantee that writing to references will occur, just that the compiler will check that it would be safe to do so; it is still valid for an implementation to copy the contents of a unique reference rather than update in place. The compiler will arrange for the update in place if it deems it appropriate at its given optimization level. Alternatively, authors of abstract types may perform similar (or sometimes drastically better) optimizations manually, safe in the knowledge that users will be forced to use the abstract type in a way that is consistent with them. Ownership in Rust, on the other hand, is more directly connected to the memory model and gives stronger guarantees about behaviour.

Compiler: How to implement Reference Counting (in a simple VM)

Ive written a very simple Compiler that translates my source language to bytecode, this code gets processed by the VM (as a simple stack machine, so 3 + 3 will get translated into
push 3
push 3
add
right now I struggle at the garbage collection (I want to use reference counting).
I know the basic concept of it, if a reference gets assigned, the reference counter of that object is incremented, and if it leaves scope, it gets decremented, but the thing thats not clear to me is how the GC can free objects that get passed to functions...
here some more concrete examples of what i mean
string a = "im a string" //ok, assignment, refcount + 1 at declare time and - 1 when it leaves scope
print(new Object()) //how is a parameter solved? is the reference incremented before calling the function?
string b = "a" + "b" + "c" //dont know how to solve this, because 2 strings get pushed, then concanated, then the last gets pushed and concanated again, but should the push operation increase the ref count too or what, and where to decrease them then?
I would be glad if anyone could give me links to tutorials for implementing reference counting or help me with this very specific problem if someone had this problem before (my problem is that i dont understand when to inc, dec the references or where the count is stored)

I think a couple of things can happen with literals. You can treat them like literal numbers, and they are constants and there forever, or you can have an implicit variable that has retrain count of 1 before print, and releases it after.
In response to your edit:
You can use the implicit variable solution, or you can use the "autorelease" concept from Objective-C. You have a an object that is placed in the autorelease pool that will be released in a small amount of time, in which the receiver of the object can retain it.

First, what types of objects does your language allow to be put on the heap? Strings? Do you have mutable or immutable strings?
Check out this post about Strings in Java. So in a Java like language strings get copied every time you concatenate them because they are immutable. Also "this is a string" is actually a call to the constructor of the string class.
If the argument to print() is a call to a constructor (new Object()), there is no reference to the object in the scope calling the function, thus the object lives in the scope of the function and the counters should be incremented and decremented accordingly to entering and leaving the scope of the print() function. If the constructor is called in the calling scope and assigned to a variable, it lives in the calling scope.
While reading about the stuff, Wikipedia is a good start, but Andrew Appel's compiler book would be handy to have (there should be a 2nd edition out there and there is a C and ML version of the book available too). Lambda-the-Ultimate is the place where many of the programming language researchers discuss things, so definitely a place worth looking at.

What's the advantage of a String being Immutable?

Once I studied about the advantage of a string being immutable because of something to improve performace in memory.
Can anybody explain this to me? I can't find it on the Internet.

Immutability (for strings or other types) can have numerous advantages:
It makes it easier to reason about the code, since you can make assumptions about variables and arguments that you can't otherwise make.
It simplifies multithreaded programming since reading from a type that cannot change is always safe to do concurrently.
It allows for a reduction of memory usage by allowing identical values to be combined together and referenced from multiple locations. Both Java and C# perform string interning to reduce the memory cost of literal strings embedded in code.
It simplifies the design and implementation of certain algorithms (such as those employing backtracking or value-space partitioning) because previously computed state can be reused later.
Immutability is a foundational principle in many functional programming languages - it allows code to be viewed as a series of transformations from one representation to another, rather than a sequence of mutations.
Immutable strings also help avoid the temptation of using strings as buffers. Many defects in C/C++ programs relate to buffer overrun problems resulting from using naked character arrays to compose or modify string values. Treating strings as a mutable types encourages using types better suited for buffer manipulation (see StringBuilder in .NET or Java).

Consider the alternative. Java has no const qualifier. If String objects were mutable, then any method to which you pass a reference to a string could have the side-effect of modifying the string. Immutable strings eliminate the need for defensive copies, and reduce the risk of program error.

Immutable strings are cheap to copy, because you don't need to copy all the data - just copy a reference or pointer to the data.
Immutable classes of any kind are easier to work with in multiple threads, the only synchronization needed is for destruction.

Perhaps, my answer is outdated, but probably someone will found here a new information.
Why Java String is immutable and why it is good:
you can share a string between threads and be sure no one of them will change the string and confuse another thread
you don’t need a lock. Several threads can work with immutable string without conflicts
if you just received a string, you can be sure no one will change its value after that
you can have many string duplicates – they will be pointed to a single instance, to just one copy. This saves computer memory (RAM)
you can do substring without copying, – by creating a pointer to an existing string’s element. This is why Java substring operation implementation is so fast
immutable strings (objects) are much better suited to use them as key in hash-tables

a) Imagine StringPool facility without making string immutable , its not possible at all because in case of string pool one string object/literal e.g. "Test" has referenced by many reference variables , so if any one of them change the value others will be automatically gets affected i.e. lets say
String A = "Test" and String B = "Test"
Now String B called "Test".toUpperCase() which change the same object into "TEST" , so A will also be "TEST" which is not desirable.
b) Another reason of Why String is immutable in Java is to allow String to cache its hashcode , being immutable String in Java caches its hash code and do not calculate every time we call hashcode method of String, which makes it very fast as hashmap key.

Think of various strings sitting on a common pool. String variables then point to locations in the pool. If u copy a string variable, both the original and the copy shares the same characters. These efficiency of sharing outweighs the inefficiency of string editing by extracting substrings and concatenating.

Fundamentally, if one object or method wishes to pass information to another, there are a few ways it can do it:
It may give a reference to a mutable object which contains the information, and which the recipient promises never to modify.
It may give a reference to an object which contains the data, but whose content it doesn't care about.
It may store the information into a mutable object the intended data recipient knows about (generally one supplied by that data recipient).
It may return a reference to an immutable object containing the information.
Of these methods, #4 is by far the easiest. In many cases, mutable objects are easier to work with than immutable ones, but there's no easy way to share with "untrusted" code the information that's in a mutable object without having to first copy the information to something else. By contrast, information held in an immutable object to which one holds a reference may easily be shared by simply sharing a copy of that reference.

Why do a lot of programming languages put the type after the variable name?

I just came across this question in the Go FAQ, and it reminded me of something that's been bugging me for a while. Unfortunately, I don't really see what the answer is getting at.
It seems like almost every non C-like language puts the type after the variable name, like so:
var : int
Just out of sheer curiosity, why is this? Are there advantages to choosing one or the other?

There is a parsing issue, as Keith Randall says, but it isn't what he describes. The "not knowing whether it is a declaration or an expression" simply doesn't matter - you don't care whether it's an expression or a declaration until you've parsed the whole thing anyway, at which point the ambiguity is resolved.
Using a context-free parser, it doesn't matter in the slightest whether the type comes before or after the variable name. What matters is that you don't need to look up user-defined type names to understand the type specification - you don't need to have understood everything that came before in order to understand the current token.
Pascal syntax is context-free - if not completely, at least WRT this issue. The fact that the variable name comes first is less important than details such as the colon separator and the syntax of type descriptions.
C syntax is context-sensitive. In order for the parser to determine where a type description ends and which token is the variable name, it needs to have already interpreted everything that came before so that it can determine whether a given identifier token is the variable name or just another token contributing to the type description.
Because C syntax is context-sensitive, it very difficult (if not impossible) to parse using traditional parser-generator tools such as yacc/bison, whereas Pascal syntax is easy to parse using the same tools. That said, there are parser generators now that can cope with C and even C++ syntax. Although it's not properly documented or in a 1.? release etc, my personal favorite is Kelbt, which uses backtracking LR and supports semantic "undo" - basically undoing additions to the symbol table when speculative parses turn out to be wrong.
In practice, C and C++ parsers are usually hand-written, mixing recursive descent and precedence parsing. I assume the same applies to Java and C#.
Incidentally, similar issues with context sensitivity in C++ parsing have created a lot of nasties. The "Alternative Function Syntax" for C++0x is working around a similar issue by moving a type specification to the end and placing it after a separator - very much like the Pascal colon for function return types. It doesn't get rid of the context sensitivity, but adopting that Pascal-like convention does make it a bit more manageable.

the 'most other' languages you speak of are those that are more declarative. They aim to allow you to program more along the lines you think in (assuming you aren't boxed into imperative thinking).
type last reads as 'create a variable called NAME of type TYPE'
this is the opposite of course to saying 'create a TYPE called NAME', but when you think about it, what the value is for is more important than the type, the type is merely a programmatic constraint on the data

If the name of the variable starts at column 0, it's easier to find the name of the variable.
Compare
QHash<QString, QPair<int, QString> > hash;
and
hash : QHash<QString, QPair<int, QString> >;
Now imagine how much more readable your typical C++ header could be.

In formal language theory and type theory, it's almost always written as var: type. For instance, in the typed lambda calculus you'll see proofs containing statements such as:
x : A y : B
-------------
\x.y : A->B
I don't think it really matters, but I think there are two justifications: one is that "x : A" is read "x is of type A", the other is that a type is like a set (e.g. int is the set of integers), and the notation is related to "x ε A".
Some of this stuff pre-dates the modern languages you're thinking of.

An increasing trend is to not state the type at all, or to optionally state the type. This could be a dynamically typed langauge where there really is no type on the variable, or it could be a statically typed language which infers the type from the context.
If the type is sometimes given and sometimes inferred, then it's easier to read if the optional bit comes afterwards.
There are also trends related to whether a language regards itself as coming from the C school or the functional school or whatever, but these are a waste of time. The languages which improve on their predecessors and are worth learning are the ones that are willing to accept input from all different schools based on merit, not be picky about a feature's heritage.

"Those who cannot remember the past are condemned to repeat it."
Putting the type before the variable started innocuously enough with Fortran and Algol, but it got really ugly in C, where some type modifiers are applied before the variable, others after. That's why in C you have such beauties as
int (*p)[10];
or
void (*signal(int x, void (*f)(int)))(int)
together with a utility (cdecl) whose purpose is to decrypt such gibberish.
In Pascal, the type comes after the variable, so the first examples becomes
p: pointer to array[10] of int
Contrast with
q: array[10] of pointer to int
which, in C, is
int *q[10]
In C, you need parentheses to distinguish this from int (*p)[10]. Parentheses are not required in Pascal, where only the order matters.
The signal function would be
signal: function(x: int, f: function(int) to void) to (function(int) to void)
Still a mouthful, but at least within the realm of human comprehension.
In fairness, the problem isn't that C put the types before the name, but that it perversely insists on putting bits and pieces before, and others after, the name.
But if you try to put everything before the name, the order is still unintuitive:
int [10] a // an int, ahem, ten of them, called a
int [10]* a // an int, no wait, ten, actually a pointer thereto, called a
So, the answer is: A sensibly designed programming language puts the variables before the types because the result is more readable for humans.

I'm not sure, but I think it's got to do with the "name vs. noun" concept.
Essentially, if you put the type first (such as "int varname"), you're declaring an "integer named 'varname'"; that is, you're giving an instance of a type a name. However, if you put the name first, and then the type (such as "varname : int"), you're saying "this is 'varname'; it's an integer". In the first case, you're giving an instance of something a name; in the second, you're defining a noun and stating that it's an instance of something.
It's a bit like if you were defining a table as a piece of furniture; saying "this is furniture and I call it 'table'" (type first) is different from saying "a table is a kind of furniture" (type last).

It's just how the language was designed. Visual Basic has always been this way.
Most (if not all) curly brace languages put the type first. This is more intuitive to me, as the same position also specifies the return type of a method. So the inputs go into the parenthesis, and the output goes out the back of the method name.

I always thought the way C does it was slightly peculiar: instead of constructing types, the user has to declare them implicitly. It's not just before/after the variable name; in general, you may need to embed the variable name among the type attributes (or, in some usage, to embed an empty space where the name would be if you were actually declaring one).
As a weak form of pattern-matching, it is intelligable to some extent, but it doesn't seem to provide any particular advantages, either. And, trying to write (or read) a function pointer type can easily take you beyond the point of ready intelligability. So overall this aspect of C is a disadvantage, and I'm happy to see that Go has left it behind.

Putting the type first helps in parsing. For instance, in C, if you declared variables like
x int;
When you parse just the x, then you don't know whether x is a declaration or an expression. In contrast, with
int x;
When you parse the int, you know you're in a declaration (types always start a declaration of some sort).
Given progress in parsing languages, this slight help isn't terribly useful nowadays.

Fortran puts the type first:
REAL*4 I,J,K
INTEGER*4 A,B,C
And yes, there's a (very feeble) joke there for those familiar with Fortran.
There is room to argue that this is easier than C, which puts the type information around the name when the type is complex enough (pointers to functions, for example).

What about dynamically (cheers #wcoenen) typed languages? You just use the variable.

What's going on in the 'offsetof' macro?

Visual C++ 2008 C runtime offers an operator 'offsetof', which is actually macro defined as this:
#define offsetof(s,m) (size_t)&reinterpret_cast<const volatile char&>((((s *)0)->m))
This allows you to calculate the offset of the member variable m within the class s.
What I don't understand in this declaration is:
Why are we casting m to anything at all and then dereferencing it? Wouldn't this have worked just as well:
&(((s*)0)->m)
?
What's the reason for choosing char reference (char&) as the cast target?
Why use volatile? Is there a danger of the compiler optimizing the loading of m? If so, in what exact way could that happen?

An offset is in bytes. So to get a number expressed in bytes, you have to cast the addresses to char, because that is the same size as a byte (on this platform).
The use of volatile is perhaps a cautious step to ensure that no compiler optimisations (either that exist now or may be added in the future) will change the precise meaning of the cast.
Update:
If we look at the macro definition:
(size_t)&reinterpret_cast<const volatile char&>((((s *)0)->m))
With the cast-to-char removed it would be:
(size_t)&((((s *)0)->m))
In other words, get the address of member m in an object at address zero, which does look okay at first glance. So there must be some way that this would potentially cause a problem.
One thing that springs to mind is that the operator & may be overloaded on whatever type m happens to be. If so, this macro would be executing arbitrary code on an "artificial" object that is somewhere quite close to address zero. This would probably cause an access violation.
This kind of abuse may be outside the applicability of offsetof, which is supposed to only be used with POD types. Perhaps the idea is that it is better to return a junk value instead of crashing.
(Update 2: As Steve pointed out in the comments, there would be no similar problem with operator ->)

offsetof is something to be very careful with in C++. It's a relic from C. These days we are supposed to use member pointers. That said, I believe that member pointers to data members are overdesigned and broken - I actually prefer offsetof.
Even so, offsetof is full of nasty surprises.
First, for your specific questions, I suspect the real issue is that they've adapted relative to the traditional C macro (which I thought was mandated in the C++ standard). They probably use reinterpret_cast for "it's C++!" reasons (so why the (size_t) cast?), and a char& rather than a char* to try to simplify the expression a little.
Casting to char looks redundant in this form, but probably isn't. (size_t) is not equivalent to reinterpret_cast, and if you try to cast pointers to other types into integers, you run into problems. I don't think the compiler even allows it, but to be honest, I'm suffering memory failure ATM.
The fact that char is a single byte type has some relevance in the traditional form, but that may only be why the cast is correct again. To be honest, I seem to remember casting to void*, then char*.
Incidentally, having gone to the trouble of using C++-specific stuff, they really should be using std::ptrdiff_t for the final cast.
Anyway, coming back to the nasty surprises...
VC++ and GCC probably won't use that macro. IIRC, they have a compiler intrinsic, depending on options.
The reason is to do what offsetof is intended to do, rather than what the macro does, which is reliable in C but not in C++. To understand this, consider what would happen if your struct uses multiple or virtual inheritance. In the macro, when you dereference a null pointer, you end up trying to access a virtual table pointer that isn't there at address zero, meaning that your app probably crashes.
For this reason, some compilers have an intrinsic that just uses the specified structs layout instead of trying to deduce a run-time type. But the C++ standard doesn't mandate or even suggest this - it's only there for C compatibility reasons. And you still have to be careful if you're working with class heirarchies, because as soon as you use multiple or virtual inheritance, you cannot assume that the layout of the derived class matches the layout of the base class - you have to ensure that the offset is valid for the exact run-time type, not just a particular base.
If you're working on a data structure library, maybe using single inheritance for nodes, but apps cannot see or use your nodes directly, offsetof works well. But strictly speaking, even then, there's a gotcha. If your data structure is in a template, the nodes may have fields with types from template parameters (the contained data type). If that isn't POD, technically your structs aren't POD either. And all the standard demands for offsetof is that it works for POD. In practice, it will work - your type hasn't gained a virtual table or anything just because it has a non-POD member - but you have no guarantees.
If you know the exact run-time type when you dereference using a field offset, you should be OK even with multiple and virtual inheritance, but ONLY if the compiler provides an intrinsic implementation of offsetof to derive that offset in the first place. My advice - don't do it.
Why use inheritance in a data structure library? Well, how about...
class node_base { ... };
class leaf_node : public node_base { ... };
class branch_node : public node_base { ... };
The fields in the node_base are automatically shared (with identical layout) in both the leaf and branch, avoiding a common error in C with accidentally different node layouts.
BTW - offsetof is avoidable with this kind of stuff. Even if you are using offsetof for some jobs, node_base can still have virtual methods and therefore a virtual table, so long as it isn't needed to dereference member variables. Therefore, node_base can have pure virtual getters, setters and other methods. Normally, that's exactly what you should do. Using offsetof (or member pointers) is a complication, and should only be used as an optimisation if you know you need it. If your data structure is in a disk file, for instance, you definitely don't need it - a few virtual call overheads will be insignificant compared with the disk access overheads, so any optimisation efforts should go into minimising disk accesses.
Hmmm - went off on a bit of a tangent there. Whoops.

char is guarenteed to be the smallest number of bits the architectural can "bite" (aka byte).
All pointers are actually numbers, so cast adress 0 to that type because it's the beginning.
Take the address of member starting from 0 (resulting into 0 + location_of_m).
Cast that back to size_t.

1) I also do not know why it is done in this way.
2) The char type is special in two ways.
No other type has weaker alignment restrictions than the char type. This is important for reinterpret cast between pointers and between expression and reference.
It is also the only type (together with its unsigned variant) for which the specification defines behavior in case the char is used to access stored value of variables of different type. I do not know if this applies to this specific situation.
3) I think that the volatile modifier is used to ensure that no compiler optimization will result in attempt to read the memory.

2 . What's the reason for choosing char reference (char&) as the cast target?
if type s has operator& overloaded then we can't get address using &s
so we reinterpret_cast the type s to primitive type char because primitive type char
doesn't have operator& overloaded
now we can get address from that
if in C then reinterpret_cast is not required
3 . Why use volatile? Is there a danger of the compiler optimizing the loading of m? If so, in what exact way could that happen?
here volatile is not relevant to compiler optimizing.
if type s have const or volatile or both qualifier(s) then
reinterpret_cast can't cast to char& because reinterpret_cast can't remove cv-qualifiers
so result is using <const volatile char&> for casting work from any combination

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string