How does the CLR string indexer work? - string

A comment buried in some C++ code in the SSCLI claims, referring to the unmanaged internal implementation of String.Chars property:
This method is not actually used. JIT will generate code for indexer method on string class.
So...what magical code is this? I understand the whole point of jitters is that they produce different code in different situations. But at the very least, for a modern x64 Windows 7+ platform, how might the/a jitter accomplish this? Or is that truly secret sauce?
Additional details
A while ago I was looking for the fastest way to iterate through individual characters in a string in C#.
It turned out the fastest way without resorting to unsafe code or duplicating the contents (via ToCharArray())
was the built-in string indexer, which is actually a call to the String.Chars property. Right in my original
question I asked if anyone had insight into how the indexer actually worked, but despite bumps from both Skeet and
Lippert, I didn't get any responses on that. So I decided to dig into it myself:
Stop 1: mscorlib
By examining mscorlib.dll with ildasm, we can see that String::get_Chars(int32 index) is just an internalcall pointer (plus an attribute):
.method public hidebysig specialname instance char
get_Chars(int32 index) cil managed internalcall
{
.custom instance void System.Security.SecuritySafeCriticalAttribute::.ctor() = ( 01 00 00 00 )
} // end of method String::get_Chars
As noted in the documentation for the MethodImplOptions enumeration, "An internal call is a call to a method that is implemented within the common language runtime itself." Both a 2004 MSDN Magazine article and an SO post indicate that the mapping of internalcall names to unmanaged implementations can be found in ecall.cpp within the Shared Source CLI.
Stop 2: ecapp.cpp
Searching an online copy of ecall.cpp reveals that get_Chars is implemented by COMString::GetCharAt:
FCIntrinsic("get_Chars", COMString::GetCharAt, CORINFO_INTRINSIC_StringGetChar)
Stop 3: comstring.cpp
comstring.cpp does indeed contain an implementation of GetCharAt, starting at line 1219. Except, it's preceded by this comment:
/*==================================GETCHARAT===================================
**Returns the character at position index. Thows IndexOutOfRangeException as
**appropriate.
**This method is not actually used. JIT will generate code for indexer method on string class.
**
==============================================================================*/

First of all, see Hans Passant's comment for the critical bit.
In early .NET (CLR 1 and 2), the CLR had considerable special support for String and StringBuilder types. In fact, the two types worked so closely together, that StringBuilder.ToString was not copying the actual characters anywhere, and the string indexer was still fetching the characters from that same memory location, using special jitter support. I assume that jitter support for String.Chars was originally necessary to avoid passing the index integer via stack, but the jitter seems to have improved since then.
.NET 4 comes with a different implementation of StringBuilder (ropes) that no longer is tied to how String is handled. (It has to copy during ToString, but has much faster appends.) After these changes,
StringBuilder indexer is drammatically slowed down to O(log n) on large strings. See here. It is never inlined, not even on short strings.
String indexer still uses (unpublished) special jitter support. I would expect this one to be basically inlined away into a shift, addition and a memory fetch, or something even faster that the nearest loop would allow.

Related

Adding a new attribute on source code that propagates until MC level in LLVM?

I am interested in how the following is propagated:
void foo(int __attribute__((aligned(16)))* p) { ... }
In this case the “alignedness” of the pointer is available at the MC level, but it is evidently not using the LLVM-IR metadata approach to achieve this. The alignment information is very important to some targets which will change code-generation dependent on this value, and I think that what I need is more like this attribute.
How difficult would it be to add a new attribute such that it propagates through the compiler in the same way as ‘aligned’? So, I already added a new element to the LLVM-IR to do this. I also expect that the hardest part would be making other parts of LLVM ignore this new element when they don’t care about it.
It really is a pity that LLVM does not have a generic target independent way of passing target dependent information from parser to back-end.
Using the ‘DebugLoc’ approach was suggested in a similar question, but I think it’s a bit-of-a-hack since this is not related to debugging. But if the implementation is less difficult this way, then the hack might be acceptable.
UPDATE:
Would inline assembly instead of the use of a new attribute work here? If yes, what are the pros/cons?
As you have demonstrated, alignment is not using metadata.
To anyone who doesn't know: alignment is mentioned (implicitly or explicitly) in all relevant instructions, so for example that function in the question will be compiled to something like this (notice the aligns):
define void #foo(i32*) {
%2 = alloca i32*, align 16 ; Allocate a 16-aligned pointer
store i32* %0, i32** %2, align 16 ; An aligned store to place the arg there
...
Now, if you want to attach some information to existing instructions and have most of the rest of the compiler ignore them, using metadata is a good idea. However, since metadata is a compiler-internal abstract thing, at some point you'll have to actually do something with it. Typically, by adding a pass of your own to consume it and do something accordingly.
As for where to place your pass and how to implement it, it really depends on the actual information you're trying to pass and its intended effect.

6502 and little-endian conversion

For fun I'm implementing an NES emulator. I'm currently reading through documentation for the 6502 CPU and I'm a little confused.
I've seen documentation stating because the 6502 is little-endian so when using absolute addressing mode you need to swap the bytes. I'm writing this on an x86 machine which is also little-endian, so I don't understand why I couldn't simply cast to a uint16_t*, dereference that, and let the compiler work out the details.
I've written some simple tests in google test and they seem to agree with me.
// implementation of READ16
#define READ16(addr) (*(uint16_t*)addr)
TEST(MemMacro, READ16) {
uint8_t arr[] = {0xFF,0xCC};
uint8_t *mem = (&arr[0]);
EXPECT_EQ(0xCCFF, READ16(mem));
}
This passes, so it appears my supposition is correct, but I thought I'd ask someone with more experience than I.
Is this correct for pulling out the operand in 6502 absolute addressing mode? Am I possibly missing something?
It will work for simple cases on little-endian systems, but tying your implementation to those feels unnecessary when the corresponding portable implementation is simple. Sticking to the macro, you could do this instead:
#define READ16(addr) (addr[0] + (addr[1] << 8))
(Just to be pedantic, you should also make sure that addr[1] can't be out-of-bounds, and would need to add some more parentheses if addr could be a complex expression.)
However, as you keep developing your emulator, you will find that it's most natural to use a pair of general-purpose read_mem() and write_mem() functions that operate on single bytes. Remember that the address space is split up into multiple regions (RAM, ROM, and memory-mapped registers from the PPU and APU), so having e.g. a single array that you index into won't work well. The fact that memory regions can be remapped by mappers also complicates things. (You won't have to worry about that for simple games though -- I recommend starting with Donkey Kong.)
What you need to do is to figure out what region or memory-mapped register the address belongs to inside your read_mem() and write_mem() functions (this is called address decoding), and do the right thing for the address.
Returning to the original question, the fact that you'll end up using read_mem() to read the individual bytes of the address anyway means that the uint16_t casting trickery is even less likely to be useful. This is the simplest and most robust approach w.r.t. handling corner cases, and what every emulator I've seen does in practice (Nestopia, Nintendulator, and FCEUX).
In case you've missed it, the #nesdev channel on EFNet is very active and a good resource by the way. I assume you're already familiar with the NESDev wiki. :)
I've also been working on an emulator which can be found here.

Public fixed-length Strings

I am just summarizing info about implementing a digital tree (Trie) in VBA. I am not asking how to do that so please do not post your solutions - my specific question regarding fixed-length Strings in class modules comes at the end of this post.
A Trie is all about efficiency and performance therefore most of other programming languages use a Char data type to represent members of TrieNodes. Since VBA does not have a Char datatype I was thinking about faking it and using a fixed-length String with 1 character.
Note: I can come up with a work-around to this ie. use Byte and a simple function to convert between Chr() and Asc() or an Enum, or delcare as a private str as String * 1 and take advantage of get/let properties but that's not the point. Stay tuned though because...
According to Public Statement on Microsoft Help Page you can't declare a fixed-length String variable in class modules.
I can't find any reasonable explanation for this constrain.
Can anyone give some insight why such a restriction applies to fixed-length Strings in class modules in VBA?
The VBA/VB6 runtime is heavily reliant on the COM system (oleaut32 et al) and this enforces some rules.
You can export a class flile between VB "stuff" but if you publish (or could theoretically publish) it as a COM object it must be able to describe a "fixed length string" in its interface description/type library so that say a C++ client can consume it.
A fixed length string is "special" because it has active behaviour, i.e. its not a dumb datatype, it behaves somewhat like a class; for example its always padded - if you assign to it it will have trailing spaces, in VBA the compiler adds generated code to get that behaviour. A C++ consumer would be unaware of the fixed-length nature of the string because the interface cant describe it/does not support a corresponding type (a String is a BSTR) which could lead to problems.
Strings are of type BSTR and like a byte array you would still lose the padding semantics if you used one of those instead.

What's going on in the 'offsetof' macro?

Visual C++ 2008 C runtime offers an operator 'offsetof', which is actually macro defined as this:
#define offsetof(s,m) (size_t)&reinterpret_cast<const volatile char&>((((s *)0)->m))
This allows you to calculate the offset of the member variable m within the class s.
What I don't understand in this declaration is:
Why are we casting m to anything at all and then dereferencing it? Wouldn't this have worked just as well:
&(((s*)0)->m)
?
What's the reason for choosing char reference (char&) as the cast target?
Why use volatile? Is there a danger of the compiler optimizing the loading of m? If so, in what exact way could that happen?
An offset is in bytes. So to get a number expressed in bytes, you have to cast the addresses to char, because that is the same size as a byte (on this platform).
The use of volatile is perhaps a cautious step to ensure that no compiler optimisations (either that exist now or may be added in the future) will change the precise meaning of the cast.
Update:
If we look at the macro definition:
(size_t)&reinterpret_cast<const volatile char&>((((s *)0)->m))
With the cast-to-char removed it would be:
(size_t)&((((s *)0)->m))
In other words, get the address of member m in an object at address zero, which does look okay at first glance. So there must be some way that this would potentially cause a problem.
One thing that springs to mind is that the operator & may be overloaded on whatever type m happens to be. If so, this macro would be executing arbitrary code on an "artificial" object that is somewhere quite close to address zero. This would probably cause an access violation.
This kind of abuse may be outside the applicability of offsetof, which is supposed to only be used with POD types. Perhaps the idea is that it is better to return a junk value instead of crashing.
(Update 2: As Steve pointed out in the comments, there would be no similar problem with operator ->)
offsetof is something to be very careful with in C++. It's a relic from C. These days we are supposed to use member pointers. That said, I believe that member pointers to data members are overdesigned and broken - I actually prefer offsetof.
Even so, offsetof is full of nasty surprises.
First, for your specific questions, I suspect the real issue is that they've adapted relative to the traditional C macro (which I thought was mandated in the C++ standard). They probably use reinterpret_cast for "it's C++!" reasons (so why the (size_t) cast?), and a char& rather than a char* to try to simplify the expression a little.
Casting to char looks redundant in this form, but probably isn't. (size_t) is not equivalent to reinterpret_cast, and if you try to cast pointers to other types into integers, you run into problems. I don't think the compiler even allows it, but to be honest, I'm suffering memory failure ATM.
The fact that char is a single byte type has some relevance in the traditional form, but that may only be why the cast is correct again. To be honest, I seem to remember casting to void*, then char*.
Incidentally, having gone to the trouble of using C++-specific stuff, they really should be using std::ptrdiff_t for the final cast.
Anyway, coming back to the nasty surprises...
VC++ and GCC probably won't use that macro. IIRC, they have a compiler intrinsic, depending on options.
The reason is to do what offsetof is intended to do, rather than what the macro does, which is reliable in C but not in C++. To understand this, consider what would happen if your struct uses multiple or virtual inheritance. In the macro, when you dereference a null pointer, you end up trying to access a virtual table pointer that isn't there at address zero, meaning that your app probably crashes.
For this reason, some compilers have an intrinsic that just uses the specified structs layout instead of trying to deduce a run-time type. But the C++ standard doesn't mandate or even suggest this - it's only there for C compatibility reasons. And you still have to be careful if you're working with class heirarchies, because as soon as you use multiple or virtual inheritance, you cannot assume that the layout of the derived class matches the layout of the base class - you have to ensure that the offset is valid for the exact run-time type, not just a particular base.
If you're working on a data structure library, maybe using single inheritance for nodes, but apps cannot see or use your nodes directly, offsetof works well. But strictly speaking, even then, there's a gotcha. If your data structure is in a template, the nodes may have fields with types from template parameters (the contained data type). If that isn't POD, technically your structs aren't POD either. And all the standard demands for offsetof is that it works for POD. In practice, it will work - your type hasn't gained a virtual table or anything just because it has a non-POD member - but you have no guarantees.
If you know the exact run-time type when you dereference using a field offset, you should be OK even with multiple and virtual inheritance, but ONLY if the compiler provides an intrinsic implementation of offsetof to derive that offset in the first place. My advice - don't do it.
Why use inheritance in a data structure library? Well, how about...
class node_base { ... };
class leaf_node : public node_base { ... };
class branch_node : public node_base { ... };
The fields in the node_base are automatically shared (with identical layout) in both the leaf and branch, avoiding a common error in C with accidentally different node layouts.
BTW - offsetof is avoidable with this kind of stuff. Even if you are using offsetof for some jobs, node_base can still have virtual methods and therefore a virtual table, so long as it isn't needed to dereference member variables. Therefore, node_base can have pure virtual getters, setters and other methods. Normally, that's exactly what you should do. Using offsetof (or member pointers) is a complication, and should only be used as an optimisation if you know you need it. If your data structure is in a disk file, for instance, you definitely don't need it - a few virtual call overheads will be insignificant compared with the disk access overheads, so any optimisation efforts should go into minimising disk accesses.
Hmmm - went off on a bit of a tangent there. Whoops.
char is guarenteed to be the smallest number of bits the architectural can "bite" (aka byte).
All pointers are actually numbers, so cast adress 0 to that type because it's the beginning.
Take the address of member starting from 0 (resulting into 0 + location_of_m).
Cast that back to size_t.
1) I also do not know why it is done in this way.
2) The char type is special in two ways.
No other type has weaker alignment restrictions than the char type. This is important for reinterpret cast between pointers and between expression and reference.
It is also the only type (together with its unsigned variant) for which the specification defines behavior in case the char is used to access stored value of variables of different type. I do not know if this applies to this specific situation.
3) I think that the volatile modifier is used to ensure that no compiler optimization will result in attempt to read the memory.
2 . What's the reason for choosing char reference (char&) as the cast target?
if type s has operator& overloaded then we can't get address using &s
so we reinterpret_cast the type s to primitive type char because primitive type char
doesn't have operator& overloaded
now we can get address from that
if in C then reinterpret_cast is not required
3 . Why use volatile? Is there a danger of the compiler optimizing the loading of m? If so, in what exact way could that happen?
here volatile is not relevant to compiler optimizing.
if type s have const or volatile or both qualifier(s) then
reinterpret_cast can't cast to char& because reinterpret_cast can't remove cv-qualifiers
so result is using <const volatile char&> for casting work from any combination

Should I cast a CString passed to Format/printf (and varargs in general)?

I recently took in a small MCF C++ application, which is obviously in a working state. To get started I'm running PC-Lint over the code, and lint is complaining that CStringT's are being passed to Format. Opinion on the internet seems to be divided. Some say that CSting is designed to handle this use case without error, but others (and an MSDN article) say that it should always be cast when passed to a variable argument function. Can Stackoverflow come to any consensus on the issue?
CString has been carefully designed to be passed as part of a variable argument list, so it is safe to use it that way. And you can be fairly sure that Microsoft will take care not to break this particular behavior. So I'd say you are safe to continue using it that way, if you want to.
That said, personally I'd prefer the cast. It is not common behavior that string classes behave that way (e.g. std::string does not) and for mental consistency it may be better to just do it the "safe" way.
P.S.: See this thread for implementation details and further notes on how to cast.

Resources