Discover if structure initialization doesn't modify all members

Discover if structure initialization doesn't modify all members - struct

Consider the following code:
typedef struct _sMYSTRUCT_BASE
{
int b_a;
int b_b;
int b_c;
} sMYSTRUCT_BASE;
typedef struct _sMYSTRUCT
{
sMYSTRUCT_BASE base;
int a;
int b;
} sMYSTRUCT;
Private const sMYSTRUCT mystruct_init =
{
0,
1,
3,
4
};
I am looking for a way to generate an error (compile-, or runtime) to indicated that the structure initialization hasn't explicitly 'touched' all structure members.
There are 5 integers in the structure, but 'mystruct_init' only have 4 values.
I know that last member (mystruct_init.b) will be zero, but I need some kind of warning/error to inform the programmer about the mistake.
This has to work on a very old compiler (maybe not even ansi-c compliant).

Modern compilers are capable of producing such a warning...in gcc, it's turned on with -Wmissing-field-initializers (which warns about initializers that exist but do not initialize all members, but not about structs with no initializer expression; these can at least sometimes be caught by turning on -Wuninitialized, which will warn you if it sees you reading a potentially uninitialized value, at least if you read it in the same function the variable was declared in).
If your very old compiler happens to supply such a warning, you could of course just turn it on, but that seems unlikely from your description.
Your best option, I think, if you want to do an exhaustive search for them, would be to see whether you can get the code to compile with some version of gcc -- it wouldn't have to compile well enough to actually run on your target platform in order to get the warnings. I can't guarantee that it will be able to compile your pre-ANSI C code, particularly if it widely uses compiler-specific extensions, but I can at least say that support for the legacy K&R syntax is still present in the modern C standard, so I wouldn't be surprised if your code compiles better than you might think.
If that works, then to consistently produce the warnings in your IDE, you could modify the build script so that it both compiles and links the code with the real compiler you're targeting, and also compiles it (but not necessarily links it) with gcc, just to generate additional warnings that can be picked up and displayed by the IDE.
The other option would be to see if you can find a compatible static analyzer that can perform such a check; I work on a tool called EnSoft Atlas that builds a data-flow graph which, together with a simple script, could be used to enforce initialization more thoroughly than the gcc warnings allow, by checking whether flow of uninitialized values to fields of structs occurs.
However, our support for C is still in beta. Atlas requires that Eclipse CDT (or JDT for Java) be able to parse your code, and the current C beta only fully supports modern strongly-typed struct initializers (i.e. struct foo f = (struct foo) {...} has fully connected data-flow, but support for the older initializer list syntax struct foo f = {...} was not implemented in our first pass), so I'm not sure it would be able to meet your needs at this time.

Related

How to declare/access an extern function that uses LPCTSTR?

How do I declare an external MFC function that has LPCTSTR tokens?
I have the following C function in a library I wish to use:
DLL_PUBLIC void output( LPCTSTR format, ... )
Where DLL_PUBLIC is resolved to
__declspec(dllimport) void output( LPCTSTR format, ...)
In Rust, I was able to get things to build with:
use winapi::um::winnt::CHAR;
type LPCTSTR = *const CHAR;
#[link(name="mylib", kind="static")]
extern {
#[link_name = "output"]
fn output( format:LPCTSTR, ...);
}
I'm not sure this is the best approach, but it seems to get me part way down. Although the module is declared in a DLL module, the symbol decorations in the native DLL are such that it does not have _imp prepended to it in the binary. So I find that "static" seems to look for the correct module, although I still have not been able to get it to link.
The dumpbin output for the module (this target function) in "mylib.dll" is:
31 ?output##YAXPEBDZZ (void __cdecl output(char const *,...))

This is assuming that what you are trying to accomplish here is to link Rust code against a custom DLL implementation. If that is the case, then things are looking good.
First things first, though, you'll need to do some sanity cleanup. LPCTSTR is not a type. It is a preprocessor symbol that either expands to LPCSTR (aka char const*) or LPCWSTR (aka wchar_t const*).
When the library gets built the compiler commits to either one of those, and that decision is made for all eternity. Clients, on the other hand, that #inlcude the header are still free to choose, and you have no control over this. Lucky you, if you're using C++ linkage, and have the linker save you. But we aren't using C++ linkage.
The first order of action is to change the C function signature using an explicit type, so that clients and implementation always agree. I will be using char const* here.
C++ library
Building the library is fairly straight forward. The following is a bare-bones C++ library implementation that simply outputs a formatted string to STDOUT.
dll.cpp:
#include <stdio.h>
#include <stdarg.h>
extern "C" __declspec(dllexport) void output(char const* format, ...)
{
va_list argptr{};
va_start(argptr, format);
vprintf(format, argptr);
va_end(argptr);
}
The following changes to the original code are required:
extern "C": This is requesting C linkage, controlling how symbols are decorated as seen by the linker. It's the only reasonable choice when planning to cross language boundaries.
__declspec(dllexport): This is telling the compiler to inform the linker to export the symbol. C and C++ clients will use a declaration with a corresponding __declspec(dllimport) directive.
char const*: See above.
This is all that's required to build the library. With MSVC the target architecture is implied by the toolchain used. Open up a Visual Studio command prompt that matches the architecture eventually used by Rust's toolchain, and run the following command:
cl.exe /LD dll.cpp
This produces, among other artifacts, dll.dll and dll.lib. The latter being the import library that needs to be discoverable by Rust. Copying it to the Rust client's crate's root directory is sufficient.
Consuming the library from Rust
Let's start from scratch here and make a new binary crate:
cargo new --bin client
Since we don't need any other dependencies, the default Cargo.toml can remain unchanged. As a sanity check you can cargo run to verify that everything is properly set up.
If that all went down well it's time to import the only public symbol exported by dll.dll. Add the following to src/main.rs:
#[link(name = "dll", kind = "dylib")]
extern "C" {
pub fn output(format: *const u8, ...);
}
And that's all there is to it. Again, a few details are important here, namely:
name = "dll": Specifies the import library. The .lib extension is implied, and must not be appended.
kind = "dylib": We're importing from a dynamic link library. This is the default and can be omitted, though I'm keeping it for posterity.
extern "C": As in the C++ code this controls name decoration and the calling convention. For variadic functions the C calling convention (__cdecl) is required.
*const u8: This is Rust's native type that corresponds to char const* in C and C++. Using type aliases (whether those provided by the winapi crate or otherwise) is not required. It wouldn't hurt either, but let's just keep this simple.
With that everything is set up and we can take this out for a spin. Replace the default generated fn main() with the following code in src/main.rs:
fn main() {
unsafe { output("Hello, world!\0".as_ptr()) };
}
and there you have it. cargo running this produces the famous output:
Hello, world!
So, all is fine, right? Well, no, not really. Actually, nothing is fine. You could have just as well written, compiled, and executed the following:
fn main() {
unsafe { output(b"Make sure this has reasons to crash: %f".as_ptr(), "💩") };
}
which produces the following output for me:
Make sure this has reasons to crash: 0.000000≡ƒÆ⌐
though any other observable behavior is possible, too. After all, the behavior is undefined. There are two bugs: 1 The format specifier doesn't match the argument, and 2 the format string isn't NUL terminated.
Either one can be fixed, trivially even, though you have opted out of Rust's safety guarantees. Rust can't help you detect either issue, and when control reaches the library implementation, it cannot detect this either. It will just do what it was asked to do, subverting each and every one of Rust's safety guarantees.
Remarks
A few words of caution: Getting developers interested in Rust is great, and I will do my best to try whenever I get a chance to. Getting Rust-curious developers excited about Rust is often just a natural progression.
Though I will say that trying to get developers excited about Rust by starting out with unsafe Rust isn't going to be successful. It's eventually going to provoke a response like: "Look, ma, a steep learning curve with absolutely no benefit whatsoever, who could possibly resist?!" (I'm exaggerating, I know).
If your ultimate goal is to establish Rust as a superior alternative to C (and in part C++), don't start by evaluating how not to benefit from Rust. Specifically, trying to import a variadic function (the unsafest language construct in all of C++) and exposing it as an unsafe function to Rust is almost guaranteed to be the beginning of a lost battle.
Now, this may read bad as it is already, but this isn't over yet. In an attempt to make your C++ code accessible from Rust, things have gotten worse! With a C++ compiler and static code analysis tools (assuming the format string is known at compile time, and the tools understand the semantics), the tooling can and frequently will warn about mismatches. That option is now gone, forever, and there's not even a base level of protection.
If you absolutely want to make some sort of logging available to Rust, export a function from the library that takes a single char const*, use Rust's format! macro, and provide a variadic wrapper to C and C++ clients.

Is it possible to set a function to only be inlined during a release build?

Possible example:
#[inline(release)]
fn foo() {
println!("moo");
}
If not, is it possible to only include an attribute based on build type or another attribute?

[...] is it possible to only include an attribute based on build type [...]?
Yes. That's what cfg_attr is for:
#[cfg_attr(not(debug_assertions), inline(always))]
#[cfg_attr(debug_assertions, inline(never))]
fn foo() {
println!("moo")
}
This is probably the closest you will get to your goal. Note that inline annotations (even with "always" and "never") can be ignored by the compiler. There are good reasons for that, as you can read below.
However: what do you want to achieve?
Humans are pretty bad at inlining decisions, while compilers are pretty smart. Even without #[inline], the compiler will inline the function in release mode whenever it's a good idea to do so. And it won't be inlined in debug mode.
If you don't have a very good and special reason to tinker with the inlining yourself, you should not touch it! The compiler will do the right thing in nearly all cases :)
Even the reference says:
The compiler automatically inlines functions based on internal heuristics. Incorrectly inlining functions can actually make the program slower, so it should be used with care.

Using old std::string in gcc

I have program that uses std::string, but memmove the std::string` instances.
It worked fine until gcc 5.1.
However this no longer works as of gcc 5.3. I think developers finally did SSO with internal pointer.
I will definitely fix that, but is there easy way to fix it with some define or pragma?
Code looks similar to this:
// MyClass have std::string inside
MyClass *a = malloc(MAX * sizeof(MyClass));
// ...
// placement new on a[0]
// ...
memmove(&a[1], &a[0], sizeof(MyClass));
// ...
process(a[1]);
This is old code, please do not comment about malloc usage.
I will refactor or switch to std::vector, but I want the code to work until I do so.

You are experiencing effects of undefined behavior, but I think you know this. You cannot rely on the effects of byte-wise copying non-POD resp. not trivially copyable types, and the compiler is free to change that behavior.
I think it may be possible to define a safe overload for memmove with your class as arguments and use the copy-constructor inside it. I don't know if that is strictly legal, but you seem to be using the C-function instead of the C++ version in namespace std, so at least you are not changing namespace std which is not allowed.
void memmove(MyClass* a, MyClass* b, size_t)
{
*a = *b;
}
Strictly speaking, I think this is still undefined behavior because 17.6.4.3 of the C++ standard specifies that
If a program declares or defines a name in a context where it is
reserved, other than as explicitly allowed by this Clause, its
behavior is undefined.
In addition, all names in C library are reserved names and shall not be used by the program (17.6.4.3.2). Practically, I think this will work.
You may need to compile with -fno-builtin to prevent gcc from replacing memmove globally. If it is illegal to overwrite the function, you can replace it dynamically with LD_PRELOAD.
This is a hack solution! Your code may still not work because the compiler makes the assumption that, when you memmove it is a POD/TriviallyCopyable object and uses that for some optimisation, e.g. by assuming that after the memmove, both objects are represented by identical bytes. This is broken when you re-implement memmove with the copy-constructor.

Is there a compiled* programming language with dynamic, maybe even weak typing?

I wondered if there is a programming language which compiles to machine code/binary (not bytecode then executed by a VM, that's something completely different when considering typing) that features dynamic and/or weak typing, e.g:
Think of a compiled language where:
Variables don't need to be declared
Variables can be created during runtime
Functions can return values of different types
Questions:
Is there such a programming language?
(Why) not?
I think that a dynamically yet strong typed, compiled language would really sense, but is it possible?

I believe Lisp fits that description.
http://en.wikipedia.org/wiki/Common_Lisp

Yes, it is possible. See Julia. It is a dynamic language (you can write programs without types) but it never runs on a VM. It compiles the program to native code at runtime (JIT compilation).

Objective-C might have some of the properties you seek. Classes can be opened and altered in runtime, and you can send any kind of message to an object, whether it usually responds to it or not. In that way, you can implement duck typing, much like in Ruby. The type id, roughly equivalent to a void*, can be endowed with interfaces that specify a contract that the (otherwise unknown) type will adhere to.

C# 4.0 has many, if not all of these characteristics. If you really want native machine code, you can compile the bytecode down to machine code using a utility.
In particular, the use of the dynamic keyword allows objects and their members to be bound dynamically at runtime.
Check out Anders Hejlsberg's video, The Future of C#, for a primer:
http://channel9.msdn.com/pdc2008/TL16/

Objective-C has many of the features you mention: it compiles to machine code and is effectively dynamically typed with respect to object instances. The id type can store any class instance and Objective-C uses message passing instead of member function calls. Methods can be created/added at runtime. The Objective-C runtime can also synthesize class instance variables at runtime, but local variables still need to be declared (just as in C).
C# 4.0 has many of these features, except that it is compiled to IL (bytecode) and interpreted using a virtual machine (the CLR). This brings up an interesting point, however: if bytecode is just-in-time compiled to machine code, does that count? If so, it opens to the door to not only any of the .Net languages, but Python (see PyPy or Unladed Swallow or IronPython) and Ruby (see MacRuby or IronRuby) and many other dynamically typed languages, not mention many LISP variants.

In a similar vein to Lisp, there is Factor, a concatenative* language with no variables by default, dynamic typing, and a flexible object system. Factor code can be run in the interactive interpreter, or compiled to a native executable using its deploy function.
* point-free functional stack-based

VB 6 has most of that

I don't know of any language that has exactly those capabilities. I can think of two that have a significant subset, though:
D has type inference, garbage collection, and powerful metaprogramming facilities, yet compiles to efficient machine code. It does not have dynamic typing, however.
C# can be compiled directly to machine code via the mono project. C# has a similar feature set to D, but again without dynamic typing.

Python to C probably needs these criteria.
Write in Python.
Compile Python to Executable. See Process to convert simple Python script into Windows executable. Also see Writing code translator from Python to C?

Elixir does this. The flexibility of dynamic variable typing helps with doing hot-code updates (for which Erlang was designed). Files are compiled to run on the BEAM, the Erlang/Elixir VM.

C/C++ both indirectly support dynamic typing using void*. C++ example:
#include <string>
int main() {
void* x = malloc(sizeof(int))
*(int*)x = 5;
x = malloc(sizeof(std::string));
*(std::string*x) = std::string("Hello world");
free(x);
return 0;
}
In C++17, std::any can be used as well:
#include <string>
#include <any>
int main() {
std::any x = 5;
x = std::string("Hello world");
return 0;
}
Of course, duck typing is rarely used or needed in C/C++, and both of these options have issues (void* is unsafe, std::any is a huge performance bottleneck).
Another example of what you may be looking for is the V8 engine for JavaScript. It is a JIT compiler, meaning the source code is compiled to bytecode and then machine code at runtime, although this is hidden from the user.

What's going on in the 'offsetof' macro?

Visual C++ 2008 C runtime offers an operator 'offsetof', which is actually macro defined as this:
#define offsetof(s,m) (size_t)&reinterpret_cast<const volatile char&>((((s *)0)->m))
This allows you to calculate the offset of the member variable m within the class s.
What I don't understand in this declaration is:
Why are we casting m to anything at all and then dereferencing it? Wouldn't this have worked just as well:
&(((s*)0)->m)
?
What's the reason for choosing char reference (char&) as the cast target?
Why use volatile? Is there a danger of the compiler optimizing the loading of m? If so, in what exact way could that happen?

An offset is in bytes. So to get a number expressed in bytes, you have to cast the addresses to char, because that is the same size as a byte (on this platform).
The use of volatile is perhaps a cautious step to ensure that no compiler optimisations (either that exist now or may be added in the future) will change the precise meaning of the cast.
Update:
If we look at the macro definition:
(size_t)&reinterpret_cast<const volatile char&>((((s *)0)->m))
With the cast-to-char removed it would be:
(size_t)&((((s *)0)->m))
In other words, get the address of member m in an object at address zero, which does look okay at first glance. So there must be some way that this would potentially cause a problem.
One thing that springs to mind is that the operator & may be overloaded on whatever type m happens to be. If so, this macro would be executing arbitrary code on an "artificial" object that is somewhere quite close to address zero. This would probably cause an access violation.
This kind of abuse may be outside the applicability of offsetof, which is supposed to only be used with POD types. Perhaps the idea is that it is better to return a junk value instead of crashing.
(Update 2: As Steve pointed out in the comments, there would be no similar problem with operator ->)

offsetof is something to be very careful with in C++. It's a relic from C. These days we are supposed to use member pointers. That said, I believe that member pointers to data members are overdesigned and broken - I actually prefer offsetof.
Even so, offsetof is full of nasty surprises.
First, for your specific questions, I suspect the real issue is that they've adapted relative to the traditional C macro (which I thought was mandated in the C++ standard). They probably use reinterpret_cast for "it's C++!" reasons (so why the (size_t) cast?), and a char& rather than a char* to try to simplify the expression a little.
Casting to char looks redundant in this form, but probably isn't. (size_t) is not equivalent to reinterpret_cast, and if you try to cast pointers to other types into integers, you run into problems. I don't think the compiler even allows it, but to be honest, I'm suffering memory failure ATM.
The fact that char is a single byte type has some relevance in the traditional form, but that may only be why the cast is correct again. To be honest, I seem to remember casting to void*, then char*.
Incidentally, having gone to the trouble of using C++-specific stuff, they really should be using std::ptrdiff_t for the final cast.
Anyway, coming back to the nasty surprises...
VC++ and GCC probably won't use that macro. IIRC, they have a compiler intrinsic, depending on options.
The reason is to do what offsetof is intended to do, rather than what the macro does, which is reliable in C but not in C++. To understand this, consider what would happen if your struct uses multiple or virtual inheritance. In the macro, when you dereference a null pointer, you end up trying to access a virtual table pointer that isn't there at address zero, meaning that your app probably crashes.
For this reason, some compilers have an intrinsic that just uses the specified structs layout instead of trying to deduce a run-time type. But the C++ standard doesn't mandate or even suggest this - it's only there for C compatibility reasons. And you still have to be careful if you're working with class heirarchies, because as soon as you use multiple or virtual inheritance, you cannot assume that the layout of the derived class matches the layout of the base class - you have to ensure that the offset is valid for the exact run-time type, not just a particular base.
If you're working on a data structure library, maybe using single inheritance for nodes, but apps cannot see or use your nodes directly, offsetof works well. But strictly speaking, even then, there's a gotcha. If your data structure is in a template, the nodes may have fields with types from template parameters (the contained data type). If that isn't POD, technically your structs aren't POD either. And all the standard demands for offsetof is that it works for POD. In practice, it will work - your type hasn't gained a virtual table or anything just because it has a non-POD member - but you have no guarantees.
If you know the exact run-time type when you dereference using a field offset, you should be OK even with multiple and virtual inheritance, but ONLY if the compiler provides an intrinsic implementation of offsetof to derive that offset in the first place. My advice - don't do it.
Why use inheritance in a data structure library? Well, how about...
class node_base { ... };
class leaf_node : public node_base { ... };
class branch_node : public node_base { ... };
The fields in the node_base are automatically shared (with identical layout) in both the leaf and branch, avoiding a common error in C with accidentally different node layouts.
BTW - offsetof is avoidable with this kind of stuff. Even if you are using offsetof for some jobs, node_base can still have virtual methods and therefore a virtual table, so long as it isn't needed to dereference member variables. Therefore, node_base can have pure virtual getters, setters and other methods. Normally, that's exactly what you should do. Using offsetof (or member pointers) is a complication, and should only be used as an optimisation if you know you need it. If your data structure is in a disk file, for instance, you definitely don't need it - a few virtual call overheads will be insignificant compared with the disk access overheads, so any optimisation efforts should go into minimising disk accesses.
Hmmm - went off on a bit of a tangent there. Whoops.

char is guarenteed to be the smallest number of bits the architectural can "bite" (aka byte).
All pointers are actually numbers, so cast adress 0 to that type because it's the beginning.
Take the address of member starting from 0 (resulting into 0 + location_of_m).
Cast that back to size_t.

1) I also do not know why it is done in this way.
2) The char type is special in two ways.
No other type has weaker alignment restrictions than the char type. This is important for reinterpret cast between pointers and between expression and reference.
It is also the only type (together with its unsigned variant) for which the specification defines behavior in case the char is used to access stored value of variables of different type. I do not know if this applies to this specific situation.
3) I think that the volatile modifier is used to ensure that no compiler optimization will result in attempt to read the memory.

2 . What's the reason for choosing char reference (char&) as the cast target?
if type s has operator& overloaded then we can't get address using &s
so we reinterpret_cast the type s to primitive type char because primitive type char
doesn't have operator& overloaded
now we can get address from that
if in C then reinterpret_cast is not required
3 . Why use volatile? Is there a danger of the compiler optimizing the loading of m? If so, in what exact way could that happen?
here volatile is not relevant to compiler optimizing.
if type s have const or volatile or both qualifier(s) then
reinterpret_cast can't cast to char& because reinterpret_cast can't remove cv-qualifiers
so result is using <const volatile char&> for casting work from any combination

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Discover if structure initialization doesn't modify all members - struct

Related

How to declare/access an extern function that uses LPCTSTR?

Is it possible to set a function to only be inlined during a release build?

Using old std::string in gcc

Is there a compiled* programming language with dynamic, maybe even weak typing?

What's going on in the 'offsetof' macro?

Categories

Resources