Is there a technical term for the part of an IDE which maintains a dynamic symbol table as you code? - visual-c++

My context is MSVC 6.
Starting with a successfully compiled program, with browse information built, I can go into a existing function and hover over a variable, and the IDE will show me the data type, and variable name. One could well imagine that the information is coming from the browse file.
In practice, If I create a new variable.
int z;
and hover over the z, the IDE will show me the data type and variable name. I have not compiled the program yet, hence the browse file has not been updated. This seems to say,
that there is a portion of the IDE, which watches as you type, and stays aware of the datatypes and functions as you enter them. For all I know, it may compile them internally as well.
I have also noticed, that syntax errors can effectively disable this functionality.
I haven't seen this discussed anywhere. Is there a term for this sort of functionality?

It's probably the lexical analysis and syntactic analysis at work and building up it's own symbol table. It's part of the parsing phase of most compilers. That would explain why the functionality breaks when you see syntax errors. The parsing needs to occur successfully to have a reliable symbol table.

In compilers, its usually called a symbol table.
I'm not sure that there's a term common to all integrated development environments.

Related

template instantiation statistics from compilers

Is there a way to get a summary of the instantiated templates (with what types and how many times - like a histogram) within a translation unit or for the whole project (shared object/executable)?
If I have a large codebase and I want to take advantage of the C++11 extern keyword I would like to know which templates are most used within my project (or from the internals of stl - like std::less<MyString> for example).
Also is it possible to have a weight assigned to each template instantiation (time spent by the compiler)?
Even if only one (c++11 enabled) compiler gives me such statistics I would be happy.
How difficult would it be to implement such a thing with Clang's LibTooling?
And is this even reasonable? Many people told me that I can reason which template instantiations I should extern without the use of a tool...
There are several ways to attack this problem.
If you are working with an open-source compiler, it's not hard to make a simple change to the source code that will trace all template substantiations.
If that sounds like too much hassle, you can also try to force the compiler to produce a warning on each template instantiation for a given symbol. Steven Watanabe has written a set of tools that can help you with that.
Finally, possibly the best options is to use the debugging symbols (or map files), generated by the compiler, to track down how many times each function appears in the final image and more importantly how much does it add to the weight in bytes. The best example for such a tool is Andrian Stone's SymbolSort, which is based on the Microsoft's toolset. Another similar tool is the Map File Browser.

Why I can see class/struct names in .exe file compiled using Visual C++?

When looking at a compiled relese .exe file binary I can find class/struct names in it! Which is odd - obviously there is no need in these symbols. What concerns me is such symbols can be used for reverse-engeniging my software, imposing a big risk to software license protection.
For example, I can find text .?AVCMySecureKeyManager (original class name is CMySecureKeyManager, it looks like to all names is added prefix ".?AV"), easy to guess what my code is doing, right?.. Looks like an open door for hackers.
Particularly, I can tell that I've enabled all possible optimizations the Visual C++ compiler/linker options, turned off all Browse/Debug Info generation, perhaps I'm missing something?
You're seeing RTTI (Run-time Type Information). If you don't use dynamic_cast or typeid in your code, you can usually turn it off safely. Please note that exceptions always use RTTI (for the catch statement matching) and it's not possible to disable it for them.
If you do need dynamic_cast, then you can scrub the names from the EXE after compilation. The code does not depend on the actual name strings, but just their addresses.
That said, the class names, while useful, are not critical in reverse-engineering. Don't rely on their absence as a guarantee.

Can a LabVIEW VI tell whether one of its output terminals is wired?

In LabVIEW, is it possible to tell from within a VI whether an output terminal is wired in the calling VI? Obviously, this would depend on the calling VI, but perhaps there is some way to find the answer for the current invocation of a VI.
In C terms, this would be like defining a function that takes arguments which are pointers to where to store output parameters, but will accept NULL if the caller is not interested in that parameter.
As it was said you can't do this in the natural way, but there's a workaround using data value references (requires LV 2009). It is the same idea of giving a NULL pointer to an output argument. The result is given in input as a data value reference (which is the pointer), and checked for Not a Reference by the SubVI. If it is null, do nothing.
Here is the SubVI (case true does nothing of course):
And here is the calling VI:
Images are VI snippets so you can drag and drop on a diagram to get the code.
I'd suggest you're going about this the wrong way. If the compiler is not smart enough to avoid the calculation on its own, make two versions of this VI. One that does the expensive calculation, one that does not. Then make a polymorphic VI that will allow you to switch between them. You already know at design time which version you want (because you're either wiring the output terminal or not), so just use the correct version of the polymorphic VI.
Alternatively, pass in a variable that switches on or off a Case statement for the expensive section of your calculation.
Like Underflow said, the basic answer is no.
You can have a look here to get the what is probably the most official and detailed answer which will ever be provided by NI.
Extending your analogy, you can do this in LV, except LV doesn't have the concept of null that C does. You can see an example of this here.
Note that the code in the link Underflow provided will not work in an executable, because the diagrams are stripped by default when building an EXE and because the RTE does not support some of properties and methods used there.
Sorry, I see I misunderstood the question. I thought you were asking about an input, so the idea I suggested does not apply. The restrictions I pointed do apply, though.
Why do you want to do this? There might be another solution.
Generally, no.
It is possible to do a static analysis on the code using the "scripting" features. This would require pulling the calling hierarchy, and tracking the wire references.
Pulling together a trial of this, there are some difficulties. Multiple identical sub-vi's on the same diagram are difficult to distinguish. Also, terminal references appear to be accessible mostly by name, which can lead to some collisions with identically named terminals of other vi's.
NI has done a bit of work on a variation of this problem; check out this.
In general, the LV compiler optimizes the machine code in such a way that unused code is not even built into the executable.
This does not apply to subVIs (because there's no way of knowing that you won't try to use the value of the indicators somehow, although LV could do it if it removes the FP when building an executable, and possibly does), but there is one way you can get it to apply to a subVI - inline the subVI, which should allow the compiler to see the outputs aren't used. You can also set its priority to subroutine, which will possibly also do this, but I wouldn't recommend that.
Officially, in-lining is only available in LV 2010, but there are ways of accessing the private VI property in older versions. I wouldn't recommend it, though, and it's likely that 2010 has some optimizations in this area that older versions did not.
P.S. In general, the details of the compiling process are not exposed and vary between LV versions as NI tweaks the compiler. The whole process is supposed to have been given a major upgrade in LV 2010 and there should be a webcast on NI's site with some of the details.

Determine source language from a binary?

I responded to another question about developing for the iPhone in non-Objective-C languages, and I made the assertion that using, say, C# to write for the iPhone would strike an Apple reviewer wrong. I was speaking largely about UI elements differing between the ObjC and C# libraries in question, but a commenter made an interesting point, leading me to this question:
Is it possible to determine the language a program is written in, solely from its binary? If there are such methods, what are they?
Let's assume for the purposes of the question:
That from an interaction standpoint (console behavior, any GUI appearance, etc.) the two are identical.
That performance isn't a reliable indicator of language (no comparing, say, Java to C).
That you don't have an interpreter or something between you and the language - just raw executable binary.
Bonus points if you're language-agnostic as possible.
Short answer: YES
Long answer:
If you look at a binary, you can find the names of the libraries that have been linked in. Opening cmd.exe in TextPad easily finds the following at hex offset 0x270: msvcrt.dll, KERNEL32.dll, NTDLL.DLL, USER32.dll, etc. msvcrt is the Microsoft 'C' runtime support functions. KERNEL32, NTDLL, and USER32.dll are OS specific libraries which tell you either the target platform, or the platform on which it was built, depending on how well the cross-platform development environment segregates the two.
Setting aside those clues, most any c/c++ compiler will have to insert the names of the functions into the binary, there is a list of all functions (or entrypoints) stored in a table. C++ 'mangles' the function names to encode the arguments and their types to support overloaded methods. It is possible to obfuscate the function names but they would still exist. The functions signatures would include the number and types of the arguments which can be used to trace into the system or internal calls used in the program. At offset 0x4190 is "SetThreadUILanguage" which can be searched for to find out a lot about the development environment. I found the entry-point table at offset 0x1ED8A. I could easily see names like printf, exit, and scanf; along with __p__fmode, __p__commode, and __initenv
Any executable for the x86 processor will have a data segment which will contain any static text that was included in the program. Back to cmd.exe (offset 0x42C8) is the text "S.o.f.t.w.a.r.e..P.o.l.i.c.i.e.s..M.i.c.r.o.s.o.f.t..W.i.n.d.o.w.s..S.y.s.t.e.m.". The string takes twice as many characters as is normally necessary because it was stored using double-wide characters, probably for internationalization. Error codes or messages are a prime source here.
At offset B1B0 is "p.u.s.h.d" followed by mkdir, rmdir, chdir, md, rd, and cd; I left out the unprintable characters for readability. Those are all command arguments to cmd.exe.
For other programs, I've sometimes been able to find the path from which a program was compiled.
So, yes, it is possible to determine the source language from the binary.
I'm not a compiler hacker (someday, I hope), but I figure that you may be able to find telltale signs in a binary file that would indicate what compiler generated it and some of the compiler options used, such as the level of optimization specified.
Strictly speaking, however, what you're asking is impossible. It could be that somebody sat down with a pen and paper and worked out the binary codes corresponding to the program that they wanted to write, and then typed that stuff out in a hex editor. Basically, they'd be programming in assembly without the assembler tool. Similarly, you may never be able to tell with certainty whether a native binary was written in straight assembler or in C with inline assembly.
As for virtual machine environments such as JVM and .NET, you should be able to identify the VM by the byte codes in the binary executable, I would expect. However you may not be able to tell what the source language was, such as C# versus Visual Basic, unless there are particular compiler quirks that tip you off.
what about these tools:
PE Detective
PEiD
both are PE Identifiers. ok, they're both for windows but that's what it was when i landed here
I expect you could, if you disassemble the source, or at least you may know the compiler, as not all compilers will use the same code for printf for example, so Objective-C and gnu C should differ here.
You have excluded all byte-code languages so this issue is going to be less common than expected.
First, run what on some binaries and look at the output. CVS (and SVN) identifiers are scattered throughout the binary image. And most of those are from libraries.
Also, there's often a "map" to the various library functions. That's a big hint, also.
When the libraries are linked into the executable, there is often a map that's included in the binary file with names and offsets. It's part of creating "position independent code". You can't simply "hard-link" the various object files together. You need a map and you have to do some lookups when loading the binary into memory.
Finally, the start-up module for C, C++ (and I imagine C#) is unique to that compiler's defaiult set of libraries.
Well, C is initially converted the ASM, so you could write all C code in ASM.
No, the bytecode is language agnostic. Different compilers could even take the same code source and generate different binaries. That's why you don't see general purpose decompilers that will work on binaries.
The command 'strings' could be used to get some hints as to what language was used (for instance, I just ran it on the stripped binary for a C application I wrote and the first entries it finds are the libraries linked by the executable).

What are the porting issues going from VC8 (VS2005) to VC9 (VS2008)?

I have inherited a very large and complex project (actually, a 'solution' consisting of 119 'projects', most of which are DLLs) that was built and tested under VC8 (VS2005), and I have the task of porting it to VC9 (VS2008).
The porting process I used was:
Copy the VC8 .sln file and rename it
to a VC9 .sln file.
Copy all of
the VC8 project files, and rename
them to VC9 project files.
Edit
all of the VC9 project files,
s/vc8/vc9.
Edit the VC9 .sln,
s/vc8/vc9/
Load the VC9 .sln with
VS2008, and let the IDE 'convert'
all of the project files.
Fix
compiler and linker errors until I
got a good build.
So far, I have run into the following issues in that last step.
1) A change in the way decorated names are calculated, causing truncation of the names.
This is more than just a warning (http://msdn.microsoft.com/en-us/library/074af4b6.aspx). Libraries built with this warning will not link with other modules. Applying the solution given in MSDN was non-trivial, but doable. I addressed this problem separately in How do I increase the allowed decorated name length in VC9 (MSVC 2008)?
2) A change that does not allow the assignment of zero to an iterator. This is per the spec, and it was fairly easy to find and fix these previously-allowed coding errors. Instead of assignment of zero to an iterator, use the value end().
3) for-loop scope is now per the ANSI standard. Another easy-to-fix problem.
4) More space required for pre-compiled headers. In some cases a LOT more space was required. I ended up using /Zm999 to provide the maximum PCH space. If PCH memory usage gets bumped up again, I assume that I will have to forgo PCH altogether, and just endure the increase in what is already a very long build time.
5) A change in requirements for copy ctors and default dtors. It appears that in template classes, under certain conditions that I haven't quite figured out yet, the compiler no longer generates a default ctor or a default dtor. I suspect this is a bug in VC9, but there may be something else that I'm doing wrong. If so, I'd sure like to know what it is.
6) The GUIDs in the sln and vcproj files were not changed. This does not appear to impact the build in any way that I can detect, but it is worrisome nevertheless.
Note that despite all of these issues, the project built, ran, and passed extensive QA testing under VC8. I have also back-ported all of the changes to the VC8 projects, where they still build and run just as happily as they did before (using VS2005/VC8). So, all of my changes required for a VC9 build at least appear to be backward-compatible, although the regression testing is still underway.
Now for the really hard problem: I have run into a difference in the startup sequence between VC8 and VC9 projects. The program uses a small-object allocator modeled after Loki, in Andrei Alexandrescu's Book Modern C++ Design. This allocator is initialized using a global variable defined in the main program module.
Under VC8, this global variable is constructed at the very beginning of the program startup, from code in a module crtexe.c. Under VC9, the first module that executes is crtdll.c, which indicates that the startup sequence has been changed. The DLLs that are starting up appear to be confusing the small-object allocator by allocating and deallocating memory before the global object can initialize the statistics, which leads to some spurious diagnostics. The operation of the program does not appear to be materially affected, but the QA folks will not allow the spurious diagnostics to get past them.
Is there some way to force the construction of a global object prior to loading DLLs?
What other porting issues am I likely to encounter?
Is there some way to force the construction of a global object prior to loading DLLs?
How about the DELAYLOAD option? So that DLLs aren't loaded until their first call?
That is a tough problem, mostly because you've inherited a design that's inherently dangerous because you're not supposed to rely on the initialization order of global variables.
It sounds like something you could try to work around by replacing the global variable with a singleton that other functions retrieve by calling a global function or method that returns a pointer to the singleton object. If the object exists at the time of the call, the function returns a pointer to it. Otherwise, it allocates a new one and returns a pointer to the newly allocated object.
The problem, of course, is that I can't think of a singleton implementation that would avoid the problem you're describing. Maybe this discussion would be useful: http://www.oneunified.net/blog/Personal/SoftwareDevelopment/CPP/Singleton.article
That's certainly an interesting problem. I don't have a solution other than perhaps to change the design so that there is no dependence on undefined behavior of the order or link/dll startup. Have you considered linking with the older linker? (or whatever the VS.NET term is)
Because the behavior of your variable and allocator relied on some (unknown at the time) arbitrary order of startup I would probably fix that so that it is not an issue in the future. I guess you are really asking if anyone knows how to do some voodoo in VC9 to make the problem disappear. I am interested in hearing it as well.
How about this,
Make your main program a DLL too, call it main.dll, linked to all the other ones, and export the main function as say, mainEntry(). Remove the global variable.
Create a new main exe which has the global variable and its initialization, but doesn't link statically to any of the other application DLLs (except for the allocator stuff).
This new main.exe then dynamically loads the main.dll using LoadLibrary(), then uses GetProcAddress to call mainEntry().
The solution to the problem turned out to be more straightforward than I originally thought. The initialization order problem was caused by the existence of several global variables of types derived from std container types (a basic design flaw that predated my position with that company). The solution was to replace all such globals with singletons. There were about 100 of them.
Once this was done, the initialization (and destruction) order was under programmer control.

Resources