I understand that – for example – the usage of int, which in .NET is System.Int32, is being replaced by the usage of nint, which enables the compiler to compile the code to either x64 or x32.
But what about code that is being shared with other applications, let's say an Android app. As far as I am aware nint etc. are only available for iOS and OS X, so one must use int in that shared code again.
Concrete example could be a PCL, that is linked to a Xamarin.iOS app.
What happens to
int Add(int one, int two)
{
return one + two;
}
from the PCL, when being used within the iOS app?
int, which in .NET is System.Int32, is being replaced by nint
That's incorrect. int remains and will always be a 32 bits integer.
What happens is that some of Apple API (both iOS and OSX) are using types like NSInteger. This type is 32bits, on 32 bits processor, and 64 bits on 64 bits processors.
That does not exists in .NET, the closest being IntPtr which is not an easy type to use in general.
At the time the original MonoTouch was written the world (at least the mobile world) was 32bits and the API were bounds using System.Int32. That worked well for years but, eventually, the 64 bits world became mobile.
This is why Xamarin introduced nint (and nuint and nfloat). Those types will vary by the CPU architecture. It lets us (and you) bind the API just like Apple defined them (a int stays an int but a NSInteger becomes an nint).
As for PCL (or shared code) you should avoid those types. They are not available on all platforms (and even if the source was copied you would be missing the JIT/AOT optimizations on them). IOW the only place it should be used is in your platform specific code (iOS and/or OSX).
Now I got it. The point is, that everything that touches native APIs needs to be prepared (by the compiler) to run in x32 or x64 mode. That is what the new types like nint are made for.
Everything that doesn't touch native stuff can still leverage the "native .NET type system" including Int32. Which answers the question of how that sample code above could work in a PCL: as that PCL code will never depend on native APIs, it's okay.
How do I compile my VC++ project to a 16-bit flat object file for use in my bootloader I am working on?
To my understanding, an object file is technically already "flat" and the linker turns it into the destination executable format. What I want it to be able to obtain that object file and pass that and my assembly code (in obj format) through the linker to create a flat bootloader.
The [guide][1] is not very specific on where the files are located and just says that you use cl.exe, link.exe, and ml.exe (MASM).
The guide uses MASM, but I know how to output object files with NASM. My main problem is the VC++ thing.
The last 16-bit compiler from Microsoft was VC++ 1.52c. It's ancient, and probably not available any more. Even if it was, chances are pretty good that it wouldn't compile any recent code. Just to name a few of its most obvious shortcomings, it had no support for templates, exception handling, or namespaces at all.
I believe most people working on things like that any more use Open Watcom (which isn't exactly up to date either, but still better than VC++ 1.52c).
I have a closed executable (with no source) which was compiled with VC++ 7.1 (VS2003).
This executable loads a DLL for which I do have the source code.
I'm trying to avoid compiling this DLL with the VS2003 toolkit, because it involves installing this toolkit on every machine I want to compile it on, and to use a makefile instead of directly use the newer VS project.
I changed parameters like the runtime library (I use /MT instead of /MD to prevent runtime DLL conflicts) and some other language switches to maintain compatibility with the old compiler. Finally it compiled and linked fine with the VS2005 libs.
But then when I tried running it, it crashed. The reason: The DLL sends an std::string (and on other places - an std::vector) back to the exe, and the conflicting STL implementation versions cause something bad to happen.
So my question is: Is there a way to work around it? Or should I continue compiling the DLL with the VC7.1 toolkit?
I'm not very optimistic, but maybe someone will have a good idea regarding this.
Thanks.
As the implentations of the stl libraries change, that breaks binary compatability, which is (to my understanding) when the size/member variables of an object changes. The two sides don't agree on how big a std::string/vector/etc is.
If the old executable thinks a std::string is 32bit char* ptr, and 32bit size_t length, and the new executable thinks a std::string is a 64bit iterator* iterator_list, 64bit char* ptr, 64bit size_t length, then they can't pass strings back and forth. This is why long-standing formats (windows, bmp) have the first member is a 32bit size_of_object.
Long story short, you need the older VS2003 version of the library (I don't think you necessarily need the compiler)
I responded to another question about developing for the iPhone in non-Objective-C languages, and I made the assertion that using, say, C# to write for the iPhone would strike an Apple reviewer wrong. I was speaking largely about UI elements differing between the ObjC and C# libraries in question, but a commenter made an interesting point, leading me to this question:
Is it possible to determine the language a program is written in, solely from its binary? If there are such methods, what are they?
Let's assume for the purposes of the question:
That from an interaction standpoint (console behavior, any GUI appearance, etc.) the two are identical.
That performance isn't a reliable indicator of language (no comparing, say, Java to C).
That you don't have an interpreter or something between you and the language - just raw executable binary.
Bonus points if you're language-agnostic as possible.
Short answer: YES
Long answer:
If you look at a binary, you can find the names of the libraries that have been linked in. Opening cmd.exe in TextPad easily finds the following at hex offset 0x270: msvcrt.dll, KERNEL32.dll, NTDLL.DLL, USER32.dll, etc. msvcrt is the Microsoft 'C' runtime support functions. KERNEL32, NTDLL, and USER32.dll are OS specific libraries which tell you either the target platform, or the platform on which it was built, depending on how well the cross-platform development environment segregates the two.
Setting aside those clues, most any c/c++ compiler will have to insert the names of the functions into the binary, there is a list of all functions (or entrypoints) stored in a table. C++ 'mangles' the function names to encode the arguments and their types to support overloaded methods. It is possible to obfuscate the function names but they would still exist. The functions signatures would include the number and types of the arguments which can be used to trace into the system or internal calls used in the program. At offset 0x4190 is "SetThreadUILanguage" which can be searched for to find out a lot about the development environment. I found the entry-point table at offset 0x1ED8A. I could easily see names like printf, exit, and scanf; along with __p__fmode, __p__commode, and __initenv
Any executable for the x86 processor will have a data segment which will contain any static text that was included in the program. Back to cmd.exe (offset 0x42C8) is the text "S.o.f.t.w.a.r.e..P.o.l.i.c.i.e.s..M.i.c.r.o.s.o.f.t..W.i.n.d.o.w.s..S.y.s.t.e.m.". The string takes twice as many characters as is normally necessary because it was stored using double-wide characters, probably for internationalization. Error codes or messages are a prime source here.
At offset B1B0 is "p.u.s.h.d" followed by mkdir, rmdir, chdir, md, rd, and cd; I left out the unprintable characters for readability. Those are all command arguments to cmd.exe.
For other programs, I've sometimes been able to find the path from which a program was compiled.
So, yes, it is possible to determine the source language from the binary.
I'm not a compiler hacker (someday, I hope), but I figure that you may be able to find telltale signs in a binary file that would indicate what compiler generated it and some of the compiler options used, such as the level of optimization specified.
Strictly speaking, however, what you're asking is impossible. It could be that somebody sat down with a pen and paper and worked out the binary codes corresponding to the program that they wanted to write, and then typed that stuff out in a hex editor. Basically, they'd be programming in assembly without the assembler tool. Similarly, you may never be able to tell with certainty whether a native binary was written in straight assembler or in C with inline assembly.
As for virtual machine environments such as JVM and .NET, you should be able to identify the VM by the byte codes in the binary executable, I would expect. However you may not be able to tell what the source language was, such as C# versus Visual Basic, unless there are particular compiler quirks that tip you off.
what about these tools:
PE Detective
PEiD
both are PE Identifiers. ok, they're both for windows but that's what it was when i landed here
I expect you could, if you disassemble the source, or at least you may know the compiler, as not all compilers will use the same code for printf for example, so Objective-C and gnu C should differ here.
You have excluded all byte-code languages so this issue is going to be less common than expected.
First, run what on some binaries and look at the output. CVS (and SVN) identifiers are scattered throughout the binary image. And most of those are from libraries.
Also, there's often a "map" to the various library functions. That's a big hint, also.
When the libraries are linked into the executable, there is often a map that's included in the binary file with names and offsets. It's part of creating "position independent code". You can't simply "hard-link" the various object files together. You need a map and you have to do some lookups when loading the binary into memory.
Finally, the start-up module for C, C++ (and I imagine C#) is unique to that compiler's defaiult set of libraries.
Well, C is initially converted the ASM, so you could write all C code in ASM.
No, the bytecode is language agnostic. Different compilers could even take the same code source and generate different binaries. That's why you don't see general purpose decompilers that will work on binaries.
The command 'strings' could be used to get some hints as to what language was used (for instance, I just ran it on the stripped binary for a C application I wrote and the first entries it finds are the libraries linked by the executable).
I have a windows DLL that currently only supports ASCII and I need to update it to work with Unicode strings. This DLL currently uses char* strings in a number of places, along with making a number of ASCII Windows API calls (like GetWindowTextA, RegQueryValueExA, CreateFileA, etc).
I want to switch to using the unicode/ascii macros defined in VC++. So instead of char or CHAR I'd use TCHAR. For char* I'd use LPTSTR. And I think things like sprintf_s would be changed to _stprintf_s.
I've never really dealt with unicode before, so I'm wondering if there are any common pitfalls I should look out for while doing this. Should it just be as simple as replacing the types and method names with the proper macros be enough, or are there other complications to look out for?
First read this article by Joel Spolsky: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
Then run through these links on Stack Overflow: What do I need to know about Unicode?
Generally, you are looking for any code that assumes one character = one byte (memory/buffer allocation, etc). But the links above will give you a pretty good rundown of the details.
The biggest danger is likely to be buffer sizes. If your memory allocations are made in terms of sizeof(TCHAR) you'll probably be OK, but if there is code where the original programmer was assuming that characters were 1 byte each and they used integers in malloc statements, that's hard to do a global search for.