I am currently following a course at my University in which, at this stage, we learn about the assembler code behind certain C/C++ constructs.
The workflow usually goes like this: the lab assistant briefly speaks about a topic, we figure out the quirks and then solve some totally random problem using inline assembly.
(For example: He briefly talks about how struct (members) are stored in memory, we figure out the pattern and then we write the solution using inline assembly to a simple problem in which we use a struct.)
The lab assistant (as well as the rest of the group) is using the Visual C++ compiler and debugger (for disassembly) for his demonstrations however I cannot use it due to ethical reasons and thus I opted for g++ and gdb.
What I find awkward about g++'s inline assembly compared to Visual C++ is the fact that:
If I want to write a 'block' of inline assembly I have two options: Have a single asm("..") construct in which each instruction is preceded by a \n\t (leads to a lot of clutter). Or have each instruction in its own asm("..") block (leads to a lot of typing).
If I want to reference a local variable in the inline assembly I have to either use the extended syntax or reference it by using offsets to esp/ebp.
In respect to the two issues above I prefer the Visual C++'s inline assembly style in which in order to write an asm block all I have to do is __asm { .. } and write each instruction on a new line and in order to reference a variable I just have to write its name.
Throughout my searches I have discovered that Apple's g++ supports the same syntax as Visual C++ with a switch (-fasm-blocks) however this does not seem to be the case for GNU g++.
In the hopes that I might have missed something I am asking here if it is possible to compile Visual C++ like inline assembly blocks under GNU g++.
The syntax you are referring to is not Microsoft specific. As you have found, Apple had it too (although Apple gave up on GCC and switched to Clang). AFAIK, Metrowerks supports the same syntax. GCC does not support it (probably because GCC guys believe that GCC is so good that nobody needs to write assembly anymore :-)). However, there is no need to type \n\t all the time, you can replace it with ;. For example:
void foo()
{
asm("xor %eax,%eax;"
"rep; nop;"
"nop;"
"sfence;"
"nop;");
}
Hope it helps. Good Luck!
Related
I have some C code I would like to optimize. It turns out the Intel C Compiler (ICC) does a much better job at this than GCC but I don't have a copy of that compiler and it is very expensive. However, I can compile it using ICC and get the assembly online at godbolt.org.
If I copy and paste this assembly into a text file, how can I then convert it into a functioning executable?
You will need to begin by making sure that the runtime environment for which godbolt.org compiles is similar enough to your runtime environment, (good luck with that,) because for example you may be using windows, and godbolt.org may be using linux, (or the other way around,) so when you bring the assembly to your system you might be able to convert it to object code, but it will still not link and it will not run.
Then you will need to find an assembler for your platform which is compatible with the syntax of assembly produced by the intel C compiler of godbolt.org so as to produce object files from the assembly files. (Good luck with that.)
Then you will need to find any and all runtime libraries (redistributables) required by code produced by the intel C compiler. (Good luck with that.)
Finally you will need to obtain a linker to link your resulting object files with the runtime libraries to produce an executable. (Good luck with that.)
Sometimes we need honest answers to our questions just so that we can realize how impossible our ideas are.
I've decided to learn assembler through online tutorials.
I've come across this one that uses the NASM compiler, which most other tutorials seem to as well:
http://www.tutorialspoint.com/assembly_programming/index.htm
I've also come across this youtube series "Assembly primer for hackers"
https://www.youtube.com/watch?v=K0g-twyhmQ4&list=PLue5IPmkmZ-P1pDbF3vSQtuNquX0SZHpB
This one uses what the guy describes as the 'generic linux compiler' (owtte).
The commands for compiling go something like this:
as -o file.o file.s
Where file.s is the assembly source code. Followed by:
ld -o file file.o
Where file is then the executable.
Each of the tutorials uses a different syntax (e.g. a register in the latter tutorial is always preceded by %. NB. There do appear to be less superficial differences in the syntax than this as well). Are these syntaxes decided by the individual compiler?
I was also initially confused when I tried to compile code from the NASM tutorial with the latter method. I was always under the impression that the instruction set had to depend on the CPU and it therefore shouldn't matter which compiler I use. I've just concluded that it's merely differences in syntax but is that correct?
I'm running a Linux computer, by the way, on kernel 4.1.6.
My main question is really which syntax do I use? Is it just a matter of choice? Is one more widely used than the other? Thanks for any help.
Each of the tutorials uses a different syntax (e.g. a register in the
latter tutorial is always preceded by %. NB. There do appear to be
less superficial differences in the syntax than this as well). Are
these syntaxes decided by the individual compiler?
Yes, different assemblers (= assembly language compilers) might use different assembler language syntax although they provide code for the same processor and platform.
My main question is really which syntax do I use? Is it just a matter
of choice? Is one more widely used than the other?
One assembler, like NASM, might go for a wide range of processors and platforms, in this case you would benefit from learning its syntax when you need to work with several processors or platforms.
In other cases it might be better to stick with the assembler of some prominent vendor, because it is widely used and you can find more example code on the net for it which might help you with your development.
Last not least you might simply prefer a particular assembler because you like its features or syntax.
If your'e on a Windows system, Microsoft's MASM (ML.EXE or ML64.exe for 64 bit) syntax is virtually the same as Intel's syntax. MASM (ML.EXE and ML64.EXE) is included with the free Visual Studio express editions, although you usually have to create a custom build step to invoke the assembler in a VS project. VS express includes a good source level debugger.
If you're on a Linux type system, then you'll probably use AT&T syntax, which I assume ended up that way since it was a conversion of some generic assembler. I don't know which assembler(s) to recommend for Linux.
I was told to avoid using features of C++ like these as it makes it difficult to port the code to other compilers.
The example I was given was using #ifdef instead of #pragma once in my header files.
Well, this is challenging to answer, because each compiler is different - and, more specifically, #pragma statements are not a feature of C++. #pragma means, by definition "a command to send to the compiler":
"Pragmas are machine- or operating system-specific by definition, and are usually different for every compiler." MSDN
so, essentially, whenever you see #pragma, it means "what follows next is not part of the language standard, and so may be different for every platform you target/compile on"
Those are not "C++ features", they are non-standard "extensions", non-standard functions, and "compiler features" provided by compiler developer.
short and incomplete list of microsoft-specific features that will cause trouble during porting:
pragma once. (and pretty much every pragma) Will be ignored by another compiler, which will result in multiple header inclusions. Can cause trouble.
__int32 and similar types (microsoft specific)
Everything that comes from windows.h - DWORD/WORD/HANDLE/TCHAR. Also os-specific API and system calls. This includes WinMain().
every builtin type, macros and keyword that starts with two underscores (____FUNCTION____, __int32, __declspec, etc).
Certain versionf of *printf functions - swprintf, vswprintf, etc. Some format (%S) specifications behave differently on different compilers.
*_s functions (strcpy_s, etc).
Here's a list of nonstandard behaviour in VC++: http://msdn.microsoft.com/en-us/library/x84h5b78%28VS.71%29.aspx
The very clean, but non-portable for each, in statement: Visual c++ "for each" portability
I responded to another question about developing for the iPhone in non-Objective-C languages, and I made the assertion that using, say, C# to write for the iPhone would strike an Apple reviewer wrong. I was speaking largely about UI elements differing between the ObjC and C# libraries in question, but a commenter made an interesting point, leading me to this question:
Is it possible to determine the language a program is written in, solely from its binary? If there are such methods, what are they?
Let's assume for the purposes of the question:
That from an interaction standpoint (console behavior, any GUI appearance, etc.) the two are identical.
That performance isn't a reliable indicator of language (no comparing, say, Java to C).
That you don't have an interpreter or something between you and the language - just raw executable binary.
Bonus points if you're language-agnostic as possible.
Short answer: YES
Long answer:
If you look at a binary, you can find the names of the libraries that have been linked in. Opening cmd.exe in TextPad easily finds the following at hex offset 0x270: msvcrt.dll, KERNEL32.dll, NTDLL.DLL, USER32.dll, etc. msvcrt is the Microsoft 'C' runtime support functions. KERNEL32, NTDLL, and USER32.dll are OS specific libraries which tell you either the target platform, or the platform on which it was built, depending on how well the cross-platform development environment segregates the two.
Setting aside those clues, most any c/c++ compiler will have to insert the names of the functions into the binary, there is a list of all functions (or entrypoints) stored in a table. C++ 'mangles' the function names to encode the arguments and their types to support overloaded methods. It is possible to obfuscate the function names but they would still exist. The functions signatures would include the number and types of the arguments which can be used to trace into the system or internal calls used in the program. At offset 0x4190 is "SetThreadUILanguage" which can be searched for to find out a lot about the development environment. I found the entry-point table at offset 0x1ED8A. I could easily see names like printf, exit, and scanf; along with __p__fmode, __p__commode, and __initenv
Any executable for the x86 processor will have a data segment which will contain any static text that was included in the program. Back to cmd.exe (offset 0x42C8) is the text "S.o.f.t.w.a.r.e..P.o.l.i.c.i.e.s..M.i.c.r.o.s.o.f.t..W.i.n.d.o.w.s..S.y.s.t.e.m.". The string takes twice as many characters as is normally necessary because it was stored using double-wide characters, probably for internationalization. Error codes or messages are a prime source here.
At offset B1B0 is "p.u.s.h.d" followed by mkdir, rmdir, chdir, md, rd, and cd; I left out the unprintable characters for readability. Those are all command arguments to cmd.exe.
For other programs, I've sometimes been able to find the path from which a program was compiled.
So, yes, it is possible to determine the source language from the binary.
I'm not a compiler hacker (someday, I hope), but I figure that you may be able to find telltale signs in a binary file that would indicate what compiler generated it and some of the compiler options used, such as the level of optimization specified.
Strictly speaking, however, what you're asking is impossible. It could be that somebody sat down with a pen and paper and worked out the binary codes corresponding to the program that they wanted to write, and then typed that stuff out in a hex editor. Basically, they'd be programming in assembly without the assembler tool. Similarly, you may never be able to tell with certainty whether a native binary was written in straight assembler or in C with inline assembly.
As for virtual machine environments such as JVM and .NET, you should be able to identify the VM by the byte codes in the binary executable, I would expect. However you may not be able to tell what the source language was, such as C# versus Visual Basic, unless there are particular compiler quirks that tip you off.
what about these tools:
PE Detective
PEiD
both are PE Identifiers. ok, they're both for windows but that's what it was when i landed here
I expect you could, if you disassemble the source, or at least you may know the compiler, as not all compilers will use the same code for printf for example, so Objective-C and gnu C should differ here.
You have excluded all byte-code languages so this issue is going to be less common than expected.
First, run what on some binaries and look at the output. CVS (and SVN) identifiers are scattered throughout the binary image. And most of those are from libraries.
Also, there's often a "map" to the various library functions. That's a big hint, also.
When the libraries are linked into the executable, there is often a map that's included in the binary file with names and offsets. It's part of creating "position independent code". You can't simply "hard-link" the various object files together. You need a map and you have to do some lookups when loading the binary into memory.
Finally, the start-up module for C, C++ (and I imagine C#) is unique to that compiler's defaiult set of libraries.
Well, C is initially converted the ASM, so you could write all C code in ASM.
No, the bytecode is language agnostic. Different compilers could even take the same code source and generate different binaries. That's why you don't see general purpose decompilers that will work on binaries.
The command 'strings' could be used to get some hints as to what language was used (for instance, I just ran it on the stripped binary for a C application I wrote and the first entries it finds are the libraries linked by the executable).
Is it possible to create, edit, link, compile (is compile the word?) etc. assembly code in MSVC++?
Also, if it's not possible, how can I create an .exe out of plain text, ie: convert the text into whatever format is required to use assembly code, then turn the assembly code into an .exe. (I'd say compile, but I don't think that is the correct word here).
And finally, what are some good places to begin learning assembly code? Written in a way that someone who has little experience can use.
I know some of these questions are probably very stupid, but I have absolutely no experience in assembly code and am not exactly sure where to start.
On x86, yes. You can use the __asm keyword to put assembly inline in your standard source files, and use the normal MS compile/link tools to compile everything together.
On x64 (or x86), you may need to use the ML and ML64 command line compilers for assembly.
Visual Studio provides the __asm keyword for compiling inline assembly in c and c++. There is also a good discussion here on the use of inline assembly. However if you are just talking about compiling assembly on it's own I'm not sure if Visual C++ is the correct tool however I'm pretty sure visual studio ships with the MASM assembler.
In short, yes.
According to Wikipedia, MASM has been shipped with all versions of Visual C later than VC6, and is also available in the Windows Driver Developer Kit. Versions supporting 16-bit real and protected modes, 32-bit, and 64-bit are all supported.
You can use the __asm keyword to write inline assembly.
pcasm-book(pdf) is a good tutorial to start assembly code programming.
Yes, sort of.
C:\Program Files\Microsoft Visual Studio 9.0\vc\bin>ml
Microsoft (R) Macro Assembler Version 9.00.30729.01
Copyright (C) Microsoft Corporation. All rights reserved.
usage: ML [ options ] filelist [ /link linkoptions]
Run "ML /help" or "ML /?" for more info
You'd use the macro assembler. I don't know if Visual Studio will automatically "do the right thing" with .asm files, though, but you can certainly edit them with it and assemble them with ml.exe.
A good place to start learning assembly language might actually be by learning about reverse engineering.
Look for information on the C++ 'asm' keyword. It may be compiler specific, but I know VC++ supports it.