This should be very simple question. There are many programming languages out there, compiled into machine code or managed code. I first started with ASM back in high school. Assembler is very nice, since you know what exactly CPU does. Next, (as you can see from my other questions here) I decided to learn C and C++. I chose C becouse from what I read it is the language with output most close to assembler-written programs.
But, what I want to know is, can any other Windows programming language out there call win32 API? To be exact, like C has its special header and functions for win32 api interactions, is this assumed to be some important part of programming language? Or are there any languages that have no support for calling win32 API, or just use console to IO and some functions for basic file IO? Becouse, for Windows programming with graphic output, it is essential to have acess to win32 API. I know this question might seem silly, but still please, help me, I ask for study porposes. Thanks.
Lots of different languages have a way of opening and using windows DLL files so you can just open the system DLLs which contain the API functions and use them.
Some languages such as C help you out by providing a nice header file with everything already defined.
The only other language i've ever seen that has direct access to the WinAPI without needing to open any library beforehand is a BASIC dialect called Purebasic.
Are you asking how to call Win32 from assembly?
Just use MASM (or TASM, or...)
Example hello world calling Win32:
==== HelloWin.asm ==============================
.586
.model flat, stdcall
EXTERN MessageBoxA#16:NEAR
.data
szCaption db 'Hello World',0
szAppName db 'HelloWorld',0
.code
start:
push 0
push offset szCaption
push offset szAppName
push 0
call MessageBoxA#16
ret
end start
===================================================
To assemble:
ml.exe /coff /c HelloWin.asm
To link:
link /subsystem:windows HelloWin.obj /defaultlib:C:\masm32\lib\user32.lib
The first language that I used to get to the Windows API was VB4. Yes most languages can get to the API in some manner.
Win32 functionality is available from C, C++, VB, VB .net and C#. In the later two, you generally use the nice CLR libraries, but you can call native (unmanaged) APIs directly if you know the right syntactic sugar to sprinkle around.
Win32 usage is not limited to the above list. It is a C API for a reason: so that any language that essentially knows how to make the right kind of function call can call them. And in this case, "the right kind" is stdcall. All the language's compiler (or whatever) has to do is load the right DLL, push the arguments (and other info) onto the stack in the right order, and you're good to go.
Related
I have a question that was bugging me for a while, sorry if it is rookie question:
Is there a way to develop an application with more than one programming language?
Today i was looking for a Video Player in Linux and i saw this:
MPV Player: Written in C, Objective-C, Lua and Python, MPV is...
Can anyone explain how they write an application in multiple language?
Thanks for helping...
You just need the ability to call a function written in one language (and implementation) from a function written in another one (and its implementation).
The lua scripting language (and interpreter) was designed to be easily embeddable in C applications. Read the chapter The Application Program Interface of Lua manual (Lua uses some stack for arguments and results). And Guile is also designed to be easily embeddable, and has a nice tutorial explaining that (you give the arity of your foreign functions to the Guile runtime).
Sometimes, you need to follow specific conventions (which depends upon the implementation) to call a foreign function. For examples, Python has a chapter on Extending and Embedding The Python Interpreter; C++ code need to annotate with extern "C" the declaration of functions coded in C (or callable from C), Ocaml's manual has a chapter about Interfacing C with Ocaml, etc.
More generally, be aware of calling conventions and of ABIs. Sometimes, you might want to use libffi which enables you to call functions of signature known only at runtime.
BTW, MPV is open-source, so why don't you study its source code?
This article would gives you answer of your question :
https://www.computerworld.com/article/2467812/internet/polyglot-programming----development-in-multiple-languages.html
I was browsing the Haskell Platform documentation and found this library.
It has only one line of explanation: A collection of FFI declarations for interfacing with Win32.
Is this library intended for building UI on windows ?
If so could anyone show a short example ?
TL;DR: Yes, but don't. Ugliness alert.
It is indeed a set of bindings directly onto the win32 API, which means you can use it to make a UI, but you essentially have to write like a C programmer who doesn't have a toolkit.
It's not pretty, and I'd like to strongly recommend you use a toolkit like GTK or WX, or better still a Functional Reactional Programming library like reactive banana. Those libraries will give you much more idiomatic and eassy-to-understand Haskell code, and portability comes for free.
Occasionally some library you use doesn't feature something you need, whereupon you might want to delve into the windows API.
If you're determined to use this, you need a good Win32 API tutorial to learn from, together with a good reference whilst actually coding. There are loads out there if you google, and plenty of books, but none of them fit into a stackoverflow answer. Whilst I don't know of any Win32 API tutorials written in Haskell, using the bindings provided in Graphics.Win32 means all the function names match up with those in the online documentation, so you should be able to translate.
For my university, final-year dissertation, I am going to implement a compiler for a skeletal form of the C programming language, then go about extending it until it resembles something a little more like Java with array bounds checking, type-checking and so forth.
I am relatively competent at much of the theory that relates to compiler construction, and have experience programming in MIPS assembly language, so I do understand a little of what it is to write extremely low-level code.
My main concern is that I am likely to be able to get all the way to the point where I need to produce the actual machine-code output, but then not understand enough about how machine code is executed from the perspective of the operating system running it.
So, my actual question is basically, "does anyone know the best place to read up about writing assembly to run on an intel x86-64 processor under linux?"
The main gap in my knowledge is how the machine code is actually run in practise. Is it run directly on the processor, making "syscall"s (or the x86 equivalent) when it needs services provided by the kernel, or is the assembly language somehow an encapsulated description that tells the kernel how to execute the instructions (in a manner similar to an interpreted language such as Java)?
Any help you can provide would be greatly appreciated.
This document explains how you can implement a foreign function interface to interact with other code: http://www.x86-64.org/documentation/abi.pdf
Firstly, for the machine code start here: http://www.intel.com/products/processor/manuals/
Next, I assume your question about how the machine code is run is really about how the OS loads the exe into memory and calls main()? These links may help
Linkers and loaders:
http://www.linuxjournal.com/article/6463
ELF file format:
http://en.wikipedia.org/wiki/Executable_and_Linkable_Format and
http://www.linuxjournal.com/article/1060
Your machine code will go into the .text section of the executable
Finally, best of luck. Your project is similar to my final year project, except I targeted the JVM and compiled a subset of Visual Basic!
I responded to another question about developing for the iPhone in non-Objective-C languages, and I made the assertion that using, say, C# to write for the iPhone would strike an Apple reviewer wrong. I was speaking largely about UI elements differing between the ObjC and C# libraries in question, but a commenter made an interesting point, leading me to this question:
Is it possible to determine the language a program is written in, solely from its binary? If there are such methods, what are they?
Let's assume for the purposes of the question:
That from an interaction standpoint (console behavior, any GUI appearance, etc.) the two are identical.
That performance isn't a reliable indicator of language (no comparing, say, Java to C).
That you don't have an interpreter or something between you and the language - just raw executable binary.
Bonus points if you're language-agnostic as possible.
Short answer: YES
Long answer:
If you look at a binary, you can find the names of the libraries that have been linked in. Opening cmd.exe in TextPad easily finds the following at hex offset 0x270: msvcrt.dll, KERNEL32.dll, NTDLL.DLL, USER32.dll, etc. msvcrt is the Microsoft 'C' runtime support functions. KERNEL32, NTDLL, and USER32.dll are OS specific libraries which tell you either the target platform, or the platform on which it was built, depending on how well the cross-platform development environment segregates the two.
Setting aside those clues, most any c/c++ compiler will have to insert the names of the functions into the binary, there is a list of all functions (or entrypoints) stored in a table. C++ 'mangles' the function names to encode the arguments and their types to support overloaded methods. It is possible to obfuscate the function names but they would still exist. The functions signatures would include the number and types of the arguments which can be used to trace into the system or internal calls used in the program. At offset 0x4190 is "SetThreadUILanguage" which can be searched for to find out a lot about the development environment. I found the entry-point table at offset 0x1ED8A. I could easily see names like printf, exit, and scanf; along with __p__fmode, __p__commode, and __initenv
Any executable for the x86 processor will have a data segment which will contain any static text that was included in the program. Back to cmd.exe (offset 0x42C8) is the text "S.o.f.t.w.a.r.e..P.o.l.i.c.i.e.s..M.i.c.r.o.s.o.f.t..W.i.n.d.o.w.s..S.y.s.t.e.m.". The string takes twice as many characters as is normally necessary because it was stored using double-wide characters, probably for internationalization. Error codes or messages are a prime source here.
At offset B1B0 is "p.u.s.h.d" followed by mkdir, rmdir, chdir, md, rd, and cd; I left out the unprintable characters for readability. Those are all command arguments to cmd.exe.
For other programs, I've sometimes been able to find the path from which a program was compiled.
So, yes, it is possible to determine the source language from the binary.
I'm not a compiler hacker (someday, I hope), but I figure that you may be able to find telltale signs in a binary file that would indicate what compiler generated it and some of the compiler options used, such as the level of optimization specified.
Strictly speaking, however, what you're asking is impossible. It could be that somebody sat down with a pen and paper and worked out the binary codes corresponding to the program that they wanted to write, and then typed that stuff out in a hex editor. Basically, they'd be programming in assembly without the assembler tool. Similarly, you may never be able to tell with certainty whether a native binary was written in straight assembler or in C with inline assembly.
As for virtual machine environments such as JVM and .NET, you should be able to identify the VM by the byte codes in the binary executable, I would expect. However you may not be able to tell what the source language was, such as C# versus Visual Basic, unless there are particular compiler quirks that tip you off.
what about these tools:
PE Detective
PEiD
both are PE Identifiers. ok, they're both for windows but that's what it was when i landed here
I expect you could, if you disassemble the source, or at least you may know the compiler, as not all compilers will use the same code for printf for example, so Objective-C and gnu C should differ here.
You have excluded all byte-code languages so this issue is going to be less common than expected.
First, run what on some binaries and look at the output. CVS (and SVN) identifiers are scattered throughout the binary image. And most of those are from libraries.
Also, there's often a "map" to the various library functions. That's a big hint, also.
When the libraries are linked into the executable, there is often a map that's included in the binary file with names and offsets. It's part of creating "position independent code". You can't simply "hard-link" the various object files together. You need a map and you have to do some lookups when loading the binary into memory.
Finally, the start-up module for C, C++ (and I imagine C#) is unique to that compiler's defaiult set of libraries.
Well, C is initially converted the ASM, so you could write all C code in ASM.
No, the bytecode is language agnostic. Different compilers could even take the same code source and generate different binaries. That's why you don't see general purpose decompilers that will work on binaries.
The command 'strings' could be used to get some hints as to what language was used (for instance, I just ran it on the stripped binary for a C application I wrote and the first entries it finds are the libraries linked by the executable).
I think that java executables (jar files) are trivial to decompile and get the source code.
What about other languages? .net and all?
Which all languages can compile only to a decompile-able code?
In general, languages like Java, C#, and VB.NET are relatively easy to decompile because they are compiled to an intermediary language, not pure machine language. In their IL form, they retain more metadata than C code does when compiled to machine language.
Technically you aren't getting the original source code out, but a variation on the source code that, when compiled, will give you the compiled code back. It isn't identical to the source code, as things like comments, annotations, and compiler directives usually aren't carried forward into the compiled code.
Managed languages can be easily decompiled because executable must contain a lot of metadata to support reflection.
Languages like C++ can be compiled to native code. Program structure can be totally changed during compilation\translation processes.
Compiler can easily replace\merge\delete parts of your code. There is no 1 to 1 relationship between original and compiled (native) code.
.NET is very easy to decompile. The best tool to do that would be the .NET reflector recently acquired by RedGate.
Most languages can be decompiled but some are easier to decompile than others. .Net and Java put more information about the original program in the executables (method names, variable names etc.) so you get more of your original information back.
C++ for example will translate variables and functions etc. to memory adresses (yeah I know this is a gross simplification) so the decompiler won't know what stuff was called. But you can still get some of the structure of the program back though.
VB6 if compiled to pcode is also possible to decompile to almost full source using P32Dasm, Flash (or actionscript) is also possible to decompile to full source using something like Flare