Why I can see class/struct names in .exe file compiled using Visual C++? - visual-c++

When looking at a compiled relese .exe file binary I can find class/struct names in it! Which is odd - obviously there is no need in these symbols. What concerns me is such symbols can be used for reverse-engeniging my software, imposing a big risk to software license protection.
For example, I can find text .?AVCMySecureKeyManager (original class name is CMySecureKeyManager, it looks like to all names is added prefix ".?AV"), easy to guess what my code is doing, right?.. Looks like an open door for hackers.
Particularly, I can tell that I've enabled all possible optimizations the Visual C++ compiler/linker options, turned off all Browse/Debug Info generation, perhaps I'm missing something?

You're seeing RTTI (Run-time Type Information). If you don't use dynamic_cast or typeid in your code, you can usually turn it off safely. Please note that exceptions always use RTTI (for the catch statement matching) and it's not possible to disable it for them.
If you do need dynamic_cast, then you can scrub the names from the EXE after compilation. The code does not depend on the actual name strings, but just their addresses.
That said, the class names, while useful, are not critical in reverse-engineering. Don't rely on their absence as a guarantee.

Related

How to Decompile Bytenode "jsc" files?

I've just seen this library ByteNode it's the same as ByteCode of java but this is for NodeJS.
This library compiles your JavaScript code into V8 bytecode, which protect your source code, I'm wondering is there anyway to Decompile byteNode therefore it's not secure enough. I'm wondering because I would like to protect my source code using this library?
TL;DR It'll raise the bar to someone copying the code and trying to pass it off as their own. It won't prevent a dedicated person from doing so. But the primary way to protect your work isn't technical, it's legal.
This library compiles your JavaScript code into V8 bytecode, which protect your source code...
Well, we don't know it's V8 bytecode, but it's "compiled" in some sense. All we know is that it creates a "code cache" via the built-in vm.Script.prototype.createCachedData API, which is officially just a cache used to speed up recompiling the code a second time, third time, etc. In theory, you're supposed to also provide the original source code as a string to the vm.Script constructor. But if you go digging into Node.js's vm.Script and V8 far enough it seems to be the actual code in some compiled form (whether actual V8 bytecode or not), and the code string you give it when running is ignored. (The ByteNode library provides a dummy string when running the code from the code cache, so clearly the actual code isn't [always?] needed.)
I'm wondering is there anyway to Decompile byteNode therefore it's not secure enough.
Naturally, otherwise it would be useless because Node.js wouldn't be able to run it. I didn't find a tool to do it that already exists, but since V8 is open source, it would presumably be possible to find the necessary information to write a decompiler for it that outputs valid JavaScript source code which someone could then try to understand.
Experimenting with it, local variable names appear to be lost, although function names don't. Comments appear to get lost (this may not be as obvious as it seems, given that Function.prototype.toString is required to either return the original source text or a synthetic version [details]).
So if you run the code through a minifier (particularly one that renames functions), then run it through ByteNode (or just do it with vm.Script yourself, ByteNode is a fairly thin wrapper), it will be feasible for someone to decompile it into something resembling source code, but that source code will be very hard to understand. This is very similar to shipping Java class files, which can be decompiled (there's even a standard tool to do it in the JDK, javap), except that the format Java class files are well-documented and don't change from one dot release to the next (though they can change from one major release to another; new releases always support the older format, though), whereas the format of this data is not documented (though it's an open source project) and is subject to change from one dot release to the next.
Certain changes, such as changing the copyright message, are probably fairly easy to make to said source code. More meaningful changes will be harder.
Note that the code cache appears to have a checksum or other similar integrity mechanism, since directly editing the .jsc file to swap one letter for another in a literal string makes the code cache fail to load. So someone tampering with it (for instance, to change a copyright notice) would either need to go the decompilation/recompilation route, or dive into the V8 source to find out how to correct the integrity check.
Fundamentally, the way to protect your work is to ensure that you've put all the relevant notices in the relevant places such that the fact copying it is a violation of copyright is clear, then pursue your legal recourse should you find out about someone passing it off as their own.
is there any way
You could get a hundred answers here saying "I don't know a way", but that still won't guarantee that there isn't one.
not secure enough
Secure enough for what? What's your deployment scenario? What kind of scenario/attack are you trying to defend against?
FWIW, I don't know of an existing tool that "decompiles" V8 bytecode (i.e. produces JavaScript source code with the same behavior). That said, considering that the bytecode is a fairly straightforward translation of the source code, I'm sure it wouldn't be very hard to write such a tool, if someone had a reason to spend some time on it. After all, V8's JS-to-bytecode compiler is open source, so one would only have to look at those sources and implement the reverse direction. So I would assume that shipping as bytecode provides about as much "protection" as shipping as uglified JavaScript, i.e. none that I would trust.
Before you make any decisions, please also keep in mind that bytecode is considered an internal implementation detail of V8; in particular it is not versioned and can change at any time, so it has to be created by exactly the same V8 version that consumes it. If you want to update your Node.js you'll have to recreate all the bytecode, and there is no checking or warning in place that will point out when you forgot to do that.
Node.js source already contains code for decompiling binary bytecode.
You can get a text string from your V8 bytecode and then you would need to analyze it.
But text string would be very long and miss some important information such as a constant pool. So you need to modify the Node.js source.
Please check https://github.com/3DGISKing/pkg10.17.0
I have attached exported xml file.
If you study V8, it would be possible to analyze it and get source code from it.
It keeping it short and sweet, You can try Ghidra node.js package which is based on Ghidra reverse engineering framework which was open-sourced by NSA in the year 2019. Ghidra is capable of disassembling and decompiling the v8 bytecode. The inner working of disassembling is quite complex, this answer is short but sufficient.

How to obfuscate string of variable, function and package names in Golang binary?

When use command "nm go_binary", I find the names of variables, functions and packages and even the directory where my code is located are all displayed, is there any way to obfuscate the binary generated by the command "go build" and prevent go binary from being exploited by hackers?
Obfuscating can't stop reverse engineering but in a way prevent info leakage
That is what burrowers/garble (Go 1.16+, Feb. 2021):
Literal obfuscation
Using the -literals flag causes literal expressions such as strings to be replaced with more complex variants, resolving to the same value at run-time.
This feature is opt-in, as it can cause slow-downs depending on the input code.
Literal expressions used as constants cannot be obfuscated, since they are resolved at compile time. This includes any expressions part of a const declaration.
Tiny mode
When the -tiny flag is passed, extra information is stripped from the resulting Go binary.
This includes line numbers, filenames, and code in the runtime that prints panics, fatal errors, and trace/debug info.
All in all this can make binaries 2-5% smaller in our testing, as well as prevent extracting some more information.
With this flag, no panics or fatal runtime errors will ever be printed, but they can still be handled internally with recover as normal.
In addition, the GODEBUG environmental variable will be ignored.
But:
Exported methods are never obfuscated at the moment, since they could be required by interfaces and reflection. This area is a work in progress.
I think the best answer to this question is here How do I protect Python code?, specifically this answer.
While that question is about Python, it applies to all code in general.
I was gonna mark this question as a duplicate, but maybe someone will provide more insight into it.

How to restore an accidentally overwritten source file using the object file

By mistake, I erased contents of my Fortran source file with a command involving ">":
some command > file.f
I do not use version control or anything. However, there is an object file present, file.o, if that may be of any help.
Is there a chance to restore the contents of file.f?
There may be decompiler tools that can produce Fortran source code from compiled object code, but it's not the original source code: things like comments and local variable names are discarded during the compilation process and are not present in the object file, so they can't be recovered. The structure of the decompiled code is likely to be different as well, especially if the object file was compiled with optimization.
You're not going to get your original code back from an object file, unfortunately.
Decompilation will work fine with bytecode languages like Java which are more or less "designed for that purpose".
With an optimizing compiler, such as Fortran (or C, or C++) you are pretty much out of luck.
Tools exist that restore some kind of source file (such as "boomerang"), but it will be nowhere near the original, and usually it is a waste of time even trying.
Given the nature of the compilation process, it is often not even possible to reverse the operation. Not only is information such as variable names or the names of non-exported functions (and of course comments) discarded and constants are replaced with magic numbers, but also the compiled program may have an entirely different structure than the code that you have written.
Compilers regularly perform optimizations like moving invariants out of loops, rearranging statements, or eleminating common subexpressions (even when optimizations are not explicitly enabled, most compilers do trivial optimizations anyway).
A compiler is required to produce something that behaves "as if" as observed from the outside, but not something that is necessarily equivalent to the source code that you have written.
A similar phenomenon exists when stepping through a program in a debugger. Sometimes, variables cannot be watched, or you cannot break on a particular line, and entire statements will apparently just be "gone" much to the surprise of the unaware developer because the compiler optimized them out.
In summary, the single best advice that I can give, unhelpful as it may be, is to acknowledge that you have done something stupid, rewrite the source file from scratch, and start using a version control system now.

Preprocess only local #includes into single file?

I understand VC++ will let you emit C++ source files which are the result of preprocessor operations e.g. macros are expanded and includes "copy-pasted in line".
Is it possible to restrict this simply to embed included files, which are files in my own project rather than standard libraries?
From the outside there's no way you can tell from which syntax form (<> or "") the content is being preprocessed. Unless a kind of API was exposed by the preprocessor, which is not the case here.
A not so elegant (and not strictly correct) solution I could propose would be to index a preprocessed version of all Standard headers (there are not that many) and after preprocessing the source of interest you could run a string matching script to detect the known files and remove the corresponding content from the final output.
Notice this is subjected to flaws because the #include system is purely textual and influenced by whatever macros are (un)defined at the time of inclusion and order matters. But depending on the complexity of the code you're working on this might give reasonable results.
By the way, may I ask what is the ultimate goal of your task?
Edit: Or actually... Maybe it's possible that you filter the sources before-hand to remove the undesired #includes and then submit it to preprocessing?

Determine source language from a binary?

I responded to another question about developing for the iPhone in non-Objective-C languages, and I made the assertion that using, say, C# to write for the iPhone would strike an Apple reviewer wrong. I was speaking largely about UI elements differing between the ObjC and C# libraries in question, but a commenter made an interesting point, leading me to this question:
Is it possible to determine the language a program is written in, solely from its binary? If there are such methods, what are they?
Let's assume for the purposes of the question:
That from an interaction standpoint (console behavior, any GUI appearance, etc.) the two are identical.
That performance isn't a reliable indicator of language (no comparing, say, Java to C).
That you don't have an interpreter or something between you and the language - just raw executable binary.
Bonus points if you're language-agnostic as possible.
Short answer: YES
Long answer:
If you look at a binary, you can find the names of the libraries that have been linked in. Opening cmd.exe in TextPad easily finds the following at hex offset 0x270: msvcrt.dll, KERNEL32.dll, NTDLL.DLL, USER32.dll, etc. msvcrt is the Microsoft 'C' runtime support functions. KERNEL32, NTDLL, and USER32.dll are OS specific libraries which tell you either the target platform, or the platform on which it was built, depending on how well the cross-platform development environment segregates the two.
Setting aside those clues, most any c/c++ compiler will have to insert the names of the functions into the binary, there is a list of all functions (or entrypoints) stored in a table. C++ 'mangles' the function names to encode the arguments and their types to support overloaded methods. It is possible to obfuscate the function names but they would still exist. The functions signatures would include the number and types of the arguments which can be used to trace into the system or internal calls used in the program. At offset 0x4190 is "SetThreadUILanguage" which can be searched for to find out a lot about the development environment. I found the entry-point table at offset 0x1ED8A. I could easily see names like printf, exit, and scanf; along with __p__fmode, __p__commode, and __initenv
Any executable for the x86 processor will have a data segment which will contain any static text that was included in the program. Back to cmd.exe (offset 0x42C8) is the text "S.o.f.t.w.a.r.e..P.o.l.i.c.i.e.s..M.i.c.r.o.s.o.f.t..W.i.n.d.o.w.s..S.y.s.t.e.m.". The string takes twice as many characters as is normally necessary because it was stored using double-wide characters, probably for internationalization. Error codes or messages are a prime source here.
At offset B1B0 is "p.u.s.h.d" followed by mkdir, rmdir, chdir, md, rd, and cd; I left out the unprintable characters for readability. Those are all command arguments to cmd.exe.
For other programs, I've sometimes been able to find the path from which a program was compiled.
So, yes, it is possible to determine the source language from the binary.
I'm not a compiler hacker (someday, I hope), but I figure that you may be able to find telltale signs in a binary file that would indicate what compiler generated it and some of the compiler options used, such as the level of optimization specified.
Strictly speaking, however, what you're asking is impossible. It could be that somebody sat down with a pen and paper and worked out the binary codes corresponding to the program that they wanted to write, and then typed that stuff out in a hex editor. Basically, they'd be programming in assembly without the assembler tool. Similarly, you may never be able to tell with certainty whether a native binary was written in straight assembler or in C with inline assembly.
As for virtual machine environments such as JVM and .NET, you should be able to identify the VM by the byte codes in the binary executable, I would expect. However you may not be able to tell what the source language was, such as C# versus Visual Basic, unless there are particular compiler quirks that tip you off.
what about these tools:
PE Detective
PEiD
both are PE Identifiers. ok, they're both for windows but that's what it was when i landed here
I expect you could, if you disassemble the source, or at least you may know the compiler, as not all compilers will use the same code for printf for example, so Objective-C and gnu C should differ here.
You have excluded all byte-code languages so this issue is going to be less common than expected.
First, run what on some binaries and look at the output. CVS (and SVN) identifiers are scattered throughout the binary image. And most of those are from libraries.
Also, there's often a "map" to the various library functions. That's a big hint, also.
When the libraries are linked into the executable, there is often a map that's included in the binary file with names and offsets. It's part of creating "position independent code". You can't simply "hard-link" the various object files together. You need a map and you have to do some lookups when loading the binary into memory.
Finally, the start-up module for C, C++ (and I imagine C#) is unique to that compiler's defaiult set of libraries.
Well, C is initially converted the ASM, so you could write all C code in ASM.
No, the bytecode is language agnostic. Different compilers could even take the same code source and generate different binaries. That's why you don't see general purpose decompilers that will work on binaries.
The command 'strings' could be used to get some hints as to what language was used (for instance, I just ran it on the stripped binary for a C application I wrote and the first entries it finds are the libraries linked by the executable).

Resources