How to restore an accidentally overwritten source file using the object file - linux

By mistake, I erased contents of my Fortran source file with a command involving ">":
some command > file.f
I do not use version control or anything. However, there is an object file present, file.o, if that may be of any help.
Is there a chance to restore the contents of file.f?

There may be decompiler tools that can produce Fortran source code from compiled object code, but it's not the original source code: things like comments and local variable names are discarded during the compilation process and are not present in the object file, so they can't be recovered. The structure of the decompiled code is likely to be different as well, especially if the object file was compiled with optimization.
You're not going to get your original code back from an object file, unfortunately.

Decompilation will work fine with bytecode languages like Java which are more or less "designed for that purpose".
With an optimizing compiler, such as Fortran (or C, or C++) you are pretty much out of luck.
Tools exist that restore some kind of source file (such as "boomerang"), but it will be nowhere near the original, and usually it is a waste of time even trying.
Given the nature of the compilation process, it is often not even possible to reverse the operation. Not only is information such as variable names or the names of non-exported functions (and of course comments) discarded and constants are replaced with magic numbers, but also the compiled program may have an entirely different structure than the code that you have written.
Compilers regularly perform optimizations like moving invariants out of loops, rearranging statements, or eleminating common subexpressions (even when optimizations are not explicitly enabled, most compilers do trivial optimizations anyway).
A compiler is required to produce something that behaves "as if" as observed from the outside, but not something that is necessarily equivalent to the source code that you have written.
A similar phenomenon exists when stepping through a program in a debugger. Sometimes, variables cannot be watched, or you cannot break on a particular line, and entire statements will apparently just be "gone" much to the surprise of the unaware developer because the compiler optimized them out.
In summary, the single best advice that I can give, unhelpful as it may be, is to acknowledge that you have done something stupid, rewrite the source file from scratch, and start using a version control system now.

Related

What is wrong with self-modifying codes with static-recompilation emulations?

I was searching for writing an emulator, and its techniques. But following paragraph made me wondered, I think I couldn't figure out which scenario can be present, if you write a self-modifying code to be static-recompilation emulated.
In this technique, you take a program written in the emulated code and attempt to translate it into the assembly code of your computer. The result will be a usual executable file which you can run on your computer without any special tools. While static recompilation sounds very nice, it is not always possible. For example, you cannot statically recompile self-modifying code as there is no way to tell what it will become without running it. To avoid such situations, you may try combining static recompiler with an interpreter or a dynamic recompiler.
Here is what I was reading, and this line made me wondered:
For example, you cannot statically recompile self-modifying code as there is no way to tell what it will become without running it
A good explanation with examples will be so instructive, thanks.
Edit: By the way, I know the meaning of self-modifying, I just wonder what problems and where will we get problems after statically-recompilation, which thing will make our self-modifying code broken.
Self-modifying code heavily relies on the instruction set encoding of the original CPU. For example, it could flip some bits in a specific memory location to turn one instruction into another. With static recompilation, flipping those same bits will have an entirely different effect since the instructions are encoded completely differently for the host CPU.

How to Decompile Bytenode "jsc" files?

I've just seen this library ByteNode it's the same as ByteCode of java but this is for NodeJS.
This library compiles your JavaScript code into V8 bytecode, which protect your source code, I'm wondering is there anyway to Decompile byteNode therefore it's not secure enough. I'm wondering because I would like to protect my source code using this library?
TL;DR It'll raise the bar to someone copying the code and trying to pass it off as their own. It won't prevent a dedicated person from doing so. But the primary way to protect your work isn't technical, it's legal.
This library compiles your JavaScript code into V8 bytecode, which protect your source code...
Well, we don't know it's V8 bytecode, but it's "compiled" in some sense. All we know is that it creates a "code cache" via the built-in vm.Script.prototype.createCachedData API, which is officially just a cache used to speed up recompiling the code a second time, third time, etc. In theory, you're supposed to also provide the original source code as a string to the vm.Script constructor. But if you go digging into Node.js's vm.Script and V8 far enough it seems to be the actual code in some compiled form (whether actual V8 bytecode or not), and the code string you give it when running is ignored. (The ByteNode library provides a dummy string when running the code from the code cache, so clearly the actual code isn't [always?] needed.)
I'm wondering is there anyway to Decompile byteNode therefore it's not secure enough.
Naturally, otherwise it would be useless because Node.js wouldn't be able to run it. I didn't find a tool to do it that already exists, but since V8 is open source, it would presumably be possible to find the necessary information to write a decompiler for it that outputs valid JavaScript source code which someone could then try to understand.
Experimenting with it, local variable names appear to be lost, although function names don't. Comments appear to get lost (this may not be as obvious as it seems, given that Function.prototype.toString is required to either return the original source text or a synthetic version [details]).
So if you run the code through a minifier (particularly one that renames functions), then run it through ByteNode (or just do it with vm.Script yourself, ByteNode is a fairly thin wrapper), it will be feasible for someone to decompile it into something resembling source code, but that source code will be very hard to understand. This is very similar to shipping Java class files, which can be decompiled (there's even a standard tool to do it in the JDK, javap), except that the format Java class files are well-documented and don't change from one dot release to the next (though they can change from one major release to another; new releases always support the older format, though), whereas the format of this data is not documented (though it's an open source project) and is subject to change from one dot release to the next.
Certain changes, such as changing the copyright message, are probably fairly easy to make to said source code. More meaningful changes will be harder.
Note that the code cache appears to have a checksum or other similar integrity mechanism, since directly editing the .jsc file to swap one letter for another in a literal string makes the code cache fail to load. So someone tampering with it (for instance, to change a copyright notice) would either need to go the decompilation/recompilation route, or dive into the V8 source to find out how to correct the integrity check.
Fundamentally, the way to protect your work is to ensure that you've put all the relevant notices in the relevant places such that the fact copying it is a violation of copyright is clear, then pursue your legal recourse should you find out about someone passing it off as their own.
is there any way
You could get a hundred answers here saying "I don't know a way", but that still won't guarantee that there isn't one.
not secure enough
Secure enough for what? What's your deployment scenario? What kind of scenario/attack are you trying to defend against?
FWIW, I don't know of an existing tool that "decompiles" V8 bytecode (i.e. produces JavaScript source code with the same behavior). That said, considering that the bytecode is a fairly straightforward translation of the source code, I'm sure it wouldn't be very hard to write such a tool, if someone had a reason to spend some time on it. After all, V8's JS-to-bytecode compiler is open source, so one would only have to look at those sources and implement the reverse direction. So I would assume that shipping as bytecode provides about as much "protection" as shipping as uglified JavaScript, i.e. none that I would trust.
Before you make any decisions, please also keep in mind that bytecode is considered an internal implementation detail of V8; in particular it is not versioned and can change at any time, so it has to be created by exactly the same V8 version that consumes it. If you want to update your Node.js you'll have to recreate all the bytecode, and there is no checking or warning in place that will point out when you forgot to do that.
Node.js source already contains code for decompiling binary bytecode.
You can get a text string from your V8 bytecode and then you would need to analyze it.
But text string would be very long and miss some important information such as a constant pool. So you need to modify the Node.js source.
Please check https://github.com/3DGISKing/pkg10.17.0
I have attached exported xml file.
If you study V8, it would be possible to analyze it and get source code from it.
It keeping it short and sweet, You can try Ghidra node.js package which is based on Ghidra reverse engineering framework which was open-sourced by NSA in the year 2019. Ghidra is capable of disassembling and decompiling the v8 bytecode. The inner working of disassembling is quite complex, this answer is short but sufficient.

Relation between MSVC Compiler & linker option for COMDAT folding

This question has some answers on SO but mine is slightly different. Before marking as duplicate, please give it a shot.
MSVC has always provided the /Gy compiler option to enable identical functions to be folded into COMDAT sections. At the same time, the linker also provides the /OPT:ICF option. Is my understanding right that these two options must be used in conjunction? That is, while the former packages functions into COMDAT, the latter eliminates redundant COMDATs. Is that correct?
If yes, then either we use both or turn off both?
Answer from someone who communicated with me off-line. Helped me understand these options a lot better.
===================================
That is essentially true. Suppose we talk just C, or C++ but with no member functions. Without /Gy, the compiler creates object files that are in some sense irreducible. If the linker wants just one function from the object, it gets them all. This is specially a consideration in programming for libraries, such that if you mean to be kind to the library's users, you should write your library as lots of small object files, typically one non-static function per object, so that the user of the library doesn't bloat from having to carry code that actually never executes.
With /Gy, the compiler creates object files that have COMDATs. Each function is in its own COMDAT, which is to some extent a mini-object. If the linker wants just one function from the object, it can pick out just that one. The linker's /OPT switch gives you some control over what the linker does with this selectivity - but without /Gy there's nothing to select.
Or very little. It's at least conceivable that the linker could, for instance, fold functions that are each the whole of the code in an object file and happen to have identical code. It's certainly conceivable that the linker could eliminate a whole object file that contains nothing that's referenced. After all, it does this with object files in libraries. The rule in practice, however, used to be that if you add a non-COMDAT object file to the linker's command line, then you're saying you want that in the binary even if unreferenced. The difference between what's conceivable and what's done is typically huge.
Best, then, to stick with the quick answer. The linker options benefit from being able to separate functions (and variables) from inside each object file, but the separation depends on the code and data to have been organised into COMDATs, which is the compiler's work.
===================================
As answered by Raymond Chen in Jan 2013
As explained in the documentation for /Gy, function-level linking
allows functions to be discardable during the "unused function" pass,
if you ask for it via /OPT:REF. It does not alter the actual classical
model for linking. The flag name is misleading. It's not "perform
function-level linking". It merely enables it by telling the linker
where functions begin and end. And it's not so much function-level
linking as it is function-level unlinking. -Raymond
(This snippet might make more sense with some further context:here are the posts about classical linking model:1, 2
So in a nutshell - yes. If you activate one switch without the other, there would be no observable impact.

How to obfuscate string of variable, function and package names in Golang binary?

When use command "nm go_binary", I find the names of variables, functions and packages and even the directory where my code is located are all displayed, is there any way to obfuscate the binary generated by the command "go build" and prevent go binary from being exploited by hackers?
Obfuscating can't stop reverse engineering but in a way prevent info leakage
That is what burrowers/garble (Go 1.16+, Feb. 2021):
Literal obfuscation
Using the -literals flag causes literal expressions such as strings to be replaced with more complex variants, resolving to the same value at run-time.
This feature is opt-in, as it can cause slow-downs depending on the input code.
Literal expressions used as constants cannot be obfuscated, since they are resolved at compile time. This includes any expressions part of a const declaration.
Tiny mode
When the -tiny flag is passed, extra information is stripped from the resulting Go binary.
This includes line numbers, filenames, and code in the runtime that prints panics, fatal errors, and trace/debug info.
All in all this can make binaries 2-5% smaller in our testing, as well as prevent extracting some more information.
With this flag, no panics or fatal runtime errors will ever be printed, but they can still be handled internally with recover as normal.
In addition, the GODEBUG environmental variable will be ignored.
But:
Exported methods are never obfuscated at the moment, since they could be required by interfaces and reflection. This area is a work in progress.
I think the best answer to this question is here How do I protect Python code?, specifically this answer.
While that question is about Python, it applies to all code in general.
I was gonna mark this question as a duplicate, but maybe someone will provide more insight into it.

How to inspect Haskell bytecode

I am trying to figure out a bug (a serious performance downgrade). Unfortunately, I wasn't able to figure out why by going back many different versions of my code.
I am suspecting it could be some modifications to libraries that I've updated, not to mention in the meanwhile I've updated to GHC 7.6 from 7.4 (and if anybody knows if some laziness behavior has changed I would greatly appreciate it!).
I have an older executable of this code that does not have this bug and thus I wonder if there are any tools to tell me the library versions I was linking to from before? Like if it can figure out the symbols, etc.
GHC creates executables, which are notoriously hard to understand... On my Linux box I can view the assembly code by typing in
objdump -d <executable filename>
but I get back over 100K lines of code from just a simple "Hello, World!" program written in Haskell.
If you happen to have the GHC .hi files, you can get some information about the executable by typing in
ghc --show-iface <hi filename>
This won't give you the assembly code, but you can get some extra information that may prove useful.
As I mentioned in the comment above, on Linux you can use "ldd" to see what C-system libraries you used in the compile, but that is also probably less than useful.
You can try to use a disassembler, but those are generally written to disassemble to C, not anything higher level and certainly not Haskell. That being said, GHC compiles to C as an intermediary (at least it used to; has that changed?), so you might be able to learn something.
Personally I often find view system calls in action much more interesting than viewing pure assembly. On my Linux box, I can view all system calls by running using strace (use Wireshark for the network traffic equivalent):
strace <program executable>
This also will generate a lot of data, so it might only be useful if you know of some specific place where direct real world communication (i.e., changes to a file on the hard disk drive) goes wrong.
In all honesty, you are probably better off just debugging the problem from source, although, depending on the actual problem, some of these techniques may help you pinpoint something.
Most of these tools have Mac and Windows equivalents.
Since much has changed in the last 9 years, and apparently this is still the first result a search engine gives on this question (like for me, again), an updated answer is in order:
First of all, yes, while Haskell does not specify a bytecode format, bytecode is also just a kind of machine code, for a virtual machine. So for the rest of the answer I will treat them as the same thing. The “Core“ as well as the LLVM intermediate language, or even WASM could be considered equivalent too.
Secondly, if your old binary is statically linked, then of course, no matter the format your program is in, no symbols will be available to check out. Because that is what linking does. Even with bytecode, and even with just classic static #include in simple languages. So your old binary will be no good, no matter what. And given the optimisations compilers do, a classic decompiler will very likely never be able to figure out what optimised bits used to be partially what libraries. Especially with stream fusion and such “magic”.
Third, you can do the things you asked with a modern Haskell program. But you need to have your binaries compiled with -dynamic and -rdynamic, So not only the C-calling-convention libraries (e.g. .so), and the Haskell libraries, but also the runtime itself is dynamically loaded. That way you end up with a very small binary, consisting of only your actual code, dynamic linking instructions, and the exact data about what libraries and runtime were used to build it. And since the runtime is compiler-dependent, you will know the compiler too. So it would give you everything you need, but only if you compiled it right. (I recommend using such dynamic linking by default in any case as it saves memory.)
The last factor that one might forget, is that even the exact same compiler version might behave vastly differently, depending on what IT was compiled with. (E.g. if somebody put a backdoor in the very first version of GHC, and all GHCs after that were compiled with that first GHC, and nobody ever checked, then that backdoor could still be in the code today, with no traces in any source or libraries whatsoever. … Or for a less extreme case, that version of GHC your old binary was built with might have been compiled with different architecture options, leading to it putting more optimised instructions into the binaries it compiles for unless told to cross-compile.)
Finally, of course, you can profile even compiled binaries, by profiling their system calls. This will give you clues about which part of the code acted differently and how. (E.g. if you notice that your new binary floods the system with some slow system calls where the old one just used a single fast one. A classic OpenGL example would be using fast display lists versus slow direct calls to draw triangles. Or using a different sorting algorithm, or having switched to a different kind of data structure that fits your work load badly and thrashes a lot of memory.)

Resources