How to obfuscate string of variable, function and package names in Golang binary? - security

When use command "nm go_binary", I find the names of variables, functions and packages and even the directory where my code is located are all displayed, is there any way to obfuscate the binary generated by the command "go build" and prevent go binary from being exploited by hackers?

Obfuscating can't stop reverse engineering but in a way prevent info leakage
That is what burrowers/garble (Go 1.16+, Feb. 2021):
Literal obfuscation
Using the -literals flag causes literal expressions such as strings to be replaced with more complex variants, resolving to the same value at run-time.
This feature is opt-in, as it can cause slow-downs depending on the input code.
Literal expressions used as constants cannot be obfuscated, since they are resolved at compile time. This includes any expressions part of a const declaration.
Tiny mode
When the -tiny flag is passed, extra information is stripped from the resulting Go binary.
This includes line numbers, filenames, and code in the runtime that prints panics, fatal errors, and trace/debug info.
All in all this can make binaries 2-5% smaller in our testing, as well as prevent extracting some more information.
With this flag, no panics or fatal runtime errors will ever be printed, but they can still be handled internally with recover as normal.
In addition, the GODEBUG environmental variable will be ignored.
But:
Exported methods are never obfuscated at the moment, since they could be required by interfaces and reflection. This area is a work in progress.

I think the best answer to this question is here How do I protect Python code?, specifically this answer.
While that question is about Python, it applies to all code in general.
I was gonna mark this question as a duplicate, but maybe someone will provide more insight into it.

Related

Compile procedural macro code or stack values into executable?

I'm writing procedural macros. Let's say I have a macro to replace every instance of my password in my code with its hash. I don't want it to be included in the executable or in the memory in run time.
Are the code and stack used at compile time added into the binary file? I'm pretty sure they're not but I couldn't find an explicit answer.
The general question is "Is there a possibility that memory from the compilation environment ends up in the final executable?".
The practical answer is No, especially related to the compiler's stack frames or memory that was used in a proc-macro (which is run in a separate process).
The answer, as far as security is concerned, is definitely Yes. For one, the compiler makes no guarantees that it does not include part of its general environment in the final executable and, for example, people have been surprised that their username gets included in the executable because a dependency generates error messages that refer to the source file where the error originated (error at /home/myusername/foo/src/...). Second, the compiler can't control what the operating system is doing, so part of it's own behavior (again, with respect to security guarantees) is implementation-defined there.
So, if you absolutely must guarantee that your token is not included in the final executable, hash it before even calling the compiler process in the first place.

Create a C and C++ preprocessor using ANTLR

I want to create a tool that can analyze C and C++ code and detect unwanted behaviors, based on a config file. I thought about using ANTLR for this task, as I already created a simple compiler with it from scratch a few years ago (variables, condition, loops, and functions).
I grabbed C.g4 and CPP14.g4 from ANTLR grammars repository. However, I came to notice that they don't support the pre-processing parsing, as that's a different step in the compilation.
I tried to find a grammar that does the pre-processing part (updated to ANTLR4) with no luck. Moreover, I also understood that if I'll go with two-steps parsing I won't be able to retain the original locations of each character, as I'd already modified the input stream.
I wonder if there's a good ANTLR grammar or program (preferably Python, but can deal with other languages as well) that can help me to pre-process the C code. I also thought about using gcc -E, but then I won't be able to inspect the macro definitions (for example, I want to warn if a user used a #pragma GCC (some students at my university, for which I write this program to, used this to bypass some of the course coding style restrictions). Moreover, gcc -E will include library header contents, which I don't want to process.
My question is, therefore, if you can recommend me a grammar/program that I can use to pre-process C and C++ code. Alternatively, if you can guide me on how to create a grammar myself that'd be perfect. I was able to write the basic #define, #pragma etc. processings, but I'm unable to deal with conditions and with macro functions, as I'm unsure how to deal with them.
Thanks in advance!
This question is almost off-topic as it asks for an external resource. However, it also bears a part that deserves some attention.
The term "preprocessor" already indicates what the handling of macros etc. is about. The parser never sees the disabled parts of the input, which also means it can be anything, which might not be part of the actual language to parse. Hence a good approach for parsing C-like languages is to send the input through a preprocessor (which can be a specialized input stream) to strip out all preprocessing constructs, to resolve macros and remove disabled text. The parse position is not a problem, because you can push the current token position before you open a new input stream and restore that when you are done with it. Store reported errors together with your input stream stack. This way you keep the correct token positions. I have used exactly this approach in my Windows resource file parser.

How to restore an accidentally overwritten source file using the object file

By mistake, I erased contents of my Fortran source file with a command involving ">":
some command > file.f
I do not use version control or anything. However, there is an object file present, file.o, if that may be of any help.
Is there a chance to restore the contents of file.f?
There may be decompiler tools that can produce Fortran source code from compiled object code, but it's not the original source code: things like comments and local variable names are discarded during the compilation process and are not present in the object file, so they can't be recovered. The structure of the decompiled code is likely to be different as well, especially if the object file was compiled with optimization.
You're not going to get your original code back from an object file, unfortunately.
Decompilation will work fine with bytecode languages like Java which are more or less "designed for that purpose".
With an optimizing compiler, such as Fortran (or C, or C++) you are pretty much out of luck.
Tools exist that restore some kind of source file (such as "boomerang"), but it will be nowhere near the original, and usually it is a waste of time even trying.
Given the nature of the compilation process, it is often not even possible to reverse the operation. Not only is information such as variable names or the names of non-exported functions (and of course comments) discarded and constants are replaced with magic numbers, but also the compiled program may have an entirely different structure than the code that you have written.
Compilers regularly perform optimizations like moving invariants out of loops, rearranging statements, or eleminating common subexpressions (even when optimizations are not explicitly enabled, most compilers do trivial optimizations anyway).
A compiler is required to produce something that behaves "as if" as observed from the outside, but not something that is necessarily equivalent to the source code that you have written.
A similar phenomenon exists when stepping through a program in a debugger. Sometimes, variables cannot be watched, or you cannot break on a particular line, and entire statements will apparently just be "gone" much to the surprise of the unaware developer because the compiler optimized them out.
In summary, the single best advice that I can give, unhelpful as it may be, is to acknowledge that you have done something stupid, rewrite the source file from scratch, and start using a version control system now.

Why I can see class/struct names in .exe file compiled using Visual C++?

When looking at a compiled relese .exe file binary I can find class/struct names in it! Which is odd - obviously there is no need in these symbols. What concerns me is such symbols can be used for reverse-engeniging my software, imposing a big risk to software license protection.
For example, I can find text .?AVCMySecureKeyManager (original class name is CMySecureKeyManager, it looks like to all names is added prefix ".?AV"), easy to guess what my code is doing, right?.. Looks like an open door for hackers.
Particularly, I can tell that I've enabled all possible optimizations the Visual C++ compiler/linker options, turned off all Browse/Debug Info generation, perhaps I'm missing something?
You're seeing RTTI (Run-time Type Information). If you don't use dynamic_cast or typeid in your code, you can usually turn it off safely. Please note that exceptions always use RTTI (for the catch statement matching) and it's not possible to disable it for them.
If you do need dynamic_cast, then you can scrub the names from the EXE after compilation. The code does not depend on the actual name strings, but just their addresses.
That said, the class names, while useful, are not critical in reverse-engineering. Don't rely on their absence as a guarantee.

Can a LabVIEW VI tell whether one of its output terminals is wired?

In LabVIEW, is it possible to tell from within a VI whether an output terminal is wired in the calling VI? Obviously, this would depend on the calling VI, but perhaps there is some way to find the answer for the current invocation of a VI.
In C terms, this would be like defining a function that takes arguments which are pointers to where to store output parameters, but will accept NULL if the caller is not interested in that parameter.
As it was said you can't do this in the natural way, but there's a workaround using data value references (requires LV 2009). It is the same idea of giving a NULL pointer to an output argument. The result is given in input as a data value reference (which is the pointer), and checked for Not a Reference by the SubVI. If it is null, do nothing.
Here is the SubVI (case true does nothing of course):
And here is the calling VI:
Images are VI snippets so you can drag and drop on a diagram to get the code.
I'd suggest you're going about this the wrong way. If the compiler is not smart enough to avoid the calculation on its own, make two versions of this VI. One that does the expensive calculation, one that does not. Then make a polymorphic VI that will allow you to switch between them. You already know at design time which version you want (because you're either wiring the output terminal or not), so just use the correct version of the polymorphic VI.
Alternatively, pass in a variable that switches on or off a Case statement for the expensive section of your calculation.
Like Underflow said, the basic answer is no.
You can have a look here to get the what is probably the most official and detailed answer which will ever be provided by NI.
Extending your analogy, you can do this in LV, except LV doesn't have the concept of null that C does. You can see an example of this here.
Note that the code in the link Underflow provided will not work in an executable, because the diagrams are stripped by default when building an EXE and because the RTE does not support some of properties and methods used there.
Sorry, I see I misunderstood the question. I thought you were asking about an input, so the idea I suggested does not apply. The restrictions I pointed do apply, though.
Why do you want to do this? There might be another solution.
Generally, no.
It is possible to do a static analysis on the code using the "scripting" features. This would require pulling the calling hierarchy, and tracking the wire references.
Pulling together a trial of this, there are some difficulties. Multiple identical sub-vi's on the same diagram are difficult to distinguish. Also, terminal references appear to be accessible mostly by name, which can lead to some collisions with identically named terminals of other vi's.
NI has done a bit of work on a variation of this problem; check out this.
In general, the LV compiler optimizes the machine code in such a way that unused code is not even built into the executable.
This does not apply to subVIs (because there's no way of knowing that you won't try to use the value of the indicators somehow, although LV could do it if it removes the FP when building an executable, and possibly does), but there is one way you can get it to apply to a subVI - inline the subVI, which should allow the compiler to see the outputs aren't used. You can also set its priority to subroutine, which will possibly also do this, but I wouldn't recommend that.
Officially, in-lining is only available in LV 2010, but there are ways of accessing the private VI property in older versions. I wouldn't recommend it, though, and it's likely that 2010 has some optimizations in this area that older versions did not.
P.S. In general, the details of the compiling process are not exposed and vary between LV versions as NI tweaks the compiler. The whole process is supposed to have been given a major upgrade in LV 2010 and there should be a webcast on NI's site with some of the details.

Resources