Are there any context-sensitive code search tools? - search

I have been getting very frustrated recently in dealing with a massive bulk of legacy code which I am trying to get familiar with.
Say I try to search for a particular function call, I get loads of results that turn out to be completely irrelevant; some of them are easy to spot, eg a comment saying
// Fixed functionality in foo() so don't need to handle this here any more
But others are much harder to spot manually, because they turn out to be calls from other functions in modules that are only compiled in certain cases, or are part of a much larger block of code that is #if 0'd out in its entirety.
What I'd like would be a search tool that would allow me to search for a term and give me the choice to include or exclude commented out or #if 0'd out code. Then the search results would be displayed alongside a list of #defines that are required in order for that snippet of code to be relevant.
I'm working in C / C++, but other than the specific comment syntax I guess the techniques should be more generally applicable.
Does such a tool exist?

Not entirely what you're after, but I find this quite handy.
GrepWin - A free visual "grep" tool for searching files.
I find it quite helpful because:
Its a separate app (doesn't lock up my editor)
Handles Regular expressions
Its fast
Can specify what folder to search, and what filetypes (handles regex's here too)
Can limit by file size
Can include subdirs (or exclude by regex)
etc.

Almost any decent source browser will let you go to where a function is defined, and/or list all the calls of that function and take you directly to a call site. This will normally be based on a fairly complete parse of the source code so it will ignore comments, code that's excluded by the preprocessor, and so on (in fact, in at least one case, the parser used by the source browser is almost certainly better than the one used in the compiler itself).

Related

JOOQ Record toString print a summary instead of ASCII table?

Using JOOQ version 3.13.4
I don't like the ASCII table formatting that the Record.toString() method does.
It's particularly unhelpful in the IDEA debugger view:
But I also don't like dumping needless multi-line strings into my prod logs - which I usually avoid doing; partially for obvious speed/size reasons, but also because they're ugly and hard to read in most log viewer apps (CloudWatch etc.)
I found the Github issue that implemented the ASCII table: https://github.com/jOOQ/jOOQ/issues/1806
I know I can write code to customise this, but I'm wondering if there's a simple flag/config I'm missing to get a simple summary (if I ever got around to writing my own, I'll use JSON)?
The current formatted ASCII table is quite useful for most debugging purposes, which include simple println() calls, Eclipse's debugging view (see below), etc. This includes showing only the first 5 records in a Result for a quick overview, and Record classes work the same.
The feedback from the community has generally been good for these defaults, but if you want to challenge them, try your luck here, and see if your feature request gets community upvotes: https://github.com/jOOQ/jOOQ/issues/new/choose
Eclipse's variable view:
As you can see, with the detail view, it's much less of a problem in Eclipse. I've always found it surprising IntelliJ didn't copy this very useful feature from Eclipse. Maybe, request it?
I know I can write code to customise this
I'll document it here nonetheless, as future visitors will no doubt find this useful.
IntelliJ has a lot of "smartness" in its debugger's variables view, sometimes a bit excessive. For example, an org.jooq.Result just displays its size(), because a Result is a List. I guess it's smart to hedge against excessive efforts if JDK lists are too long, though jOOQ results can handle it, and otherwise it would be possible to display list.subList(0, Math.min(list.size(), 5)) + "...".
Luckily, you can easily specify your own type renderers in IntelliJ, for example this one:
It uses this code to render a jOOQ Record:
formatJSON(new JSONFormat()
.header(false)
.recordFormat(JSONFormat.RecordFormat.OBJECT)
)
The output might look like this:
You might even suggest this as a default to IntelliJ?
but I'm wondering if there's a simple flag/config I'm missing to get a simple summary (if I ever got around to writing my own, I'll use JSON)?
No, there isn't, and I doubt that kind of configurability is reasonable. People will start requesting 100s of flags to get their preferred way of debugging, when in fact, this can be very easily achieved in the IDE as shown above.
Again, you can try your luck with a change request to change it for everyone, but a configuration for this is unlikely to be implemented.

vim lookup struct elements/members

Does anyone know of a vim plugin that allows to look up struct members/elements of structures in C or classes in case of C++?
I'm thinking something similar to jumping to a definition of a struct using cscope/ctags, but ideally I would want something similar to moving a cursor over the desired variable and a keystroke will pull up a table akin to when you use omni-complete?
I've been trying to find something but to no avail.
The requirement is: should be exclusively for vim.
99% of my development is sshing to a remote linux machine.
Normally my workflow is, git clone the project, setup ctags and cscope, open up the desired file and load up my cscope db and that's where I stay for the majority of the day, moving around the directory I have the nerd-tree plugin.
So far I use a combination of ctags/cscope, and calltree plugins to look up function callees.
I'm missing a plugin that allows me to simply look up struct elements.
I don't really use omnicompletion because it is notoriously slow and I've given up to make it faster.
Any ideas?
With universal-ctags, given you've used the right options, you can always obtain the members of a class/struct type.
If you read in between the lines, this means there is no efficient way to do it on a C++ variable if you stick to vimscript. I've tried do it, but I've given up on this path eventually ; path that has no way to support auto.
Now, I've started using (/implementing) another approach to analyse C++ code and exploit the information from vim: I rely on libclang. The catch is that it needs to operate on the current translation unit (TU), and if the TU is long (e.g if it includes many long files), it's takes time to parse it for the first time since the last time it has changed (I expect the solution to change with C++20 modules) -- the analysing is done on demand, and not in the background. Note: my plugin is currently under development and I haven't yet provided an high level Vim function that returns the type of a variable and its associated information (type, members, possible enum values...)
If your ultimate objective is completion of members, clang based tools provide a dedicated API for this purpose that'll be more efficient than completely analysing the current TU. The plugins you don't wish to use exploit this API. LSP servers even try to cache as much information as possible (for navigation and completion purpose only).
Note that there exist a few plugins, like tagbar, that tries to organize information extracted with ctags and present it in a hierarchic view. Note that it won't help with disambiguation nor with auto.

How to Decompile Bytenode "jsc" files?

I've just seen this library ByteNode it's the same as ByteCode of java but this is for NodeJS.
This library compiles your JavaScript code into V8 bytecode, which protect your source code, I'm wondering is there anyway to Decompile byteNode therefore it's not secure enough. I'm wondering because I would like to protect my source code using this library?
TL;DR It'll raise the bar to someone copying the code and trying to pass it off as their own. It won't prevent a dedicated person from doing so. But the primary way to protect your work isn't technical, it's legal.
This library compiles your JavaScript code into V8 bytecode, which protect your source code...
Well, we don't know it's V8 bytecode, but it's "compiled" in some sense. All we know is that it creates a "code cache" via the built-in vm.Script.prototype.createCachedData API, which is officially just a cache used to speed up recompiling the code a second time, third time, etc. In theory, you're supposed to also provide the original source code as a string to the vm.Script constructor. But if you go digging into Node.js's vm.Script and V8 far enough it seems to be the actual code in some compiled form (whether actual V8 bytecode or not), and the code string you give it when running is ignored. (The ByteNode library provides a dummy string when running the code from the code cache, so clearly the actual code isn't [always?] needed.)
I'm wondering is there anyway to Decompile byteNode therefore it's not secure enough.
Naturally, otherwise it would be useless because Node.js wouldn't be able to run it. I didn't find a tool to do it that already exists, but since V8 is open source, it would presumably be possible to find the necessary information to write a decompiler for it that outputs valid JavaScript source code which someone could then try to understand.
Experimenting with it, local variable names appear to be lost, although function names don't. Comments appear to get lost (this may not be as obvious as it seems, given that Function.prototype.toString is required to either return the original source text or a synthetic version [details]).
So if you run the code through a minifier (particularly one that renames functions), then run it through ByteNode (or just do it with vm.Script yourself, ByteNode is a fairly thin wrapper), it will be feasible for someone to decompile it into something resembling source code, but that source code will be very hard to understand. This is very similar to shipping Java class files, which can be decompiled (there's even a standard tool to do it in the JDK, javap), except that the format Java class files are well-documented and don't change from one dot release to the next (though they can change from one major release to another; new releases always support the older format, though), whereas the format of this data is not documented (though it's an open source project) and is subject to change from one dot release to the next.
Certain changes, such as changing the copyright message, are probably fairly easy to make to said source code. More meaningful changes will be harder.
Note that the code cache appears to have a checksum or other similar integrity mechanism, since directly editing the .jsc file to swap one letter for another in a literal string makes the code cache fail to load. So someone tampering with it (for instance, to change a copyright notice) would either need to go the decompilation/recompilation route, or dive into the V8 source to find out how to correct the integrity check.
Fundamentally, the way to protect your work is to ensure that you've put all the relevant notices in the relevant places such that the fact copying it is a violation of copyright is clear, then pursue your legal recourse should you find out about someone passing it off as their own.
is there any way
You could get a hundred answers here saying "I don't know a way", but that still won't guarantee that there isn't one.
not secure enough
Secure enough for what? What's your deployment scenario? What kind of scenario/attack are you trying to defend against?
FWIW, I don't know of an existing tool that "decompiles" V8 bytecode (i.e. produces JavaScript source code with the same behavior). That said, considering that the bytecode is a fairly straightforward translation of the source code, I'm sure it wouldn't be very hard to write such a tool, if someone had a reason to spend some time on it. After all, V8's JS-to-bytecode compiler is open source, so one would only have to look at those sources and implement the reverse direction. So I would assume that shipping as bytecode provides about as much "protection" as shipping as uglified JavaScript, i.e. none that I would trust.
Before you make any decisions, please also keep in mind that bytecode is considered an internal implementation detail of V8; in particular it is not versioned and can change at any time, so it has to be created by exactly the same V8 version that consumes it. If you want to update your Node.js you'll have to recreate all the bytecode, and there is no checking or warning in place that will point out when you forgot to do that.
Node.js source already contains code for decompiling binary bytecode.
You can get a text string from your V8 bytecode and then you would need to analyze it.
But text string would be very long and miss some important information such as a constant pool. So you need to modify the Node.js source.
Please check https://github.com/3DGISKing/pkg10.17.0
I have attached exported xml file.
If you study V8, it would be possible to analyze it and get source code from it.
It keeping it short and sweet, You can try Ghidra node.js package which is based on Ghidra reverse engineering framework which was open-sourced by NSA in the year 2019. Ghidra is capable of disassembling and decompiling the v8 bytecode. The inner working of disassembling is quite complex, this answer is short but sufficient.

Preprocess only local #includes into single file?

I understand VC++ will let you emit C++ source files which are the result of preprocessor operations e.g. macros are expanded and includes "copy-pasted in line".
Is it possible to restrict this simply to embed included files, which are files in my own project rather than standard libraries?
From the outside there's no way you can tell from which syntax form (<> or "") the content is being preprocessed. Unless a kind of API was exposed by the preprocessor, which is not the case here.
A not so elegant (and not strictly correct) solution I could propose would be to index a preprocessed version of all Standard headers (there are not that many) and after preprocessing the source of interest you could run a string matching script to detect the known files and remove the corresponding content from the final output.
Notice this is subjected to flaws because the #include system is purely textual and influenced by whatever macros are (un)defined at the time of inclusion and order matters. But depending on the complexity of the code you're working on this might give reasonable results.
By the way, may I ask what is the ultimate goal of your task?
Edit: Or actually... Maybe it's possible that you filter the sources before-hand to remove the undesired #includes and then submit it to preprocessing?

Translating code comments written in another spoken language

I've just inherited some C code from a German programmer, and all of the comments are, naturally, in German. As I've forgotten most of my high school German, this is a slight problem.
Does anyone know of any translation tools that are code-aware; meaning it will only translate language within comments? The project has many files, being able to operate on all of them at once would also be fantastic.
I'm currently copying-and-pasting into Google Translate, and while this is less than ideal, it can at least get me some answers.
I would only know exactly how to do this in java, but I am sure there is a way to do this in C as well, as the tools exist:
Grab a parser that understands C source files (this one sounds ok, but I don't know much about C)
build a syntax tree. iterate over all nodes of the tree, replacing the text of all comment nodes with translated text.
write the tree back to a new source file (perhaps in a different directory).
Very broadly, this should be possible to do using Google translation's Ajax API and a regex function that can deal with callbacks - I don't think JS's built-in regex functions are up to the task but I'm sure there are libraries out there. You would have to build a regular expression that can isolate the comments, send each chunk to the API, and return the translated result in the callback function.

Resources