Adding a new instruction to QEMU - emulation

I'm a little confused going about adding a new instruction to QEMU and want to confirm if my understanding is right. After going through the source code, I think adding an instruction to QEMU involves the following steps:
Define a helper function of the format CHERI_HELPER_IMPL(*instruction* in \target\target_arch\op_helper.c that emulates this instruction.
Define generate_*instruction* in \target\target_arch\translate.c that calls gen_helper_*instruction* which calls the helper function.
Am I missing any steps?

The fact that you mention a "CHERI_HELPER_IMPL" macro tells me that you're not working with upstream QEMU, but with the CHERI project's fork of it. So you should talk to them about anything special that might be needed there. As I understand it their local modifications may be quite significant.
For upstream QEMU, this depends on whether the target architecture is using decodetree or not.
For decodetree-based architectures:
add a suitable instruction pattern or patterns to the .decode file. This will result in the generation of code which calls a function whose name begins trans_ to handle instructions that match that pattern, passing it a pointer to a structure which contains the values of the various instruction fields defined by your pattern.
implement the trans_ functions appropriately. What you need to do depends on what the instruction behaviour is. For simple instructions, you can just emit TCG ops which do the actions the instruction must do. For more complicated work, you might want to emit TCG ops for "call a runtime helper function". The tcg/README file has some "recommended coding rules" at the bottom which include a rule of thumb for when to use a helper function.
if you decided to emit a helper call, you need to implement the helper function. The DEF_HELPER_* macros in helper.h both define the prototype for the C function you're going to write and also auto-generate a function gen_helper_whatever that your translate-time code can call to generate the TCG code to call it.
For non-decodetree-based architectures:
There will be hand-written code, usually starting in translate.c, which identifies instructions using switch statements and bit-masking code. You'll need to look at that code to find out where in that to add the code which identifies the instruction that you're adding. This is all completely target-specific; some targets use a somewhat table-driven setup or some preprocessor macros as part of this, some use completely hand-written code.
Once you've figured out where to add the "is this my instruction type?" check, the rest is similar to decodetree-based targets: you need to emit TCG ops to either do the work or to call a helper to do the work at runtime.
You'll find that there's a lot of specific detail that needs to be got right in each of these steps, but that's the basic outline.

Related

Alternative to ctor/inventory for when compiling to wasm?

The ctor crate doesn't support web assembly currently, although there is active discussion about how to fix this.
Although I am well aware of the issues associated with static initialization coming from C++, being able to register things with a factory at startup is a very handy capability, and is necessary to avoid violating the DRY principle in a lot of cases. Without it every time you want to add a new possibility to your factory you have to add a separate line of code to main(), possibly more lines if you need to import your new function. This quickly gets tedious.
I am wondering if it is possible to build something equivalent (at least when everything you want to register is concentrated inside a single crate) using a procedural attribute macro and build.rs. The macro would be used to mark the functions that I want to register, and it's implementation would save the module paths ("crate::your::registered_function") of those functions off to the side somewhere in a file but otherwise just be a passthrough. Then build.rs would generate a function that calls all the functions listed in the file, and I would have main() call that one function by hand.
Is there a different trick that would work instead?
Does an implementation of this already exist somewhere I could use as a reference?
How would the procedural macro actually generate the module path for the function to be called? There is module_path! but if invoked from inside the definition of the procedural macro it will give the module path for the macro, not the module path associated with what the TokenStream is going to expand into. The macro could generate a call to module_path! but that won't evaluate until later when the final program is run.

Relation between MSVC Compiler & linker option for COMDAT folding

This question has some answers on SO but mine is slightly different. Before marking as duplicate, please give it a shot.
MSVC has always provided the /Gy compiler option to enable identical functions to be folded into COMDAT sections. At the same time, the linker also provides the /OPT:ICF option. Is my understanding right that these two options must be used in conjunction? That is, while the former packages functions into COMDAT, the latter eliminates redundant COMDATs. Is that correct?
If yes, then either we use both or turn off both?
Answer from someone who communicated with me off-line. Helped me understand these options a lot better.
===================================
That is essentially true. Suppose we talk just C, or C++ but with no member functions. Without /Gy, the compiler creates object files that are in some sense irreducible. If the linker wants just one function from the object, it gets them all. This is specially a consideration in programming for libraries, such that if you mean to be kind to the library's users, you should write your library as lots of small object files, typically one non-static function per object, so that the user of the library doesn't bloat from having to carry code that actually never executes.
With /Gy, the compiler creates object files that have COMDATs. Each function is in its own COMDAT, which is to some extent a mini-object. If the linker wants just one function from the object, it can pick out just that one. The linker's /OPT switch gives you some control over what the linker does with this selectivity - but without /Gy there's nothing to select.
Or very little. It's at least conceivable that the linker could, for instance, fold functions that are each the whole of the code in an object file and happen to have identical code. It's certainly conceivable that the linker could eliminate a whole object file that contains nothing that's referenced. After all, it does this with object files in libraries. The rule in practice, however, used to be that if you add a non-COMDAT object file to the linker's command line, then you're saying you want that in the binary even if unreferenced. The difference between what's conceivable and what's done is typically huge.
Best, then, to stick with the quick answer. The linker options benefit from being able to separate functions (and variables) from inside each object file, but the separation depends on the code and data to have been organised into COMDATs, which is the compiler's work.
===================================
As answered by Raymond Chen in Jan 2013
As explained in the documentation for /Gy, function-level linking
allows functions to be discardable during the "unused function" pass,
if you ask for it via /OPT:REF. It does not alter the actual classical
model for linking. The flag name is misleading. It's not "perform
function-level linking". It merely enables it by telling the linker
where functions begin and end. And it's not so much function-level
linking as it is function-level unlinking. -Raymond
(This snippet might make more sense with some further context:here are the posts about classical linking model:1, 2
So in a nutshell - yes. If you activate one switch without the other, there would be no observable impact.

Can I use the Rust lexer or parser to retrieve a list of functions within a Rust file?

The lexer/parser file located here is quite large and I'm not sure if it is suitable for just retrieving a list of Rust functions. Perhaps writing my own/using another library would be a better route to take?
The end objective would be to create a kind of execution manager. To contextualise, it would be able to read a list of function calls wrapped in a function. The function calls that are within the function will then be able to be re/ordered from some web interface. Thought it might be nice to manage larger applications this way.
No. I mean, not really. Whether you write your own parser or re-use syntex, you're going to hit a fundamental limitation: macros.
So let's say you go all-out and expand macro_rules!-based macros, including the ones defined in external crates (which means you'll also need to extract rustc's crate metadata loading... which isn't stable). What about procedural macros and custom derive attributes? Those are defined in code and depend on compiler-internal interfaces to function.
The only way this is likely to ever work correctly is if you build on top of the compiler, or duplicate a huge amount of work (which also involves unstable binary interfaces).
You could use syntex to parse the Rust code in a build script.

Issues with using test_and_set_bit function in linux

I am trying to implement a spin lock using the test_and_set_bit function. I found a bitops.h file which consisted of this function. However, in my current kernel version which is 3.0, the function is not included in that header file i.e, bitops.h. Any anyone provide some references where I can find that?
Not sure if I totally understand your question, but including <linux/bitops.h> should bring in the definition of test_and_set_bit(). The actual definition of the function is not in include/linux/bitops.h but it is picked up via the include of <asm/bitops.h> that is in the linux/ version of the include.
So to see the actual definition of test_and_set_bit() you can look in arch/arm/include/asm/bitops.h or arch/x86/include/asm/bitops.h (or whatever other architecture you're interested in).
By the way, there's no reason to need to implement your own spinlock -- the kernel has (of course) the standard spinlock_t and also functions like bit_spin_lock() that use a single bit as a lock.

Is there a way to convert from a string to pure code in C++?

I know that its possible to read from a .txt file and then convert various parts of that into string, char, and int values, but is it possible to take a string and use it as real code in the program?
Code:
string codeblock1="cout<<This is a test;";
string codeblock2="int array[5]={0,6,6,3,5};}";
int i;
cin>>i;
if(i)
{
execute(codeblock1);
}
else
{
execute(codeblock2);
}
Where execute is a function that converts from text to actual code (I don't know if there actually is a function called execute, I'm using it for the purpose of my example).
In C++ there's no simple way to do this. This feature is available in higher-level languages like Python, Lisp, Ruby and Perl (usually with some variation of an eval function). However, even in these languages this practice is frowned upon, because it can result in very unreadable code.
It's important you ask yourself (and perhaps tell us) why you want to do it?
Or do you only want to know if it's possible? If so, it is, though in a hairy way. You can write a C++ source file (generate whatever you want into it, as long as it's valid C++), then compile it and link to your code. All of this can be done automatically, of course, as long as a compiler is available to you in runtime (and you just execute it with system). I know someone who did this for some heavy optimization once. It's not pretty, but can be made to work.
You can create a function and parse whatever strings you like and create a data structure from it. This is known as a parse tree. Subsequently you can examine your parse tree and generate the necessary dynamic structures to perform the logic therin. The parse tree is subsequently converted into a runtime representation that is executed.
All compilers do exactly this. They take your code and they produce machine code based on this. In your particular case you want a language to write code for itself. Normally this is done in the context of a code generator and it is part of a larger build process. If you write a program to parse your language (consider flex and bison for this operation) that generates code you can achieve the results you desire.
Many scripting languages offer this sort of feature, going all the way back to eval in LISP - but C and C++ don't expose the compiler at runtime.
There's nothing in the spec that stops you from creating and executing some arbitrary machine language, like so:
char code[] = { 0x2f, 0x3c, 0x17, 0x43 }; // some machine code of some sort
typedef void (FuncType*)(); // define a function pointer type
FuncType func = (FuncType)code; // take the address of the code
func(); // and jump to it!
but most environments will crash if you try this, for security reasons. (Many viruses work by convincing ordinary programs to do something like this.)
In a normal environment, one thing you could do is create a complete program as text, then invoke the compiler to compile it and invoke the resulting executable.
If you want to run code in your own memory space, you could invoke the compiler to build you a DLL (or .so, depending on your platform) and then link in the DLL and jump into it.
First, I wanted to say, that I never implemented something like that myself and I may be way off, however, did you try CodeDomProvider class in System.CodeDom.Compiler namespace? I have a feeling the classes in System.CodeDom can provide you with the functionality you are looking for.
Of course, it will all be .NET code, not any other platform
Go here for sample
Yes, you just have to build a compiler (and possibly a linker) and you're there.
Several languages such as Python can be embedded into C/C++ so that may be an option.
It's kind of sort of possible, but not with just straight C/C++. You'll need some layer underneath such as LLVM.
Check out c-repl and ccons
One way that you could do this is with Boost Python. You wouldn't be using C++ at that point, but it's a good way of allowing the user to use a scripting language to interact with the existing program. I know it's not exactly what you want, but perhaps it might help.
Sounds like you're trying to create "C++Script", which doesn't exist as far as I know. C++ is a compiled language. This means it always must be compiled to native bytecode before being executed. You could wrap the code as a function, run it through a compiler, then execute the resulting DLL dynamically, but you're not going to get access to anything a compiled DLL wouldn't normally get.
You'd be better off trying to do this in Java, JavaScript, VBScript, or .NET, which are at one stage or another interpreted languages. Most of these languages either have an eval or execute function for just that, or can just be included as text.
Of course executing blocks of code isn't the safest idea - it will leave you vulnerable to all kinds of data execution attacks.
My recommendation would be to create a scripting language that serves the purposes of your application. This would give the user a limited set of instructions for security reasons, and allow you to interact with the existing program much more dynamically than a compiled external block.
Not easily, because C++ is a compiled language. Several people have pointed round-about ways to make it work - either execute the compiler, or incorporate a compiler or interpreter into your program. If you want to go the interpreter route, you can save yourself a lot of work by using an existing open source project, such as Lua

Resources