Static data definition as language feature - programming-languages

I am writing a program in C++ in what I use some constant data. I build that data procedurally at startup, and never again change it. I know that ideally that data should be static data at the binaries, but that is not the case since I have to build them at first.
In C++ I could define static consts members in classes with that data or simply global consts. By doing this I would have static data in the binaries, but then I would not be able to program their contents. I would have to build them elsewhere and paste the result in the code. In my case it would be a bunch of binary data uglily and nonsenselesly encoded inside the source file.
So I started to wonder, is there any language that support such feature that enables me to define my static data procedurally, but that resolves it at compile time and embed inside the binaries? Could any kind of optimization handle these cases? If the procedure/function that generates the data takes no external parameters and has predictable results, it could be safely optimized with the results by the compiler. Does compilers take that path? Any languages you know does treat explicitly this matter?
I know of C preprocessor, but it is really not Turing-compatible, and its syntax is not as attractive as it would be a function modifier that tells it should be resolved at compile time.

Build a program to generate the data (you already have that), have it's output be in C++, add the generation to your Makefile, and use the generated data with #include.

C++ templates are Turing-complete, and are often used to compute compile-time constants.

Related

What exactly is a "library" in a crate?

I'm a little confused about the concept "library" in rust, which is mentioned from "A crate is a binary or library".
If I'm right, a binary means an executable program (which can be run from shell, for example), but what is a library?
Are they some sort of object files with symbols like .a or .so, which will be linked to my program (like C/C++)
Or they are pure source codes which will be compiled together with my program?
As described by Masklinn, yes, Rust does have prebuilt library formats. However, these are mostly used internally, are finnicky for different compiler versions, and cargo still lacks support for them. In fact, crates.io requires libraries to be "open-source" (as in, you provide the source code, you could still have the source code load from some closed-source dependency), and it distributes the source code to whoever downloads the crate. Then, the source code is effectively compiled with your program (this is where rlibs come in to play, but cargo doesn't expose this to the user). This is also why you're able to inspect the source code for pretty much every crate.
If I'm right, a binary means an executable program (which can be run from shell, for example), but what is a library?
Yes. Specifically, per the Linkage documentation
A runnable executable will be produced. This requires that there is a main function in the crate which will be run when the program begins executing. This will link in all Rust and native dependencies, producing a single distributable binary. This is the default crate type.
Are they some sort of object files with symbols like .a or .so, which will be linked to my program (like C/C++)
Or they are pure source codes which will be compiled together with my program?
Never strictly the latter, but the exact artefact depends, as per the linkage documentation:
A Rust library will be produced. This is an ambiguous concept as to what exactly is produced because a library can manifest itself in several forms. The purpose of this generic lib option is to generate the "compiler recommended" style of library. The output library will always be usable by rustc, but the actual type of library may change from time-to-time.
The documentation then lists the various types of libraries:
rlib, a static library with rust-specific metadata (an augmented .a)
dylib, a dynamic library with rust-specific metadata (an augmented .so)
staticlib, a system static library (an actual .a)
cdylib, a system dynamic library (an actual .so)
I would think "lib" aliases to "rlib" but frankly I have no idea, and as the quote notes that's neither fixed nor documented by design.

What is the difference between dynamically and statically generated grpc code?

In the examples of the GRPC client there are two types of implementation, one where the .proto files are loaded and processed at runtime, and one where they are compiled using protoc.
My question is: what is the difference? The docs say nothing more than 'they behave identically', but surely there has to be a difference right?
Fundamentally, the primary difference is the one you mentioned: with the dynamic code generation, the .proto file is loaded and parsed at run time, and with static code generation, the .proto file is preprocessed into JavaScript.
The dynamic code generation is simpler to use, potentially easier to debug, and generates code that accepts regular JavaScript objects.
The static code generation (using protoc) requires the user to create protobuf objects, which means that input validation will be done earlier. It is also a workflow that is more consistent with other languages.

template instantiation statistics from compilers

Is there a way to get a summary of the instantiated templates (with what types and how many times - like a histogram) within a translation unit or for the whole project (shared object/executable)?
If I have a large codebase and I want to take advantage of the C++11 extern keyword I would like to know which templates are most used within my project (or from the internals of stl - like std::less<MyString> for example).
Also is it possible to have a weight assigned to each template instantiation (time spent by the compiler)?
Even if only one (c++11 enabled) compiler gives me such statistics I would be happy.
How difficult would it be to implement such a thing with Clang's LibTooling?
And is this even reasonable? Many people told me that I can reason which template instantiations I should extern without the use of a tool...
There are several ways to attack this problem.
If you are working with an open-source compiler, it's not hard to make a simple change to the source code that will trace all template substantiations.
If that sounds like too much hassle, you can also try to force the compiler to produce a warning on each template instantiation for a given symbol. Steven Watanabe has written a set of tools that can help you with that.
Finally, possibly the best options is to use the debugging symbols (or map files), generated by the compiler, to track down how many times each function appears in the final image and more importantly how much does it add to the weight in bytes. The best example for such a tool is Andrian Stone's SymbolSort, which is based on the Microsoft's toolset. Another similar tool is the Map File Browser.

Interpreted standard library

It's common for a programming language to come with a standard library implemented at least partly in the language itself.
In the case of an interpreted language, the obvious implementation is to read the library source files when the interpreter starts up, but this runs into the messy but persistent problem of making sure the interpreter knows where to find those files even when both are moved around. It would be cleaner if they could be embedded in the interpreter itself, so there is just a single executable.
I can see a simple way to do this by just translating the library source files to C literal strings, but I'm curious as to whether there are any pitfalls I'm overlooking or refinements to the method.
So my question is, what existing interpreted languages attach library source files in the language itself, to the interpreter?
Bytecode virtual machines often provide an answer to this: store the bytecode in files (*.pyc, *.rbc) and load the bytecoded versions of the libraries using a simpler mechanism.
Smalltalks do this by dumping the standard heap into a separate file called an "image".
As for single-file distribution, append the library file(s) to the end of the executable file, and include special logic for the interpreter to read from its binary and find a structure of those interpretable program data, or alternatively build the interpreter with a static inclusion of the program data.

Is there a way to convert from a string to pure code in C++?

I know that its possible to read from a .txt file and then convert various parts of that into string, char, and int values, but is it possible to take a string and use it as real code in the program?
Code:
string codeblock1="cout<<This is a test;";
string codeblock2="int array[5]={0,6,6,3,5};}";
int i;
cin>>i;
if(i)
{
execute(codeblock1);
}
else
{
execute(codeblock2);
}
Where execute is a function that converts from text to actual code (I don't know if there actually is a function called execute, I'm using it for the purpose of my example).
In C++ there's no simple way to do this. This feature is available in higher-level languages like Python, Lisp, Ruby and Perl (usually with some variation of an eval function). However, even in these languages this practice is frowned upon, because it can result in very unreadable code.
It's important you ask yourself (and perhaps tell us) why you want to do it?
Or do you only want to know if it's possible? If so, it is, though in a hairy way. You can write a C++ source file (generate whatever you want into it, as long as it's valid C++), then compile it and link to your code. All of this can be done automatically, of course, as long as a compiler is available to you in runtime (and you just execute it with system). I know someone who did this for some heavy optimization once. It's not pretty, but can be made to work.
You can create a function and parse whatever strings you like and create a data structure from it. This is known as a parse tree. Subsequently you can examine your parse tree and generate the necessary dynamic structures to perform the logic therin. The parse tree is subsequently converted into a runtime representation that is executed.
All compilers do exactly this. They take your code and they produce machine code based on this. In your particular case you want a language to write code for itself. Normally this is done in the context of a code generator and it is part of a larger build process. If you write a program to parse your language (consider flex and bison for this operation) that generates code you can achieve the results you desire.
Many scripting languages offer this sort of feature, going all the way back to eval in LISP - but C and C++ don't expose the compiler at runtime.
There's nothing in the spec that stops you from creating and executing some arbitrary machine language, like so:
char code[] = { 0x2f, 0x3c, 0x17, 0x43 }; // some machine code of some sort
typedef void (FuncType*)(); // define a function pointer type
FuncType func = (FuncType)code; // take the address of the code
func(); // and jump to it!
but most environments will crash if you try this, for security reasons. (Many viruses work by convincing ordinary programs to do something like this.)
In a normal environment, one thing you could do is create a complete program as text, then invoke the compiler to compile it and invoke the resulting executable.
If you want to run code in your own memory space, you could invoke the compiler to build you a DLL (or .so, depending on your platform) and then link in the DLL and jump into it.
First, I wanted to say, that I never implemented something like that myself and I may be way off, however, did you try CodeDomProvider class in System.CodeDom.Compiler namespace? I have a feeling the classes in System.CodeDom can provide you with the functionality you are looking for.
Of course, it will all be .NET code, not any other platform
Go here for sample
Yes, you just have to build a compiler (and possibly a linker) and you're there.
Several languages such as Python can be embedded into C/C++ so that may be an option.
It's kind of sort of possible, but not with just straight C/C++. You'll need some layer underneath such as LLVM.
Check out c-repl and ccons
One way that you could do this is with Boost Python. You wouldn't be using C++ at that point, but it's a good way of allowing the user to use a scripting language to interact with the existing program. I know it's not exactly what you want, but perhaps it might help.
Sounds like you're trying to create "C++Script", which doesn't exist as far as I know. C++ is a compiled language. This means it always must be compiled to native bytecode before being executed. You could wrap the code as a function, run it through a compiler, then execute the resulting DLL dynamically, but you're not going to get access to anything a compiled DLL wouldn't normally get.
You'd be better off trying to do this in Java, JavaScript, VBScript, or .NET, which are at one stage or another interpreted languages. Most of these languages either have an eval or execute function for just that, or can just be included as text.
Of course executing blocks of code isn't the safest idea - it will leave you vulnerable to all kinds of data execution attacks.
My recommendation would be to create a scripting language that serves the purposes of your application. This would give the user a limited set of instructions for security reasons, and allow you to interact with the existing program much more dynamically than a compiled external block.
Not easily, because C++ is a compiled language. Several people have pointed round-about ways to make it work - either execute the compiler, or incorporate a compiler or interpreter into your program. If you want to go the interpreter route, you can save yourself a lot of work by using an existing open source project, such as Lua

Resources