Get the size of the biggest derived class from a language server - language-server-protocol

I've got the problem that I need to know the size of the biggest derived class of a base class. At the moment I'm using the -D option of the compiler to specify the size, this is the easiest option I could think of, but you always have to update the size manually.
Since this would be nearly impossible to do in c++ itself, I thought maybe could get the information from a language server and then generate a header that contains the information.
Note that I have no experience in this, but the language server should have all the information that I need right?
How difficult would it be to implement this in python? (doesn't have to be python but I would prefer it)
For the language server I could provide a CMakeLists.txt.

I don't see why you need a language server. Using the define for the maximum size, you can check all classes against that constant, every time a new class is defined:
static_assert(sizeof(MyClass) < MAX_CLASS_SIZE);
(assumes C++ 17). This has the added benefit that you can see how large your classes are, the class size will not grow out of hand without a compiler error this way.
If you really want to compute the maximum size automatically, you can use something like this:
#define MAX(X,Y) X > Y ? X : Y
#define MAX_CLASS_SIZE MAX(sizeof(MyClass2), sizeof(MyClass3))

Related

How to implement source map in a compiler?

I'm implementing a compiler compiling a source language to a target language (assembly like) in Haskell.
For debugging purpose, a source map is needed to map target language assembly instruction to its corresponding source position (line and column).
I've searched extensively compiler implementation, but none includes a source map.
Can anyone please point me in the right direction on how to generate a source map?
Code samples, books, etc. Haskell is preferred, other languages are also welcome.
Details depend on a compilation technique you're applying.
If you're doing it via a sequence of transforms of intermediate languages, as most sane compilers do these days, your options are following:
Annotate all intermediate representation (IR) nodes with source location information. Introduce special nodes for preserving variable names (they'll all go after you do, say, an SSA-transform, so you need to track their origins separately)
Inject tons of intrinsic function calls (see how it's done in LLVM IR) instead of annotating each node
Do a mixture of the above
The first option can even be done nearly automatically - if each transform preserves source location of an original node in all nodes it creates from it, you'd only have to manually change some non-trivial annotations.
Also you must keep in mind that some optimisations may render your source location information absolutely meaningless. E.g., a value numbering would collapse a number of similar expressions into one, probably preserving a source location information for one random origin. Same for rematerialisation.
With Haskell, the first approach will result in a lot of boilerplate in your ADT definitions and pattern matching, even if you sugar coat it with something like Scrap Your Boilerplate (SYB), so I'd recommend the second approach, which is extensively documented and nicely demonstrated in LLVM IR.

Flexible-size arma vector allocation - true in general or is a property of Rcpp?

I'm using Rcpp package to write a code that has the main proportion written in C++ and the smaller proportion in R.
Based on what I know from C++, all unlike R, all variables in C++ should be declared upfront and this declaration includes both type and size. For instance when we say:
arma::vec test(2);
then I assume test is an armadillo vector with size 2 which means we should not assign anything of a different size to test. Is that right?
Here is my challenge:
In my code, I have a loop that assigns vectors of different size (usually larger than 2) to the "test vector" without redeclaration of test . To my surprise, the code works perfectly fine without any compiling error !
In each iteration, here is how I assign a new vector to test:
test = Rcpp::as<arma::vec>(myList["aVecFromMyList"]);
Question:
Is that an Armadillo feature that we can assign vectors of different lengths to test which is initially declared to be of size 2? or it's an Rcpp package feature?
Thanks very much for your help.
You're asking for the size of the vector to be encoded into the type. When you specify that something is of type arma::vec, you allow it to accept arma::vecs of any size.
If you want to enforce a size constraint in the type, then you want something like arma::vec::fixed<N>, where in your case N would be 2. This is a type that enforces the constraint that vectors should be of size N. There are also typedefs for low-digit versions of these, e.g. vec2 as a fixed vector of size 2.
You would have to modify your as call similarly I believe -- hopefully it works, I haven't tested it.
You should read the Armadillo docs; the Armadillo docs are probably some of the cleanest and most useful out there.

A language in which everything compiles

I'm trying to do some research for a new project, and I need to create objects dynamically from random data.
For this to work, I need a language / compiler that doesn't have problems with weird uncompilable code lying around.
Basically, I need the random code to compile (or be interpreted) as much as possible - Meaning that the uncompilable parts will be ignored, and only the compilable parts will create the objects (which could be ran).
Object Oriented-ness is not a must, but is a very strong advantage.
I thought of ASM, but it's very messy, and I'd probably need a more readable code
Thanks!
It sounds like you're doing something very much like genetic programming; even if you aren't, GP has to solve some of the same problems—using randomness to generate valid programs. The approach to this that is typically used is to work with a syntax tree: rather than storing x + y * 3 - 2, you store something like the following:
Then, instead of randomly changing the syntax, one can randomly change nodes in the tree instead. And if x should randomly change to, say, +, you can statically know that this means you need to insert two children (or not, depending on how you define +).
A good choice for a language to work with for this would be any Lisp dialect. In a Lisp, the above program would be written (- (+ x (* y 3)) 2), which is just a linearization of the syntax tree using parentheses to show depth. And in fact, Lisps expose this feature: you can just as easily work with the object '(- (+ x (* y 3)) 2) (note the leading quote). This is a three-element list, whose first element is -, second element is another list, and third element is 2. And, though you might or might not want it for your particular application, there's an eval function, such that (eval '(- (+ x (* y 3)) 2)) will take in the given list, treat it as a Lisp syntax tree/program, and evaluate it. This is what makes Lisps so attractive for doing this sort of work; Lisp syntax is basically a reification of the syntax-tree, and if you operate at the syntax-tree level, you can work on code as though it was a value. Lisp won't help you read /dev/random as a program directly, but with a little interpretation layered on top, you should be able to get what you want.
I should also mention, though I don't know anything about it (not that I know much about ordinary genetic programming) the existence of linear genetic programming. This is sort of like the assembly model that you mentioned—a linear stream of very, very simple instructions. The advantage here would seem to be that if you are working with /dev/random or something like it, the amount of interpretation needed is very small; the disadvantage would be, as you mentioned, the low-level nature of the code.
I'm not sure if this is what you're looking for, but any programming language can be made to function this way. For any programming language P, define the language Palways as follows:
If p is a valid program in P, then p is a valid program in Palways whose meaning is the same as its meaning in P.
If p is not a valid program in P, then p is a valid program in Palways whose meaning is the same as a program that immediately terminates.
For example, I could make the language C++always so that this program:
#include <iostream>
using namespace std;
int main() {
cout << "Hello, world!" << endl;
}
would compile as "Hello, world!", while this program:
Hahaha! This isn't legal C++ code!
Would be a legal program that just does absolutely nothing.
To solve your original problem, just take any OOP language like Java, Smalltalk, etc. and construct the appropriate Javaalways, Smalltalkalways, etc. language from it. Again, I'm not sure if this is at all what you're looking for, but it could be done very easily.
Alternatively, consider finding a grammar for any OOP language and then using that grammar to produce random syntactically valid programs. You could then filter those programs down by using the Palways programming language for that language to eliminate syntactically but not semantically valid programs.
Divide the ASCII byte values into 9 classes (division modulo 9 would help). Then assign then to Brainfuck codewords (see http://en.wikipedia.org/wiki/Brainfuck). Then interpret as Brainfuck.
There you go, any sequence of ASCII characters is a program. Not that it's going to do anything sensible... This approach has a much better chance, compared to templatetypedef's answer, to get a nontrivial program from a random byte sequence.
Text Editors
You could try feeding random character strings to an editor like Emacs or VI. Many (most?) characters will perform an editing action but some will do nothing (other than beep, perhaps). You would have to ensure that the random code mutator never generates the character sequence that exits the editor. However, this experience would be much like programming a Turing machine -- the code is not too readable.
Mathematica
In Mathematica, undefined symbols and other expressions evaluate to themselves, without error. So, that language might be a viable choice if you can arrange for the random code mutator to always generate well-formed expressions. This would be readily achievable since the basic Mathematica syntax is trivial, making it easy to operate on syntactic units rather than at the character level. It would be even easier if the mutator were written in Mathematica itself since expression-munging is Mathematica's forte. You could define a mini-language of valid operations within a Mathematica package that does not import the system-defined symbols. This would allow you to generate well-formed expressions to your heart's content without fear of generating a dangerous expression, like DeleteFile[FileNames["*.*", "/", Infinity]].
I believe Common Lisp should suit your needs. I always have some code in my SLIME/Emacs session that wouldn't compile. You can always tweak things, redefine functions in run-time. It is actually very good for prototyping.
A few years ago it took me quite a while to learn. But nowadays we have quicklisp and everything is so much easier.
Here I describe my development environment:
Install lisp on my linux machine
PS: I want to give an example, where Common Lisp was useful for me:
Up to maybe 2004 I used to write small programs in C (the keep it simple Unix way).
The last 3 years I had to get lots of different hardware running. Motorized stages, scientific cameras, IO cards.
The cameras turned out to be quite annoying. Usually you have to cool them down to -50 degree celsius or so and (in some SDKs) they don't like it when you close them. But this
is exactly how my C development cycle worked: write (30s), compile (1s), run (0.1s), repeat.
Eventually I decided to just use Common Lisp. Often it is straight forward to define the foreign function interfaces to talk to the SDKs and I can do this without ever leaving the running Lisp image. I start the editor in the morning define the open-device function, to talk to the device and after 3 hours I have enough of the functions implemented to set gain, temperature, region of interest and obtain the video.
Then I can often put the SDK manual away and just use the camera.
I used the same interactive programming approach when I have to parse some webpage or some weird XML.

statically/dynamically typed vs static/dynamic binding

everyone what is the difference between those 4 terms, can You give please examples?
Static and dynamic are jargon words that refer to the point in time at which some programming element is resolved. Static indicates that resolution takes place at the time a program is constructed. Dynamic indicates that resolution takes place at the time a program is run.
Static and Dynamic Typing
Typing refers to changes in program structure that are due to the differences between data values: integers, characters, floating point numbers, strings, objects and so on. These differences can have many effects, for example:
memory layout (e.g. 4 bytes for an int, 8 bytes for a double, more for an object)
instructions executed (e.g. primitive operations to add small integers, library calls to add large ones)
program flow (simple subroutine calling conventions versus hash-dispatch for multi-methods)
Static typing means that the executable form of a program generated at build time will vary depending upon the types of data values found in the program. Dynamic typing means that the generated code will always be the same, irrespective of type -- any differences in execution will be determined at run-time.
Note that few real systems are either purely one or the other, it is just a question of which is the preferred strategy.
Static and Dynamic Binding
Binding refers to the association of names in program text to the storage locations to which they refer. In static binding, this association is predetermined at build time. With dynamic binding, this association is not determined until run-time.
Truly static binding is almost extinct. Earlier assemblers and FORTRAN, for example, would completely precompute the exact memory location of all variables and subroutine locations. This situation did not last long, with the introduction of stack and heap allocation for variables and dynamically-loaded libraries for subroutines.
So one must take some liberty with the definitions. It is the spirit of the concept that counts here: statically bound programs precompute as much as possible about storage layout as is practical in a modern virtual memory, garbage collected, separately compiled application. Dynamically bound programs wait as late as possible.
An example might help. If I attempt to invoke a method MyClass.foo(), a static-binding system will verify at build time that there is a class called MyClass and that class has a method called foo. A dynamic-binding system will wait until run-time to see whether either exists.
Contrasts
The main strength of static strategies is that the program translator is much more aware of the programmer's intent. This makes it easier to:
catch many common errors early, during the build phase
build refactoring tools
incur a significant amount of the computational cost required to determine the executable form of the program only once, at build time
The main strength of dynamic strategies is that they are much easier to implement, meaning that:
a working dynamic environment can be created at a fraction of the cost of a static one
it is easier to add language features that might be very challenging to check statically
it is easier to handle situations that require self-modifying code
Typing - refers to variable tyes and if variables are allowed to change type during program execution
http://en.wikipedia.org/wiki/Type_system#Type_checking
Binding - this, as you can read below can refer to variable binding, or library binding
http://en.wikipedia.org/wiki/Binding_%28computer_science%29#Language_or_Name_binding

Maximum number of variables in classes

What is the max number? Will my program crash if it exceeds certain number? Is there a standard just like it is 5 for method parameters?
An answer to such a question would depend on the language you are using, but generally speaking, there isn't any limit on the amount of variables or parameters of a method.
There is a cap on the amount of data you can handle, and that's the amount of memory available to your system, but that's a cap on the size of the actual data held by the variables.
Having a high number of variables or methods inside a class is not recommended because your code can become unmaintainable very quickly. That is due to the Single Responsibility Principle: your class should be responsible for one thing, and only one thing, and that one thing will rarely need that many variables to accurately represent it's state. In the event that it does, use Object Composition: identify the small structures which have emerged inside the class and break them up into smaller classes, then add references to objects of those classes to the original class, effectively creating a "has a" relationship between the original class and the smaller classes.
For example, a car has an engine:
class Car {
Engine engine;
};
Your code will become unreadable long before you reach any hard limits set by a programming language, both for variables and method parameters.
This is unlikely to be an issue. Although I would guess that it depends on the language you are talking about,
And why don't you try to code all your program in only one file, and with only one function ? :)
Because it's unreadable, and unmaintainable, so it's full of bugs, and so it will not work very well.
This is a kind of real limit to the number of member variables yes.
Although there is no hard limit, it is never recommended to use large number of variables in a class or method parameters. One can use composition design pattern or inheritance in some cases for reuse. The latter should used sparingly. I would rarely use more than 25 variables in a class or 5 in method parameters.

Resources