Code generation from restricted set of input - metaprogramming

Suppose I have a mapping (with known types) such as
1: false,
4: false,
8: true,
16: true
And I want to generate a function take input and gives the correct output. I don't care what happens for any input that is not in the above mapping, for example 3 will never be expected.
A naive solution would be to generate the function with a switch statement, for instance
f(int x) {
if x == 1 return false;
else if x == 4 return false;
else if x == 8 return true;
else if x == 16 return true;
}
I want to be able to generate code that doesn't scale in memory with the set of input.
f(int x) {
return x >= 8;
}
Does this problem have a name? What area should I research into?

You want to "guess" what code to generate for the inputs not provided.
You can't do it.
[EDIT: On further discussion, it is now clear to me that he doesn't care about such inputs. I'm leaving the answer as is, because people will assume, as I did, that he must care. Surprising advice offered at end of this answer, anyway].
Imagine you have an adversary that is going to specify a function f on all inputs, but s/he only provides you a sample, asin your example. some fixed set of inputs. You now apply an oracle, that guesses that f(9) is true. Your adversary promptly shows that her function has f(9) is actually false. Likewise if you guess f(9) is false.
The adversary can always manufacture an input/output pair that does not match what your code generator guesses. So you simply cannot get it right.
What you can do is to accept that your guesses may be wrong, and try to choose a function that has the least complexity that explains the input/output pairs you have seen so far. Your example is essentially one of these.
If you believe that "simple" functions are a better approximation of the world than complex ones, you can generate code and hope you don't encounter an adversary in nature.
Don't count on it to be reliable.
With those caveats, OP might be interested in the GNU SuperOptimizer. This finds short sequences of machine instructions that produce a provided set of input/output pairs [actually, I think you give it function that computes the answer, like OP's original function] by the "obviously" crazy idea of literally trying every instruction sequence.
The genius behind the superoptimizer is that this stunt actually works in practice for short instruction sequences.
I think it would be easy to modify it to produce generic "C" instructions (e.g. valid C actions) since I believe it uses C actions to model machine instructions anyway. You would probably have to modify your function to produce "don't care" results for inputs that don't matter, and teach GNU superoptimizer that "don't care" is a valid result. That would in fact be a useful addition to the GNU superoptimizer for its original purpose, too.

Related

How are the bools in my while loop affecting the loop?

I'm learning Python and I built a TicTacToe game. I'm now writing unit tests and improving my code. I was advised to make a while loop in my turn function and use the following bools, but I don't know why. I can see some reason to why this makes sense, but because of how new I am I couldn't even explain why it makes any sense to me. Can someone explain why this would make more sense than another combination of bools?
print(TTTGame.player + "'s TURN!")
print('pick a number 1 through 9')
position = int(input()) - 1
valid = False<<<<<<<<<<<<<<<<<<<<<<False
while not valid:<<<<<<<<<<<<<<<<<<<<<<<<<<<Not
try:
position = int(input('pick a spot'))
except (ValueError):
print('pick a number 1 though 9')
x = TTTGame.board[position]
if x == '-':
TTTGame.board[position] = TTTGame.player
else:
print('Cant Go There!')
TTTGame.board[position] = TTTGame.player
valid = True<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<True
Based on the code you provided I would have to try to work backwards to create a worse example, and without seeing what you did initially that may cause more confusion than it would help. A fair number of the things I mention are subjective to each developer, but the principles tend to be commonly agreed upon to some degree (especially in python).
That being said I will try to explain why this pattern is optimal:
Intuitiveness: When reading this code and looking from the outside I can tell that you are doing validation, primarily this is because just reading the code as plain English I see you are assuming the input to be invalid (valid == False) and won't leave the loop until it has been validated (valid == True). Even without the additional context of knowing it was a tic-tac-toe game I can immediately tell by reading the code this is the case.
Speed: In terms of how long it takes to complete this function, it will terminate as soon as the input is valid as opposed to some other ways of doing this that require more than one check.
Pythonic: In the python community there is an emphasis on 'pythonic' code, basically this means that there are ways of doing things that are common across various types of python code. In this case there are many patterns of validating input (using for loops, terminating conditions, seperate functions etc.), but again because of how this is written I would be able to debug it without having to necessarily know your whole codebase. This also means if you see a similar chunk of code in a large project, you are more likely to recognize what's happening. It follows the principles of the zen of python in that you are being explicit, while maintaining readability, errors are not passing silently and you are keeping it simple.

Why would more array accesses perform better?

I'm taking a course on coursera that uses minizinc. In one of the assignments, I was spinning my wheels forever because my model was not performing well enough on a hidden test case. I finally solved it by changing the following types of accesses in my model
from
constraint sum(neg1,neg2 in party where neg1 < neg2)(joint[neg1,neg2]) >= m;
to
constraint sum(i,j in 1..u where i < j)(joint[party[i],party[j]]) >= m;
I dont know what I'm missing, but why would these two perform any differently from eachother? It seems like they should perform similarly with the former being maybe slightly faster, but the performance difference was dramatic. I'm guessing there is some sort of optimization that the former misses out on? Or, am I really missing something and do those lines actually result in different behavior? My intention is to sum the strength of every element in raid.
Misc. Details:
party is an array of enum vars
party's index set is 1..real_u
every element in party should be unique except for a dummy variable.
solver was Gecode
verification of my model was done on a coursera server so I don't know what optimization level their compiler used.
edit: Since minizinc(mz) is a declarative language, I'm realizing that "array accesses" in mz don't necessarily have a direct corollary in an imperative language. However, to me, these two lines mean the same thing semantically. So I guess my question is more "Why are the above lines different semantically in mz?"
edit2: I had to change the example in question, I was toting the line of violating coursera's honor code.
The difference stems from the way in which the where-clause "a < b" is evaluated. When "a" and "b" are parameters, then the compiler can already exclude the irrelevant parts of the sum during compilation. If "a" or "b" is a variable, then this can usually not be decided during compile time and the solver will receive a more complex constraint.
In this case the solver would have gotten a sum over "array[int] of var opt int", meaning that some variables in an array might not actually be present. For most solvers this is rewritten to a sum where every variable is multiplied by a boolean variable, which is true iff the variable is present. You can understand how this is less efficient than an normal sum without multiplications.

which searching method is used by element in list in python

When I use the function to check if element is present in list in python by using element in List. Which searching method is used in the background.
ex 2 in [2,3,4,5]
for example in the following code:
list=[2,3,4]
if 2 **in** list:
print(True)
else:
print(False)
Here's the code for the implemention in cpython:
static int
list_contains(PyListObject *a, PyObject *el)
{
Py_ssize_t i;
int cmp;
for (i = 0, cmp = 0 ; cmp == 0 && i < Py_SIZE(a); ++i)
cmp = PyObject_RichCompareBool(el, PyList_GET_ITEM(a, i),
Py_EQ);
return cmp;
}
So as you can see, it's literally just walking through the list from front to back until the requisite element is found. The simplest possible algorithm for something like this. This checks out with the time complexity guide, which says that x in list has O(n) complexity.
Dicts and sets use a wholly different algorithm, because they're hash tables, a completely different data structure. Said different algorithm is almost always more efficient. If you're curious, you can also look at the source codes for sets and dicts respectively (the latter is much more complicated than the former, it looks like).
The in keyword can be used in different ways, but in this case, it's used to check if something contains something else. According to a documentation page about this, the in keyword calls __contains__, i.e.: x in y is the same as y.__contains__(x).
If SomeType.__contains__ is not defined by SomeType...
...the membership test first tries iteration via __iter__(), then the old sequence iteration protocol via __getitem__().
This would most likely be in linear time (O(n)). I say "most likely" because it depends on those implementations.
The in-built list type can have elements of different types and is not typically sorted, making binary search impossible (or at least illogical) and therefore, as stated by Green Cloak Guy, the CPython implementation performs the check linearly.
To answer the question in one term, it would be: "Linear Search."

A reverse inference engine (find a random X for which foo(X) is true)

I am aware that languages like Prolog allow you to write things like the following:
mortal(X) :- man(X). % All men are mortal
man(socrates). % Socrates is a man
?- mortal(socrates). % Is Socrates mortal?
yes
What I want is something like this, but backwards. Suppose I have this:
mortal(X) :- man(X).
man(socrates).
man(plato).
man(aristotle).
I then ask it to give me a random X for which mortal(X) is true (thus it should give me one of 'socrates', 'plato', or 'aristotle' according to some random seed).
My questions are:
Does this sort of reverse inference have a name?
Are there any languages or libraries that support it?
EDIT
As somebody below pointed out, you can simply ask mortal(X) and it will return all X, from which you can simply pick a random one from the list. What if, however, that list would be very large, perhaps in the billions? Obviously in that case it wouldn't do to generate every possible result before picking one.
To see how this would be a practical problem, imagine a simple grammar that generated a random sentence of the form "adjective1 noun1 adverb transitive_verb adjective2 noun2". If the lists of adjectives, nouns, verbs, etc. are very large, you can see how the combinatorial explosion is a problem. If each list had 1000 words, you'd have 1000^6 possible sentences.
Instead of the deep-first search of Prolog, a randomized deep-first search strategy could be easyly implemented. All that is required is to randomize the program flow at choice points so that every time a disjunction is reached a random pole on the search tree (= prolog program) is selected instead of the first.
Though, note that this approach does not guarantees that all the solutions will be equally probable. To guarantee that, it is required to known in advance how many solutions will be generated by every pole to weight the randomization accordingly.
I've never used Prolog or anything similar, but judging by what Wikipedia says on the subject, asking
?- mortal(X).
should list everything for which mortal is true. After that, just pick one of the results.
So to answer your questions,
I'd go with "a query with a variable in it"
From what I can tell, Prolog itself should support it quite fine.
I dont think that you can calculate the nth solution directly but you can calculate the n first solutions (n randomly picked) and pick the last. Of course this would be problematic if n=10^(big_number)...
You could also do something like
mortal(ID,X) :- man(ID,X).
man(X):- random(1,4,ID), man(ID,X).
man(1,socrates).
man(2,plato).
man(3,aristotle).
but the problem is that if not every man was mortal, for example if only 1 out of 1000000 was mortal you would have to search a lot. It would be like searching for solutions for an equation by trying random numbers till you find one.
You could develop some sort of heuristic to find a solution close to the number but that may affect (negatively) the randomness.
I suspect that there is no way to do it more efficiently: you either have to calculate the set of solutions and pick one or pick one member of the superset of all solutions till you find one solution. But don't take my word for it xd

Why is the 'if' statement considered evil?

I just came from Simple Design and Testing Conference. In one of the session we were talking about evil keywords in programming languages. Corey Haines, who proposed the subject, was convinced that if statement is absolute evil. His alternative was to create functions with predicates. Can you please explain to me why if is evil.
I understand that you can write very ugly code abusing if. But I don't believe that it's that bad.
The if statement is rarely considered as "evil" as goto or mutable global variables -- and even the latter are actually not universally and absolutely evil. I would suggest taking the claim as a bit hyperbolic.
It also largely depends on your programming language and environment. In languages which support pattern matching, you will have great tools for replacing if at your disposal. But if you're programming a low-level microcontroller in C, replacing ifs with function pointers will be a step in the wrong direction. So, I will mostly consider replacing ifs in OOP programming, because in functional languages, if is not idiomatic anyway, while in purely procedural languages you don't have many other options to begin with.
Nevertheless, conditional clauses sometimes result in code which is harder to manage. This does not only include the if statement, but even more commonly the switch statement, which usually includes more branches than a corresponding if would.
There are cases where it's perfectly reasonable to use an if
When you are writing utility methods, extensions or specific library functions, it's likely that you won't be able to avoid ifs (and you shouldn't). There isn't a better way to code this little function, nor make it more self-documented than it is:
// this is a good "if" use-case
int Min(int a, int b)
{
if (a < b)
return a;
else
return b;
}
// or, if you prefer the ternary operator
int Min(int a, int b)
{
return (a < b) ? a : b;
}
Branching over a "type code" is a code smell
On the other hand, if you encounter code which tests for some sort of a type code, or tests if a variable is of a certain type, then this is most likely a good candidate for refactoring, namely replacing the conditional with polymorphism.
The reason for this is that by allowing your callers to branch on a certain type code, you are creating a possibility to end up with numerous checks scattered all over your code, making extensions and maintenance much more complex. Polymorphism on the other hand allows you to bring this branching decision as closer to the root of your program as possible.
Consider:
// this is called branching on a "type code",
// and screams for refactoring
void RunVehicle(Vehicle vehicle)
{
// how the hell do I even test this?
if (vehicle.Type == CAR)
Drive(vehicle);
else if (vehicle.Type == PLANE)
Fly(vehicle);
else
Sail(vehicle);
}
By placing common but type-specific (i.e. class-specific) functionality into separate classes and exposing it through a virtual method (or an interface), you allow the internal parts of your program to delegate this decision to someone higher in the call hierarchy (potentially at a single place in code), allowing much easier testing (mocking), extensibility and maintenance:
// adding a new vehicle is gonna be a piece of cake
interface IVehicle
{
void Run();
}
// your method now doesn't care about which vehicle
// it got as a parameter
void RunVehicle(IVehicle vehicle)
{
vehicle.Run();
}
And you can now easily test if your RunVehicle method works as it should:
// you can now create test (mock) implementations
// since you're passing it as an interface
var mock = new Mock<IVehicle>();
// run the client method
something.RunVehicle(mock.Object);
// check if Run() was invoked
mock.Verify(m => m.Run(), Times.Once());
Patterns which only differ in their if conditions can be reused
Regarding the argument about replacing if with a "predicate" in your question, Haines probably wanted to mention that sometimes similar patterns exist over your code, which differ only in their conditional expressions. Conditional expressions do emerge in conjunction with ifs, but the whole idea is to extract a repeating pattern into a separate method, leaving the expression as a parameter. This is what LINQ already does, usually resulting in cleaner code compared to an alternative foreach:
Consider these two very similar methods:
// average male age
public double AverageMaleAge(List<Person> people)
{
double sum = 0.0;
int count = 0;
foreach (var person in people)
{
if (person.Gender == Gender.Male)
{
sum += person.Age;
count++;
}
}
return sum / count; // not checking for zero div. for simplicity
}
// average female age
public double AverageFemaleAge(List<Person> people)
{
double sum = 0.0;
int count = 0;
foreach (var person in people)
{
if (person.Gender == Gender.Female) // <-- only the expression
{ // is different
sum += person.Age;
count++;
}
}
return sum / count;
}
This indicates that you can extract the condition into a predicate, leaving you with a single method for these two cases (and many other future cases):
// average age for all people matched by the predicate
public double AverageAge(List<Person> people, Predicate<Person> match)
{
double sum = 0.0;
int count = 0;
foreach (var person in people)
{
if (match(person)) // <-- the decision to match
{ // is now delegated to callers
sum += person.Age;
count++;
}
}
return sum / count;
}
var males = AverageAge(people, p => p.Gender == Gender.Male);
var females = AverageAge(people, p => p.Gender == Gender.Female);
And since LINQ already has a bunch of handy extension methods like this, you actually don't even need to write your own methods:
// replace everything we've written above with these two lines
var males = list.Where(p => p.Gender == Gender.Male).Average(p => p.Age);
var females = list.Where(p => p.Gender == Gender.Female).Average(p => p.Age);
In this last LINQ version the if statement has "disappeared" completely, although:
to be honest the problem wasn't in the if by itself, but in the entire code pattern (simply because it was duplicated), and
the if still actually exists, but it's written inside the LINQ Where extension method, which has been tested and closed for modification. Having less of your own code is always a good thing: less things to test, less things to go wrong, and the code is simpler to follow, analyze and maintain.
Huge runs of nested if/else statements
When you see a function spanning 1000 lines and having dozens of nested if blocks, there is an enormous chance it can be rewritten to
use a better data structure and organize the input data in a more appropriate manner (e.g. a hashtable, which will map one input value to another in a single call),
use a formula, a loop, or sometimes just an existing function which performs the same logic in 10 lines or less (e.g. this notorious example comes to my mind, but the general idea applies to other cases),
use guard clauses to prevent nesting (guard clauses give more confidence into the state of variables throughout the function, because they get rid of exceptional cases as soon as possible),
at least replace with a switch statement where appropriate.
Refactor when you feel it's a code smell, but don't over-engineer
Having said all this, you should not spend sleepless nights over having a couple of conditionals now and there. While these answers can provide some general rules of thumb, the best way to be able to detect constructs which need refactoring is through experience. Over time, some patterns emerge that result in modifying the same clauses over and over again.
There is another sense in which if can be evil: when it comes instead of polymorphism.
E.g.
if (animal.isFrog()) croak(animal)
else if (animal.isDog()) bark(animal)
else if (animal.isLion()) roar(animal)
instead of
animal.emitSound()
But basically if is a perfectly acceptable tool for what it does. It can be abused and misused of course, but it is nowhere near the status of goto.
A good quote from Code Complete:
Code as if whoever maintains your program is a violent psychopath who
knows where you live.
— Anonymous
IOW, keep it simple. If the readability of your application will be enhanced by using a predicate in a particular area, use it. Otherwise, use the 'if' and move on.
I think it depends on what you're doing to be honest.
If you have a simple if..else statement, why use a predicate?
If you can, use a switch for larger if replacements, and then if the option to use a predicate for large operations (where it makes sense, otherwise your code will be a nightmare to maintain), use it.
This guy seems to have been a bit pedantic for my liking. Replacing all if's with Predicates is just crazy talk.
There is the Anti-If campaign which started earlier in the year. The main premise being that many nested if statements often can often be replaced with polymorphism.
I would be interested to see an example of using the Predicate instead. Is this more along the lines of functional programming?
Just like in the bible verse about money, if statements are not evil -- the LOVE of if statements is evil. A program without if statements is a ridiculous idea, and using them as necessary is essential. But a program that has 100 if-else if blocks in a row (which, sadly, I have seen) is definitely evil.
I have to say that I recently have begun to view if statements as a code smell: especially when you find yourself repeating the same condition several times. But there's something you need to understand about code smells: they don't necessarily mean that the code is bad. They just mean that there's a good chance the code is bad.
For instance, comments are listed as a code smell by Martin Fowler, but I wouldn't take anyone seriously who says "comments are evil; don't use them".
Generally though, I prefer to use polymorphism instead of if statements where possible. That just makes for so much less room for error. I tend to find that a lot of the time, using conditionals leads to a lot of tramp arguments as well (because you have to pass the data needed to form the conditional on to the appropriate method).
if is not evil(I also hold that assigning morality to code-writing practices is asinine...).
Mr. Haines is being silly and should be laughed at.
I'll agree with you; he was wrong. You can go too far with things like that, too clever for your own good.
Code created with predicates instead of ifs would be horrendous to maintain and test.
Predicates come from logical/declarative programming languages, like PROLOG. For certain classes of problems, like constraint solving, they are arguably superior to a lot of drawn out step-by-step if-this-do-that-then-do-this crap. Problems that would be long and complex to solve in imperative languages can be done in just a few lines in PROLOG.
There's also the issue of scalable programming (due to the move towards multicore, the web, etc.). If statements and imperative programming in general tend to be in step-by-step order, and not scaleable. Logical declarations and lambda calculus though, describe how a problem can be solved, and what pieces it can be broken down into. As a result, the interpreter/processor executing that code can efficiently break the code into pieces, and distribute it across multiple CPUs/cores/threads/servers.
Definitely not useful everywhere; I'd hate to try writing a device driver with predicates instead of if statements. But yes, I think the main point is probably sound, and worth at least getting familiar with, if not using all the time.
The only problem with a predicates (in terms of replacing if statements) is that you still need to test them:
function void Test(Predicate<int> pr, int num)
{
if (pr(num))
{ /* do something */ }
else
{ /* do something else */ }
}
You could of course use the terniary operator (?:), but that's just an if statement in disguise...
Perhaps with quantum computing it will be a sensible strategy to not use IF statements but to let each leg of the computation proceed and only have the function 'collapse' at termination to a useful result.
Sometimes it's necessary to take an extreme position to make your point. I'm sure this person uses if -- but every time you use an if, it's worth having a little think about whether a different pattern would make the code clearer.
Preferring polymorphism to if is at the core of this. Rather than:
if(animaltype = bird) {
squawk();
} else if(animaltype = dog) {
bark();
}
... use:
animal.makeSound();
But that supposes that you've got an Animal class/interface -- so really what the if is telling you, is that you need to create that interface.
So in the real world, what sort of ifs do we see that lead us to a polymorphism solution?
if(logging) {
log.write("Did something");
}
That's really irritating to see throughout your code. How about, instead, having two (or more) implementations of Logger?
this.logger = new NullLogger(); // logger.log() does nothing
this.logger = new StdOutLogger(); // logger.log() writes to stdout
That leads us to the Strategy Pattern.
Instead of:
if(user.getCreditRisk() > 50) {
decision = thoroughCreditCheck();
} else if(user.getCreditRisk() > 20) {
decision = mediumCreditCheck();
} else {
decision = cursoryCreditCheck();
}
... you could have ...
decision = getCreditCheckStrategy(user.getCreditRisk()).decide();
Of course getCreditCheckStrategy() might contain an if -- and that might well be appropriate. You've pushed it into a neat place where it belongs.
It probably comes down to a desire to keep code cyclomatic complexity down, and to reduce the number of branch points in a function. If a function is simple to decompose into a number of smaller functions, each of which can be tested, you can reduce the complexity and make code more easily testable.
IMO:
I suspect he was trying to provoke a debate and make people think about the misuse of 'if'. No one would seriously suggest such a fundamental construction of programming syntax was to be completely avoided would they?
Good that in ruby we have unless ;)
But seriously probably if is the next goto, that even if most of the people think it is evil in some cases is simplifying/speeding up the things (and in some cases like low level highly optimized code it's a must).
I think If statements are evil, but If expressions are not. What I mean by an if expression in this case can be something like the C# ternary operator (condition ? trueExpression : falseExpression). This is not evil because it is a pure function (in a mathematical sense). It evaluates to a new value, but it has no effects on anything else. Because of this, it works in a substitution model.
Imperative If statements are evil because they force you to create side-effects when you don't need to. For an If statement to be meaningful, you have to produce different "effects" depending on the condition expression. These effects can be things like IO, graphic rendering or database transactions, which change things outside of the program. Or, it could be assignment statements that mutate the state of the existing variables. It is usually better to minimize these effects and separate them from the actual logic. But, because of the If statements, we can freely add these "conditionally executed effects" everywhere in the code. I think that's bad.
If is not evil! Consider ...
int sum(int a, int b) {
return a + b;
}
Boring, eh? Now with an added if ...
int sum(int a, int b) {
if (a == 0 && b == 0) {
return 0;
}
return a + b;
}
... your code creation productivity (measured in LOC) is doubled.
Also code readability has improved much, for now you can see in the blink of an eye what the result is when both argument are zero. You couldn't do that in the code above, could you?
Moreover you supported the testteam for they now can push their code coverage test tools use up more to the limits.
Furthermore the code now is better prepared for future enhancements. Let's guess, for example, the sum should be zero if one of the arguments is zero (don't laugh and don't blame me, silly customer requirements, you know, and the customer is always right).
Because of the if in the first place only a slight code change is needed.
int sum(int a, int b) {
if (a == 0 || b == 0) {
return 0;
}
return a + b;
}
How much more code change would have been needed if you hadn't invented the if right from the start.
Thankfulness will be yours on all sides.
Conclusion: There's never enough if's.
There you go. To.

Resources