Simple text/string reciprocal cipher DSL which could be random generated? - string

Just to play around, are there any DSL that
could be generated randomly
manipulate text or string and restore them
works like a reciprocal cipher. e.g. If the generated function is F(), for every string s1 you can get scrambled string s2 = F(s1). Then another G() could be deduced to reverse F(), which G(s2) = s1.
F() and G() could be the same or different.
And few additional questions:
any programming language could deduce reverse functions automatically?
And make sure generated function F() is reversible?
Or any tips where could I start?
Thanks!

One good starting point would be the Feistel network construction for block ciphers. In essence, it's a basic framework for building an iterated block cipher out of a function. There's very few requirements on the function -- it simply needs to be a function which modifies a piece of the message based on the key. The cipher will work no matter what the function is; the nature of the function will affect the security of the cipher, though.
http://en.wikipedia.org/wiki/Feistel_cipher
To answer some of your other questions:
any programming language could deduce reverse functions automatically?
Not in general. Especially because many (most!) functions are not invertible at all.
And make sure generated function F() is reversible?
Using the Feistel network construction will guarantee this.

To answer my own question:
http://en.wikipedia.org/wiki/Reversible_computing
http://strangepaths.com/reversible-computation/2008/01/20/en/
Looks like it's mainly in theory CS, so such DSL is yet to be invented.
So far prolog can do reversible functions

Related

Accumulator factory in Haskell

Now, at the start of my adventure with programming I have some problems understanding basic concepts. Here is one related to Haskell or perhaps generally functional paradigm.
Here is a general statement of accumulator factory problem, from
http://rosettacode.org/wiki/Accumulator_factory
[Write a function that]
Takes a number n and returns a function (lets call it g), that takes a number i, and returns n incremented by the accumulation of i from every call of function g(i).
Works for any numeric type-- i.e. can take both ints and floats and returns functions that can take both ints and floats. (It is not enough simply to convert all input to floats. An accumulator that has only seen integers must return integers.) (i.e., if the language doesn't allow for numeric polymorphism, you have to use overloading or something like that)
Generates functions that return the sum of every number ever passed to them, not just the most recent. (This requires a piece of state to hold the accumulated value, which in turn means that pure functional languages can't be used for this task.)
Returns a real function, meaning something that you can use wherever you could use a function you had defined in the ordinary way in the text of your program. (Follow your language's conventions here.)
Doesn't store the accumulated value or the returned functions in a way that could cause them to be inadvertently modified by other code. (No global variables or other such things.)
with, as I understand, a key point being:
"[...] creating a function that [...]
Generates functions that return the sum of every number ever passed to them, not just the most recent. (This requires a piece of state to hold the accumulated value, which in turn means that pure functional languages can't be used for this task.)"
We can find a Haskell solution on the same website and it seems to do just what the quote above says.
Here
http://rosettacode.org/wiki/Category:Haskell
it is said that Haskell is purely functional.
What is then the explanation of the apparent contradiction? Or maybe there is no contradiction and I simply lack some understanding? Thanks.
The Haskell solution does not actually quite follow the rules of the challenge. In particular, it violates the rule that the function "Returns a real function, meaning something that you can use wherever you could use a function you had defined in the ordinary way in the text of your program." Instead of returning a real function, it returns an ST computation that produces a function that itself produces more ST computations. Within the context of an ST "state thread", you can create and use mutable references (STRef), arrays, and vectors. However, it's impossible for this mutable state to "leak" outside the state thread to contaminate pure code.

Using a regular string as a code for compiling

I need to make a function that receives a string such as:
int *ptr[20], *p, p2, p3[3];
and the function need to print:
ptr requires 80 bytes.
p requires 4 bytes.
p2 requires 4 bytes.
p3 requires 12 bytes.
to simplify to task, I would like to use the "fake" code in the string as a "real" code, and then just print the function sizeof(variable) to answer the question. I think it is the most simple way.
But how to do it?
What you describe is the ability to "evaluate" dynamically generated code.
Some languages -- usually they are evaluated (non-compiled) ones -- have such features, but C++ does not.
Even if it did, it wouldn't be a good solution here. You need a parser. For a formal approach, you may research lexers and context-free parsers. For an ad hoc approach...well...do whatever string manipulation you would like.

A language in which everything compiles

I'm trying to do some research for a new project, and I need to create objects dynamically from random data.
For this to work, I need a language / compiler that doesn't have problems with weird uncompilable code lying around.
Basically, I need the random code to compile (or be interpreted) as much as possible - Meaning that the uncompilable parts will be ignored, and only the compilable parts will create the objects (which could be ran).
Object Oriented-ness is not a must, but is a very strong advantage.
I thought of ASM, but it's very messy, and I'd probably need a more readable code
Thanks!
It sounds like you're doing something very much like genetic programming; even if you aren't, GP has to solve some of the same problems—using randomness to generate valid programs. The approach to this that is typically used is to work with a syntax tree: rather than storing x + y * 3 - 2, you store something like the following:
Then, instead of randomly changing the syntax, one can randomly change nodes in the tree instead. And if x should randomly change to, say, +, you can statically know that this means you need to insert two children (or not, depending on how you define +).
A good choice for a language to work with for this would be any Lisp dialect. In a Lisp, the above program would be written (- (+ x (* y 3)) 2), which is just a linearization of the syntax tree using parentheses to show depth. And in fact, Lisps expose this feature: you can just as easily work with the object '(- (+ x (* y 3)) 2) (note the leading quote). This is a three-element list, whose first element is -, second element is another list, and third element is 2. And, though you might or might not want it for your particular application, there's an eval function, such that (eval '(- (+ x (* y 3)) 2)) will take in the given list, treat it as a Lisp syntax tree/program, and evaluate it. This is what makes Lisps so attractive for doing this sort of work; Lisp syntax is basically a reification of the syntax-tree, and if you operate at the syntax-tree level, you can work on code as though it was a value. Lisp won't help you read /dev/random as a program directly, but with a little interpretation layered on top, you should be able to get what you want.
I should also mention, though I don't know anything about it (not that I know much about ordinary genetic programming) the existence of linear genetic programming. This is sort of like the assembly model that you mentioned—a linear stream of very, very simple instructions. The advantage here would seem to be that if you are working with /dev/random or something like it, the amount of interpretation needed is very small; the disadvantage would be, as you mentioned, the low-level nature of the code.
I'm not sure if this is what you're looking for, but any programming language can be made to function this way. For any programming language P, define the language Palways as follows:
If p is a valid program in P, then p is a valid program in Palways whose meaning is the same as its meaning in P.
If p is not a valid program in P, then p is a valid program in Palways whose meaning is the same as a program that immediately terminates.
For example, I could make the language C++always so that this program:
#include <iostream>
using namespace std;
int main() {
cout << "Hello, world!" << endl;
}
would compile as "Hello, world!", while this program:
Hahaha! This isn't legal C++ code!
Would be a legal program that just does absolutely nothing.
To solve your original problem, just take any OOP language like Java, Smalltalk, etc. and construct the appropriate Javaalways, Smalltalkalways, etc. language from it. Again, I'm not sure if this is at all what you're looking for, but it could be done very easily.
Alternatively, consider finding a grammar for any OOP language and then using that grammar to produce random syntactically valid programs. You could then filter those programs down by using the Palways programming language for that language to eliminate syntactically but not semantically valid programs.
Divide the ASCII byte values into 9 classes (division modulo 9 would help). Then assign then to Brainfuck codewords (see http://en.wikipedia.org/wiki/Brainfuck). Then interpret as Brainfuck.
There you go, any sequence of ASCII characters is a program. Not that it's going to do anything sensible... This approach has a much better chance, compared to templatetypedef's answer, to get a nontrivial program from a random byte sequence.
Text Editors
You could try feeding random character strings to an editor like Emacs or VI. Many (most?) characters will perform an editing action but some will do nothing (other than beep, perhaps). You would have to ensure that the random code mutator never generates the character sequence that exits the editor. However, this experience would be much like programming a Turing machine -- the code is not too readable.
Mathematica
In Mathematica, undefined symbols and other expressions evaluate to themselves, without error. So, that language might be a viable choice if you can arrange for the random code mutator to always generate well-formed expressions. This would be readily achievable since the basic Mathematica syntax is trivial, making it easy to operate on syntactic units rather than at the character level. It would be even easier if the mutator were written in Mathematica itself since expression-munging is Mathematica's forte. You could define a mini-language of valid operations within a Mathematica package that does not import the system-defined symbols. This would allow you to generate well-formed expressions to your heart's content without fear of generating a dangerous expression, like DeleteFile[FileNames["*.*", "/", Infinity]].
I believe Common Lisp should suit your needs. I always have some code in my SLIME/Emacs session that wouldn't compile. You can always tweak things, redefine functions in run-time. It is actually very good for prototyping.
A few years ago it took me quite a while to learn. But nowadays we have quicklisp and everything is so much easier.
Here I describe my development environment:
Install lisp on my linux machine
PS: I want to give an example, where Common Lisp was useful for me:
Up to maybe 2004 I used to write small programs in C (the keep it simple Unix way).
The last 3 years I had to get lots of different hardware running. Motorized stages, scientific cameras, IO cards.
The cameras turned out to be quite annoying. Usually you have to cool them down to -50 degree celsius or so and (in some SDKs) they don't like it when you close them. But this
is exactly how my C development cycle worked: write (30s), compile (1s), run (0.1s), repeat.
Eventually I decided to just use Common Lisp. Often it is straight forward to define the foreign function interfaces to talk to the SDKs and I can do this without ever leaving the running Lisp image. I start the editor in the morning define the open-device function, to talk to the device and after 3 hours I have enough of the functions implemented to set gain, temperature, region of interest and obtain the video.
Then I can often put the SDK manual away and just use the camera.
I used the same interactive programming approach when I have to parse some webpage or some weird XML.

What programming languages will let me manipulate the sequence of instructions in a method?

I have an upcoming project in which a core requirement will be to mutate the way a method works at runtime. Note that I'm not talking about a higher level OO concept like "shadow one method with another", although the practical effect would be similar.
The key properties I'm after are:
I must be able to modify the method in such a way that I can add new expressions, remove existing expressions, or modify any of the expressions that take place in it.
After modifying the method, subsequent calls to that method would invoke the new sequence of operations. (Or, if the language binds methods rather than evaluating every single time, provide me a way to unbind/rebind the new method.)
Ideally, I would like to manipulate the atomic units of the language (e.g., "invoke method foo on object bar") and not the assembly directly (e.g. "pop these three parameters onto the stack"). In other words, I'd like to be able to have high confidence that the operations I construct are semantically meaningful in the language. But I'll take what I can get.
If you're not sure if a candidate language meets these criteria, here's a simple litmus test:
Can you write another method called clean which:
accepts a method m as input
returns another method m2 that performs the same operations as m
such that m2 is identical to m, but doesn't contain any calls to the print-to-standard-out method in your language (puts, System.Console.WriteLn, println, etc.)?
I'd like to do some preliminary research now and figure out what the strongest candidates are. Having a large, active community is as important to me as the practicality of implementing what I want to do. I am aware that there may be some unforged territory here, since manipulating bytecode directly is not typically an operation that needs to be exposed.
What are the choices available to me? If possible, can you provide a toy example in one or more of the languages that you recommend, or point me to a recent example?
Update: The reason I'm after this is that I'd like to write a program which is capable of modifying itself at runtime in response to new information. This modification goes beyond mere parameters or configurable data, but full-fledged, evolved changes in behavior. (No, I'm not writing a virus. ;) )
Well, you could always use .NET and the Expression libraries to build up expressions. That I think is really your best bet as you can build up representations of commands in memory and there is good library support for manipulating, traversing, etc.
Well, those languages with really strong macro support (in particular Lisps) could qualify.
But are you sure you actually need to go this deeply? I don't know what you're trying to do, but I suppose you could emulate it without actually getting too deeply into metaprogramming. Say, instead of using a method and manipulating it, use a collection of functions (with some way of sharing state, e.g. an object holding state passed to each).
I would say Groovy can do this.
For example
class Foo {
void bar() {
println "foobar"
}
}
Foo.metaClass.bar = {->
prinltn "barfoo"
}
Or a specific instance of foo without effecting other instances
fooInstance.metaClass.bar = {->
println "instance barfoo"
}
Using this approach I can modify, remove or add expression from the method and Subsequent calls will use the new method. You can do quite a lot with the Groovy metaClass.
In java, many professional framework do so using the open source ASM framework.
Here is a list of all famous java apps and libs including ASM.
A few years ago BCEL was also very much used.
There are languages/environments that allows a real runtime modification - for example, Common Lisp, Smalltalk, Forth. Use one of them if you really know what you're doing. Otherwise you can simply employ an interpreter pattern for an evolving part of your code, it is possible (and trivial) with any OO or functional language.

What does it mean for something to "compose well"?

Many a times, I've come across statements of the form
X does/doesn't compose well.
I can remember few instances that I've read recently :
Macros don't compose well (context: clojure)
Locks don't compose well (context: clojure)
Imperative programming doesn't compose well... etc.
I want to understand the implications of composability in terms of designing/reading/writing code ? Examples would be nice.
"Composing" functions basically just means sticking two or more functions together to make a big function that combines their functionality in a useful way. Essentially, you define a sequence of functions and pipe the results of each one into the next, finally giving the result of the whole process. Clojure provides the comp function to do this for you, you could do it by hand too.
Functions that you can chain with other functions in creative ways are more useful in general than functions that you can only call in certain conditions. For example, if we didn't have the last function and only had the traditional Lisp list functions, we could easily define last as (def last (comp first reverse)). Look at that — we didn't even need to defn or mention any arguments, because we're just piping the result of one function into another. This would not work if, for example, reverse took the imperative route of modifying the sequence in-place. Macros are problematic as well because you can't pass them to functions like comp or apply.
Composition in programming means assembling bigger pieces out of smaller ones.
Composition of unary functions creates a more complicated unary function by chaining simpler ones.
Composition of control flow constructs places control flow constructs inside other control flow constructs.
Composition of data structures combines multiple simpler data structures into a more complicated one.
Ideally, a composed unit works like a basic unit and you as a programmer do not need to be aware of the difference. If things fall short of the ideal, if something doesn't compose well, your composed program may not have the (intended) combined behavior of its individual pieces.
Suppose I have some simple C code.
void run_with_resource(void) {
Resource *r = create_resource();
do_some_work(r);
destroy_resource(r);
}
C facilitates compositional reasoning about control flow at the level of functions. I don't have to care about what actually happens inside do_some_work(); I know just by looking at this small function that every time a resource is created on line 2 with create_resource(), it will eventually be destroyed on line 4 by destroy_resource().
Well, not quite. What if create_resource() acquires a lock and destroy_resource() frees it? Then I have to worry about whether do_some_work acquires the same lock, which would prevent the function from finishing. What if do_some_work() calls longjmp(), and skips the end of my function entirely? Until I know what goes on in do_some_work(), I won't be able to predict the control flow of my function. We no longer have compositionality: we can no longer decompose the program into parts, reason about the parts independently, and carry our conclusions back to the whole program. This makes designing and debugging much harder and it's why people care whether something composes well.
"Bang for the Buck" - composing well implies a high ratio of expressiveness per rule-of-composition. Each macro introduces its own rules of composition. Each custom data structure does the same. Functions, especially those using general data structures have far fewer rules.
Assignment and other side effects, especially wrt concurrency have even more rules.
Think about when you write functions or methods. You create a group of functionality to do a specific task. When working in an Object Oriented language you cluster your behavior around the actions you think a distinct entity in the system will perform. Functional programs break away from this by encouraging authors to group functionality according to an abstraction. For example, the Clojure Ring library comprises a group of abstractions that cover routing in web applications.
Ring is composable where functions that describe paths in the system (routes) can be grouped into higher order functions (middlewhere). In fact, Clojure is so dynamic that it is possible (and you are encouraged) to come up with patterns of routes that can be dynamically created at runtime. This is the essence of composablilty, instead of coming up with patterns that solve a certain problem you focus on patterns that generate solutions to a certain class of problem. Builders and code generators are just two of the common patterns used in functional programming. Function programming is the art of patterns that generate other patterns (and so on and so on).
The idea is to solve a problem at its most basic level then come up with patterns or groups of the lowest level functions that solve the problem. Once you start to see patterns in the lowest level you've discovered composition. As folks discover second order patterns in groups of functions they may start to see a third level. And so on...
Composition (in the context you describe at a functional level) is typically the ability to feed one function into another cleanly and without intermediate processing. Such an example of composition is in std::cout in C++:
cout << each << item << links << on;
That is a simple example of composition which doesn't really "look" like composition.
Another example with a form more visibly compositional:
foo(bar(baz()));
Wikipedia Link
Composition is useful for readability and compactness, however chaining large collections of interlocking functions which can potentially return error codes or junk data can be hazardous (this is why it is best to minimize error code or null return values.)
Provided your functions use exceptions, or alternatively return null objects you can minimize the requirement for branching (if) on errors and maximize the compositional potential of your code at no extra risk.
Object composition (vs inheritance) is a separate issue (and not what you are asking, but it shares the name). It is one of containment to derive object hierarchy as opposed to direct inheritance.
Within the context of clojure, this comment addresses certain aspects of composability. In general, it seems to emerge when units of the system do one thing well, do not require other units to understand its internals, eschew side-effects, and accept and return the system's pervasive data structures. All of the above can be seen in M2tM's C++ example.
composability, applied to functions, means that the functions are smaller and well-defined, thus easy to integrate into other functions (i have seen this idea in the book "the joy of clojure")
the concept can apply to other things that are supposed be composed into something else.
the purpose of composability is reuse. for example, a function well-build (composable) is easier to reuse
macros aren't that well-composable because you can't pass them as parameters
lock are crap because you can't really give them names (define them well) or reuse them. you just do them inplace
imperative languages aren't that composable because (some of them, at least) don't have closures. if you want functionality passed as parameter, you're screwed. you have to build an object and pass that; disclaimer here: this last idea i'm not entirely convinced is true, therefore research more before taking it for granted
another idea on imperative languages is that they don't compose well because they imply state (from wikipedia knowledgebase :) "Imperative programming - describes computation in terms of statements that change a program state").
state does not compose well because although you have given a specific "something" in input, that "something" generates an output according to it's state. different internal state, different behaviour. and thus you can say good-bye to what you where expecting to happen.
with state, you depend to much on knowing what the current state of an object is... if you want to predict it's behavior. more stuff to keep in the back of your mind, less composable (remember well-defined ? or "small and simple", as in "easy to use" ?)
ps: thinking of learning clojure, huh ? investigating... ? good for you ! :P

Resources