I need to make a function that receives a string such as:
int *ptr[20], *p, p2, p3[3];
and the function need to print:
ptr requires 80 bytes.
p requires 4 bytes.
p2 requires 4 bytes.
p3 requires 12 bytes.
to simplify to task, I would like to use the "fake" code in the string as a "real" code, and then just print the function sizeof(variable) to answer the question. I think it is the most simple way.
But how to do it?
What you describe is the ability to "evaluate" dynamically generated code.
Some languages -- usually they are evaluated (non-compiled) ones -- have such features, but C++ does not.
Even if it did, it wouldn't be a good solution here. You need a parser. For a formal approach, you may research lexers and context-free parsers. For an ad hoc approach...well...do whatever string manipulation you would like.
Related
I just had a small question.
Are operations considered literals? Would 2*7, for example, be a literal? Is "hello, " + "world!" a literal?
I know the operands are literals, but the expression is not explicitly 14 or "hello, world!".
The question Is 2+3 considered as a literal?
asks basically what I am asking but most answers weren't even helpful, all they do is break the variable declaration down or talk about what compilers do with them, but I'm not looking for that, so I would like a more in depth explanation.
Thank you
It will depend on the language and the compiler, sorry. But just using the concept that a literal is a kind of token, then no, the result is a compile-time constant, not a token.
In C/C++ 2*7 will be optimised by the compiler to make a new constant but it isn't actually legally defined as a literal, though it can be treated as a compile-time constant.
Concatenating "hello" "world" (note no plus) is actually described as a preprocessing step in c++, so does generate a new literal constant string, but then in original C this didn't work.
But note that in C, a macro will treat the parameter phrase 2+7 as separate tokens, and #define STUPIDMUL3(val) 3 * val for 2+7 will give the answer 13, not 18. If you could find a way to force macros to treat the two halves of the string differently, I think it would.
I would expect an interpreter to take longer to process 2*7 than it would 14 because it might interpret and solve it every time.
I'm trying to write tests for a Tuple class that I'm writing in Ruby (this is an exercise to learn both Ruby and Gherkin). So one of my Scenarios creates a Tuple with float values:
Scenario: A tuple with w=1.0 is a point
Given a ← tuple[4.3, -4.2, 3.1, 1.0]
Then a.x = 4.3
And ...
For the Given step, cucumber suggests the following:
Given("a ← tuple[{float}, {float}, {float}, {float}]") do |float, float2, float3, float4|
pending # Write code here that turns the phrase above into concrete actions
end
which I implemented as:
Given("a ← tuple[{float}, {float}, {float}, {float}]") do |float, float2, float3, float4|
tuple_a = Tuple.new(float, float2, float3, float4)
end
Great. Now I want another scenario which happens to pass integers to the Tuple:
Scenario: Adding two tuples
Given a ← tuple[3, -2, 5, 1]
...
And Cucumber suggests:
Given("a ← tuple[{int}, {int}, {int}, {int}]") do |int, int2, int3, int4|
pending # Write code here that turns the phrase above into concrete actions
end
But my implementation is in Ruby; I don't really care if i'm passing ints or floats to Tuple.new(). The Given step I implemented first which expects floats would work the same for ints, but Cucumber won't use that; it wants me to implement it again with int params. I could just use float arguments, e.g. Given a ← tuple[3.0, -2.0, 5.0, 1.0], but that's kind of annoying. Is my only option to define a custom ParameterType? That would entail a regexp that matches both integers and floats; will that take priority over the existing int and float types?
I would suggest using unit test tools for this sort of thing e.g. rspec, minitest etc. they are a much better fit.
Scenarios are useful when you can express things in a language that is not technical and abstract. Your scenarios are technical and concrete and much harder to read and write.
An analogy is trying to write mathematical expressions in a natural langauge.
(3+5)^3 is much simpler and more precise than, add 5 to 3 and then cube the total
The art of learning Gherkin is how to write simple clear scenarios that describe a particular behaviour. It is not about using multiple params, complex regex's, large tables and multiple examples. You are learning the wrong things if you want to learn how to Cuke and do BDD. You are using the wrong tool if you want to learn ruby and write tests for things like a Tuple class.
Just to play around, are there any DSL that
could be generated randomly
manipulate text or string and restore them
works like a reciprocal cipher. e.g. If the generated function is F(), for every string s1 you can get scrambled string s2 = F(s1). Then another G() could be deduced to reverse F(), which G(s2) = s1.
F() and G() could be the same or different.
And few additional questions:
any programming language could deduce reverse functions automatically?
And make sure generated function F() is reversible?
Or any tips where could I start?
Thanks!
One good starting point would be the Feistel network construction for block ciphers. In essence, it's a basic framework for building an iterated block cipher out of a function. There's very few requirements on the function -- it simply needs to be a function which modifies a piece of the message based on the key. The cipher will work no matter what the function is; the nature of the function will affect the security of the cipher, though.
http://en.wikipedia.org/wiki/Feistel_cipher
To answer some of your other questions:
any programming language could deduce reverse functions automatically?
Not in general. Especially because many (most!) functions are not invertible at all.
And make sure generated function F() is reversible?
Using the Feistel network construction will guarantee this.
To answer my own question:
http://en.wikipedia.org/wiki/Reversible_computing
http://strangepaths.com/reversible-computation/2008/01/20/en/
Looks like it's mainly in theory CS, so such DSL is yet to be invented.
So far prolog can do reversible functions
I'm trying to do some research for a new project, and I need to create objects dynamically from random data.
For this to work, I need a language / compiler that doesn't have problems with weird uncompilable code lying around.
Basically, I need the random code to compile (or be interpreted) as much as possible - Meaning that the uncompilable parts will be ignored, and only the compilable parts will create the objects (which could be ran).
Object Oriented-ness is not a must, but is a very strong advantage.
I thought of ASM, but it's very messy, and I'd probably need a more readable code
Thanks!
It sounds like you're doing something very much like genetic programming; even if you aren't, GP has to solve some of the same problems—using randomness to generate valid programs. The approach to this that is typically used is to work with a syntax tree: rather than storing x + y * 3 - 2, you store something like the following:
Then, instead of randomly changing the syntax, one can randomly change nodes in the tree instead. And if x should randomly change to, say, +, you can statically know that this means you need to insert two children (or not, depending on how you define +).
A good choice for a language to work with for this would be any Lisp dialect. In a Lisp, the above program would be written (- (+ x (* y 3)) 2), which is just a linearization of the syntax tree using parentheses to show depth. And in fact, Lisps expose this feature: you can just as easily work with the object '(- (+ x (* y 3)) 2) (note the leading quote). This is a three-element list, whose first element is -, second element is another list, and third element is 2. And, though you might or might not want it for your particular application, there's an eval function, such that (eval '(- (+ x (* y 3)) 2)) will take in the given list, treat it as a Lisp syntax tree/program, and evaluate it. This is what makes Lisps so attractive for doing this sort of work; Lisp syntax is basically a reification of the syntax-tree, and if you operate at the syntax-tree level, you can work on code as though it was a value. Lisp won't help you read /dev/random as a program directly, but with a little interpretation layered on top, you should be able to get what you want.
I should also mention, though I don't know anything about it (not that I know much about ordinary genetic programming) the existence of linear genetic programming. This is sort of like the assembly model that you mentioned—a linear stream of very, very simple instructions. The advantage here would seem to be that if you are working with /dev/random or something like it, the amount of interpretation needed is very small; the disadvantage would be, as you mentioned, the low-level nature of the code.
I'm not sure if this is what you're looking for, but any programming language can be made to function this way. For any programming language P, define the language Palways as follows:
If p is a valid program in P, then p is a valid program in Palways whose meaning is the same as its meaning in P.
If p is not a valid program in P, then p is a valid program in Palways whose meaning is the same as a program that immediately terminates.
For example, I could make the language C++always so that this program:
#include <iostream>
using namespace std;
int main() {
cout << "Hello, world!" << endl;
}
would compile as "Hello, world!", while this program:
Hahaha! This isn't legal C++ code!
Would be a legal program that just does absolutely nothing.
To solve your original problem, just take any OOP language like Java, Smalltalk, etc. and construct the appropriate Javaalways, Smalltalkalways, etc. language from it. Again, I'm not sure if this is at all what you're looking for, but it could be done very easily.
Alternatively, consider finding a grammar for any OOP language and then using that grammar to produce random syntactically valid programs. You could then filter those programs down by using the Palways programming language for that language to eliminate syntactically but not semantically valid programs.
Divide the ASCII byte values into 9 classes (division modulo 9 would help). Then assign then to Brainfuck codewords (see http://en.wikipedia.org/wiki/Brainfuck). Then interpret as Brainfuck.
There you go, any sequence of ASCII characters is a program. Not that it's going to do anything sensible... This approach has a much better chance, compared to templatetypedef's answer, to get a nontrivial program from a random byte sequence.
Text Editors
You could try feeding random character strings to an editor like Emacs or VI. Many (most?) characters will perform an editing action but some will do nothing (other than beep, perhaps). You would have to ensure that the random code mutator never generates the character sequence that exits the editor. However, this experience would be much like programming a Turing machine -- the code is not too readable.
Mathematica
In Mathematica, undefined symbols and other expressions evaluate to themselves, without error. So, that language might be a viable choice if you can arrange for the random code mutator to always generate well-formed expressions. This would be readily achievable since the basic Mathematica syntax is trivial, making it easy to operate on syntactic units rather than at the character level. It would be even easier if the mutator were written in Mathematica itself since expression-munging is Mathematica's forte. You could define a mini-language of valid operations within a Mathematica package that does not import the system-defined symbols. This would allow you to generate well-formed expressions to your heart's content without fear of generating a dangerous expression, like DeleteFile[FileNames["*.*", "/", Infinity]].
I believe Common Lisp should suit your needs. I always have some code in my SLIME/Emacs session that wouldn't compile. You can always tweak things, redefine functions in run-time. It is actually very good for prototyping.
A few years ago it took me quite a while to learn. But nowadays we have quicklisp and everything is so much easier.
Here I describe my development environment:
Install lisp on my linux machine
PS: I want to give an example, where Common Lisp was useful for me:
Up to maybe 2004 I used to write small programs in C (the keep it simple Unix way).
The last 3 years I had to get lots of different hardware running. Motorized stages, scientific cameras, IO cards.
The cameras turned out to be quite annoying. Usually you have to cool them down to -50 degree celsius or so and (in some SDKs) they don't like it when you close them. But this
is exactly how my C development cycle worked: write (30s), compile (1s), run (0.1s), repeat.
Eventually I decided to just use Common Lisp. Often it is straight forward to define the foreign function interfaces to talk to the SDKs and I can do this without ever leaving the running Lisp image. I start the editor in the morning define the open-device function, to talk to the device and after 3 hours I have enough of the functions implemented to set gain, temperature, region of interest and obtain the video.
Then I can often put the SDK manual away and just use the camera.
I used the same interactive programming approach when I have to parse some webpage or some weird XML.
I'm currently teaching myself Haskell, and I'm wondering what the best practices are when working with strings in Haskell.
The default string implementation in Haskell is a list of Char. This is inefficient for file input-output, according to Real World Haskell, since each character is separately allocated (I assume that this means that a String is basically a linked list in Haskell, but I'm not sure.)
But if the default string implementation is inefficient for file i/o, is it also inefficient for working with Strings in memory? Why or why not? C uses an array of char to represent a String, and I assumed that this would be the default way of doing things in most languages.
As I see it, the list implementation of String will take up more memory, since each character will require overhead, and also more time to iterate over, because a pointer dereferencing will be required to get to the next char. But I've liked playing with Haskell so far, so I want to believe that the default implementation is efficient.
Apart from String/ByteString there is now the Text library which combines the best of both worlds—it works with Unicode while being ByteString-based internally, so you get fast, correct strings.
Best practices for working with strings performantly in Haskell are basically: Use Data.ByteString/Data.ByteString.Lazy.
http://hackage.haskell.org/packages/archive/bytestring/latest/doc/html/
As far as the efficiency of the default string implementation goes in Haskell, it's not. Each Char represents a Unicode codepoint which means it needs at least 21bits per Char.
Since a String is just [Char], that is a linked list of Char, it means Strings have poor locality of reference, and again means that Strings are fairly large in memory, at a minimum it's N * (21bits + Mbits) where N is the length of the string and M is the size of a pointer (32, 64, what have you) and unlike many other places where Haskell uses lists where other languages might use different structures (I'm thinking specifically of control flow here), Strings are much less likely to be able to be optimized to loops, etc. by the compiler.
And while a Char corresponds to a codepoint, the Haskell 98 report doesn't specify anything about the encoding used when doing file IO, not even a default much less a way to change it. In practice GHC provides an extensions to do e.g. binary IO, but you're going off the reservation at that point anyway.
Even with operations like prepending to front of the string it's unlikely that a String will beat a ByteString in practice.
The answer is a bit more complex than just "use lazy bytestrings".
Byte strings only store 8 bits per value, whereas String holds real Unicode characters. So if you want to work with Unicode then you have to convert to and from UTF-8 or UTF-16 all the time, which is more expensive than just using strings. Don't make the mistake of assuming that your program will only need ASCII. Unless its just throwaway code then one day someone will need to put in a Euro symbol (U+20AC) or accented characters, and your nice fast bytestring implementation will be irretrievably broken.
Byte strings make some things, like prepending to the start of a string, more expensive.
That said, if you need performance and you can represent your data purely in bytestrings, then do so.
The basic answer given, use ByteString, is correct. That said, all of the three answers before mine have inaccuracies.
Regarding UTF-8: whether this will be an issue or not depends entirely on what sort of processing you do with your strings. If you're simply treating them as single chunks of data (which includes operations such as concatenation, though not splitting), or doing certain limited byte-based operations (e.g., finding the length of the string in bytes, rather than the length in characters), you won't have any issues. If you are using I18N, there are enough other issues that simply using String rather than ByteString will start to fix only a very few of the problems you'll encounter.
Prepending single bytes to the front of a ByteString is probably more expensive than doing the same for a String. However, if you're doing a lot of this, it's probably possible to find ways of dealing with your particular problem that are cheaper.
But the end result would be, for the poster of the original question: yes, Strings are inefficient in Haskell, though rather handy. If you're worried about efficiency, use ByteStrings, and view them as either arrays of Char8 or Word8, depending on your purpose (ASCII/ISO-8859-1 vs Unicode of some sort, or just arbitrary binary data). Generally, use Lazy ByteStrings (where prepending to the start of a string is actually a very fast operation) unless you know why you want non-lazy ones (which is usually wrapped up in an appreciation of the performance aspects of lazy evaluation).
For what it's worth, I am building an automated trading system entirely in Haskell, and one of the things we need to do is very quickly parse a market data feed we receive over a network connection. I can handle reading and parsing 300 messages per second with a negligable amount of CPU; as far as handling this data goes, GHC-compiled Haskell performs close enough to C that it's nowhere near entering my list of notable issues.