How programs written in interpreted languages are executed if they are never translated into machine language? - programming-languages

Computers can only understand machine language. Then how come interepreters execute a program directly without translating it into machine language? For example:
<?php
echo "Hello, World!" ;
It's a simple Hello World program written in PHP. How does it execute in machine while the machine has no idea what echo is? How does it output what's expected, in this case, the string Hello, World!?

Many interpreters, including the official PHP interpreter, actually translate the code to a byte code format before executing it for performance (and I suppose flexibility) reasons, but at its simplest, an interpreter simply goes through the code and performs the corresponding action for each statement. An extremely simple interpreter for a PHP-like language might look like this for example:
def execute_program(prog)
for statement in prog.toplevel_statements:
execute_statement(statement)
def execute_statement(statement):
if statement is an echo statement:
print( evaluate_expression(statement.argument) )
else if statement is a for loop:
execute_statement(statement.init)
while evaluate_expression(statement.condition).is_truthy():
for inner_statement in statement.body:
execute_statement(inner_statement)
execute_statement(statement.increment)
else if ...
Note that a big if-else-if statement is not actually the cleanest way to go through an AST and a real interpreter would also need to keep track of scopes and a call stack to implement function calls and returns.
But at its most basic, this is what it boils down to: "If we see this kind of statement, perform this kind of action etc.".
Except for being much more complex, it isn't really any different from writing a program that responds to user commands where the user could for example type "rectangle" and then you draw a rectangle. Here the CPU also doesn't understand what "rectangle" means, but your code contains something like if user_input == rectangle: [code to draw a rectangle] and that's all you need.

Strictly speaking, the interpreter is being executed and the code that the interpreter is interpreting just determines what actions the interpreter takes. (If it was just compiled to machine code, what would you need the interpreter for?).
For example, I built an automation framework awhile back where we captured reflection metadata on what was occurring at runtime during QA tests. We serialized that metadata to JSON. The JSON was never compiled to anything - it just told the automation engine what methods to call and what parameters to pass. No machine code involved. It wouldn't be exactly correct to say that we were "executing" the JSON - we were executing the automation engine, which was then following the "directions" found in the JSON, but it was certainly interpreting the JSON.

Related

What are "Exact compiler arguments" for Python?

I'm a mostly self-taught programmer, and recently applied for a computation-heavy internship.
As part of the selection process, I was sent, from some recruiter in a distant city, a programming challenge to complete in Python (I used Python 3).
The program accepts a couple of positional arguments, and writes its results to a file.
I need to submit "source code and exact compiler arguments used to compile." I'm looking for enlightenment on what the last bit means.
From Python3 idle, if programFile contains only function definitions, I can do
>>> import programFile
>>> function(arg1,arg2)
to get the output. Or if I add an executable statement to the programFile, I can do python programFile.py from the command line to get the output. I don't know if one of these is the compiler argument, or if I need something else which would only compile the code.
Any guidance would be appreciated.
The recruiter is using generic terminology that doesn't apply to Python. No compiler is used, so no compiler arguments are required either.
Send the source file and minimum Python version requirement.
Python is an interpreted language you don't need to compile the python file(.py) in order to execute it.
I think the recruiter is using the same message for every language in their program.
I would only send the source code, document it on how to use it and you will be good.

Programming language for self-modifying code?

I am recently thinking about writing self-modifying programs, I think it may be powerful and fun. So I am currently looking for a language that allows modifying a program's own code easily.
I read about C# (as a way around) and the ability to compile and execute code in runtime, but that is too painful.
I am also thinking about assembly. It is easier there to change running code but it is not very powerful (very raw).
Can you suggest a powerful language or feature that supports modifying code in runtime?
Example
This is what I mean by modifying code in runtime:
Start:
a=10,b=20,c=0;
label1: c=a+b;
....
label1= c=a*b;
goto label1;
and may be building a list of instructions:
code1.add(c=a+b);
code1.add(c=c*(c-1));
code1. execute();
Malbolge would be a good place to start. Every instruction is self-modifying, and it's a lot of fun(*) to play with.
(*) Disclaimer: May not actually be fun.
I highly recommend Lisp. Lisp data can be read and exec'd as code. Lisp code can be written out as data.
It is considered one of the canonical self-modifiable languages.
Example list(data):
'(+ 1 2 3)
or, calling the data as code
(eval '(+ 1 2 3))
runs the + function.
You can also go in and edit the members of the lists on the fly.
edit:
I wrote a program to dynamically generate a program and evaluate it on the fly, then report to me how it did compared to a baseline(div by 0 was the usual report, ha).
Every answer so far is about reflection/runtime compilation, but in the comments you mentioned you're interested in actual self-modifying code - code that modifies itself in-memory.
There is no way to do this in C#, Java, or even (portably) in C - that is, you cannot modify the loaded in-memory binary using these languages.
In general, the only way to do this is with assembly, and it's highly processor-dependent. In fact, it's highly operating-system dependent as well: to protect against polymorphic viruses, most modern operating systems (including Windows XP+, Linux, and BSD) enforce W^X, meaning you have to go through some trouble to write polymorphic executables in those operating systems, for the ones that allow it at all.
It may be possible in some interpreted languages to have the program modify its own source-code while it's running. Perl, Python (see here), and every implementation of Javascript I know of do not allow this, though.
Personally, I find it quite strange that you find assembly easier to handle than C#. I find it even stranger that you think that assembly isn't as powerful: you can't get any more powerful than raw machine language. Anyway, to each his/her own.
C# has great reflection services, but if you have an aversion to that.. If you're really comfortable with C or C++, you could always write a program that writes C/C++ and issues it to a compiler. This would only be viable if your solution doesn't require a quick self-rewriting turn-around time (on the order of tens of seconds or more).
Javascript and Python both support reflection as well. If you're thinking of learning a new, fun programming language that's powerful but not massively technically demanding, I'd suggest Python.
May I suggest Python, a nice very high-level dynamic language which has rich introspection included (and by e.g. usage of compile, eval or exec permits a form of self-modifying code). A very simple example based upon your question:
def label1(a,b,c):
c=a+b
return c
a,b,c=10,20,0
print label1(a,b,c) # prints 30
newdef= \
"""
def label1(a,b,c):
c=a*b
return c
"""
exec(newdef,globals(),globals())
print label1(a,b,c) # prints 200
Note that in the code sample above c is only altered in the function scope.
Common Lisp was designed with this sort of thing in mind. You could also try Smalltalk, where using reflection to modify running code is not unknown.
In both of these languages you are likely to be replacing an entire function or an entire method, not a single line of code. Smalltalk methods tend to be more fine-grained than Lisp functions, so that may be a good place to begin.
Many languages allow you to eval code at runtime.
Lisp
Perl
Python
PHP
Ruby
Groovy (via GroovyShell)
In high-level languages where you compile and execute code at run-time, it is not really self-modifying code, but dynamic class loading. Using inheritance principles, you can replace a class Factory and change application behavior at run-time.
Only in assembly language do you really have true self-modification, by writing directly to the code segment. But there is little practical usage for it. If you like a challenge, write a self-encrypting, maybe polymorphic virus. That would be fun.
I sometimes, although very rarely do self-modifying code in Ruby.
Sometimes you have a method where you don't really know whether the data you are using (e.g. some lazy cache) is properly initialized or not. So, you have to check at the beginning of your method whether the data is properly initialized and then maybe initialize it. But you really only have to do that initialization once, but you check for it every single time.
So, sometimes I write a method which does the initialization and then replaces itself with a version that doesn't include the initialization code.
class Cache
def [](key)
#backing_store ||= self.expensive_initialization
def [](key)
#backing_store[key]
end
#backing_store[key]
end
end
But honestly, I don't think that's worth it. In fact, I'm embarrassed to admit that I have never actually benchmarked to see whether that one conditional actually makes any difference. (On a modern Ruby implementation with an aggressively optimizing profile-feedback-driven JIT compiler probably not.)
Note that, depending on how you define "self-modifying code", this may or may not be what you want. You are replacing some part of the currently executing program, so …
EDIT: Now that I think about it, that optimization doesn't make much sense. The expensive initialization is only executed once anyway. The only thing that modification avoids, is the conditional. It would be better to take an example where the check itself is expensive, but I can't think of one.
However, I thought of a cool example of self-modifying code: the Maxine JVM. Maxine is a Research VM (it's technically not actually allowed to be called a "JVM" because its developers don't run the compatibility testsuites) written completely in Java. Now, there are plenty of JVMs written in itself, but Maxine is the only one I know of that also runs in itself. This is extremely powerful. For example, the JIT compiler can JIT compile itself to adapt it to the type of code that it is JIT compiling.
A very similar thing happens in the Klein VM which is a VM for the Self Programming Language.
In both cases, the VM can optimize and recompile itself at runtime.
I wrote Python class Code that enables you to add and delete new lines of code to the object, print the code and excecute it. Class Code shown at the end.
Example: if the x == 1, the code changes its value to x = 2 and then deletes the whole block with the conditional that checked for that condition.
#Initialize Variables
x = 1
#Create Code
code = Code()
code + 'global x, code' #Adds a new Code instance code[0] with this line of code => internally code.subcode[0]
code + "if x == 1:" #Adds a new Code instance code[1] with this line of code => internally code.subcode[1]
code[1] + "x = 2" #Adds a new Code instance 0 under code[1] with this line of code => internally code.subcode[1].subcode[0]
code[1] + "del code[1]" #Adds a new Code instance 0 under code[1] with this line of code => internally code.subcode[1].subcode[1]
After the code is created you can print it:
#Prints
print "Initial Code:"
print code
print "x = " + str(x)
Output:
Initial Code:
global x, code
if x == 1:
x = 2
del code[1]
x = 1
Execute the cade by calling the object: code()
print "Code after execution:"
code() #Executes code
print code
print "x = " + str(x)
Output 2:
Code after execution:
global x, code
x = 2
As you can see, the code changed the variable x to the value 2 and deleted the whole if block. This might be useful to avoid checking for conditions once they are met. In real-life, this case-scenario could be handled by a coroutine system, but this self modifying code experiment is just for fun.
class Code:
def __init__(self,line = '',indent = -1):
if indent < -1:
raise NameError('Invalid {} indent'.format(indent))
self.strindent = ''
for i in xrange(indent):
self.strindent = ' ' + self.strindent
self.strsubindent = ' ' + self.strindent
self.line = line
self.subcode = []
self.indent = indent
def __add__(self,other):
if other.__class__ is str:
other_code = Code(other,self.indent+1)
self.subcode.append(other_code)
return self
elif other.__class__ is Code:
self.subcode.append(other)
return self
def __sub__(self,other):
if other.__class__ is str:
for code in self.subcode:
if code.line == other:
self.subcode.remove(code)
return self
elif other.__class__ is Code:
self.subcode.remove(other)
def __repr__(self):
rep = self.strindent + self.line + '\n'
for code in self.subcode: rep += code.__repr__()
return rep
def __call__(self):
print 'executing code'
exec(self.__repr__())
return self.__repr__()
def __getitem__(self,key):
if key.__class__ is str:
for code in self.subcode:
if code.line is key:
return code
elif key.__class__ is int:
return self.subcode[key]
def __delitem__(self,key):
if key.__class__ is str:
for i in range(len(self.subcode)):
code = self.subcode[i]
if code.line is key:
del self.subcode[i]
elif key.__class__ is int:
del self.subcode[key]
You can do this in Maple (the computer algebra language). Unlike those many answers above which use compiled languages which only allow you to create and link in new code at run-time, here you can honest-to-goodness modify the code of a currently-running program. (Ruby and Lisp, as indicated by other answerers, also allow you to do this; probably Smalltalk too).
Actually, it used to be standard in Maple that most library functions were small stubs which would load their 'real' self from disk on first call, and then self-modify themselves to the loaded version. This is no longer the case as the library loading has been virtualized.
As others have indicated: you need an interpreted language with strong reflection and reification facilities to achieve this.
I have written an automated normalizer/simplifier for Maple code, which I proceeded to run on the whole library (including itself); and because I was not too careful in all of my code, the normalizer did modify itself. I also wrote a Partial Evaluator (recently accepted by SCP) called MapleMIX - available on sourceforge - but could not quite apply it fully to itself (that wasn't the design goal).
Have you looked at Java ? Java 6 has a compiler API, so you can write code and compile it within the Java VM.
In Lua, you can "hook" existing code, which allows you to attach arbitrary code to function calls. It goes something like this:
local oldMyFunction = myFunction
myFunction = function(arg)
if arg.blah then return oldMyFunction(arg) end
else
--do whatever
end
end
You can also simply plow over functions, which sort of gives you self modifying code.
Dlang's LLVM implementation contains the #dynamicCompile and #dynamicCompileConst function attributes, allowing you to compile according to the native host's the instruction set at compile-time, and change compile-time constants in runtime through recompilation.
https://forum.dlang.org/thread/bskpxhrqyfkvaqzoospx#forum.dlang.org

Lisp data security/validation

This is really just a conceptual question for me at this point.
In Lisp, programs are data and data are programs. The REPL does exactly that - reads and then evaluates.
So how does one go about getting input from the user in a secure way? Obviously it's possible - I mean viaweb - now Yahoo!Stores is pretty secure, so how is it done?
The REPL stands for Read Eval Print Loop.
(loop (print (eval (read))))
Above is only conceptual, the real REPL code is much more complicated (with error handling, debugging, ...).
You can read all kinds of data in Lisp without evaluating it. Evaluation is a separate step - independent from reading data.
There are all kinds of IO functions in Lisp. The most complex of the provided functions is usually READ, which reads s-expressions. There is an option in Common Lisp which allows evaluation during READ, but that can and should be turned off when reading data.
So, data in Lisp is not necessarily a program and even if data is a program, then Lisp can read the program as data - without evaluation. A REPL should only be used by a developer and should not be exposed to arbitrary users. For getting data from users one uses the normal IO functions, including functions like READ, which can read S-expressions, but does not evaluate them.
Here are a few things one should NOT do:
use READ to read arbitrary data. READ for examples allows one to read really large data - there is no limit.
evaluate during READ ('read eval'). This should be turned off.
read symbols from I/O and call their symbol functions
read cyclical data structures with READ, when your functions expect plain lists. Walking down a cyclical list can keep your program busy for a while.
not handle syntax errors during reading from data.
You do it the way everyone else does it. You read a string of data from the stream, you parse it for your commands and parameters, you validate the commands and parameters, and you interpret the commands and parameters.
There's no magic here.
Simply put, what you DON'T do, is you don't expose your Lisp listener to an unvalidated, unsecure data source.
As was mentioned, the REPL is read - eval - print. #The Rook focused on eval (with reason), but do not discount READ. READ is a VERY powerful command in Common Lisp. The reader can evaluate code on its own, before you even GET to "eval".
Do NOT expose READ to anything you don't trust.
With enough work, you could make a custom package, limit scope of functions avaliable to that package, etc. etc. But, I think that's more work than simply writing a simple command parser myself and not worrying about some side effect that I missed.
Create your own readtable and fill with necessary hooks: SET-MACRO-CHARACTER, SET-DISPATCH-MACRO-CHARACTER et al.
Bind READTABLE to your own readtable.
Bind READ-EVAL to nil to prevent #. (may not be necessary if step 1 is done right)
READ
Probably something else.
Also there is a trick in interning symbols in temporary package while reading.
If data in not LL(1)-ish, simply write usual parser.
This is a killer question and I thought this same thing when I was reading about Lisp. Although I haven't done anything meaningful in LISP so my answer is very limited.
What I can tell you is that eval() is nasty. There is a saying that I like "If eval is the answer then you are asking the wrong question." --Unknown.
If the attacker can control data that is then evaluated then you have a very serious remote code execution vulnerability. This can be mitigated, and I'll show you an example with PHP, because that is what I know:
$id=addslashes($_GET['id']);
eval('$test="$id";');
If you weren't doing an add slashes then an attacker could get remote code execution by doing this:
http://localhost?evil_eval.php?id="; phpinfo();/*
But the add slashes will turn the " into a \", thus keeping the attacker from "breaking out" of the "data" and being able to execute code. Which is very similar to sql injection.
I found that question quit controversial. The eval wont eval your input unless you explicitly ask for it.
I mean your input will not be treat it as a LISP code but instead as a string.
Is not because that your language have powerfull concept like the eval that it is not "safe".
I think the confusion come from SQL where your actually treat an input as a [part of] SQL.
(query (concatenate 'string "SELECT * FROM foo WHERE id = " input-id))
Here input-id is being evaluate by the SQL engine.
This is because you have no nice way to write SQL, or whatever, but the point is that your input become part of what is being evaluate.
So eval don't bring you insecurity unless your are using it eyes closed.
EDIT Forgot to tell that this apply to any language.

Is the valid state domain of a program a regular language?

If you look at the call stack of a program and treat each return pointer as a token, what kind of automata is needed to build a recognizer for the valid states of the program?
As a corollary, what kind of automata is needed to build a recognizer for a specific bug state?
(Note: I'm only looking at the info that could be had from this function.)
My thought is that if these form regular languages than some interesting tools could be built around that. E.g. given a set of crash/failure dumps, automatically group them and generate a recognizer to identify new instances of know bugs.
Note: I'm not suggesting this as a diagnostic tool but as a data management tool for turning a pile of crash reports into something more useful.
"These 54 crashes seem related, as do those 42."
"These new crashes seem unrelated to anything before date X."
etc.
It would seem that I've not been clear about what I'm thinking of accomplishing, so here's an example:
Say you have a program that has three bugs in it.
Two bugs that cause invalid args to be passed to a single function tripping the same sanity check.
A function that if given a (valid) corner case goes into an infinite recursion.
Also as that when the program crashes (failed assert, uncaught exception, seg-V, stack overflow, etc.) it grabs a stack trace, extracts the call sites on it and ships them to a QA reporting server. (I'm assuming that only that information is extracted because 1, it's easy to get with a one time per project cost and 2, it has a simple, definite meaning that can be used without any special knowledge about the program)
What I'm proposing would be a tool that would attempt to classify incoming reports as connected to one of the known bugs (or as a new bug).
The simplest thing would be to assume that one failure site is one bug, but in the first example, two bugs get detected in the same place. The next easiest thing would be to require the entire stack to match, but again, this doesn't work in cases like the second example where you have multiple pieces of (valid) valid code that can trip the same bug.
The return pointer on the stack is just a pointer to memory. In theory if you look at the call stack of a program that just makes one function call, the return pointer (for that one function) can have different value for every execution of the program. How would you analyze that?
In theory you could read through a core dump using a map file. But doing so is extremely platform and compiler specific. You would not be able to create a general tool for doing this with any program. Read your compiler's documentation to see if it includes any tools for doing postmortem analysis.
If your program is decorated with assert statements, then each assert statement defines a valid state. The program statements between the assertions define the valid state changes.
A program that crashes has violated enough assertions that something broken.
A program that's incorrect but "flaky" has violated at least one assertion but hasn't failed.
It's not at all clear what you're looking for. The valid states are -- sometimes -- hard to define but -- usually -- easy to represent as simple assert statements.
Since a crashed program has violated one or more assertions, a program with explicit, executable assertions, doesn't need an crash debugging. It will simply fail an assert statement and die visibly.
If you don't want to put in assert statements then it's essentially impossible to know what state should have been true and which (never-actually-stated) assertion was violated.
Unwinding the call stack to work out the position and the nesting is trivial. But it's not clear what that shows. It tells you what broke, but not what other things lead to the breakage. That would require guessing what assertions where supposed to have been true, which requires deep knowledge of the design.
Edit.
"seem related" and "seem unrelated" are undefinable without recourse to the actual design of the actual application and the actual assertions that should be true in each stack frame.
If you don't know the assertions that should be true, all you have is a random puddle of variables. What can you claim about "related" given a random pile of values?
Crash 1: a = 2, b = 3, c = 4
Crash 2: a = 3, b = 4, c = 5
Related? Unrelated? How can you classify these without knowing everything about the code? If you know everything about the code, you can formulate standard assert-statement conditions that should have been true. And then you know what the actual crash is.

We would like to make the function which does not forward the exception which has reference permeability dynamically?

Google translate :
The interpreter is created, an array of bytes into the array in a machine language to cast the enum type and function, I have made an approach to dynamically execute a function, reference Please tell me if the machine-language site.
Babelfish translate:
It is to make the interpreter, but inserting machine language in arrangement at the byte unit, but if it is it makes function dynamically with the approach that it arranges the very the cast, it executes that in functional type through enum, there is a sight of the machine language which becomes reference, please teach.
Original question:
インタプリタを作っているのですが、機械語をバイト単位で配列に入れて
その配列をenumを通して関数型にキャストし、それを実行するというアプローチで関数を動的に作っているのですが、
参考になる機械語のサイトがあれば教えてください。
I'm going to make a guess - a pure guess:
You have an array of bytes containing the machine code for a function. How can you write a cast such that the function can be executed?
In which case, the answer is likely to be:
In most modern operating systems, the system protects you from converting data into executable code. The best way to deal with it would be to package the code as a function in a dynamically-loaded (shared) library, and then use the standard calls for the operating system to load that library and execute the function.

Resources