What type of interpreter were most 8-bit BASIC implementations? - basic

I’m a big fan of early/mid 1980s personal computers like the Amstrad CPC, Commodore 64 and the Sinclair Spectrum. One thing these computers all had was a version of BASIC.
As a language hacker myself I’m curious: were these interpreters implemented as tree-walker interpreters (simply traversing the parse tree) or bytecode interpreters? I can’t find a lot of information on how they were implemented. It’s fascinating to me how they were built given the limitations of the hardware at the time.

They were mostly token based. This means, that the integrated code editor transformed the human readable commands in the source into a bytecode. Upon execution those bytecodes were read and a dispatcher then executed the appropriate command (stored as machine code in the kernal) with the given parameters.
The ZX Spectrum even had a keyboard to enter Basic Tokens:
http://www.worldofspectrum.org/ZX81BasicProgramming/
For some others see here:
https://www.primidi.com/atari_basic/description/the_tokenizer
http://fileformats.archiveteam.org/wiki/Commodore_BASIC_tokenized_file
http://cpctech.cpc-live.com/docs/bastech.html
Hope this anwers your question.

Related

How Could I Create This Type of Machine?

So I have a computer. It has programs on it already. If I delete those programs, I would be left with an operating system that is able to run commands. I could create my own programs from that point, but I would be limited to the constraints of the operating system already loaded onto the machine. What I would like to do is remove the operating system from the computer entirely and be left only with a blank screen and a cursor where I could type whatever I want. I want to be able to create my own program without having to run an operating system program behind it. I do not understand how the physical machine would be able to process the strings of characters that I type into it and produce its own response, which would then be displayed on the screen, but obviously someone has done it before, otherwise I would not have the machine that I am typing on right now.
(I apologize for the run on sentences but I do not know how to say what I want to say right now.)
My goal here is to have a computer, kind of like the Apple 2, where the only thing that I could do with it is type into a text line and see characters pop up on the screen. My goal on top of that goal is to develop a program that would hide in the background of the machine, so that there would still only be a cursor on screen, but the program would make it so that when I type any simple question into the screen, such as, "How are you feeling today?", I would receive a response like, "I am doing quite well, thank you. How are you?".
Does anybody have any idea how I would be able to start this project properly?
If you need to ask this question, you need to learn more than one answer on SO can provide.
Operating system is needed even to get the cursor thingy on screen.
If you are serious about the idea - you might want to start with a microcontroller, such as Arduino. They are more powerful than Apple 2 and they will allow you to write programs and boot straight into them. Adding some kind of terminal IO will not be hard - at least comparing to bootstrapping a program on an actual PC.
A good starting point for a project like this is to learn about operating systems in general. It is a vast topic but you don't have to know everything.
When we speak of an operating system we have in mind a large system that provides capabilities like managing memory, reading and writing files to permanent storage and interacting with input and output such as keyboards and displays. We are also usually thinking of a large number of higher level software applications. Think about commands like dir or ls as programs that come along with the operating system. Of course with a GUI based OS we also have windows and buttons and a wide variety of controls to consider.
The good news is that in order to get started you don't need to be an expert on everything and you certainly don't have to start with a full-blown OS.
The other good news is that the topic can be broken down into byte-sized pieces. A great introduction to the fundamentals you will need is Charles Petzold's Code The Hidden Language of Hardware and Software
Petzold begins with discussions of the inventions of Morse code and Braille, adds electricity, number systems, Boolean logic, and the resulting epiphanies required to put them all together economically. With these building blocks he builds circuits, relays, gates, switches, discusses the inventions of the vacuum tube, transistors, and finally the integrated circuit.
The last portion of the book contains a grab bag of subjects such as implementation of floating point math, operating systems, and the various refinements that have occurred in the latter half of the twentieth century.
Once you have a feel for the fundamentals a good next step in learning about operating systems is to study one that provides as few capabilities as possible. Take a look at MINIX
MINIX originally was developed in 1987 by Andrew S. Tanenbaum as a teaching tool for his textbook Operating Systems Design and Implementation. Today, it is a text-oriented operating system with a kernel of less than 6,000 lines of code. MINIX's largest claim to fame is as an example of a microkernel, in which each device driver runs as an isolated user-mode process—a structure that not only increases security but also reliability, because it means a bug in a driver cannot bring down the entire system.
Have fun.

What type of programs are C/C++ used for now? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
in which area is c++ mostly used?
I started off with C in school, went to Java and now I primarily use the P's(Php, Perl, Python) so my exposure to the lower level languages have all but disappeared. I would like to get back into it but I can never justify using C over Perl or Python. What real-world apps are being built with these languages? Any suggestions if I want to dive back in, what can I do with C/C++ that I can't easily do with Perl/Python?
To borrow some text from the answer I had for another related question:
Device drivers in native code.
High performance floating point number crunching (i.e. SIMD).
Easy ability to interface with assembly language routines.
Manage memory manually for extended execution runs.
Most of my work has been C and C++. I studied computer engineering in school and worked with embedded devices. My Master's degree had an emphasis in graphics and visualization. One of our visualization apps was written in Python, but for the most part, graphics demands C/C++ for the speed. I now work with embedded devices running Windows Mobile and Windows CE - all C++, though you can do a lot with C#. I previously worked in simulations, which was all C++ code on the backend. C++ is still king for time-sensitive IO, embedded applications, graphics and simulations.
Basically, if you need tight control of timing, you go lower level. Or if you need light-weight (ie, small program size, small memory footprint)
Somewhat unscientifically I took a look on Sourceforge and the top twenty projects/language break-down is currently thus:
Java(43,199)
C++(34,313)
PHP(28,333)
C(26,711)
C#(12,298)
Python(12,222)
JavaScript(10,307)
Perl(8,931)
Unix Shell(3,618)
Delphi/Kylix(3,353)
Visual Basic(3,044)
Visual Basic .NET(2,513)
Assembly(2,283)
JSP(1,891)
Ruby(1,731)
PL/SQL(1,669)
Objective C(1,424)
ASP.NET(1,344)
Tcl(1,241)
ActionScript(1,164)
Perl + Python together still total less than C alone. I have no idea why Java is so high, I know of no single Java developer and have not seen a single Java project, but I am sure someone is using it! For probably the same reason, you are not seeing much C/C++, you are just not working in a domain where it figures highly. I work in embedded systems where C and C++ are ubiquitous and Python comes nowhere. Different languages are encountered to different extents in different worlds.
You ask what you can do with C/C++ that you cannot do easily with Perl/Python; well the answer is plenty, real-time embedded systems for one; but if that is not what you want/need to do, then there is no reason to. On the other hand I might ask the reverse; I'd use C++ for things you might use Python for, simply because for me it would be easier and quicker (than learning a new language and getting the tools working)
C/C++ can be, and is, used for nearly all "types" of programs.
There are some major advantages to C and C++:
Potentially better performance
Easier to build interoperable libraries, especially if working with libraries usable from multiple languages.
well the interpreters for your "P's" languages are most certainly written in c/c++. Most OS code is written in C/C++. On the application side, if you are into games, they are generally written in c/c++. Anything that needs high performance and or low memory is a good candidate.
I've used Gsoap, a c++ soap client implementation for a web service that got HUGE traffic.
Most desktop/console applications with a bias toward graphics rely heavily on C++. This includes CAD software and AAA video games, among other things.

Which languages are dynamically typed and compiled (and which are statically typed and interpreted)?

In my reading on dynamic and static typing, I keep coming up against the assumption that statically typed languages are compiled, while dynamically typed languages are interpreted. I know that in general this is true, but I'm interested in the exceptions.
I'd really like someone to not only give some examples of these exceptions, but try to explain why it was decided that these languages should work in this way.
Here's a list of a few interesting systems. It is not exhaustive!
Dynamically typed and compiled
The Gambit Scheme compiler, Chez Scheme, Will Clinger's Larceny Scheme compiler, the Bigloo Scheme compiler, and probably many others.
Why?
Lots of people really like Scheme. Programs as data, good macro system, 35 years of development, big community. But they want performance. Hence, a number of good native-code compilers—Chez Scheme is even a successful commercial product (interpreted bytecodes are free; native codes you pay for).
The LuaJIT just-in-time compiler for Lua.
Why?
To show it could be done. And then, people started to like getting 3x speedup on their Lua programs. Lua is in a lot of games, where performance matters, plus it's creeping into other products too. 70% of the code in Adobe Lightroom is Lua.
The iconc Icon-to-C compiler.
Why?
The fifty people who used it loved Icon. Totally unusual evaluation model, the most innovative (and in my opinion, best) string-processing system ever designed. But that evaluation model was really expensive, especially on late-1980s computers. By compiling Icon to C, the Icon Project made it possible for big Icon programs to run in fewer hours.
Conclusion: people first develop an attachment to a dynamically typed language, and probably a significant code base. Eventually, the community spits out a native-code compiler so that you can get better performance and solve bigger problems.
Statically Typed and Interpreted
This category is less common, but...
Objective Caml. Dialect of ML, vehicle for lots of innovative experiments in language design.
Why?
Very portable system and very fast compilation times. People like both properties, so the new language-design ideas are desseminated widely.
Moscow ML. Standard ML with a few extra features of the modules system.
Why?
Portable, fast compilation times, easy to make an interactive read/eval/print loop. Became a popular teaching compiler.
C-Terp. An old product, I think maybe from Gimpel Software. Saber C—a product I don't think you can buy any more.
Why?
Debugging. Especially, debugging on 1980s hardware under MS-DOS. For very little resources, you could get really good help debugging C code on very limited hardware (think: 4.77MHz processor with an 8-bit bus, 640K of RAM fully loaded). Nearly impossible to get a good visual debugger for native-compiled code, but with the interpreter, fairly easy.
UCSD Pascal—the system that made "P-code" a household word.
Why?
Teachers liked Niklaus Wirth's language design, and the compiler could run on very small machines. Wirth's clean design and the UCSD P-system made an unbeatable combination, and Pascal was the standard teaching language of the 1970s. Younger people may find it hard to appreciate that in the 1970s there was no debate over what language to teach in the first course. Today I know of programs using C, C++, Haskell, Java, ML, and Scheme. In the 1970s it was always Pascal, and the UCSD P-system was a big reason way.
In case you are wondering, P stood for portable.
Summary: Interpreting a statically typed language is a great way to get an implementation into everybody's hands quickly. (It also had advantages for debugging on Bronze Age hardware.)
Objective-C is compiled and supports dynamic typing (certainly when calling methods via [target doSomething] syntax). That is, you can send any message to a target (using ordinary language syntax, without programming against a reflection API), receive only a warning at compile time that it might not be handled, and receive an exception only at runtime if the target doesn't respond to that selector (which is like a method signature); and you can ask any object (which can all be of static type id if your code doesn't know any better or doesn't care) whether it respondsToSelector: to probe its capabilities.
Java (a statically typed language) is compiled to JVM bytecode, which was interpreted on older versions of the JVM, whereas it now uses Just In Time (JIT) compilation, meaning machine code is generated at runtime. I also believe ML and its dialects can be interpreted, and ML is definitely statically typed.
Python is a dynamic language that has compilers.
See this SO question - Python - why compile?, for instance.
In general, compiling makes the program run much faster.
Actionscript has dynamic typing and compiles to bytecode.
And it even compiles right down to native machine code if you want to release a Flash app on the iPhone.

Interactive programming language?

Is there a programming language which can be programmed entirely in interactive mode, without needing to write files which are interpreted or compiled. Think maybe something like IRB for Ruby, but a system which is designed to let you write the whole program from the command line.
I assume you are looking for something similar to how BASIC used to work (boot up to a BASIC prompt and start coding).
IPython allows you to do this quite intuitively. Unix shells such as Bash use the same concept, but you cannot re-use and save your work nearly as intuitively as with IPython. Python is also a far better general-purpose language.
Edit: I was going to type up some examples and provide some links, but the IPython interactive tutorial seems to do this a lot better than I could. Good starting points for what you are looking for are the sections on source code handling tips and lightweight version control. Note this tutorial doesn't spell out how to do everything you are looking for precisely, but it does provide a jumping off point to understand the interactive features on the IPython shell.
Also take a look at the IPython "magic" reference, as it provides a lot of utilities that do things specific to what you want to do, and allows you to easily define your own. This is very "meta", but the example that shows how to create an IPython magic function is probably the most concise example of a "complete application" built in IPython.
Smalltalk can be programmed entirely interactively, but I wouldn't call the smalltalk prompt a "command line". Most lisp environments are like this as well. Also postscript (as in printers) if memory serves.
Are you saying that you want to write a program while never seeing more code than what fits in the scrollback buffer of your command window?
There's always lisp, the original alternative to Smalltalk with this characteristic.
The only way to avoid writing any files is to move completely to a running interactive environment. When you program this way (that is, interactively such as in IRB or F# interactive), how do you distribute your programs? When you exit IRB or F# interactive console, you lose all code you interactively wrote.
Smalltalk (see modern implementation such as Squeak) solves this and I'm not aware of any other environment where you could fully avoid files. The solution is that you distribute an image of running environment (which includes your interactively created program). In Smalltalk, these are called images.
Any unix shell conforms to your question. This goes from bash, sh, csh, ksh to tclsh for TCL or wish for TK GUI writing.
As already mentioned, Python has a few good interactive shells, I would recommend bpython for starters instead of ipython, the advantage of bpython here is the support for autocompletion and help dialogs to help you know what arguments the function accepts or what it does (if it has docstrings).
Screenshots: http://bpython-interpreter.org/screenshots/
This is really a question about implementations, not languages, but
Smalltalk (try out the Squeak version) keeps all your work in an "interactive workspace", but it is graphical and not oriented toward the command line.
APL, which was first deployed on IBM 360 and 370 systems, was entirely interactive, using a command line on a modified IBM Selectric typewriter! Your APL functions were kept in a "workspace" which did not at all resemble an ordinary file.
Many, many language implementations come with pure command-line interactive interpreters, like say Standard ML of New Jersey, but because they don't offer any sort of persistent namespace (i.e., when you exit the program, all your work is lost), I don't think they should really count.
Interestingly, the prime movers behind Smalltalk and APL (Kay and Iverson respectively) both won Turing Awards. (Iverson got his Turing award after being denied tenure at Harvard.)
TCL can be programmed entirely interactivly, and you can cetainly define new tcl procs (or redefine existing ones) without saving to a file.
Of course if you are developing and entire application at some point you do want to save to a file, else you lose everything. Using TCLs introspective abilities its relatively easy to dump some or all of the current interpreter state into a tcl file (I've written a proc to make this easier before, however mostly I would just develop in the file in the first place, and have a function in the application to resources itself if its source changes).
Not sure about that, but this system is impressively interactive: http://rigsomelight.com/2014/05/01/interactive-programming-flappy-bird-clojurescript.html
Most variations of Lisp make it easy to save your interactive work product as program files, since code is just data.
Charles Simonyi's Intentional Programming concept might be part way there, too, but it's not like you can go and buy that yet. The Intentional Workbench project may be worth exploring.
Many Forths can be used like this.
Someone already mentioned Forth but I would like to elaborate a bit on the history of Forth. Traditionally, Forth is a programming language which is it's own operating system. The traditional Forth saves the program directly onto disk sectors without using a "real" filesystem. It could afford to do that because it didn't ran directly on the CPU without an operating system so it didn't need to play nice.
Indeed, some implementations have Forth as not only the operating system but also the CPU (a lot of more modern stack based CPUs are in fact designed as Forth machines).
In the original implementation of Forth, code is always compiled each time a line is entered and saved on disk. This is feasible because Forth is very easy to compile. You just start the interpreter, play around with Forth defining functions as necessary then simply quit the interpreter. The next time you start the interpreter again all your previous functions are still there. Of course, not all modern implementations of Forth works this way.
Clojure
It's a functional Lisp on the JVM. You can connect to a REPL server called nREPL, and from there you can start writing code in a text file and loading it up interactively as you go.
Clojure gives you something akin to interactive unit testing.
I think Clojure is more interactive then other Lisps because of it's strong emphasis of the functional paradigm. It's easier to hot-swap functions when they are pure.
The best way to try it out is here: http://web.clojurerepl.com/
ELM
ELM is probably the most interactive you can get that I know of. It's a very pure functional language with syntax close to Haskell. What makes it special is that it's designed around a reactive model that allows hot-swapping(modifying running code(functions or values)) of code. The reactive bit makes it that whenever you change one thing, everything is re-evaluated.
Now ELM is compiled to HTML-CSS-JavaScript. So you won't be able to use it for everything.
ELM gives you something akin to interactive integration testing.
The best way to try it out is here: http://elm-lang.org/try

How do emulators work and how are they written? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Closed 9 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
How do emulators work? When I see NES/SNES or C64 emulators, it astounds me.
Do you have to emulate the processor of those machines by interpreting its particular assembly instructions? What else goes into it? How are they typically designed?
Can you give any advice for someone interested in writing an emulator (particularly a game system)?
Emulation is a multi-faceted area. Here are the basic ideas and functional components. I'm going to break it into pieces and then fill in the details via edits. Many of the things I'm going to describe will require knowledge of the inner workings of processors -- assembly knowledge is necessary. If I'm a bit too vague on certain things, please ask questions so I can continue to improve this answer.
Basic idea:
Emulation works by handling the behavior of the processor and the individual components. You build each individual piece of the system and then connect the pieces much like wires do in hardware.
Processor emulation:
There are three ways of handling processor emulation:
Interpretation
Dynamic recompilation
Static recompilation
With all of these paths, you have the same overall goal: execute a piece of code to modify processor state and interact with 'hardware'. Processor state is a conglomeration of the processor registers, interrupt handlers, etc for a given processor target. For the 6502, you'd have a number of 8-bit integers representing registers: A, X, Y, P, and S; you'd also have a 16-bit PC register.
With interpretation, you start at the IP (instruction pointer -- also called PC, program counter) and read the instruction from memory. Your code parses this instruction and uses this information to alter processor state as specified by your processor. The core problem with interpretation is that it's very slow; each time you handle a given instruction, you have to decode it and perform the requisite operation.
With dynamic recompilation, you iterate over the code much like interpretation, but instead of just executing opcodes, you build up a list of operations. Once you reach a branch instruction, you compile this list of operations to machine code for your host platform, then you cache this compiled code and execute it. Then when you hit a given instruction group again, you only have to execute the code from the cache. (BTW, most people don't actually make a list of instructions but compile them to machine code on the fly -- this makes it more difficult to optimize, but that's out of the scope of this answer, unless enough people are interested)
With static recompilation, you do the same as in dynamic recompilation, but you follow branches. You end up building a chunk of code that represents all of the code in the program, which can then be executed with no further interference. This would be a great mechanism if it weren't for the following problems:
Code that isn't in the program to begin with (e.g. compressed, encrypted, generated/modified at runtime, etc) won't be recompiled, so it won't run
It's been proven that finding all the code in a given binary is equivalent to the Halting problem
These combine to make static recompilation completely infeasible in 99% of cases. For more information, Michael Steil has done some great research into static recompilation -- the best I've seen.
The other side to processor emulation is the way in which you interact with hardware. This really has two sides:
Processor timing
Interrupt handling
Processor timing:
Certain platforms -- especially older consoles like the NES, SNES, etc -- require your emulator to have strict timing to be completely compatible. With the NES, you have the PPU (pixel processing unit) which requires that the CPU put pixels into its memory at precise moments. If you use interpretation, you can easily count cycles and emulate proper timing; with dynamic/static recompilation, things are a /lot/ more complex.
Interrupt handling:
Interrupts are the primary mechanism that the CPU communicates with hardware. Generally, your hardware components will tell the CPU what interrupts it cares about. This is pretty straightforward -- when your code throws a given interrupt, you look at the interrupt handler table and call the proper callback.
Hardware emulation:
There are two sides to emulating a given hardware device:
Emulating the functionality of the device
Emulating the actual device interfaces
Take the case of a hard-drive. The functionality is emulated by creating the backing storage, read/write/format routines, etc. This part is generally very straightforward.
The actual interface of the device is a bit more complex. This is generally some combination of memory mapped registers (e.g. parts of memory that the device watches for changes to do signaling) and interrupts. For a hard-drive, you may have a memory mapped area where you place read commands, writes, etc, then read this data back.
I'd go into more detail, but there are a million ways you can go with it. If you have any specific questions here, feel free to ask and I'll add the info.
Resources:
I think I've given a pretty good intro here, but there are a ton of additional areas. I'm more than happy to help with any questions; I've been very vague in most of this simply due to the immense complexity.
Obligatory Wikipedia links:
Emulator
Dynamic recompilation
General emulation resources:
Zophar -- This is where I got my start with emulation, first downloading emulators and eventually plundering their immense archives of documentation. This is the absolute best resource you can possibly have.
NGEmu -- Not many direct resources, but their forums are unbeatable.
RomHacking.net -- The documents section contains resources regarding machine architecture for popular consoles
Emulator projects to reference:
IronBabel -- This is an emulation platform for .NET, written in Nemerle and recompiles code to C# on the fly. Disclaimer: This is my project, so pardon the shameless plug.
BSnes -- An awesome SNES emulator with the goal of cycle-perfect accuracy.
MAME -- The arcade emulator. Great reference.
6502asm.com -- This is a JavaScript 6502 emulator with a cool little forum.
dynarec'd 6502asm -- This is a little hack I did over a day or two. I took the existing emulator from 6502asm.com and changed it to dynamically recompile the code to JavaScript for massive speed increases.
Processor recompilation references:
The research into static recompilation done by Michael Steil (referenced above) culminated in this paper and you can find source and such here.
Addendum:
It's been well over a year since this answer was submitted and with all the attention it's been getting, I figured it's time to update some things.
Perhaps the most exciting thing in emulation right now is libcpu, started by the aforementioned Michael Steil. It's a library intended to support a large number of CPU cores, which use LLVM for recompilation (static and dynamic!). It's got huge potential, and I think it'll do great things for emulation.
emu-docs has also been brought to my attention, which houses a great repository of system documentation, which is very useful for emulation purposes. I haven't spent much time there, but it looks like they have a lot of great resources.
I'm glad this post has been helpful, and I'm hoping I can get off my arse and finish up my book on the subject by the end of the year/early next year.
A guy named Victor Moya del Barrio wrote his thesis on this topic. A lot of good information on 152 pages. You can download the PDF here.
If you don't want to register with scribd, you can google for the PDF title, "Study of the techniques for emulation programming". There are a couple of different sources for the PDF.
Emulation may seem daunting but is actually quite easier than simulating.
Any processor typically has a well-written specification that describes states, interactions, etc.
If you did not care about performance at all, then you could easily emulate most older processors using very elegant object oriented programs. For example, an X86 processor would need something to maintain the state of registers (easy), something to maintain the state of memory (easy), and something that would take each incoming command and apply it to the current state of the machine. If you really wanted accuracy, you would also emulate memory translations, caching, etc., but that is doable.
In fact, many microchip and CPU manufacturers test programs against an emulator of the chip and then against the chip itself, which helps them find out if there are issues in the specifications of the chip, or in the actual implementation of the chip in hardware. For example, it is possible to write a chip specification that would result in deadlocks, and when a deadline occurs in the hardware it's important to see if it could be reproduced in the specification since that indicates a greater problem than something in the chip implementation.
Of course, emulators for video games usually care about performance so they don't use naive implementations, and they also include code that interfaces with the host system's OS, for example to use drawing and sound.
Considering the very slow performance of old video games (NES/SNES, etc.), emulation is quite easy on modern systems. In fact, it's even more amazing that you could just download a set of every SNES game ever or any Atari 2600 game ever, considering that when these systems were popular having free access to every cartridge would have been a dream come true.
I know that this question is a bit old, but I would like to add something to the discussion. Most of the answers here center around emulators interpreting the machine instructions of the systems they emulate.
However, there is a very well-known exception to this called "UltraHLE" (WIKIpedia article). UltraHLE, one of the most famous emulators ever created, emulated commercial Nintendo 64 games (with decent performance on home computers) at a time when it was widely considered impossible to do so. As a matter of fact, Nintendo was still producing new titles for the Nintendo 64 when UltraHLE was created!
For the first time, I saw articles about emulators in print magazines where before, I had only seen them discussed on the web.
The concept of UltraHLE was to make possible the impossible by emulating C library calls instead of machine level calls.
Something worth taking a look at is Imran Nazar's attempt at writing a Gameboy emulator in JavaScript.
Having created my own emulator of the BBC Microcomputer of the 80s (type VBeeb into Google), there are a number of things to know.
You're not emulating the real thing as such, that would be a replica. Instead, you're emulating State. A good example is a calculator, the real thing has buttons, screen, case etc. But to emulate a calculator you only need to emulate whether buttons are up or down, which segments of LCD are on, etc. Basically, a set of numbers representing all the possible combinations of things that can change in a calculator.
You only need the interface of the emulator to appear and behave like the real thing. The more convincing this is the closer the emulation is. What goes on behind the scenes can be anything you like. But, for ease of writing an emulator, there is a mental mapping that happens between the real system, i.e. chips, displays, keyboards, circuit boards, and the abstract computer code.
To emulate a computer system, it's easiest to break it up into smaller chunks and emulate those chunks individually. Then string the whole lot together for the finished product. Much like a set of black boxes with inputs and outputs, which lends itself beautifully to object oriented programming. You can further subdivide these chunks to make life easier.
Practically speaking, you're generally looking to write for speed and fidelity of emulation. This is because software on the target system will (may) run more slowly than the original hardware on the source system. That may constrain the choice of programming language, compilers, target system etc.
Further to that you have to circumscribe what you're prepared to emulate, for example its not necessary to emulate the voltage state of transistors in a microprocessor, but its probably necessary to emulate the state of the register set of the microprocessor.
Generally speaking the smaller the level of detail of emulation, the more fidelity you'll get to the original system.
Finally, information for older systems may be incomplete or non-existent. So getting hold of original equipment is essential, or at least prising apart another good emulator that someone else has written!
Yes, you have to interpret the whole binary machine code mess "by hand". Not only that, most of the time you also have to simulate some exotic hardware that doesn't have an equivalent on the target machine.
The simple approach is to interpret the instructions one-by-one. That works well, but it's slow. A faster approach is recompilation - translating the source machine code to target machine code. This is more complicated, as most instructions will not map one-to-one. Instead you will have to make elaborate work-arounds that involve additional code. But in the end it's much faster. Most modern emulators do this.
When you develop an emulator you are interpreting the processor assembly that the system is working on (Z80, 8080, PS CPU, etc.).
You also need to emulate all peripherals that the system has (video output, controller).
You should start writing emulators for the simpe systems like the good old Game Boy (that use a Z80 processor, am I not not mistaking) OR for C64.
Emulator are very hard to create since there are many hacks (as in unusual
effects), timing issues, etc that you need to simulate.
For an example of this, see http://queue.acm.org/detail.cfm?id=1755886.
That will also show you why you ‘need’ a multi-GHz CPU for emulating a 1MHz one.
Also check out Darek Mihocka's Emulators.com for great advice on instruction-level optimization for JITs, and many other goodies on building efficient emulators.
I've never done anything so fancy as to emulate a game console but I did take a course once where the assignment was to write an emulator for the machine described in Andrew Tanenbaums Structured Computer Organization. That was fun an gave me a lot of aha moments. You might want to pick that book up before diving in to writing a real emulator.
Advice on emulating a real system or your own thing?
I can say that emulators work by emulating the ENTIRE hardware. Maybe not down to the circuit (as moving bits around like the HW would do. Moving the byte is the end result so copying the byte is fine). Emulator are very hard to create since there are many hacks (as in unusual effects), timing issues, etc that you need to simulate. If one (input) piece is wrong the entire system can do down or at best have a bug/glitch.
The Shared Source Device Emulator contains buildable source code to a PocketPC/Smartphone emulator (Requires Visual Studio, runs on Windows). I worked on V1 and V2 of the binary release.
It tackles many emulation issues:
- efficient address translation from guest virtual to guest physical to host virtual
- JIT compilation of guest code
- simulation of peripheral devices such as network adapters, touchscreen and audio
- UI integration, for host keyboard and mouse
- save/restore of state, for simulation of resume from low-power mode
To add the answer provided by #Cody Brocious
In the context of virtualization where you are emulating a new system(CPU , I/O etc ) to a virtual machine we can see the following categories of emulators.
Interpretation: bochs is an example of interpreter , it is a x86 PC emulator,it takes each instruction from guest system translates it in another set of instruction( of the host ISA) to produce the intended effect.Yes it is very slow , it doesn't cache anything so every instruction goes through the same cycle.
Dynamic emalator: Qemu is a dynamic emulator. It does on the fly translation of guest instruction also caches results.The best part is that executes as many instructions as possible directly on the host system so that emulation is faster. Also as mentioned by Cody, it divides the code into blocks ( 1 single flow of execution).
Static emulator: As far I know there are no static emulator that can be helpful in virtualization.
How I would start emulation.
1.Get books based around low level programming, you'll need it for the "pretend" operating system of the Nintendo...game boy...
2.Get books on emulation specifically, and maybe os development. (you won't be making an os, but the closest to it.
3.look at some open source emulators, especially ones of the system you want to make an emulator for.
4.copy snippets of the more complex code into your IDE/compliler. This will save you writing out long code. This is what I do for os development, use a district of linux
I wrote an article about emulating the Chip-8 system in JavaScript.
It's a great place to start as the system isn't very complicated, but you still learn how opcodes, the stack, registers, etc work.
I will be writing a longer guide soon for the NES.

Resources