Can a programming language do anthing outside of what the OS provides? - programming-languages

I am trying to gain a more holistic, general, and high-level understanding of programs and programming languages.
I would like to understand how they actually function. I understand at the lowest level is machine code which is 0 and 1s. Then you have assembly. Then you have another high level language where every instruction/function/method/call/routine whatever you want to call it maps to some instruction or group of instructions in assembly right? The higher level language cannot provide or do anything outside of what the lower level language assembly provides correct?
Similarly, since all code runs on an OS, that code can only do things that the OS provides. It is impossible for the code to do anything outside of what the OS actually provides correct?

The computer has an instruction set, machine code, which defines what can be done on the computer. Assembly code is essentially a more convenient representation of that, so assembly code can do anything the machine can do. A higher-level programming language has to run on the machine, so it can't do anything the machine can't do, though it may well be able to express it more conveniently (e.g. print "foo" rather than several dozen machine-code instructions). It is a matter of implementation choice whether the compiler for that programming language generates machine code directly, or assembly code, or any other form that might need further processing.
This brings us to the question of whether it's possible for a program (regardless of what it's written in) to do something the OS does not explicitly provide for. I find this an odd way to express it, since the point of writing a program is to give you some capability that you didn't have before, so in some sense you only write programs for things the OS does not explicitly provide you with. The problem lies in defining what the OS "provides". If it's a general-purpose OS, then its designers likely intend to "provide" the ability for you to write a wide range of programs. The OS may choose to provide some convenient feature (say, the ability to create files) but if it did not provide such a feature, you could perhaps do it yourself, given suitable motivation (and, for the file creation example, the ability to do disk I/O - possibly requiring you to write a disk driver too).

I am trying to gain a more holistic, general, and high-level
understanding of programs and programming languages.
I would like to understand how they actually function.
I recommend working through the understanding of modern hardware to get good performance and energy efficiency with the following as example:
With subword parallelism to improve matrix multiply by a factor of 4.
Doubling the performance by unrolling the loop to demonstrate the value of instruction level parallelism.
Doubling performance again by optimizing for caches using
blocking.
Finally, A speedup of 14 from 16 processors by using thread-level parallelism.
All four optimizations in total add just 24 lines of C code to a matrix multiply example you find in
Computer Organization and Design: The Hardware Software Interface (5th ed.)
or similar books.
To make a point, it really is worth it to dig deeper than just "learning python" even if it is a good start. So understanding low level really effects high level programming in many ways and thats what you are after according to your question.
There is actually not just Assembly - there are hardware description languages as VHDL thats handled in this topic f.e.:
https://electronics.stackexchange.com/questions/132611/whats-the-motivation-in-using-verilog-or-vhdl-over-c

Related

How the hardware platform impacts upon the choice for the programming language?

Long put short: The teacher who taught me through out the last year has only recently left and has been replaced with a new one. This new teacher has given me an assignment that involves things (like this) that we were never previously taught. So this task has showed up on the assignment and I have no idea how to do it. I can't get hold of the teacher because he's poorly and not coming in for the next few days. And even when I do ask him to explain further, he gets into a right mood and makes me feel like I'm completely retarded.
Describe how the hardware platform impacts upon the choice for
the programming language
Looking at my activity here on SO, you can tell that I'm into programming, I'm into developing things, and I'm into learning, so I'm not just trying to get one of you guys to do my homework for me.
Could someone here please explain how I would answer a question like this.
Some considerations below, but not a full answer by any means.
If your hardware platform is a small embedded device of some kind, then your choice of programming language is going to be directed towards the lower level unmanaged languages - you probably won't be able to (or want to) load a managed language runtime like the Java JVM or .NET CLR. This is down to memory and storage requirements. Similarly, interpreted languages will be out of the question as you won't have space for the intepreter.
If you're on a larger machine, it's more a question of compatibility. A managed language must run on a platform where its runtime is supported. In the case of .NET, that's Windows, or other platforms if you substitute the Microsoft CLR with the Mono runtime. In the case of Java, that's a far wider range of platforms.
This is by no means a definitive answer, but my first thought would be embedded systems. A task I perform on an embedded system, or other low powered battery operated computer, would need to be handled completely different to that performed on a computer which has access to mains electricity.
One simple impact.. would be battery life.
If I use wasteful algorithms on an embedded system, the battery life will be affected.
Hope that helps stir the brain juices!
Clearly, the speed and amount of memory of the device will impact the choice. The more primitive and weak the platform is, the harder it is to run code developed with very high level languages. Code written with them may just not work at all (e.g. when there isn't enough memory) or be too slow or it will require serious optimizations (i.e. incur more work), perhaps affecting negatively the feature set or quality.
Also, some languages and software may rely heavily on or benefit from the availability of page translation in the CPU. If the CPU doesn't have it, certain checks will have to be done in software instead of being done automatically in hardware, and that will affect the performance or the language/software choice.

Haskell for mission-critical systems [duplicate]

I've been curious to understand if it is possible to apply the power of Haskell to embedded realtime world, and in googling have found the Atom package. I'd assume that in the complex case the code might have all the classical C bugs - crashes, memory corruptions, etc, which would then need to be traced to the original Haskell code that
caused them. So, this is the first part of the question: "If you had the experience with Atom, how did you deal with the task of debugging the low-level bugs in compiled C code and fixing them in Haskell original code ?"
I searched for some more examples for Atom, this blog post mentions the resulting C code 22KLOC (and obviously no code:), the included example is a toy. This and this references have a bit more practical code, but this is where this ends. And the reason I put "sizable" in the subject is, I'm most interested if you might share your experiences of working with the generated C code in the range of 300KLOC+.
As I am a Haskell newbie, obviously there may be other ways that I did not find due to my unknown unknowns, so any other pointers for self-education in this area would be greatly appreciated - and this is the second part of the question - "what would be some other practical methods (if) of doing real-time development in Haskell?". If the multicore is also in the picture, that's an extra plus :-)
(About usage of Haskell itself for this purpose: from what I read in this blog post, the garbage collection and laziness in Haskell makes it rather nondeterministic scheduling-wise, but maybe in two years something has changed. Real world Haskell programming question on SO was the closest that I could find to this topic)
Note: "real-time" above is would be closer to "hard realtime" - I'm curious if it is possible to ensure that the pause time when the main task is not executing is under 0.5ms.
At Galois we use Haskell for two things:
Soft real time (OS device layers, networking), where 1-5 ms response times are plausible. GHC generates fast code, and has plenty of support for tuning the garbage collector and scheduler to get the right timings.
for true real time systems EDSLs are used to generate code for other languages that provide stronger timing guarantees. E.g. Cryptol, Atom and Copilot.
So be careful to distinguish the EDSL (Copilot or Atom) from the host language (Haskell).
Some examples of critical systems, and in some cases, real-time systems, either written or generated from Haskell, produced by Galois.
EDSLs
Copilot: A Hard Real-Time Runtime Monitor -- a DSL for real-time avionics monitoring
Equivalence and Safety Checking in Cryptol -- a DSL for cryptographic components of critical systems
Systems
HaLVM -- a lightweight microkernel for embedded and mobile applications
TSE -- a cross-domain (security level) network appliance
It will be a long time before there is a Haskell system that fits in small memory and can guarantee sub-millisecond pause times. The community of Haskell implementors just doesn't seem to be interested in this kind of target.
There is healthy interest in using Haskell or something Haskell-like to compile down to something very efficient; for example, Bluespec compiles to hardware.
I don't think it will meet your needs, but if you're interested in functional programming and embedded systems you should learn about Erlang.
Andrew,
Yes, it can be tricky to debug problems through the generated code back to the original source. One thing Atom provides is a means to probe internal expressions, then leaves if up to the user how to handle these probes. For vehicle testing, we build a transmitter (in Atom) and stream the probes out over a CAN bus. We can then capture this data, formated it, then view it with tools like GTKWave, either in post-processing or realtime. For software simulation, probes are handled differently. Instead of getting probe data from a CAN protocol, hooks are made to the C code to lift the probe values directly. The probe values are then used in the unit testing framework (distributed with Atom) to determine if a test passes or fails and to calculate simulation coverage.
I don't think Haskell, or other Garbage Collected languages are very well-suited to hard-realtime systems, as GC's tend to amortize their runtimes into short pauses.
Writing in Atom is not exactly programming in Haskell, as Haskell here can be seen as purely a preprocessor for the actual program you are writing.
I think Haskell is an awesome preprocessor, and using DSEL's like Atom is probably a great way to create sizable hard-realtime systems, but I don't know if Atom fits the bill or not. If it doesn't, I'm pretty sure it is possible (and I encourage anyone who does!) to implement a DSEL that does.
Having a very strong pre-processor like Haskell for a low-level language opens up a huge window of opportunity to implement abstractions through code-generation that are much more clumsy when implemented as C code text generators.
I've been fooling around with Atom. It is pretty cool, but I think it is best for small systems. Yes it runs in trucks and buses and implements real-world, critical applications, but that doesn't mean those applications are necessarily large or complex. It really is for hard-real-time apps and goes to great lengths to make every operation take the exact same amount of time. For example, instead of an if/else statement that conditionally executes one of two code branches that might differ in running time, it has a "mux" statement that always executes both branches before conditionally selecting one of the two computed values (so the total execution time is the same whichever value is selected). It doesn't have any significant type system other than built-in types (comparable to C's) that are enforced through GADT values passed through the Atom monad. The author is working on a static verification tool that analyzes the output C code, which is pretty cool (it uses an SMT solver), but I think Atom would benefit from more source-level features and checks. Even in my toy-sized app (LED flashlight controller), I've made a number of newbie errors that someone more experienced with the package might avoid, but that resulted in buggy output code that I'd rather have been caught by the compiler instead of through testing. On the other hand, it's still at version 0.1.something so improvements are undoubtedly coming.

Low, mid, high level language, what's the difference?

I've heard these terms thrown around describing languages before, like C is not quite a low-level language, C++ is a midlevel, and Python is a high-level language.
I understand that it has to do something with the way the code is compiled, and how it is written. But what defines a language into one of those three categories? Are these absolute categories, or just a general idea programmers use to describe languages to each other?
Yes, they're just general terms. It's to do with abstraction, and how close you are to what the computer's actually doing.
Here's a list of programming languages ranging from very low to very high level:
Machine Code could probably be considered the lowest level programming language.
Assembly language is at the level of telling the processor what to do. There is still a conversion step towards machine code.
C is a step up from assembler, because you get to specify what you want to do in slightly more abstract terms, but you're still fairly close to the metal.
C++ does everything that C can do but adds the capability to abstract things away into classes.
Java/C# do similar things to C++ in a way, but without the opportunity to do everything you can do in C (like pointer manipulation in Java's case [thanks Joe!]). They have garbage collection though, which you have to do manually in C++.
Python/Ruby are even higher level, and let you forget about a lot of the details that you would need to specify in something like Java or C#.
SQL is even higher level (it's declarative). Just say "Give me all the items in the table sorted by age" and it will work out the most efficient way to carry this out for you.
low level = long development time + very fast executable file
high level = shorter development time + slower executable file
mid level is between the two
Very low-level: Machine Code
Low level: Assembler, Forth
Mid level: C, C++, most system programming languages
Mid/High level: D, Go, garbage collected system programming languages
High level: Java, C#, most interpreted languages
Even Higher level: Lisp dialects
Highest level: SQL, declarative programming languages
If there is anything else to be added, tell me.
They aren't absolute. They are all relative to what other languages are being used in industry at the time. For example, there was a time when assembly was considered mid-level.
The 'level' is essentially a measure of how abstracted the programmer is from the actual hardware-based operations. In a low level language you might have to care about actual memory locations, whereas in a high-level you just create variables and let the OS handle memory.
A normal CPU processes either 32 or 64-bit instructions. In the simplest form, think of this as an 32 1's and 0's in a row - that's what the processor actually interprets and executes. Writing this directly (machine code) would be the 'lowest-level'.
Low level means closer to the machine, and therefore more difficult and more powerful. The higher level you get, the more removed from the machine and "English-like" you get, but you lose a lot of the power and functionality that comes with being able to control the minute details of the machine. Higher level languages also generally tend to protect you more and have much more precautions and checks in place, while lower level languages trust you, so to speak, and let you play around at your own risk.
The term mid-level language is one I've never heard.
"Low" and "High" refer to how "close" to the machine you are in your programming. The lowest level would be machine (binary) code. Next (and still considered low) is assembler. The higher level languages involve more symbolism and constructs that are supposed to be closer to how humans normally think. C (and somewhat C++) has a reputation as being somewhat a hybrid low/high level because it has many constructs that are in high level languages, but also has instructions (e.g. shifts) that are low level languages but often not in higher level languages.
From low to high, you can categorize the languages as follows.
Machine Code --> Assembly Language --> Compiled Language --> Interpreted Language
Remember that these aren't absolute black and white definitions, but rather shades of gray. This is more of a guideline than a rule.
Think of machine code as a long string of 1s and 0s understood by the native platform. Consider this your baseline... the lowest "level" you can have.
Assembly language could be considered a symbolic representation of this. I believe there is a 1 to 1 mapping between assembly code instructions and machine code instructions. This is your low level language.
Java and C++, for example, are both compiled languages, but many would consider C++ to be a lower level language than Java because it exposes low level system access, while Java runs in a protected environment (the virtual machine). Remember that a compiled language is compiled (converted, if you will) to machine code before execution. C is also a compiled language, but would be considered lower level than both Java and C++.
For our sake, we will say that C and C++ are low level languages because they offer (relatively) little abstraction from the hardware and direct memory management. In actuality, they fall somewhere between low and mid, as you will see soon enough.
We will call Java and C# (.NET) mid level languages because they have automatic memory management (garbage collection), plenty of high-level abstractions (IE objects... yet C++ supports objects. Do you see why the scale is considered to be loosely defined?)
With an interpreted language, the interpreter resides in memory and reads the source code directly. These are high level languages. Python, Perl, Javascript, and PHP are all examples of high level languages.
C is a midlevel language, because we can use code in assembly language.
The only slight difference is pointers make it powerful (if pointer remove in C then it be will considered a low-level language). Its portable features makes it midlevel, so we can say it is a midlevel language.
It is all relative... The "level" reflect the amount of abstraction.
Once you add a spectrum of levels of a programming language you add nuance to the definition.
Clearly machine code and assembly are machine-dependent. C and C++ in theory are machine-independent, but in truth that is not universal. In C, things like alignment need to be taken into account and you can always manage the stack in C and in the C subset of C++) via a pointer and a single initialised variable—if you are crazy enough—so that (x86) the RSP register (stack pointer) is never used. So C, yes it is midlevel. Everything else is high-level and some super high-level.
Low-level languages are very close to machine language that may be binary or RTL. It is hard to write and very quick to execute. It can interact with the hardware and a high-level programming language is very easy to write, but it can be executed after compilation.

Trivial mathematical problems as language benchmarks

Why do people insist on using trivial mathematical problems like finding numbers in the Fibonacci sequence for language benchmarks? Don't these usually get optimized to relativistic speeds? Isn't the brunt of the bottlenecks usually in I/O, system API calls, operations on strings and structures, processing large quantities of data, abstract object-oriented stuff, etc?
It is a throwback to the old days, when compiler technology for what we would now call basic math was still evolving rapidly.
Now, compiler evolution is more focused on exploiting new instructions for niche operations, 64-bit math, and so on.
Micro-benchmarks such as the ones you mention were useful, though, when evaluating the efficiency of the hotspot compiler when Java was first launched, and in evaluating the efficiency of .NET versus C/C++.
Your suggestion that I/O and system calls are the likely bottlenecks is correct, at least for some space of problems. But I notice you suggested string operations. One person's irrelevant micro-benchmark is another person's critical performance metric.
EDIT: ps, I also remember using linpack and other micro-benchmarks to compare versions of the JVM, and to compare vendors of the JVM. From v4 to v5 there was a big jump in perf, I guess the JIT compiler got more effective. Also, IBM's JVM was ahead of Sun's at that time, on Windows-x86.
Because if you want to benchmark the language/compiler, these "math problems" are good indicators of the "bare speed" of the generated code. Either they use the iterative solution, which is a tight loop and indicates how well can the compiler push the instructions to the processor, or they use the recursive solution, which indicates how does it handle recursive calls of short functions (inlining, tail-recursion etc.) (although the Ackermann function is usually used for that too).
Usually, the benchmark suite for the language contain tests benchmarking other parts as well - eg. gzip compression, text searching, object creation, virtual function call, exception throw/catch benchmarks.
The other things you've noticed, syscalls and IO are usually not included because
syscalls are in fact not that slow - applications don't spend significant porion of the time in the kernel, except for test specifically targeted at them or when something is seriously wrong with the program
syscall and IO performance does not depend on the language, but rather on the OS & hardware
I'd think a simple, well-established algorithm would remove the possibility that the benchmark is biased (whether through ignorance or malice) to favor one language. It is very difficult to write a complex program in two different languages exactly the same. Testing something like the efficiency of a multithreaded application in c# vs java, for example, would require developers skilled in multithreaded development both languages, and there would still be questions as to whether the benchmark app properly represents the general case, or if it is misrepresenting a special case that only one language handles well.
Back when the sieve of eratosthanes was a popular benchmark for C compilers, I thought it would be funny if one of the compiler authors would recognize the sieve code and replace it with a pre-computed lookup.

Machine dependent languages

Why might a machine-dependent language be more appropriate for writing certain types of programs? What types of programs would be appropriate?
Why might a machine-dependent language
be more appropriate for writing
certain types of programs?
Speed
Some machines have special instructions sets (Like MMX or SSE on x86, for example) that allows to 'exploit' the architecture in ways that compilers may or may not utilize best (or not utilize at all). If speed is critical (such as video games or data-crunching programs), then you'd want to utilize the best out of the architecture you're on.
Where Portability is Useless
When coding a program for a specific device (take the iPhone or the Nintendo DS as examples), portability is the least of your concerns. This code will most likely never go to another platform as it's specifically designed for that architecture/hardware combination.
Developer Ignorance and/or Market Demand
Computer video games are prime example - Windows is the dominating computer game OS, so why target others? It will let the developers focus on known variables for speed/size/ease-of-use. Some developers are ignorant - they learn to code only on one platform (Such as .NET) and 'forget' that others platforms exist because they don't know about them. They seem to take an approach similar to "It works on my machine, why should I bother porting it to a bizarre combination that I will never use?"
No other choice.
I will take the iPhone again as it is a very good example. While you can program to it in C or C++, you cannot access any of the UI widgets that are linked against the Objective-C runtime. You have no other choice but to code in Objective-C if you want to access any of those widgets.
What types of programs would be
appropriate?
Embedded systems
All of the above apply - When you're coding for an embedded system, you want to take advantage of the full potential of the hardware you're working on. Be it memory management (Such as the CP15 on ARM9) or even obscure hardware that is only attached to the target device (servo motors, special sensors etc).
The best example I can think of is for small embedded devices. When you have to have full control over every detail of optimization due to extremely limited computing power (only a few kilobytes of RAM, for example), you might want to drop down to the assembler level yourself to make everything work perfectly in those small confines.
On the other hand, compilers have gotten sophisticated enough these days where you really don't need to drop below C for most situations, including embedded devices and microcontrollers. The situations are pretty rare when this is necessary.
Consider virtually any graphics engine. Since your run-of-the-mill general purpose CPU cannot perform operations in parallel, you would have a bare minimum of one cycle per pixel to be modified.
However, since modern GPUs can operate on many pixels (or other piece of data) all at the same time, the same operation can be finished much more quickly. GPUs are very well-suited for embarrassingly parallel problems.
Granted, we have high-level-language APIs to control our video cards nowadays, but as you get "closer to the metal", the raw language used to control a GPU is a different animal from the language to control a general purpose CPU, due to the vast difference in architectures.

Resources