According to the wikipedia article on memory segmentation, x86 processors do segmentation bounds-checking in hardware. Are there any systems that do the bounds-checking in software? If so, what kind of overhead does that incur? In the hardware implementations, is there any way to skip the bounds checking to avoid the penalty (if there is a penalty)?
All modern languages do bounds checking in software, on top of segment bounds checking and memory map lookups. One benchmark suggests the overhead is about 0.5%. This is a small price to pay for stability and security.
A 486 could load a memory location in a single cycle, and CPUs have only gotten better at doing on-chip processing, so it's unlikely that segmentation bounds checking has any overhead at all.
Still, though, you can simply run in 64bit mode: "The processor does not perform segment limit checks at runtime in 64-bit mode" (Intel's developer manual, 3.2.4).
Related
I am working on the on-board computer for a CubeSat. Our computer will be vulnerable to radiation, hence single event upsets, e.g. bit flips are likely to occur. Would a lighter, smaller OS like FreeRTOS bring more stability, robustness and a lower probability of failure over a full-blown Linux operating system?
The probability of a bit error in RAM is a function of time, memory size and radiation density, so a larger memory has a greater probability, and you can fit a FreeRTOS system in much less memory (like 10kb instead of 4Mb). However the usage rate of the smaller memory is likely much higher - i.e. in a FreeRTOS application, most of the code and data are accessed relatively frequently, while in a Linux deployment, much of it is redundant and if corrupted may never be accessed in any case.
However the question makes little sense for a number of reasons, such as:
The effect of a bit-flip event is entirely non-deterministic, any single event it may be benign or catastrophic. It is impossible to say that a system can tolerate 1 error when you don't know when or where the error will occur.
If your system can be implemented on FreeRTOS, why would you even consider Linux? They are chalk and cheese. If you need the extensive networking, filesystem, memory management, POSIX API and device support etc. provided by Linux, FreeRTOS is not suited to your application in any case, as you would have to add all that yourself from your own or additional third-party code. FreeRTOS is only a scheduling kernel, with threading, synchronisation and IPC support and little else. Conversely if you need hard real-time deterministic behaviour, Linux is unsuited to your application.
Where you might benefit from using an RTOS kernel like FreeRTOS is that it will execute from ROM which may be less prone to the bit-flipping cosmic ray issue - (although the availability of ECC/radiation hardened Flash memory may indicate otherwise). You still need RAM for R/W data, but at least the code itself will be robust. A typical FreeRTOS system might run in SRAM (possibly in on-chip RAM on a microcontroller) - I don't know whether low density SRAM is less prone to bit-flipping than high-density SDRAM, but I am willing to believe it is. It is also possible to source radiation hardened SRAM in any case.
The solution for a system using SDRAM in such an environment is to use ECC RAM which may largely overcome the problem of data corruption from radiation and non-deterministic system behaviour. However I would not imagine that even that would be sufficient for space or high-atmosphere applications.
In short the solution is not in the software, it has to be in the hardware, and the lengths you need to go to will depend on the radiation environment your system will be subjected to. However the selection of a small RTOS kernel allows the selection of hardware to be potentially much wider since it will run on a much wider range of architectures in much smaller memory, perform deterministically, respond to events in fewer cycles and is ROMable.
I am looking into Google V8 Javascript Engine. It is said that they are having problems for porting to 64 bit systems.
What kind of programming or programming constraints can make a program a 32-bit or 64-bit specific, apart from building and testing them on 64 bit machine with 64 bit settings ?
You may check this wiki which says:-
The main disadvantage of 64-bit architectures is that, relative to
32-bit architectures, the same data occupies more space in memory (due
to longer pointers and possibly other types, and alignment padding).
This increases the memory requirements of a given process and can have
implications for efficient processor cache utilization. Maintaining a
partial 32-bit model is one way to handle this, and is in general
reasonably effective. For example, the z/OS operating system takes
this approach, requiring program code to reside in 31-bit address
spaces (the high order bit is not used in address calculation on the
underlying hardware platform) while data objects can optionally reside
in 64-bit regions.
Is it always the case that the 64-bit release built program is faster than the 32-bit?
Execution happens on 64-bit machine for both.
Thanks
It depends what the program is doing.
If it was a chess engine, for example, which used bitboard representation of piece placement and movement, then I would expect it to be much faster than the 32-bit version.
For other compute-intensive applications then it should be faster as well, given the additional registers available to x86-64 processors.
However this is not true for all programs, for example if the problem is memory intensive and that memory is read from the filesystem, then the 64-bit version may be slower due to I/O.
In most cases, yes, it is. 64-bit program can enjoy more CPU registers, bigger address-space (which can probably speed-up memory allocations on the heap even if the program does not actually need more then 2GB) so it is easier to the heap management to deal with heap fragmentation.
But it may also be slower in some cases, mainly due to the fact that 64-bit binary is in general bigger and eats a bit of more memory then its 32-bit counterpart. Every pointer takes 8 bytes instead of 4. That may cause difference whether important data fit into a CPU cache or not. And if it is a data used in a tight computation loop, it can make the speed difference. However "normal" programs hardly every hit this problem.
x86-64 should be faster than x86-32.
x86-64 has more registers.
Its can do 64-bits operations in a single instruction.
It used improved __fastcall. The first four parameters you are passed to function is passed via register.
Alway use 64-bits if you can.
There is a relatively new Linux ABI referred to as x32, where the x86-64 processor runs in 32-bit mode, so pointers are still only 32-bits, but the 64-bit architecture specific registers are still used. So you're still limited to 4GB max memory use as in normal 32-bit, but your pointers use up less cache space than they do in 64-bit, you can do 64-bit arithmetic efficiently, and you get access to more registers (16) than you would in vanilla 32-bit (8).
Assuming you have a workload that fits nicely within 4GB, is there any way the performance of x32 could be worse than on x86-64?
It seems to me that if you don't need the extra memory space nothing is lost -- you should always get the same perf (when you already fit in cache) or better (when the pointer space savings lets you fit more in cache). But it wouldn't surprise me if there are paging/TLB/etc. details that I don't know about.
Certainly if you have a multithreaded program, the fact that data structures are smaller on x32 might cause cache line fighting between threads -- different objects might get allocated on the same cache line in x32 mode and different cache lines in x86_64 mode. If two threads modify those objects independently the cache ping-ponging could severely slow down the x32 code. Of course, this kind of cache effect could happen regardless of pointer size, but if the code has been tuned assuming 64-bit pointers, going to 32-bit pointers could de-tune things.
In X32 the processor is actually executing in "long mode", the same mode as for x86_64. That is, addresses as seen by the processor when doing addressing are still 64 bits, however the X32 ABI makes sure that all addresses are small enough to fit into 32 bits. As a result of this, in some case there is some slight overhead when pointers have to be zero extended from 32 bits to 64.
Also, needing x86/x86-64/x32 libraries in RAM, which I suppose is what one will end up with in practice (unless you're talking about some embedded or other tightly controlled system rather than a general purpose computer), may eat up some of the benefit of X32.
I'm not sure I understand it properly: does a 64 bit OS run/compile code faster than a 32 bit OS on the same system?
We're using 64 bit OSs where I am and it seems to only cause compatibility issues with legacy and proprietary software. (We're running Ubuntu 9.04 Jaunty amd64)
I will restrict this answer to x86-32 (IA-32) vs x86-64 (AMD64), as I believe that's the question you're actually asking.
At the processor level, there are a few advantages. First and most obvious is the expansion of the per-process virtual memory to a much wider range of 48 bits. (64 is allowed in the architecture but not required, if memory serves.) That enables applications to use a lot more of the system's memory available to them, as well as opening up a lot of space for things like memory mapped files that operate on virtual memory that isn't linked to real memory. It also opens up a lot of space for the OS in question to work, as it doesn't have to share your 4 GB limit for its data. In short, applications and the OS can make better use of your machine's resources.
Additionally, the AMD64 architecture addresses one of the biggest problems of IA-32, which is the utter lack of registers. In fact it doubles the available registers, which is a huge win for some types of code. (Actually it's a win for almost ANY code, but some applications suffer from the increased memory cost of 64 bits and it evens out.)
On the Windows side, MS has taken it as an opportunity to break a whole bunch of historical compability problems. It's not a clean break from the old world, but it's a start. I don't believe Linux suffers from the same problems to begin with, and I don't have much perspective to offer on their 64 bit advantages.
As a general rule, developing--or using--a 64-bit operating system, in any context, will be slower than the same 32-bit operating system. Because all pointers are suddenly twice as large, you are far more likely to blow the cache, and can fit less data in RAM. That slows down your application considerably. You normally would only use 64-bit systems when your applications need to address more than 2 to 3 GB of data simultaneously--something very common in scientific computing and some database situations, but otherwise extremely rare. This is why Apple does not advocate unconditionally compiling PowerPC applications in 64-bit mode, for example: the cost due to cache-misses and lack of memory are high enough that going 64-bit only makes sense when you truly can take advantage of the 64-bit space.
But x86 v. AMD64, which is what you're really asking about (since you're discussing Ubuntu), is a very special beast. AMD64 not only extends all pointers to 64-bit; it fixes many, many deficiencies in the x86 architecture, doubling the number of GPRs, simplifying the instructions to be more friendly to modern CPU designs, and more. Because of this, on AMD64 platforms only, you will frequently see a substantial performance boost by going to 64-bit.
There is one other area where, in software development, it makes sense to go to 64-bit: you need to run lots of VMs. Running a couple of VMs can easily blow you past the 3 GB memory barrier of the operating system, making using them very painful. (It will work due to a technology called PAE, or Paged Addressing Extensions, that Intel invented to bridge the gap between 32-bit systems and 64-bit systems, but the result is slow, painful to work with as a developer, and not very well supported on Windows.) Going to a 64-bit OS can provide tremendous benefits.
(As the commentators note, this answer is somewhat generic, some of these points do not apply to intel/amd chips.)
The answer is: it varies, for a few reasons:
With larger-width instructions, you're going to get more expressiveness (either a greater variety of instructions or a greater capacity to encode data into those instructions directly), which can mean a reduced number of instructions flowing through the machine, which is generally a win: so ++64bit here.
But sometimes larger instructions might take more cycles to decode and execute, because they may be more complex. So a possible --64bit here.
Also, you need to transfer these instructions to and from the CPU: 64 bit instructions are twice as big as 32 bit instructions, which means more traffic to and from memory and the caches. CPUs are structured to ameliorate a lot of this cost, but it is a slight --64bit here.
More registers are usually available in wider instruction sets, which causes less data traffic to and from the stack and or memory. So ++64bit here.
And as everyone's no doubt going to mention, you have the ability to address more memory.
(Nearly forgot this one) the native "long" or "int" size may go up, depending on architecture, meaning data structures based on these get larger. Larger = more memory to move around, which means more possible waiting on data moving: --64bit if you're not careful.
Depending on your architecture, a lot of other concerns may apply too. You can rest assured that the processor and compiler vendors are working their butts off to reduce the "--"s above and increase the "++"s.
I have this 5GByte database that needs converting. On a 64-bit system, I just put all data in collections. In the 32-bit system, I had to think about the order in which to load and convert. The problem is not run-time, it is engineering time. Switching to 64 bit saves weeks of development time.
The compatability issues: that's no bug, that's a feature. It shows you who has written clean software.
There are also some security advantages to using 64-bit operating systems. There have been some buffer overflow exploits that circumvent address space layout randomization by brute force. On a 64-bit OS, there are simply too many addresses for this kind of attack to be successful.
It will speed up compilation if your compile process is memory-bound and you use your 64bit OS to increase the amount of memory usable by your system.
I expect it to be slightly slower, I had that experience with FC10. I don't have real reasons, but it is definitely not the sizeof(pointer) issue. (*)
My own hunch is that it simply is a matter of less optimized drivers or tweaked chipsets.
Also NTFS-3g was funny under 64-bit, while it worked under 32-bit (same distro, same kernel same partition, it just "hung" in some circumstances)
(*) most compiling is disk bound, not CPU bound. Moreover there are other improvements in the x86_64 architecture that cancel out that fact (better PIC, more regs, SSE2 default on, 686 cmov default on) . Unless your app does nothing than randomly moving small blocks around.