Case for Higher Level Language: 64 or 32-bit? - programming-languages

With OS X 10.5.7 coming out, there's been a lot of talk about apps that are 64-bit vs. apps that are 32-bit... manufacturers that will have to convert apps, manufacturers that will not be able to any soon due to a lack of resources (it's apparently a huge deal), etc. What the benefits of converting a certain app (like iTunes) would be, etc....
I'm wondering if, when you run in a VM (I mean like the .Net Framework or the JVM) and code only in "managed code" (in Java, no JNI, not sure what this would be in Ruby):
do you get the benefits when your VM becomes 64-bit without ever having to know about this stuff yourself? OR
do you not really get the benefits (nor the hassle of converting), since your apps are pretty inefficient compared to what's possible (though perhaps fast enough to do what you need)?

I can't speak for the CLR (I would assume its similar), but a 64 bit JVM will give you all the memory benefits of 64 bit with no conversion whatsoever.

The main reason I develop for 64bit is memory. Much more memory is addressable with 64 bit pointers, and for some problem domains, a theoretical 4GB limit at 32 bit (often less) is less than satisfactory.
For something like iTunes, the conversion is near pointless, as it is a low powered app with no need for large memory.
I'm not sure how 64bit OSX handles 32bit binaries, but in Windows, the emulation is near perfect, and for many apps, there's simply no need to upgrade.

Related

What can make a program not capable to take advantages of 64 bit system?

I am looking into Google V8 Javascript Engine. It is said that they are having problems for porting to 64 bit systems.
What kind of programming or programming constraints can make a program a 32-bit or 64-bit specific, apart from building and testing them on 64 bit machine with 64 bit settings ?
You may check this wiki which says:-
The main disadvantage of 64-bit architectures is that, relative to
32-bit architectures, the same data occupies more space in memory (due
to longer pointers and possibly other types, and alignment padding).
This increases the memory requirements of a given process and can have
implications for efficient processor cache utilization. Maintaining a
partial 32-bit model is one way to handle this, and is in general
reasonably effective. For example, the z/OS operating system takes
this approach, requiring program code to reside in 31-bit address
spaces (the high order bit is not used in address calculation on the
underlying hardware platform) while data objects can optionally reside
in 64-bit regions.

Windows CE (RTOS) class-libraries for latency of interrupts and threads and USB?

I am getting started in working with Windows CE to utilize RTOS to reduce latency concerns with interrupts and threads and USB. What class-libraries(visual c++) can you point me to that would be good to have learned well to speed up the learning curve?
Thanks
That's a really, really broad question. The most important piece of advice I'll give you is that if you're after determinism and speed (your reference to an RTOS leads me to think you consider these important) then you need to be aware that any memory allocation or deallocation in a piece of code makes it non-deterministic.
C++ classes often have allocations and deallocations buried in them, so whatever you choose (and whatever you write), use them wisely. Sometimes they'll allow you to provide custom allocators (e.g. Boost) which you can use to just pull memory from an already allocated heap you create somewhere.
Keep the real-time parts of the code as small and simple as possible.

Highly concurrent multi-threaded application requires hardware

I am looking for a hardware, which must run about 256 computationally intensive real-time concurrent tasks in 24 hour mode (one multi-threaded C application). Each task takes about 40-50 MFLOPs, so all tasks require about 10 GFLOPs. CPU-RAM speed is insignificant. All tasks must be managed by a Linux Kernel (32 bit, with SMP).
I am looking for a one-mainboard solution with one multi-core CPU (if such CPU exist). If such CPU doesn't exist, then I need one mulit-socket mainboard solution (with multiple CPUs).
Can you please recommend me any professional CPU/Mainboard solution which will satisfy such requirements? It is also very important that there are no issues with Linux Kernel (2.6.25). No virtualization, no needs in huge RAM or CPU cache. I also would prefer Intel architecture and well-proved stability. I still have doubts that it is feasible at all.
Thank you in advance.
UPDATE:
I think I have found a right answer here and here.
UltraSPARC T2 has 8 cores with 8 threads each. Integrated high-bandwidth memory and IO. The T5140 carries two of them for 128 hardware threads.
The theoretical max raw performance of the 8 floating point units is 11 Giga flops per second (GFlops/s). A huge advantage over other implementations however is that 64 threads can share the units and thus we can achieve an extremely high percentage of theoretical peak. Our experiments have achieved nearly 90% of the 11 Gflop/s. - (http://blogs.oracle.com/deniss/entry/floating_point_performance_on_the)
Rent some Amazon EC2 nodes.
Updated: How about PS3's then? The NASA uses them for their simulation engines.
Maybe use CPU+GPU's in commercial servers?
Build it around FPGAs: nowadays, some variants include processors that can run Linux.
Even though you've given us the specs you think you need, we might be able to help you out better if you tell us what the application is intended to accomplish, and how it was implemented.
There may be a better way to split the work up or deal with it rather than your current solution.
Not Intel architecture but these run linux and have 64 cores on a single die.
TILEPro64
Get a bunch of four- or eight-core machines and split the processing across the machines using some sort of grid or clustering software. Maybe have a look at Beowulf.
As you mentioned, 10GFlops isn't exactly to be sneezed at so in a single machine, it'll be expensive. There's also the problem what you do when the machine breaks, you're unlikely to have a second machine of similar spec available. If you build a cluster using commodity hardware, you're a little more resilient and it's easier to find replacement machines.
MFLOPS and GFLOPS are very poor indicators of how well a program can run on any given CPU. These days, cache footprint is much more important; perhaps branch prediction accuracy as well.
There's almost no way to gauge performance of a given application on different architectures without actually giving it a spin. And even then, you may not get a good idea if you were unlucky enough to unknowingly build with compiler options that ruined your cache footprint, or used a bad threading library, or any of a hundred other things.
I see you'd prefer intel, but if you need one chip, I will again suggest the cell processor -
its theoretical peak performance is arount 25GFlops - kernel 2.6.25 had support for it already.
You could try a pre-slim playstation 3 for experimenting with (that would cost you little) or get yourself a server-based solution at around US$8K - you will have to re-write and fine tune your threads to take advabtage of the SPU co-processors there, but you could achieve your computational needs without breaking a sweat with a single CELL (1 PPC core + 8 SPU's)
NB.: with a playstation 3, you'd have only 6 available co-processors - but you don't seen to be on a budget with this project -
So you could at least try IBM's cell developer kit, which offers an emulator, to see if you can code your solution to run on it.
Thre are commercially available CELL products, both as stand-alone servers in blade form factory, and PCI Express add-on boards for PC workstations from
Mercury Computer Systems:
http://www.mc.com/microsites/cell/products.aspx?id=6986
Mercury does not list any prices on the site, but the pricing seens to be around the previoulsy mentioned U$8000.00 for these PCI Express cards.
A playstation 3 videogame can be purchased for about U$300.00 - and would allow you to prototype your application, and check if it is up to the needed performance. (I myself got one and have Fedora 9 running on it, although I did that as a hobbyst and have not, so far, used it for any calculations - I had also put together a Playstation-3 12 machinne cluster for Molecular simulations at the local University. The application they run did not take advantage of the multimedia SPU's, while I was in touch with then. But even so, clocked at 3.5GHz they performed better than standard ,s imlarly priced, PC's, even considering PS3's are priced 5x higher around here)

How to develop to take advantage of 64 bit systems?

Is there any specific sectors of Software Engineer/Computer Science where there's a marked difference when developing for 64 bit systems? I've been coding for around 10 years now, and since the break of 64 bit systems, my code hasn't changed one bit.
What applications that a single coder can code as a side project require you to use 64 bit technology?
Anything that requires more than 4 GB of working and program memory would certainly qualify, since that is the maximum amount of memory that a 32 bit system can address directly.
Since 64 bit numbers can reside in the CPU registers, calculations requiring numbers of these sizes would see a performance improvement.
Aside from address space or big calculations, doubling your word size helps more in the low level stuff, and mostly for people who are going to be doing kernel hacking or writing device drivers. For instance, let's say you have a stream of bytes from a network connection and you have to process them. You can now pull those bytes in from main memory to CPU registers 8 at a time rather than 4. But I would think you need a "64 bit aware" string library to take advantage of this.
Anecdotally, we did observe a performance increase when upgrading from 32 bit SQL Server to 64 bit SQL Server (2005) on the same hardware (a 64 bit machine).
We recently ported some of our internally-used libraries to 64-bit. The C code didn't change at all; we just had to get the 64-bit versions of the third-party libraries we link against and figure out which new compiler directives we needed to use. The biggest headache was finding 64-bit versions of our dependencies and refactoring our build system to handle both 32-bit and 64-bit.
That's not to say that other software wouldn't require modification. For example, if you pack your data to fit within word boundaries, you might now be inclined to pack it differently when programming for a 64-bit system.
If you need to ask, you probably will not get any advantage, as you are probably not building into your code any assumptions about size of ints. Rather few use cases, and all fairly low-level, will see any speedup. Bignums and heavy integer arithmetic on very large numbers will be quicker (like crypto).

Is there any advantage for developing on a 64 bit OS?

I'm not sure I understand it properly: does a 64 bit OS run/compile code faster than a 32 bit OS on the same system?
We're using 64 bit OSs where I am and it seems to only cause compatibility issues with legacy and proprietary software. (We're running Ubuntu 9.04 Jaunty amd64)
I will restrict this answer to x86-32 (IA-32) vs x86-64 (AMD64), as I believe that's the question you're actually asking.
At the processor level, there are a few advantages. First and most obvious is the expansion of the per-process virtual memory to a much wider range of 48 bits. (64 is allowed in the architecture but not required, if memory serves.) That enables applications to use a lot more of the system's memory available to them, as well as opening up a lot of space for things like memory mapped files that operate on virtual memory that isn't linked to real memory. It also opens up a lot of space for the OS in question to work, as it doesn't have to share your 4 GB limit for its data. In short, applications and the OS can make better use of your machine's resources.
Additionally, the AMD64 architecture addresses one of the biggest problems of IA-32, which is the utter lack of registers. In fact it doubles the available registers, which is a huge win for some types of code. (Actually it's a win for almost ANY code, but some applications suffer from the increased memory cost of 64 bits and it evens out.)
On the Windows side, MS has taken it as an opportunity to break a whole bunch of historical compability problems. It's not a clean break from the old world, but it's a start. I don't believe Linux suffers from the same problems to begin with, and I don't have much perspective to offer on their 64 bit advantages.
As a general rule, developing--or using--a 64-bit operating system, in any context, will be slower than the same 32-bit operating system. Because all pointers are suddenly twice as large, you are far more likely to blow the cache, and can fit less data in RAM. That slows down your application considerably. You normally would only use 64-bit systems when your applications need to address more than 2 to 3 GB of data simultaneously--something very common in scientific computing and some database situations, but otherwise extremely rare. This is why Apple does not advocate unconditionally compiling PowerPC applications in 64-bit mode, for example: the cost due to cache-misses and lack of memory are high enough that going 64-bit only makes sense when you truly can take advantage of the 64-bit space.
But x86 v. AMD64, which is what you're really asking about (since you're discussing Ubuntu), is a very special beast. AMD64 not only extends all pointers to 64-bit; it fixes many, many deficiencies in the x86 architecture, doubling the number of GPRs, simplifying the instructions to be more friendly to modern CPU designs, and more. Because of this, on AMD64 platforms only, you will frequently see a substantial performance boost by going to 64-bit.
There is one other area where, in software development, it makes sense to go to 64-bit: you need to run lots of VMs. Running a couple of VMs can easily blow you past the 3 GB memory barrier of the operating system, making using them very painful. (It will work due to a technology called PAE, or Paged Addressing Extensions, that Intel invented to bridge the gap between 32-bit systems and 64-bit systems, but the result is slow, painful to work with as a developer, and not very well supported on Windows.) Going to a 64-bit OS can provide tremendous benefits.
(As the commentators note, this answer is somewhat generic, some of these points do not apply to intel/amd chips.)
The answer is: it varies, for a few reasons:
With larger-width instructions, you're going to get more expressiveness (either a greater variety of instructions or a greater capacity to encode data into those instructions directly), which can mean a reduced number of instructions flowing through the machine, which is generally a win: so ++64bit here.
But sometimes larger instructions might take more cycles to decode and execute, because they may be more complex. So a possible --64bit here.
Also, you need to transfer these instructions to and from the CPU: 64 bit instructions are twice as big as 32 bit instructions, which means more traffic to and from memory and the caches. CPUs are structured to ameliorate a lot of this cost, but it is a slight --64bit here.
More registers are usually available in wider instruction sets, which causes less data traffic to and from the stack and or memory. So ++64bit here.
And as everyone's no doubt going to mention, you have the ability to address more memory.
(Nearly forgot this one) the native "long" or "int" size may go up, depending on architecture, meaning data structures based on these get larger. Larger = more memory to move around, which means more possible waiting on data moving: --64bit if you're not careful.
Depending on your architecture, a lot of other concerns may apply too. You can rest assured that the processor and compiler vendors are working their butts off to reduce the "--"s above and increase the "++"s.
I have this 5GByte database that needs converting. On a 64-bit system, I just put all data in collections. In the 32-bit system, I had to think about the order in which to load and convert. The problem is not run-time, it is engineering time. Switching to 64 bit saves weeks of development time.
The compatability issues: that's no bug, that's a feature. It shows you who has written clean software.
There are also some security advantages to using 64-bit operating systems. There have been some buffer overflow exploits that circumvent address space layout randomization by brute force. On a 64-bit OS, there are simply too many addresses for this kind of attack to be successful.
It will speed up compilation if your compile process is memory-bound and you use your 64bit OS to increase the amount of memory usable by your system.
I expect it to be slightly slower, I had that experience with FC10. I don't have real reasons, but it is definitely not the sizeof(pointer) issue. (*)
My own hunch is that it simply is a matter of less optimized drivers or tweaked chipsets.
Also NTFS-3g was funny under 64-bit, while it worked under 32-bit (same distro, same kernel same partition, it just "hung" in some circumstances)
(*) most compiling is disk bound, not CPU bound. Moreover there are other improvements in the x86_64 architecture that cancel out that fact (better PIC, more regs, SSE2 default on, 686 cmov default on) . Unless your app does nothing than randomly moving small blocks around.

Resources