This question is related to Java Refuses To Start - Could Not Resrve Enough Space for Object Heap and should be easy enough to figure out. However; my searches haven't yielded anything useful.
Essentially we have 2 32 bit OS's (RedHat & SuSE) on different machines with the same hardware. Both use the same JVM both executing the same command line. RedHat works perfectly fine but SuSE reports there isn't enough Memory.
We just need to know if this is a limitation of the version of SuSE we're using or if it's something else.
'cat /proc/version' gives us:
Linux version 2.6.5-7.244-bigsmp (geeko#buildhost) (gcc version 3.3.3 (SuSE Linux)) #1 SMP Mon Dec 12 18:32:25 UTC 2005
'uname -a' gives us the following on BOTH types of machines:
UTC 2005 i686 i686 i386 GNU/Linux
The JVM memory limit is related the largest free contiguous block available, not the amount of free memory. The limit varies from about 1.4 GB to a bit over 2.0 GB, and depends on where your operating system puts various things in memory. I don't know the particulars of where Redhat or Suse load stuff into memory, but it could be that suse is mapping some library to an address in the middle of RAM, where Redhat might map it at the end (speculating).
And remember that your actual memory usage in java is more than what you specify for Xmx. The other memory settings also affect the size of your heap (like permgen). So it could also be that the perm space on Suse has a larget default than on Redhat.
Also, depending on the memory allocation profile of your application, you might get away with a smaller heap size and different garbage collecting options. There are some details here (http://java.sun.com/performance/reference/whitepapers/tuning.html) and other places. For example, if you allocate a lot of small, temporary blocks, you'll want different GC settings than if you have a lot of bit, long-lived objects.
Regarding the linked question, why not just use Redhat? That might be a simplistic solution, but I guarantee it's going to fix your problem faster than deeply delving into the arcane world of java tuning and OS memory management :P
Firstly, you are crazy to be running a 32-bit OS when you have this much address space pressure. Migrate to a 64-bit JVM on 64-bit Linux. How much time have you wasted already trying to diagnose this problem which you must have suspected from the outset would go away with the larger address space of a 64-bit system ?
Secondly, it's well known that out of all the Linux vendors Red Hat has the most kernel engineers on staff and makes some serious tweaks for the kernels in their RHEL products. These often include patches for large workloads like yours (well, it's a large workload for a 32-bit system, it's nothing special on 64-bit). So there's some chance the reason ultimately is that RHEL has other customers doing the same crazy stuff as you and you're benefiting from work they did to support those customers.
Finally though, since I suspect you're going to insist on trying to find a way to do this on 32-bit SuSE I will point out that Linux offers a variety of address space trade-offs on 32-bit x86, and it's possible (but not certain) that your SuSE systems just have a different trade-off selected. If you can bring up the configuration of the running kernels (often in /boot/config....) then you can compare settings like HIGHMEM.
The conventional option until a few years ago was 2:2 split, that is userspace is limited to 2GiB of address space, an easy solution to program and it has decent efficiency but in this scenario obviously you can't have your requested heap since it would leave no space for the program text, stack etc. More recently the trend has been for 3:1 (similar to the Windows /3GB switch) which expands userspace address space at the cost of cramming the OS kernel itself into less space which potentially causes its own problems. This might work, but it would be very cramped so I also wouldn't be surprised if it didn't work for your jobs. Finally newer Linux kernels also offer an option where you get 4GiB 32-bit userspace, which might be enough to make your jobs run reliably, at a significant performance cost since then obviously userspace and kernel addresses can't co-exist.
To try this you'd need a new kernel. You may be able to just install one provided by SuSE (see if they offer others to choose from, e.g. a "PAE" option) or you may have to compile your own, in which case it probably invalidates your support contract.
But really, you should just go with option 1, switch to a 64-bit JVM and put your feet up.
Related
Background:
I was trying to setup a ubuntu machine on my desktop computer. The whole process took a whole day, including installing OS and softwares. I didn't thought much about it, though.
Then I tried doing my work using the new machine, and it was significantly slower than my laptop, which is very strange.
I did iotop and found that disk traffic when decompressing a package is around 1-2MB/s, and it's definitely abnormal.
Then, after hours of research, I found this article that describes exactly same problem, and provided a ugly solution:
We recently had a major performance issue on some systems, where disk write speed is extremely slow (~1 MB/s — where normal performance
is 150+MB/s).
...
EDIT: to solve this, either remove enough RAM, or add “mem=8G” as kernel boot parameter (e.g. in /etc/default/grub on Ubuntu — don’t
forget to run update-grub !)
I also looked at this post
https://lonesysadmin.net/2013/12/22/better-linux-disk-caching-performance-vm-dirty_ratio/
and did
cat /proc/vmstat | egrep "dirty|writeback"
output is:
nr_dirty 10
nr_writeback 0
nr_writeback_temp 0
nr_dirty_threshold 0 // and here
nr_dirty_background_threshold 0 // here
those values were 8223 and 4111 when mem=8g is set.
So, it's basically showing that when system memory is greater than 8GB (32GB in my case), regardless of vm.dirty_background_ratio and vm.dirty_ratio settings, (5% and 10% in my case), the actual dirty threshold goes to 0 and write buffer is disabled?
Why is this happening?
Is this a bug in the kernel or somewhere else?
Is there a solution other than unplugging RAM or using "mem=8g"?
UPDATE: I'm running 3.13.0-53-generic kernel with ubuntu 12.04 32-bit, so it's possible that this only happens on 32-bit systems.
If you use a 32 bit kernel with more than 2G of RAM, you are running in a sub-optimal configuration where significant tradeoffs must be made. This is because in these configurations, the kernel can no longer map all of physical memory at once.
As the amount of physical memory increases beyond this point, the tradeoffs become worse and worse, because the struct page array that is used to manage all physical memory must be kept mapped at all times, and that array grows with physical memory.
The physical memory that isn't directly mapped by the kernel is called "highmem", and by default the writeback code treats highmem as undirtyable. This is what results in your zero values for the dirty thresholds.
You can change this by setting /proc/sys/vm/highmem_is_dirtyable to 1, but with that much memory you will be far better off if you install a 64-bit kernel instead.
Is this a bug in the kernel
According to the article you quoted, this is a bug, which did not exist in earlier kernels, and is fixed in more recent kernels.
Note that this issue seems to be fixed in later releases (3.5.0+) and is a regression (doesn’t happen on e.g. 2.6.32)
This is semi-theoretic question.
Can I specify the virtualization mode for memory (pure segmentation/segmentation+paging/just paging) while compiling for Windows (e.g., MSVS12 C++) and for Linux (e.g. g++)?
I have read all MSVS linker+compiler options, and found no point of control in there.
For g++ the manual is quite too complex for such question.
The source of this question is this - link
I know from theory and practice that these should either be possible or restricted by OS policy at some level cause core i7 supports all three modes I mentioned above.
Practical background:
The piece of code that created lots of data is here, function Init - and it exhausted my memory if I wanted to have over 2-3G primes on heap.
Intel x86 CPUs always use some form segmentation that can't be turned off. In 64-bit mode code segmentation is limited, but it's still there. Paging is required for both Windows and Linux to work on Intel CPUs (though Linux doesn't use paging on certain other CPU architectures). Paging is also required to enable 64-bit mode on Intel CPUs.
So in other words on Windows and Linux the OS always uses segmentation and paging, and so do any applications run on them, though this is largely transparent. It's not possible to "compiled+linked for 'segmentation without paging'" as you said in the answer you linked. Maybe the book you referenced is referring to ancient 16-bit versions of Windows (3.1 or earlier) which could be run in a mode that supported 80286 CPUs which didn't have paging. Though even then that normally didn't make any difference in how you compiled and linked your applications.
What you are describing is not a function of a compiler, or even a linker.
When you run your program, you get the memory model that is already running on the system. Your compiled code does not care abut the underlying memory mode.
However, your program itself can change the memory model IF it starts running in an unprotected processor mode.
There is a relatively new Linux ABI referred to as x32, where the x86-64 processor runs in 32-bit mode, so pointers are still only 32-bits, but the 64-bit architecture specific registers are still used. So you're still limited to 4GB max memory use as in normal 32-bit, but your pointers use up less cache space than they do in 64-bit, you can do 64-bit arithmetic efficiently, and you get access to more registers (16) than you would in vanilla 32-bit (8).
Assuming you have a workload that fits nicely within 4GB, is there any way the performance of x32 could be worse than on x86-64?
It seems to me that if you don't need the extra memory space nothing is lost -- you should always get the same perf (when you already fit in cache) or better (when the pointer space savings lets you fit more in cache). But it wouldn't surprise me if there are paging/TLB/etc. details that I don't know about.
Certainly if you have a multithreaded program, the fact that data structures are smaller on x32 might cause cache line fighting between threads -- different objects might get allocated on the same cache line in x32 mode and different cache lines in x86_64 mode. If two threads modify those objects independently the cache ping-ponging could severely slow down the x32 code. Of course, this kind of cache effect could happen regardless of pointer size, but if the code has been tuned assuming 64-bit pointers, going to 32-bit pointers could de-tune things.
In X32 the processor is actually executing in "long mode", the same mode as for x86_64. That is, addresses as seen by the processor when doing addressing are still 64 bits, however the X32 ABI makes sure that all addresses are small enough to fit into 32 bits. As a result of this, in some case there is some slight overhead when pointers have to be zero extended from 32 bits to 64.
Also, needing x86/x86-64/x32 libraries in RAM, which I suppose is what one will end up with in practice (unless you're talking about some embedded or other tightly controlled system rather than a general purpose computer), may eat up some of the benefit of X32.
I don't understand what 32 bit and 64 bit means. It seems that people say 64 bit computers run faster - but why? Does it mean that there are 64 bit integers instead of 32? If it's something like that, is there a way to write a program to determine if we're on a 32 bit or 64 bit machine?
On 64-bit machines pointers are 8 bytes (64 bits). On 32-bit machines they are 4 bytes (32 bits). Thus we can determine by the size of a pointer what we are dealing with, in it's simplest form:
#define IS_64BIT (sizeof(void *) == 8)
The only drawback is that a 64 bit computer running in 32 bit mode will register as 32 bit. Of course, this isn't actually important as for all intents and purposes a 32 bit OS on a 64 bit computer will be a 32 bit computer.
There's actually several different things your asking here.
First of all there's the CPU. Most modern day CPUs (within the past 5-years approx) will support 64-bit.
Now just because the CPU supports it doesn't mean the OS supports it, that's where you have either 64-bit OS or 32-bit OS (32-bit is also known as x86, there's small technical differences in the x86 refers to the CPU instruction set, but for most common usage x86 and 32-bit are interchangeable)
Even if the OS supports it, it doesn't mean the specific program you're running supports 64-bit. What most (if not all?) 64-bit OS's do is they have a 32-bit emulation mode so you can still run 32-bit programs.
Now for your question of how to determine which architecture you're running on, the most reliable way is to ask the OS through some API call.
As for why 64-bit is sometimes considered faster, it because with 32-bits it is only possible to address 4GB of memory, whereas with 64-bit the limit imposed by address space is much higher (as in about 4 billion times higher) and the limiting factor is hardware not address space. As to when and why more memory is faster, that's a separate topic altogether.
64-bit machines do not run faster than 32-bit machines except in cases where 64-bit math is being done or in cases where more than 4 GB of RAM is needed.
64-bit AMD (and later Intel) machines run faster than 32-bit x86 machines because when AMD designed the new instruction set they added more CPU registers and made SSE math the default.
32-bit x86 systems can waste a lot of CPU time pushing data around in RAM, while a x86_64 system can store that data in CPU registers instead. Registers are much faster than level-1 CPU cache. Having more registers also saves CPU instructions that otherwise need to store the old value of a register in RAM, load in a different value from RAM, then load the original value back from RAM.
In some especially register-starved cases the extra registers can gain 30% speed for a program. The benefit is usually much less than that.
The speed benefits from assuming SSE2 are many. In 32-bit CPUs SSE instructions may or may not exist, so to use them the software needs to have clumsy test code and two (or more!) implementation of the math functions. Most software just doesn't care enough and so it never bothers, always falling back on x87 FPU math from the 486 days. The 64-bit CPUs made SSE2 a required part of the instruction set, so all x86_64 programs are free to assume it exists and use it in all cases.
64bit computers do not run faster, per se. It just can support higher precision (larger integers, more precise floats).
In some rare cases, libraries might jam two 32bit numbers into 64bits to perform a large number of parallel operations, possibly resulting in potentially up to 2x speedup. This might occur for some highly optimized scientific/numeric libraries, or in special applications that (for some reason or another) have been highly optimized at a very low level. For example, some multimedia software. It should be noted that such applications could always have made this tradeoff even in 32bit mode, but chose not to; they are merely trading away precision (which they may not need) for parallelism.
Operating system benchmarks which reveal faster performance (maybe <10% improvement) are not necessarily related to 64bit-related optimizations. 64bit architectures may be correlated with having for example more registers or advanced features that programs can take aware of [citation: http://www.tuxradar.com/content/ubuntu-904-32-bit-vs-64-bit-benchmarks ], which may be the cause of a performance difference (as well as other variables).
How to determine whether a CPU is 32bit or 64bit depends on what OS you are using. For example on Linux, you can call uname -a, though there's probably a better way to do so. If you're using C/C++, see the other answer for a way to determine it in a program.
I'm not sure I understand it properly: does a 64 bit OS run/compile code faster than a 32 bit OS on the same system?
We're using 64 bit OSs where I am and it seems to only cause compatibility issues with legacy and proprietary software. (We're running Ubuntu 9.04 Jaunty amd64)
I will restrict this answer to x86-32 (IA-32) vs x86-64 (AMD64), as I believe that's the question you're actually asking.
At the processor level, there are a few advantages. First and most obvious is the expansion of the per-process virtual memory to a much wider range of 48 bits. (64 is allowed in the architecture but not required, if memory serves.) That enables applications to use a lot more of the system's memory available to them, as well as opening up a lot of space for things like memory mapped files that operate on virtual memory that isn't linked to real memory. It also opens up a lot of space for the OS in question to work, as it doesn't have to share your 4 GB limit for its data. In short, applications and the OS can make better use of your machine's resources.
Additionally, the AMD64 architecture addresses one of the biggest problems of IA-32, which is the utter lack of registers. In fact it doubles the available registers, which is a huge win for some types of code. (Actually it's a win for almost ANY code, but some applications suffer from the increased memory cost of 64 bits and it evens out.)
On the Windows side, MS has taken it as an opportunity to break a whole bunch of historical compability problems. It's not a clean break from the old world, but it's a start. I don't believe Linux suffers from the same problems to begin with, and I don't have much perspective to offer on their 64 bit advantages.
As a general rule, developing--or using--a 64-bit operating system, in any context, will be slower than the same 32-bit operating system. Because all pointers are suddenly twice as large, you are far more likely to blow the cache, and can fit less data in RAM. That slows down your application considerably. You normally would only use 64-bit systems when your applications need to address more than 2 to 3 GB of data simultaneously--something very common in scientific computing and some database situations, but otherwise extremely rare. This is why Apple does not advocate unconditionally compiling PowerPC applications in 64-bit mode, for example: the cost due to cache-misses and lack of memory are high enough that going 64-bit only makes sense when you truly can take advantage of the 64-bit space.
But x86 v. AMD64, which is what you're really asking about (since you're discussing Ubuntu), is a very special beast. AMD64 not only extends all pointers to 64-bit; it fixes many, many deficiencies in the x86 architecture, doubling the number of GPRs, simplifying the instructions to be more friendly to modern CPU designs, and more. Because of this, on AMD64 platforms only, you will frequently see a substantial performance boost by going to 64-bit.
There is one other area where, in software development, it makes sense to go to 64-bit: you need to run lots of VMs. Running a couple of VMs can easily blow you past the 3 GB memory barrier of the operating system, making using them very painful. (It will work due to a technology called PAE, or Paged Addressing Extensions, that Intel invented to bridge the gap between 32-bit systems and 64-bit systems, but the result is slow, painful to work with as a developer, and not very well supported on Windows.) Going to a 64-bit OS can provide tremendous benefits.
(As the commentators note, this answer is somewhat generic, some of these points do not apply to intel/amd chips.)
The answer is: it varies, for a few reasons:
With larger-width instructions, you're going to get more expressiveness (either a greater variety of instructions or a greater capacity to encode data into those instructions directly), which can mean a reduced number of instructions flowing through the machine, which is generally a win: so ++64bit here.
But sometimes larger instructions might take more cycles to decode and execute, because they may be more complex. So a possible --64bit here.
Also, you need to transfer these instructions to and from the CPU: 64 bit instructions are twice as big as 32 bit instructions, which means more traffic to and from memory and the caches. CPUs are structured to ameliorate a lot of this cost, but it is a slight --64bit here.
More registers are usually available in wider instruction sets, which causes less data traffic to and from the stack and or memory. So ++64bit here.
And as everyone's no doubt going to mention, you have the ability to address more memory.
(Nearly forgot this one) the native "long" or "int" size may go up, depending on architecture, meaning data structures based on these get larger. Larger = more memory to move around, which means more possible waiting on data moving: --64bit if you're not careful.
Depending on your architecture, a lot of other concerns may apply too. You can rest assured that the processor and compiler vendors are working their butts off to reduce the "--"s above and increase the "++"s.
I have this 5GByte database that needs converting. On a 64-bit system, I just put all data in collections. In the 32-bit system, I had to think about the order in which to load and convert. The problem is not run-time, it is engineering time. Switching to 64 bit saves weeks of development time.
The compatability issues: that's no bug, that's a feature. It shows you who has written clean software.
There are also some security advantages to using 64-bit operating systems. There have been some buffer overflow exploits that circumvent address space layout randomization by brute force. On a 64-bit OS, there are simply too many addresses for this kind of attack to be successful.
It will speed up compilation if your compile process is memory-bound and you use your 64bit OS to increase the amount of memory usable by your system.
I expect it to be slightly slower, I had that experience with FC10. I don't have real reasons, but it is definitely not the sizeof(pointer) issue. (*)
My own hunch is that it simply is a matter of less optimized drivers or tweaked chipsets.
Also NTFS-3g was funny under 64-bit, while it worked under 32-bit (same distro, same kernel same partition, it just "hung" in some circumstances)
(*) most compiling is disk bound, not CPU bound. Moreover there are other improvements in the x86_64 architecture that cancel out that fact (better PIC, more regs, SSE2 default on, 686 cmov default on) . Unless your app does nothing than randomly moving small blocks around.