32-bit versus 64-bit JVM performance considerations?

32-bit versus 64-bit JVM performance considerations? - 64-bit

Our production environment runs 3 32-Bit Java 6 JVMs on each Windows 2003 server. Each heap is at it's max setting (~1.25GB). We are considering moving to new servers and using 64-Bit JVMs. Presumably we can then have one 64-bit JVM on each server that would replace the 3 32-Bit JVMs on each server because of the allowance for a much larger heap size when using a 64-bit JVM.
Anyone done this and have any lessons learned?
I am specifically concerned with any performance considerations and what to do to compensate.

The memory requirements will go up as it is more expensive to address objects on a 64 bit architecture. How much is really depending on your application. A wild guess would be 10%-20%, assuming that you are not hashing 4 gb of Integers...
You may also get problems with lock contention on locks in loggers, thread pools, connection pools, singletons etc. It is probably not a problem if your application is database centric, but if - for example- Your application is storing a lot of sessions in a map, and access that map a lot, you might get problems. The contention "wall" can come rather quickly, in my experience.
However, there is only one way to know : ...test it.

I've not done this, but this should be perfectly fine as long as you aren't relying on any 32-bit JNI code for which you do not have 64-bit versions. If all you are using is pure Java, there should be no problems.

Related

Why so many applications allocate incredibly large amount of virtual memory while not using any of it?

I've been watching some weird phenomena in programming for quite some time, since overcommit is enabled by default on linux systems.
It seems to me that pretty much every high level application (eg. application written in high level programming language like Java, Python or C# including some desktop applications written in C++ that use large libraries such as Qt) use insane amount of virtual operating memory. For example, it's normal for web browser to allocate 20GB of ram while using only 300MB of it. Or for a dektop environment, mysql server, pretty much every java or mono application and so on, to allocate tens of gigabytes of RAM.
Why is that happening? What is the point? Is there any benefit in this?
I noticed that when I disable overcommit in linux, in case of a desktop system that actually runs a lot of these applications, the system becomes unusable as it doesn't even boot up properly.

Languages that run their code inside virtual machines (like Java (*), C# or Python) usually assign large amounts of (virtual) memory right at startup. Part of this is necessary for the virtual machine itself, part is pre-allocated to parcel out to the application inside the VM.
With languages executing under direct OS control (like C or C++), this is not necessary. You can write applications that dynamically use just the amount of memory they actually require. However, some applications / frameworks are still designed in such a way that they request a large chunk memory from the operating system once, and then manage the memory themselves, in hopes of being more efficient about it than the OS.
There are problems with this:
It is not necessarily faster. Most operating systems are already quite smart about how they manage their memory. Rule #1 of optimization, measure, optimize, measure.
Not all operating systems do have virtual memory. There are some quite capable ones out there that cannot run applications that are so "careless" in assuming that you can allocate lots & lots of "not real" memory without problems.
You already found out that if you turn your OS from "generous" to "strict", these memory hogs fall flat on their noses. ;-)
(*) Java, for example, cannot expand its VM once it is started. You have to give the maximum size of the VM as a parameter (-Xmxn). Thinking "better safe than sorry" leads to severe overallocations by certain people / applications.

These applications usually have their own method of memory management, which is optimized for their own usage and is more efficient than the default memory management provided by the system. So they allocate huge memory block, to skip or minimize the effect of the memory management provided by system or libc.

System.OutOfMemoryException error

We are working on a rich client application in which many threads are running as well third party controls are used, after running application for 1 hour it starts giving error of 'System.OutOfMemoryException' unless and until we restart the application, i have search many sites for help but no particular and specified reason is giving.
Thanks.

It sounds pretty self-explanatory, you're system doesn't have enough memory. If you're still running the application as 32-bit then moving to 64-bit might solve the problem. I had exactly that problem on a server-2008-r2 recently, and moving to 64 bit did solve my problem. But if you're already 64 bit then perhaps the server doesn't have enough physical memory. In which case, you need to add more memory, or work out how to make your application less memory hungry. There could be objects that could be discarded that it's keeping references to, etc, and if that's the case you should try profiling to try and identify what's hogging the most memory. Beyond that, does the application use any unmanaged DLLs, e.g. COM objects written in C++ or similar. Maybe there's a memory leak outside of the managed framework?

I recommend using a profiler to identify and find where does the high memory consumption come from.

Making an Operating System Give More Memory Access to Erlang

I have always run erlang applications on powerful servers. However, sometimes, you cannot avoid such memory errors, especially when users are many
Crash dump was written to: erl_crash.dump
eheap_alloc: Cannot allocate 467078560 bytes of memory (of type "heap").
What makes it more annoying is that you have a server with 20GB of RAM, with say 8 cores. Looking at the memory which erlang says, it could not allocate and that is why it crashed, is also disturbing , because it is very little memory compared to what the server has in stock.
My question today (i wish it is not closed) , is that, what Operating system configurations can be done (consider RedHat , Solaris, Ubuntu or Linux in general), to make it offer the erlang VM more memory when it needs it ? If one is to run an erlang application on such capable servers, what memory consideration (outside erlang) should be made as regards the underlying operating system ? Problem Background Erlang consumes Main Memory, especially when processes are in thousands. I am running a Web service using Yaws Web Server. On the same node, i have Mnesia running with about 3 ram_copies tables. Its a notification system, as part of a larger Web application running on an intranet. Users access this very system via JSONP from the main application running off a different web server and a different hardware as well. Each user connection queries mnesia directly for any data it needs. However, as users increase i always get the crash dump. I have tweaked the application itself as much as possible, clean up the code to standard, used more binaries than strings e.t.c. avoided single points like gen_servers between yaws processes and mnesia, so that each connection, just hits mnesia directly. The server is very capable with lots of RAM and Disc Space. However, my node crashes when it needs a little more memory, thats why i need to find a way of forcing the Operating system to expand more memory to erlang. Operating system is REDHAT ENTERPRISE 6

It is probably because you are running in 32bit mode where only approx 4 GB of RAM is addressable. Try switching to the 64bit version of erlang and try again.

Several various server tutorials I have read say that if the service runs as a non root user, you may have to edit the /etc/security/limits.conf to allow that user to access more memory than it is typically allowed. the example below lets user fooservice use 2GB.
fooservice hard memlock 2097152

Do performance stats like Geekbench represent general multi-tasking performance?

I am trying to compare how an i7 dual core 2.7Ghz would perform vs. an i7 quad core 2.0Ghz in a multitasking environment. The quad core scores at around 9000 while the dual comes in at around 7500 (for Geekbench). At the same time, Geekbench explicity specifies that the tests show the full performance potential of all the cores. However, in real world, everyday use, almost none of the application I would be running are multi-threaded (Ruby runtime, Java IDE, Windows VM on mac, app server).
This machine would server as a web development machine. Which cpu would be most "snappy" in terms of response time in this use case?

Results of a benchmark have any practical meaning only if the benchmark very closely approximates your typical workload.
You should consider whether your typical development environment regularly calls for parallelism. For example, if I develop a C/C++/Java app it's common that a header file (or Java source) change to cause several other files to be recompiled and a few binaries to be relinked - that's a highly parallel workload and many-core CPU may prove advantageous.
On the other hand, if I'm changing a few Python or Javascript sources, I doubt I will create any parallel workload when I try to execute and test the changes.
However, these are theoretical considerations.
I don't think the speed of the machine is a bottleneck in any development effort. The human is.

CPU usage of Oracle installed Database machine

I am using oracle 11g and i have an application which is coded in Spring framework. Once i configure the database on Sun fire 4170 installed with Linux the machine's CPU utilization is around 80-100% and, however, when i shift the same database to Sun M3000 server installed with Unix OS (supposedly more powerful machine) the application performance goes down and CPU utilization remains 90-100%. I can't figure out if its the application which is making the such utilization or its the database design.
It is added that the database is not relational; things are handled by the application.

Well you certainly can find some interesting opinions on the intertubes.
Oracle does not have a true server
architecture (others have it).
Rather than performing classic server
tasks, such as multi-threading,
caching of data pages, parallel
processing (split a query across many
devices) etc. within itself, it uses
the o/s to do all that. That means for
each user process (PL/SQL connection)
there is one unix process; 1000 users
means 1000 unix processes, all
competing for the same resources.
You might note that Oracle has had
a connection pooling architecture (multi-threaded server) since version 7 (1992).
a cache for data pages (known helpfully as the buffer cache) since forever
parallel query (splitting a query across many processes) since version 7.1 (1993)
splitting queries across multiple servers since OPS (version 6) or across distributed databases (version 5)
It's also noteworthy that even if all that was said was correct rather than incorrect it doesn't actually help you in determining root cause.
Especially noteworthy, because it uses
file system files (not raw
partitions), and the "caching" is
outside, it relies heavily on (and is
very sensitive to) the file system
cache that you have set up. likewise,
Oracle needs a massive amount of
memory for these processes.
Oracle certainly can use raw partitions again dating back to the last millenium, moreover if you wish to cache within the database - using the buffer cache that PerformanceDBA has forgotten about - and bypass the filesystem cache this feature is available on all current filesystems. Oracle also supplies it's own combined filesystem/volume manager in ASM which you can use if you wish.
Oracle is also rather well instrumented (and if you have access to dtrace so is solaris) and can certainly tell you what sessions, processes etc are using the CPU, what the time the application spends in the database is consumed by (down to individual block read times if you care) and so is very susceptible to profiling. I'd recommend that you check out Thinking Clearly about Performance available at http://www.method-r.com/downloads/cat_view/38-papers-and-articles and written by one of the top Oracle Performance experts in the world. If you have access to the Oracle Diagnostics pack then checking out first of all ADDM reports and secondly AWR reports would be profitable.

Trying to avoid a flame war here.
I should probably have separated out the "how to find out" part of my response more clearly from my responses to the comments about server architecture from PerformanceDBA. I share Stephanie's suspicions about the spring framework, but without properly scoped measurement evidence there is no point in blaming any particular attribute of the environment, that would be just particular bias. Fortunately the instrumentation built into the oracle kernel allows you to trace and then profile the slow sessions to determine exactly where the issue lies. So I would do the following:
1) enable tracing for a representative session (you can use the dbms_monitor package for that).
2) also gather an execution plan for the statement(s) involved with the gather_plan_statistics hint.
3) profile the trace file by time using an appropriate profile (tkprof,orasrp,method-r profiler)
Investigate the problem statements in contribution to response time order.
If you can't carry out the above, then you can use ADDM and/or AWR if licenced as I originally suggested or statspack if not licensed for the diagnostics pack. ADDM naturally concentrates on time consumers, I suggest if you are forced down the statspack route you do the same.

The M3000 is certainly a more powerful machine, but it is more suitable for true servers. The X4170 with hyper-threads is more suited for file servers.
I'm not so certain about that. Have any data to support that claim?
An M3000 has one SPARC64 VII processor with 4 cores (tech specs) while a X4170 has 1 or 2 Intel 5500 "Nehalem-EP" processors each with 4 cores (tech specs). I know that I would expect much more from even a single processor Nehalem-EP system, than the M3000. Obviously data will vary slightly with the workload, but I know where I'd put my money.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string