Making an Operating System Give More Memory Access to Erlang

Making an Operating System Give More Memory Access to Erlang - linux

I have always run erlang applications on powerful servers. However, sometimes, you cannot avoid such memory errors, especially when users are many
Crash dump was written to: erl_crash.dump
eheap_alloc: Cannot allocate 467078560 bytes of memory (of type "heap").
What makes it more annoying is that you have a server with 20GB of RAM, with say 8 cores. Looking at the memory which erlang says, it could not allocate and that is why it crashed, is also disturbing , because it is very little memory compared to what the server has in stock.
My question today (i wish it is not closed) , is that, what Operating system configurations can be done (consider RedHat , Solaris, Ubuntu or Linux in general), to make it offer the erlang VM more memory when it needs it ? If one is to run an erlang application on such capable servers, what memory consideration (outside erlang) should be made as regards the underlying operating system ? Problem Background Erlang consumes Main Memory, especially when processes are in thousands. I am running a Web service using Yaws Web Server. On the same node, i have Mnesia running with about 3 ram_copies tables. Its a notification system, as part of a larger Web application running on an intranet. Users access this very system via JSONP from the main application running off a different web server and a different hardware as well. Each user connection queries mnesia directly for any data it needs. However, as users increase i always get the crash dump. I have tweaked the application itself as much as possible, clean up the code to standard, used more binaries than strings e.t.c. avoided single points like gen_servers between yaws processes and mnesia, so that each connection, just hits mnesia directly. The server is very capable with lots of RAM and Disc Space. However, my node crashes when it needs a little more memory, thats why i need to find a way of forcing the Operating system to expand more memory to erlang. Operating system is REDHAT ENTERPRISE 6

It is probably because you are running in 32bit mode where only approx 4 GB of RAM is addressable. Try switching to the 64bit version of erlang and try again.

Several various server tutorials I have read say that if the service runs as a non root user, you may have to edit the /etc/security/limits.conf to allow that user to access more memory than it is typically allowed. the example below lets user fooservice use 2GB.
fooservice hard memlock 2097152

Related

For a node web server, is it better to have more vCPUs or RAM

I am running a node app on a Digital Ocean cloud server, and the app merely services API requests. All client-side assets are served by a CDN, and the DB is accessed remotely, rather than stored on the server instance itself.
I have the choice of a greater number of vCPUs or RAM. I have no idea what that means in any way, so any feedback is a great help.

A single node.js server will run your Javascript on only one CPU so it doesn't help your Javascript run any faster to have more CPUs unless you cluster your app and run multiple node.js processes sharing the load of your app or unless there are other processes on the same server that are being used by your server.
Having more RAM (memory) will only improve things if you actually need more RAM. That depends entirely upon what the memory usage profile is of your app and how much RAM you already have available. Probably, you would already know if you were running out of RAM because you either get drastic slow-down when the OS starts page swapping or your process crashes when out of memory.
So, in order to know which would benefit you more, you really need more data on how your existing app is performing (whether it is ever bog down with CPU intensive operations and how much RAM it uses compared to how much you have available). It is quite possible that neither will actually matter to you - it totally depends upon the usage profile or your server process.
If you have no more data than this and have to make a choice, choose the vCPUs because there are some circumstances where it might help you (and gives you the option to go to clustering in the future if needed) whereas adding more RAM when you aren't even using what you already have won't help you at all.

Why so many applications allocate incredibly large amount of virtual memory while not using any of it?

I've been watching some weird phenomena in programming for quite some time, since overcommit is enabled by default on linux systems.
It seems to me that pretty much every high level application (eg. application written in high level programming language like Java, Python or C# including some desktop applications written in C++ that use large libraries such as Qt) use insane amount of virtual operating memory. For example, it's normal for web browser to allocate 20GB of ram while using only 300MB of it. Or for a dektop environment, mysql server, pretty much every java or mono application and so on, to allocate tens of gigabytes of RAM.
Why is that happening? What is the point? Is there any benefit in this?
I noticed that when I disable overcommit in linux, in case of a desktop system that actually runs a lot of these applications, the system becomes unusable as it doesn't even boot up properly.

Languages that run their code inside virtual machines (like Java (*), C# or Python) usually assign large amounts of (virtual) memory right at startup. Part of this is necessary for the virtual machine itself, part is pre-allocated to parcel out to the application inside the VM.
With languages executing under direct OS control (like C or C++), this is not necessary. You can write applications that dynamically use just the amount of memory they actually require. However, some applications / frameworks are still designed in such a way that they request a large chunk memory from the operating system once, and then manage the memory themselves, in hopes of being more efficient about it than the OS.
There are problems with this:
It is not necessarily faster. Most operating systems are already quite smart about how they manage their memory. Rule #1 of optimization, measure, optimize, measure.
Not all operating systems do have virtual memory. There are some quite capable ones out there that cannot run applications that are so "careless" in assuming that you can allocate lots & lots of "not real" memory without problems.
You already found out that if you turn your OS from "generous" to "strict", these memory hogs fall flat on their noses. ;-)
(*) Java, for example, cannot expand its VM once it is started. You have to give the maximum size of the VM as a parameter (-Xmxn). Thinking "better safe than sorry" leads to severe overallocations by certain people / applications.

These applications usually have their own method of memory management, which is optimized for their own usage and is more efficient than the default memory management provided by the system. So they allocate huge memory block, to skip or minimize the effect of the memory management provided by system or libc.

Running Ubuntu with nothing installed uses 500 out of 512MB which process should I kill?

Running linux ubuntu 14.04 on a digitalOcean server which gives me 512MB ram. Surprisingly, when trying to run activator for a play app I came to realice that almost all the memory was used. Using 'htop' command I get this output. which process should I kill (I am using 2 ssh connections, one to monitor and the other one to do stuff).
I could also assign swap memory but that would affect performance. I thought 512MB should be more than enough to run a play server. I mean, seriously, we put a man on the moon with reaaaaly much less.

Linux makes as much use of memory as it can, but that doesn't mean that it's not available for your applications. It will use memory to cache certain things (such as files) and memory for buffers.
In your screenshot you'll see the memory usage bar is made of different coloured sections:
Green is memory in use
Blue is buffer
Yellow is cache
So generally any applications you run that require more memory will allocate it out of the memory used to cache data.
Having swap space is generally a good idea - it won't affect performance unless the kernel starts swapping heavily, but that's generally better than the alternative which is your applications will crash with an out-of-memory error.

Should I disable NUMA if my application is not NUMA-Aware

Running Windows Server 2008 R2 SP1. The application I'm running was not designed with NUMA in mind. Would it be better to disable NUMA on my dual-socket system? My guess is yes, but I wanted to confirm. My server is a Westmere dual-socket system.

If your application is not multithreaded or is multithreaded but does not employ the threads to work simultaneously on the same problem (e.g. is not data parallel), then you can simply bind the program to one of the NUMA nodes. This can be done with various tools, e.g. with the "Set Affinity..." context menu command in Windows Task Manager. If your program is parallel, then you can still use half of the available process cores and bind to one NUMA node.
Note that remote memory accesses on Westmere systems are not that expensive - the latency is 1.6x higher than local access and the bandwidth is almost the same as the local on, therefore if you do a lot of processing on each memory value the impact would be minimal. On the other hand, disabling NUMA on such systems results in fine-mesh interleave of both NUMA domains which makes all applications perform equally bad as roughly 50% of all memory accesses will be local and 50% will be remote.

If I understand correctly, turning NUMA on cannot harm the performance.
If your application is not NUMA aware, accesses will be managed by the OS, so might be across NUMA nodes or might be on the same one - depending on what other pressures the OS has, how much memory / CPU you're using, etc. The OS will try to get your data fast.
If you have it turned off, the OS doesn't know enough to even try to keep each application's execution CPU close to it's memory.

CPU usage of Oracle installed Database machine

I am using oracle 11g and i have an application which is coded in Spring framework. Once i configure the database on Sun fire 4170 installed with Linux the machine's CPU utilization is around 80-100% and, however, when i shift the same database to Sun M3000 server installed with Unix OS (supposedly more powerful machine) the application performance goes down and CPU utilization remains 90-100%. I can't figure out if its the application which is making the such utilization or its the database design.
It is added that the database is not relational; things are handled by the application.

Well you certainly can find some interesting opinions on the intertubes.
Oracle does not have a true server
architecture (others have it).
Rather than performing classic server
tasks, such as multi-threading,
caching of data pages, parallel
processing (split a query across many
devices) etc. within itself, it uses
the o/s to do all that. That means for
each user process (PL/SQL connection)
there is one unix process; 1000 users
means 1000 unix processes, all
competing for the same resources.
You might note that Oracle has had
a connection pooling architecture (multi-threaded server) since version 7 (1992).
a cache for data pages (known helpfully as the buffer cache) since forever
parallel query (splitting a query across many processes) since version 7.1 (1993)
splitting queries across multiple servers since OPS (version 6) or across distributed databases (version 5)
It's also noteworthy that even if all that was said was correct rather than incorrect it doesn't actually help you in determining root cause.
Especially noteworthy, because it uses
file system files (not raw
partitions), and the "caching" is
outside, it relies heavily on (and is
very sensitive to) the file system
cache that you have set up. likewise,
Oracle needs a massive amount of
memory for these processes.
Oracle certainly can use raw partitions again dating back to the last millenium, moreover if you wish to cache within the database - using the buffer cache that PerformanceDBA has forgotten about - and bypass the filesystem cache this feature is available on all current filesystems. Oracle also supplies it's own combined filesystem/volume manager in ASM which you can use if you wish.
Oracle is also rather well instrumented (and if you have access to dtrace so is solaris) and can certainly tell you what sessions, processes etc are using the CPU, what the time the application spends in the database is consumed by (down to individual block read times if you care) and so is very susceptible to profiling. I'd recommend that you check out Thinking Clearly about Performance available at http://www.method-r.com/downloads/cat_view/38-papers-and-articles and written by one of the top Oracle Performance experts in the world. If you have access to the Oracle Diagnostics pack then checking out first of all ADDM reports and secondly AWR reports would be profitable.

Trying to avoid a flame war here.
I should probably have separated out the "how to find out" part of my response more clearly from my responses to the comments about server architecture from PerformanceDBA. I share Stephanie's suspicions about the spring framework, but without properly scoped measurement evidence there is no point in blaming any particular attribute of the environment, that would be just particular bias. Fortunately the instrumentation built into the oracle kernel allows you to trace and then profile the slow sessions to determine exactly where the issue lies. So I would do the following:
1) enable tracing for a representative session (you can use the dbms_monitor package for that).
2) also gather an execution plan for the statement(s) involved with the gather_plan_statistics hint.
3) profile the trace file by time using an appropriate profile (tkprof,orasrp,method-r profiler)
Investigate the problem statements in contribution to response time order.
If you can't carry out the above, then you can use ADDM and/or AWR if licenced as I originally suggested or statspack if not licensed for the diagnostics pack. ADDM naturally concentrates on time consumers, I suggest if you are forced down the statspack route you do the same.

The M3000 is certainly a more powerful machine, but it is more suitable for true servers. The X4170 with hyper-threads is more suited for file servers.
I'm not so certain about that. Have any data to support that claim?
An M3000 has one SPARC64 VII processor with 4 cores (tech specs) while a X4170 has 1 or 2 Intel 5500 "Nehalem-EP" processors each with 4 cores (tech specs). I know that I would expect much more from even a single processor Nehalem-EP system, than the M3000. Obviously data will vary slightly with the workload, but I know where I'd put my money.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string