Ubuntu Firebird2 Classic Server config to use all CPU - linux

I am using Firebird 2.0.7 CS on my Ubuntu 16.04 Server. It is not possible to upgrade to a higher version due to the software used, which requires a lower one.
I've used the SuperServer version before, but on Linux the parameter CpuAffinityMask is ignored.
The SuperServer version works tragically because on Linux it uses only 1 core.
The ClassicServer version is a little better, because it assigns 1 core to the 1 user.
When I run demanding task in program, fb_inet_server use 100% of 1 core, but other 23 cores are idle. How I can assign more cores to this process?

The CpuAffinityMask setting is only for SuperServer (and then only for Windows).
If you are using Classic Server, then Firebird can (and it will) use all cores if there is sufficient activity, however the Firebird processes need to coordinate their effort, which - if there is a lot of lock contention - can lead to reduced performance.
To reduce lock contention, you may want to increase the LockHashSlots setting.
Increasing the number of page buffers may also help, but keep in mind that with Classic Server, this setting is per process and can increase memory usage.
Contrary to what you state, Firebird does not "assign[s] 1 core to the 1 user.". Classic Server will create a process per connection, and the threads of these processes will be scheduled by the OS on any available core.

Related

VM - LLVMPIPE (Software Rendering) and Effect of Core/Thread Usage on Application

I have written an application (Qt/C++) that creates a lot of concurrent worker threads to accomplish its task, utilizing QThreadPool (from the Qt framework). It has worked flawlessly running on a dedicated server/hardware.
A copy of this application is now running in a virtual machine (RHEL 7), and performance has suffered significantly in that the queue (from the thread pool) is being exercised quite extensively. This has resulted in things getting backed up a bit. This, despite having more cores available to the application through this VM version than the dedicated, non-virtualized server.
Today, I did some troubleshooting with the top -H -p <pid> command, and found that there were 16 total llvmpipe-# threads running all at once, apparently for software rendering of my application's very simple graphical display. It looks to me like the presence of so many of these rendering threads has left limited resources available for my actual application's threads to be concurrently running. Meaning, my worker threads are yielding/taking a back seat to these.
As this is a small/simple GUI running on a server, I don't care to dedicate so many threads to software rendering of its display. I read some Mesa3D documentation about utilizing the LP_NUM_THREADS environment variable, to limit its use. I set it to LP_NUM_THREADS=4, and as a result I seem to have effectively opened up 12 cores for my application to now use for its worker threads.
Does this sound reasonable, or will I pay some sort of other consequence for doing this?

Should I disable NUMA if my application is not NUMA-Aware

Running Windows Server 2008 R2 SP1. The application I'm running was not designed with NUMA in mind. Would it be better to disable NUMA on my dual-socket system? My guess is yes, but I wanted to confirm. My server is a Westmere dual-socket system.
If your application is not multithreaded or is multithreaded but does not employ the threads to work simultaneously on the same problem (e.g. is not data parallel), then you can simply bind the program to one of the NUMA nodes. This can be done with various tools, e.g. with the "Set Affinity..." context menu command in Windows Task Manager. If your program is parallel, then you can still use half of the available process cores and bind to one NUMA node.
Note that remote memory accesses on Westmere systems are not that expensive - the latency is 1.6x higher than local access and the bandwidth is almost the same as the local on, therefore if you do a lot of processing on each memory value the impact would be minimal. On the other hand, disabling NUMA on such systems results in fine-mesh interleave of both NUMA domains which makes all applications perform equally bad as roughly 50% of all memory accesses will be local and 50% will be remote.
If I understand correctly, turning NUMA on cannot harm the performance.
If your application is not NUMA aware, accesses will be managed by the OS, so might be across NUMA nodes or might be on the same one - depending on what other pressures the OS has, how much memory / CPU you're using, etc. The OS will try to get your data fast.
If you have it turned off, the OS doesn't know enough to even try to keep each application's execution CPU close to it's memory.

Making an Operating System Give More Memory Access to Erlang

I have always run erlang applications on powerful servers. However, sometimes, you cannot avoid such memory errors, especially when users are many
Crash dump was written to: erl_crash.dump
eheap_alloc: Cannot allocate 467078560 bytes of memory (of type "heap").
What makes it more annoying is that you have a server with 20GB of RAM, with say 8 cores. Looking at the memory which erlang says, it could not allocate and that is why it crashed, is also disturbing , because it is very little memory compared to what the server has in stock.
My question today (i wish it is not closed) , is that, what Operating system configurations can be done (consider RedHat , Solaris, Ubuntu or Linux in general), to make it offer the erlang VM more memory when it needs it ? If one is to run an erlang application on such capable servers, what memory consideration (outside erlang) should be made as regards the underlying operating system ? Problem Background Erlang consumes Main Memory, especially when processes are in thousands. I am running a Web service using Yaws Web Server. On the same node, i have Mnesia running with about 3 ram_copies tables. Its a notification system, as part of a larger Web application running on an intranet. Users access this very system via JSONP from the main application running off a different web server and a different hardware as well. Each user connection queries mnesia directly for any data it needs. However, as users increase i always get the crash dump. I have tweaked the application itself as much as possible, clean up the code to standard, used more binaries than strings e.t.c. avoided single points like gen_servers between yaws processes and mnesia, so that each connection, just hits mnesia directly. The server is very capable with lots of RAM and Disc Space. However, my node crashes when it needs a little more memory, thats why i need to find a way of forcing the Operating system to expand more memory to erlang. Operating system is REDHAT ENTERPRISE 6
It is probably because you are running in 32bit mode where only approx 4 GB of RAM is addressable. Try switching to the 64bit version of erlang and try again.
Several various server tutorials I have read say that if the service runs as a non root user, you may have to edit the /etc/security/limits.conf to allow that user to access more memory than it is typically allowed. the example below lets user fooservice use 2GB.
fooservice hard memlock 2097152

Node.js with V8 suitable for limited memory device?

Would like to know is node.js with V8 engine suitable to be deployed on limited memory device (e.g. 256mb) and running in parallel with other process.
I read that it will hook up the resource of the machine. Is there way to limit the memory and processing usage of V8 engine itself?
256 MB is sufficient amount of RAM to run Node.js (e.g. on Linux VPS instance), assuming no other memory-hog software is run. Node has --max-stack-size argument for limiting the memory usage.
Node's single-thread evented server model generally makes efficient use of resources, but V8 due its JIT architecture is likely to use somewhat more memory than interpreted/bytecompiled implementations, such as PHP or CPython (while offering superior performance). Also, to take advantage of multiple CPU cores, multiple Node.js processes must be run (versus memory-sharing threads), effectively multiplying the memory usage, but this limitation applies to its most popular competitors as well.
In the respect of "running in parallel with other process" or "hooking up the resource of the machine", there is nothing special about running Node.js process (except the not uncommon multicore issue); it behaves similarly to any userland program. You can low-prioritize the Node.js process in OS level (e.g. with nice), but depending on your device/application, I/O can be potentially more an issue.
Purely from technical/effectiviness perspective, Erlang is probably more ideal choice for a high-level language when true multiprocessing support and high concurrency is required.
64MB of RAM is sufficient for V8 and Node.js
See "Compiling Node.js for Arduino YÚN"
and also "installing Node.js on Arduino YÚN".
Arduino YÚN runs linux with 64MB of RAM.
The BeagleBone has 256MB RAM (and in a normal configuration, no virtual memory), and it runs node.js quite nicely.

CPU usage of Oracle installed Database machine

I am using oracle 11g and i have an application which is coded in Spring framework. Once i configure the database on Sun fire 4170 installed with Linux the machine's CPU utilization is around 80-100% and, however, when i shift the same database to Sun M3000 server installed with Unix OS (supposedly more powerful machine) the application performance goes down and CPU utilization remains 90-100%. I can't figure out if its the application which is making the such utilization or its the database design.
It is added that the database is not relational; things are handled by the application.
Well you certainly can find some interesting opinions on the intertubes.
Oracle does not have a true server
architecture (others have it).
Rather than performing classic server
tasks, such as multi-threading,
caching of data pages, parallel
processing (split a query across many
devices) etc. within itself, it uses
the o/s to do all that. That means for
each user process (PL/SQL connection)
there is one unix process; 1000 users
means 1000 unix processes, all
competing for the same resources.
You might note that Oracle has had
a connection pooling architecture (multi-threaded server) since version 7 (1992).
a cache for data pages (known helpfully as the buffer cache) since forever
parallel query (splitting a query across many processes) since version 7.1 (1993)
splitting queries across multiple servers since OPS (version 6) or across distributed databases (version 5)
It's also noteworthy that even if all that was said was correct rather than incorrect it doesn't actually help you in determining root cause.
Especially noteworthy, because it uses
file system files (not raw
partitions), and the "caching" is
outside, it relies heavily on (and is
very sensitive to) the file system
cache that you have set up. likewise,
Oracle needs a massive amount of
memory for these processes.
Oracle certainly can use raw partitions again dating back to the last millenium, moreover if you wish to cache within the database - using the buffer cache that PerformanceDBA has forgotten about - and bypass the filesystem cache this feature is available on all current filesystems. Oracle also supplies it's own combined filesystem/volume manager in ASM which you can use if you wish.
Oracle is also rather well instrumented (and if you have access to dtrace so is solaris) and can certainly tell you what sessions, processes etc are using the CPU, what the time the application spends in the database is consumed by (down to individual block read times if you care) and so is very susceptible to profiling. I'd recommend that you check out Thinking Clearly about Performance available at http://www.method-r.com/downloads/cat_view/38-papers-and-articles and written by one of the top Oracle Performance experts in the world. If you have access to the Oracle Diagnostics pack then checking out first of all ADDM reports and secondly AWR reports would be profitable.
Trying to avoid a flame war here.
I should probably have separated out the "how to find out" part of my response more clearly from my responses to the comments about server architecture from PerformanceDBA. I share Stephanie's suspicions about the spring framework, but without properly scoped measurement evidence there is no point in blaming any particular attribute of the environment, that would be just particular bias. Fortunately the instrumentation built into the oracle kernel allows you to trace and then profile the slow sessions to determine exactly where the issue lies. So I would do the following:
1) enable tracing for a representative session (you can use the dbms_monitor package for that).
2) also gather an execution plan for the statement(s) involved with the gather_plan_statistics hint.
3) profile the trace file by time using an appropriate profile (tkprof,orasrp,method-r profiler)
Investigate the problem statements in contribution to response time order.
If you can't carry out the above, then you can use ADDM and/or AWR if licenced as I originally suggested or statspack if not licensed for the diagnostics pack. ADDM naturally concentrates on time consumers, I suggest if you are forced down the statspack route you do the same.
The M3000 is certainly a more powerful machine, but it is more suitable for true servers. The X4170 with hyper-threads is more suited for file servers.
I'm not so certain about that. Have any data to support that claim?
An M3000 has one SPARC64 VII processor with 4 cores (tech specs) while a X4170 has 1 or 2 Intel 5500 "Nehalem-EP" processors each with 4 cores (tech specs). I know that I would expect much more from even a single processor Nehalem-EP system, than the M3000. Obviously data will vary slightly with the workload, but I know where I'd put my money.

Resources