CPU usage of Oracle installed Database machine - linux

I am using oracle 11g and i have an application which is coded in Spring framework. Once i configure the database on Sun fire 4170 installed with Linux the machine's CPU utilization is around 80-100% and, however, when i shift the same database to Sun M3000 server installed with Unix OS (supposedly more powerful machine) the application performance goes down and CPU utilization remains 90-100%. I can't figure out if its the application which is making the such utilization or its the database design.
It is added that the database is not relational; things are handled by the application.

Well you certainly can find some interesting opinions on the intertubes.
Oracle does not have a true server
architecture (others have it).
Rather than performing classic server
tasks, such as multi-threading,
caching of data pages, parallel
processing (split a query across many
devices) etc. within itself, it uses
the o/s to do all that. That means for
each user process (PL/SQL connection)
there is one unix process; 1000 users
means 1000 unix processes, all
competing for the same resources.
You might note that Oracle has had
a connection pooling architecture (multi-threaded server) since version 7 (1992).
a cache for data pages (known helpfully as the buffer cache) since forever
parallel query (splitting a query across many processes) since version 7.1 (1993)
splitting queries across multiple servers since OPS (version 6) or across distributed databases (version 5)
It's also noteworthy that even if all that was said was correct rather than incorrect it doesn't actually help you in determining root cause.
Especially noteworthy, because it uses
file system files (not raw
partitions), and the "caching" is
outside, it relies heavily on (and is
very sensitive to) the file system
cache that you have set up. likewise,
Oracle needs a massive amount of
memory for these processes.
Oracle certainly can use raw partitions again dating back to the last millenium, moreover if you wish to cache within the database - using the buffer cache that PerformanceDBA has forgotten about - and bypass the filesystem cache this feature is available on all current filesystems. Oracle also supplies it's own combined filesystem/volume manager in ASM which you can use if you wish.
Oracle is also rather well instrumented (and if you have access to dtrace so is solaris) and can certainly tell you what sessions, processes etc are using the CPU, what the time the application spends in the database is consumed by (down to individual block read times if you care) and so is very susceptible to profiling. I'd recommend that you check out Thinking Clearly about Performance available at http://www.method-r.com/downloads/cat_view/38-papers-and-articles and written by one of the top Oracle Performance experts in the world. If you have access to the Oracle Diagnostics pack then checking out first of all ADDM reports and secondly AWR reports would be profitable.

Trying to avoid a flame war here.
I should probably have separated out the "how to find out" part of my response more clearly from my responses to the comments about server architecture from PerformanceDBA. I share Stephanie's suspicions about the spring framework, but without properly scoped measurement evidence there is no point in blaming any particular attribute of the environment, that would be just particular bias. Fortunately the instrumentation built into the oracle kernel allows you to trace and then profile the slow sessions to determine exactly where the issue lies. So I would do the following:
1) enable tracing for a representative session (you can use the dbms_monitor package for that).
2) also gather an execution plan for the statement(s) involved with the gather_plan_statistics hint.
3) profile the trace file by time using an appropriate profile (tkprof,orasrp,method-r profiler)
Investigate the problem statements in contribution to response time order.
If you can't carry out the above, then you can use ADDM and/or AWR if licenced as I originally suggested or statspack if not licensed for the diagnostics pack. ADDM naturally concentrates on time consumers, I suggest if you are forced down the statspack route you do the same.

The M3000 is certainly a more powerful machine, but it is more suitable for true servers. The X4170 with hyper-threads is more suited for file servers.
I'm not so certain about that. Have any data to support that claim?
An M3000 has one SPARC64 VII processor with 4 cores (tech specs) while a X4170 has 1 or 2 Intel 5500 "Nehalem-EP" processors each with 4 cores (tech specs). I know that I would expect much more from even a single processor Nehalem-EP system, than the M3000. Obviously data will vary slightly with the workload, but I know where I'd put my money.

Related

Performance implications of distribution of threads amongst processes in a server

The question title is pretty awkward, sorry about that.
I am currently working on the design of a server, and a comment came up from one of my co-workers that we should use multiple processes, since the was some performance hit to having too many threads in a single process (as opposed to having that same number of threads spread over multiple processes on the same machine)
The only thing I can think of which would cause this (other than bad OS scheduling), would be from increased contention (for example on the memory allocator), but I'm not sure how much that matters.
Is this a 'best practice'? Does anyone have some benchmarks they could share with me? Of course the answer may depend on the platform (I'm interested mostly in windows/linux/osx, although I need to care about HP-UX, AIX, and Solaris to some extent)
There are of course other benefits to using a multi-process architecture, such as process isolation to limit the effect of a crash, but I'm interested about performance for this question.
For some context, the server is going to service long-running, stateful connections (so they cannot be migrated to other server processes) which send back a lot of data, and can also cause a lot of local DB processing on the server machine. It's going to use the proactor architecture in-process and be implemented in C++. The server will be expected to run for weeks/months without need of restart (although this may be implemented by rotating new instances transparently under some proxy).
Also, we will be using a multi-process architecture, my concern is more about scheduling connections to processes.

Making an Operating System Give More Memory Access to Erlang

I have always run erlang applications on powerful servers. However, sometimes, you cannot avoid such memory errors, especially when users are many
Crash dump was written to: erl_crash.dump
eheap_alloc: Cannot allocate 467078560 bytes of memory (of type "heap").
What makes it more annoying is that you have a server with 20GB of RAM, with say 8 cores. Looking at the memory which erlang says, it could not allocate and that is why it crashed, is also disturbing , because it is very little memory compared to what the server has in stock.
My question today (i wish it is not closed) , is that, what Operating system configurations can be done (consider RedHat , Solaris, Ubuntu or Linux in general), to make it offer the erlang VM more memory when it needs it ? If one is to run an erlang application on such capable servers, what memory consideration (outside erlang) should be made as regards the underlying operating system ? Problem Background Erlang consumes Main Memory, especially when processes are in thousands. I am running a Web service using Yaws Web Server. On the same node, i have Mnesia running with about 3 ram_copies tables. Its a notification system, as part of a larger Web application running on an intranet. Users access this very system via JSONP from the main application running off a different web server and a different hardware as well. Each user connection queries mnesia directly for any data it needs. However, as users increase i always get the crash dump. I have tweaked the application itself as much as possible, clean up the code to standard, used more binaries than strings e.t.c. avoided single points like gen_servers between yaws processes and mnesia, so that each connection, just hits mnesia directly. The server is very capable with lots of RAM and Disc Space. However, my node crashes when it needs a little more memory, thats why i need to find a way of forcing the Operating system to expand more memory to erlang. Operating system is REDHAT ENTERPRISE 6
It is probably because you are running in 32bit mode where only approx 4 GB of RAM is addressable. Try switching to the 64bit version of erlang and try again.
Several various server tutorials I have read say that if the service runs as a non root user, you may have to edit the /etc/security/limits.conf to allow that user to access more memory than it is typically allowed. the example below lets user fooservice use 2GB.
fooservice hard memlock 2097152

Architecture to Sandbox the Compilation and Execution Of Untrusted Source Code

The SPOJ is a website that lists programming puzzles, then allows users to write code to solve those puzzles and upload their source code to the server. The server then compiles that source code (or interprets it if it's an interpreted language), runs a battery of unit tests against the code, and verifies that it correctly solves the problem.
What's the best way to implement something like this - how do you sandbox the user input so that it can not compromise the server? Should you use SELinux, chroot, or virtualization? All three plus something else I haven't thought of?
How does the application reliably communicate results outside of the jail while also insuring that the results are not compromised? How would you prevent, for instance, an application from writing huge chunks of nonsense data to disk, or other malicious activities?
I'm genuinely curious, as this just seems like a very risky sort of application to run.
A chroot jail executed from a limited user account sounds like the best starting point (i.e. NOT root or the same user that runs your webserver)
To prevent huge chunks of nonsense data being written to disk, you could use disk quotas or a separate volume that you don't mind filling up (assuming you're not testing in parallel under the same user - or you'll end up dealing with annoying race conditions)
If you wanted to do something more scalable and secure, you could use dynamic virtualized hosts with your own server/client solution for communication - you have a pool of 'agents' that receive instructions to copy and compile from X repository or share, then execute a battery of tests, and log the output back via the same server/client protocol. Your host process can watch for excessive disk usage and report warnings if required, the agents may or may not execute the code under a chroot jail, and if you're super paranoid you would destroy the agent after each run and spin up a new VM when the next sample is ready for testing. If you're doing this large scale in the cloud (e.g. 100+ agents running on EC2) you only ever have enough spun up to accommodate demand and therefore reduce your costs. Again, if you're going for scale you can use something like Amazon SQS to buffer requests, or if you're doing a experimental sample project then you could do something much simpler (just think distributed parallel processing systems, e.g. seti#home)

User-Contributed Code Security

I've seen some websites that can run code from the browser, and the code is evaluated on the server.
What is the security best-practice for applications that run user-contributed code? Besides of accessing and changing the server's sensitive information.
(for example, using a Python with a stripped-down version of the standard library)
How to prevent DoS like non-halting and/or CPU-intensive programs? (we can't use static code analysis here) What about DoSing the type check system?
Python, Prolog and Haskell are suggested examples to talk about.
The "best practice" (am I really the only one who hates that phrase?) is probably just not to do it at all.
If you really must do it, set it up to run in a virtual machine (and I don't mean something like a JVM; I mean something that hosts an OS) so it's easy to restore the VM from a snapshot (or whatever the VM in question happens to call it).
In most cases, you'll need to go a bit beyond just that though. Without some extra work to lock it down, even a VM can use enough resources to reduce responsiveness so it can be difficult to kill and restart it (you usually can eventually, but "eventually" is rarely what you want). You also generally want to set some quotas to limit its total CPU usage, probably limit it to using a single CPU (and run it on a machine with at least two), limit its total memory usage, etc. In Windows, for example, you can do (at least most of that) by starting the VM in a job object, and limiting the resources available to the job object.

Linux vs Win runtime timings

I have an application which was ported from Windows to Linux. Now the same code compiles on VS C++ and g++, but there is a difference in performance when it's running on Win and when it's running on Linux. The scope of this application is caching. It's a node between a server and a client, and it's caching client requests and server response in a list, so that any other client which makes requests that was already processed by the server, this node will response instead of forwarding it to server.
When this node runs on Windows, the client gets all it needs in about 7 seconds. But when same node is running on Linux (Ubuntu 9.04), the client starts up in 35 seconds. Every test is from scratch. I'm trying to understand why is this timing difference. A weird scenario is when the node is running on Linux but in a Virtual Machine, hosted by Win. In this case, load time is around 7 seconds, just like it was running Win natively. So, my impression is that there is a problem with networking.
This node is using UDP protocol for sending and receiving network data, and it's using boost::asio as implementation. I tried to change all supported socket flags, changed buffer size, but nothing.
Does someone know why is this happening, or any network settings related with UDP that might influence the performance?
Thanks.
If you suspect a network problem take a network capture (Wireshark is great for this kind of problem) and look at the traffic.
Find out where the time is being spent, either based on the network capture or based on the output of a profiler.
Once you know that you're half way to a solution.
These timing differences can depend on many factors, but the first one coming to mind is that you are using a modern Windows version. XP already had features to keep recently used applications in memory, but in Vista this was much better optimized. For each application you load, a special load file is created that is equal to how it looks in memory. Next time you load your application, it should go a lot faster.
I don't know about Linux, but it is very well possible that it needs to load your app completely each time. You can test the difference in performance between the two systems much better if you compare performance when running. Leave your application open (if it is possible with your design) and compare again.
These differences in how the system optimizes memory are backed up by your scenario using the VM approach.
Basically, if you rule out other running applications and if you run your application in high priority mode, the performance should be close to equal, but it depends on whether you use operating system specific code, how you access the file system, how you you use the UDP protocol etc etc.

Resources