JBoss: 32 vs 64 Bit Performance differences? - linux

I know that it is a pretty vague question but I was hoping to get some ideas about where to look as it is a little puzzling to me.
I have a web app that computes some value and returns it to the client (EJB remote calls). When I call my localhost from a main() test looping 10 times, it comes back within about 100 milliseconds. When I call the DEV machine following the same process, it is sometimes fast and sometimes really slow, like 4 seconds, which is a huge difference.
The weird thing is that my localhost is a 32 bit 1GB Jboss config but my DEV machine is a 64 bit 6GB Jboss config so if anything, I would expect my localhost to hang... not the the DEV machine.
Where would you suggest starting the troubleshooting process?

If I understood right, both calls are made from same computer? If that is the case, network between is much more likely source for response time differences than 32 vs. 64 bit.
If that is not the case, then monitor dev and check what is the difference in context (other applications etc.) between "fast" and "4 seconds" cases. Anyway, most likely difference in response times have nothing to with difference between 32 bit / 64 bit.

Some time ago I worked on application which was deployed on JBoss on two servers with exactly the same hardware configuration. The first server had CentOS and the second FreeBSD. Exactly the same hardware, the same network, similar load. From what I observed, application responses when it was running on FreeBSD was about 1.5 - 2 times faster. On the first sight, it was strange for me, but after week of tests differences in response times was confirmed.
Since that time I do not consider hardware configuration as so important as I thought before ;)

We resolved the issue after finding out that the install on the linux machine actually had two different instances of JBoss running on the VM therefore resulting in unpredictable behavior. The resources that were consumed were enormous, which did not make any sense based on the app that was deployed...

Related

Rapid gain in storage use with gitlab not exchanging any files

I am currently running Gitlab CE. I have an issue where it is constantly gaining space,
There is 1 current user (myself). But sitting idle it gains 20gb of usage in under an hour for no apparent reason (not pushing or pulling or even using it, the service is simply live and idle) until eventually it fills my drive (411gb of free space before the installation of Gitlabs. takes less than 24hrs to fill it.).
I cannot locate the source of the issue, google seems to like referring me to size limitations, and that is fine if I needed to increase that which I don't, i have tried to disable some metrics and the safety features such as "Health checks" in an attempt to stop it from doing this but with no success
I have to keep reinstalling it to negate the idle data usage. There is a reason for me setting it up, but I cannot deploy this the way it is. Have any of you experienced this issue? Is there a way around this?
The system current running it: Fedora 36 running the installation on a 500GB SSD, 8 core Ryzen 7 Processor.
any advice to solve this problem would be great. Please note I am not an expert.
Answer to this question:
rsync was scheduled automatically and was in a loop.
Removed rsync, reinstalled it, rescheduled rsync to go on my schedule, removed the older 100 or so back ups and my space has been returned.
for those that are running rsync, just check that it is not running too closely and is detecting that its own backups are there. as the back ups i found were corrupted.

OS specific build performance in Java

We are currently evaluating our next-generation company-wide developer pc-configuration and have noticed something really weird.
Our rather large monolith has - on our current configuration a build time of approx. 4.5 minutes (no test, just compile).
For our next generation configuration we upgraded several components. A moderate increase in frequency and IPC with the processor, doubling the number of CPU cores and a switch from a small SATA SSD towards a NVMe SSD rated at >3GBps. Also, the next generation configuration switches from Windows 7 to Windows 10.
When executing the first tests, we noticed an almost identical build time (4.3 Minutes), which was a lot less improvement than we expected.
During our experiments we tried at one point to run the build process from within a virtual Linux machine running on the windows host. On the old configuration (Windows7) we saw a drop in build times from 4.5 to ~3.7 Minutes, on the Windows 10 Host, we saw a decrease from 4.3 to 2.3 minutes. We have ruled out things like virus scan.
We were rather astonished with these results and have tried to find another explanation than some almost-religious and insulting statements about different operation systems.
So the question is: What could we have possibly done wrong in configuring the Windows machine such that the speed is almost half of a Linux running virtualized in the very same windows host? Especially as all the hardware advancements seem to be eaten up by the switch from windows 7 to 10.
Another question is: How can we ace the javac process use up more cores, because right now, using Hotspot JDK 8 we can see at most two cores really used by the build. I've read about sjavac but that seems a rather experimental feature only available to OpenJDK9 onward, right?
After almost a year in experimenting we came to the conclusion, that it is indeed NTFS which is the evil-doer. If you have a ntfs user-partition with a linux host, you get somewhat similar results compared to an all-windows-setup.
We did benchmarks of gradle-build, eclipse internal build, starting up wildfly and running database-centered tests on multiple devices. All our benchmarks showed consistently a speedup of at least 100% when switching from Windows to Linux (sometimes, Windows takes 3x the amount of time in real world benchmarks than Linux, some artificial benchmarks had a speedup of 60!). Especially on notebooks we experienced much less noise, as the combined processor load of a complete build is substantial less than with windows.
Our conclusion was, to switch from Windows to Linux over the course of the last year.
Regarding the parallelisation thing, we realized, it was some form of code-entanglement. Resolving this helped gradle and javac to parallelise the build a lot (also have a look into gradle-composite-builds)

JMeter: How to calculate maximum number of threads per machine

The JMeter manual says
Your hardware's capabilities will limit the number of threads you can effectively run with JMeter. It will also depend on how fast your server is (a faster server makes JMeter work harder since it returns request quicker). The more JMeter works, the less accurate its timing information may become.
The question I want to ask is How many threads can I run from a single desktop machine and still get accurate enough results? However, I realize that's going to depend on what we define modern hardware as, or how fast my application/site is, etc.
So, the better (but harder to answer) question is, how to I profile JMeter to know when I've gone beyond the thread/user count that it's reasonable for a single machine to handle? Accurate deterministic methods are preferred, but anecdotal/rules-of-thumb are welcome.
I first suggest you follow best-practices for building JMeter test plans and running them:
http://www.ubik-ingenierie.com/blog/jmeter_performance_tuning_tips/
http://jmeter.apache.org/usermanual/best-practices.html
Then once your test plan is built, baseline it on the JMeter machine:
Monitor CPU (don't exceed 50%), swap (ensure no swap in/out at all)
Check GC for no long pauses
And don't forget issues which make Test wrong can come from lot of factors:
Networks issue between injector and application
TCP stack issues on JMeter injector
Components between the Injector and Application (Firewall, Load Balancer ...)

Node.js app has periodic slowness and/or timeouts (does not accept incoming requests)

This problem is killing the stability of my production servers.
To recap, the basic idea is that my node server(s) sometimes intermittently slow down, sometimes resulting in Gateway Timeouts. As best as I can tell from my logs, something is blocking the node thread (meaning that the incoming request is not accepted), but I cannot for the life of me figure out what.
The problem ranges in severity. Sometimes what should be <100ms requests take ~10 seconds to complete; sometimes they never even get accepted by the node server at all. In short, it is as though some random task is working and blocking the node thread for a period of time, thus slowing down (or even blocking) incoming requests; the one thing I can say for sure is that the need-to-fix-symptom is a "Gateway Timeout".
The issue comes and goes without warning. I have not been able to correlate it against CPU usage, RAM usage, uptime, or any other relevant statistic. I've seen the servers handle a large load fine, and then have this error with a small load, so it does not even appear to be load-related. It is not unusual to see the error around 1am PST, which is the smallest load time of the day! Restarting the node app does seem to maybe make the problem go away for a while, but that really doesn't tell me much. I do wonder if it might be a bug in node.js... not very comforting, considering it is killing my production servers.
The first thing I did was to make sure I had upgraded node.js to the latest (0.8.12), as well as all my modules (here they are). Of course, I also have plenty of error catchers in place. I'm not doing anything funky like printing out lots to the console or writing to lots of files.
At first, I thought it was outbound HTTP requests blocking the incoming socket, because the express middleware was not even picking up the inbound request, but I gave up the theory because it looks like the node thread itself became busy.
Next, I went through all my code with JSHint and fixed literally every single warning, including a few accidental globals (forgetting to write "var") but this didn't help
After that, I assumed that perhaps I was running out of memory. But, my heap snapshots via nodetime are looking pretty good now (described below).
Still thinking that memory might be an issue, I took a look at garbage collection. I enabled the --nouse-idle-notification flag and did some more code optimization to NULL objects when they were not needed.
Still convinced that memory was the issue, I added the --expose-gc flag and executed the gc(); command every minute. This did not change anything, except to occasionally make requests a bit slower perhaps.
In a desperate attempt, I setup the "cluster" module to use 2 workers and automatically restart them every 30 min. Still, no luck.
I increased the ulimit to over 10,000 and kept an eye on the open files. There seem to be < 300 open files (or sockets) per node.js app, and increasing the ulimit thus had no impact.
I've been logging my server with nodetime and here's the jist of it:
CentOS 5.2 running on the Amazon Cloud (m1.large instance)
Greater than 5000 MB free memory at all times
Less than 150 MB heap size at all times
CPU usage is less than 60% at all times
I've also checked my MongoDB servers, which have <5% CPU usage and no requests are taking > 100ms to complete, so I highly doubt there's a bottleneck.
I've wrapped (almost) all my code using Q-promises (see code sample), and of course have avoided Sync() calls like the plague. I've tried to replicate the issue on my testing server (OSX), but have had little luck. Of course, this may be just because the production servers are being used by so many people in so many unpredictable ways that I simply cannot replicate via stress tests...
Many months after I first asked this question, I found the answer.
In a nutshell, the problem was that I was not piping a big asset when transferring it from one server to another. In other words, I was downloading an image from one server, before uploading it to a S3 bucket. Instead of streaming the download into the upload, I downloaded the file into memory, and then uploaded it.
I'm not sure why this did not show up as a memory spike, or elsewhere in my statistics.
My guess is Mongoose. If you are storing large payloads in Mongo, Mongoose can be pretty slow due to how it builds the Mongoose objects. See https://github.com/LearnBoost/mongoose/issues/950 for more details on the problem. If this is the problem you wouldn't see it in Mongo itself since the query returns quickly, but object instantiation could take 75x the query time.
Try setting up timers around (process.hrtime()) before and after you the Mongoose objects are being created to see if that might be the problem. If this is the problem, I would switch to using the node Mongo driver directly instead of going through Mongoose.
You are heavily leaking memory, try setting every object to null as soon as you don't need it anymore! Read this.
More information about hunting down memory leaks can be found here.
Give special attention to having multiple references to the same object and check if you have circular references, those are a pain to debug but will help you very much.
Try invoking the garbage collector manually every minute or so (I don't know if you can do this in node.js cause I'm more of a c++ and php coder). From my years of experience working with c++ I can tell you the most likely cause of your application slowing down over time is memory leaks, find them and plug them, you'll be ok!
Also assuming you're not caching and/or processing images, audio or video in memory or anything like that 150M heap is a lot! Those could be hundreds of thousands or even millions of small objects.
You don't have to be running out of memory for your application to slow down... just searching for free memory with that many objects already allocated is a huge job for the memory allocator, it takes a lot of time to allocate each new object and as you leak more and more memory that time only increases.
Is "--nouse-idle-connection" a mistake? do you really mean "--nouse_idle_notification".
I think it's maybe some issues about gc with too many tiny objects.
node is single process, so watch the most busy cpu core is much important than the load.
when your program is slow, you can execute "gdb node pid" and "bt" to see what node is busy doing.
What I'd do is set up a parallel node instance on the same server with some kind of echo service and test that one. If it runs fine, you narrow down your problem to your program code (and not a scheduler/OS-level problem). Then, step by step, include the modules and test again. Certainly this is a lot of work, takes long and I dont know if it is doable on your system.
If you need to get this working now, you can go the NASA redundancy route:
Bring up a second copy of your production servers, and put a proxy in front of them which routes each request to both stacks and returns the first response. I don't recommend this as a perfect long-term solution but it should help significantly reduce issues in production now, and help you gather log data that you could replay to recreate the issues on non-production servers.
Obviously, this is straight-forward for read requests, but more complex for commands which write to the db.
We have a similar problem with our Node.js server. It didn't scale well for weeks and we had tried almost everything as you had. Our problem was in the implicit backlog value which is set very low for high-concurrent environments.
http://nodejs.org/api/http.html#http_server_listen_port_hostname_backlog_callback
Setting the backlog to a significantly higher value (e.g. 10000) as well as tune networking in our kernel (/etc/sysctl.conf on Linux) as described in manual section helped a lot. From this time forward we don't have any timeouts in our Node.js server.

Linux vs Win runtime timings

I have an application which was ported from Windows to Linux. Now the same code compiles on VS C++ and g++, but there is a difference in performance when it's running on Win and when it's running on Linux. The scope of this application is caching. It's a node between a server and a client, and it's caching client requests and server response in a list, so that any other client which makes requests that was already processed by the server, this node will response instead of forwarding it to server.
When this node runs on Windows, the client gets all it needs in about 7 seconds. But when same node is running on Linux (Ubuntu 9.04), the client starts up in 35 seconds. Every test is from scratch. I'm trying to understand why is this timing difference. A weird scenario is when the node is running on Linux but in a Virtual Machine, hosted by Win. In this case, load time is around 7 seconds, just like it was running Win natively. So, my impression is that there is a problem with networking.
This node is using UDP protocol for sending and receiving network data, and it's using boost::asio as implementation. I tried to change all supported socket flags, changed buffer size, but nothing.
Does someone know why is this happening, or any network settings related with UDP that might influence the performance?
Thanks.
If you suspect a network problem take a network capture (Wireshark is great for this kind of problem) and look at the traffic.
Find out where the time is being spent, either based on the network capture or based on the output of a profiler.
Once you know that you're half way to a solution.
These timing differences can depend on many factors, but the first one coming to mind is that you are using a modern Windows version. XP already had features to keep recently used applications in memory, but in Vista this was much better optimized. For each application you load, a special load file is created that is equal to how it looks in memory. Next time you load your application, it should go a lot faster.
I don't know about Linux, but it is very well possible that it needs to load your app completely each time. You can test the difference in performance between the two systems much better if you compare performance when running. Leave your application open (if it is possible with your design) and compare again.
These differences in how the system optimizes memory are backed up by your scenario using the VM approach.
Basically, if you rule out other running applications and if you run your application in high priority mode, the performance should be close to equal, but it depends on whether you use operating system specific code, how you access the file system, how you you use the UDP protocol etc etc.

Resources