Jenkins running at very high CPU usage - linux

I recently upgraded from Jenkins 1.6 to 2.5. After I did this, I noticed very high CPU usage, sometimes over 300% (there are only 4 cores, so I don't think it could go over 400%). I'm not sure where to begin debugging this, but here's a thread dump and some screenshots from top/htop
htop
top:

As it turned out, my issue was that several jobs had thousands of old builds. This was fine in Jenkins 1.6 but it's a problem in 2.5 (I guess maybe Jenkins tries to load all the builds into memory when you view the job overview page). To fix it, I just deleted most of the old builds from the problem jobs using this strategy and then reloaded jenkins. Worked like a charm!
I also set the "discard old builds" plugin to keep only the 50 most recent builds, to prevent this from happening again.

Whenever a request comes in, Jenkins will spawn some threads to serve the request. After upgrading Jenkins, it might have invoked at high throttle at that time. Plz check the CPU and memory usage of Jenkins server while the following scenarios :
Jenkins is idle and no other apps are running on the server.
Scheduled a build and no other apps are running on the server.
And compare the behaviors which could help you out to determine whether Jenkins or running jenkins in parallel with other apps are really making trouble.
As #vlp said, try to monitor the jenkins application via JVisualVM with Jstad configuration to hook in. Refer this link to Configure JvisualVM with Jstad.

I have noticed a couple of reasons for abnormal CPU usage with my Jenkins install on Windows 7 Ultimate.
I had recently upgraded from v2.138 to v2.140 plus added a few additional plugins. I started noticing a problem with the Jenkins java executable taking up to 60% of my CPU time every time a job would trigger. None of the jobs were CPU bound, just grabbing data from external servers, so it didn't make any sense. It was fixed with a simple restart of the Jenkins service. I assume the upgrade just didn't finish cleanly.
Java Garbage Collection was throwing errors and hogging the CPU when running with the default memory settings. It was probably overkill, but I went wild and upped the Java Heap Space for Jenkins from the default 256mb to 4gb; which solved this problem for me.See this solution for instructions:
https://stackoverflow.com/a/8122566/4479786

2.5 seems to be a development release, while 1.6 is their Long Term Support version. Thus it seems logical that you should expect some regressions when using the bleeding edge version. The bounty on this question is proof that other users are experiencing this as well. The solution is to report a bug on the Jenkins bug tracker. You can temporarily downgrade to the known good version for now.

Try passwing following argument to jenkins:
-Dhudson.util.AtomicFileWriter.DISABLE_FORCED_FLUSH=true
as mentioned here: https://issues.jenkins-ci.org/browse/JENKINS-52150

Related

Rapid gain in storage use with gitlab not exchanging any files

I am currently running Gitlab CE. I have an issue where it is constantly gaining space,
There is 1 current user (myself). But sitting idle it gains 20gb of usage in under an hour for no apparent reason (not pushing or pulling or even using it, the service is simply live and idle) until eventually it fills my drive (411gb of free space before the installation of Gitlabs. takes less than 24hrs to fill it.).
I cannot locate the source of the issue, google seems to like referring me to size limitations, and that is fine if I needed to increase that which I don't, i have tried to disable some metrics and the safety features such as "Health checks" in an attempt to stop it from doing this but with no success
I have to keep reinstalling it to negate the idle data usage. There is a reason for me setting it up, but I cannot deploy this the way it is. Have any of you experienced this issue? Is there a way around this?
The system current running it: Fedora 36 running the installation on a 500GB SSD, 8 core Ryzen 7 Processor.
any advice to solve this problem would be great. Please note I am not an expert.
Answer to this question:
rsync was scheduled automatically and was in a loop.
Removed rsync, reinstalled it, rescheduled rsync to go on my schedule, removed the older 100 or so back ups and my space has been returned.
for those that are running rsync, just check that it is not running too closely and is detecting that its own backups are there. as the back ups i found were corrupted.

OS specific build performance in Java

We are currently evaluating our next-generation company-wide developer pc-configuration and have noticed something really weird.
Our rather large monolith has - on our current configuration a build time of approx. 4.5 minutes (no test, just compile).
For our next generation configuration we upgraded several components. A moderate increase in frequency and IPC with the processor, doubling the number of CPU cores and a switch from a small SATA SSD towards a NVMe SSD rated at >3GBps. Also, the next generation configuration switches from Windows 7 to Windows 10.
When executing the first tests, we noticed an almost identical build time (4.3 Minutes), which was a lot less improvement than we expected.
During our experiments we tried at one point to run the build process from within a virtual Linux machine running on the windows host. On the old configuration (Windows7) we saw a drop in build times from 4.5 to ~3.7 Minutes, on the Windows 10 Host, we saw a decrease from 4.3 to 2.3 minutes. We have ruled out things like virus scan.
We were rather astonished with these results and have tried to find another explanation than some almost-religious and insulting statements about different operation systems.
So the question is: What could we have possibly done wrong in configuring the Windows machine such that the speed is almost half of a Linux running virtualized in the very same windows host? Especially as all the hardware advancements seem to be eaten up by the switch from windows 7 to 10.
Another question is: How can we ace the javac process use up more cores, because right now, using Hotspot JDK 8 we can see at most two cores really used by the build. I've read about sjavac but that seems a rather experimental feature only available to OpenJDK9 onward, right?
After almost a year in experimenting we came to the conclusion, that it is indeed NTFS which is the evil-doer. If you have a ntfs user-partition with a linux host, you get somewhat similar results compared to an all-windows-setup.
We did benchmarks of gradle-build, eclipse internal build, starting up wildfly and running database-centered tests on multiple devices. All our benchmarks showed consistently a speedup of at least 100% when switching from Windows to Linux (sometimes, Windows takes 3x the amount of time in real world benchmarks than Linux, some artificial benchmarks had a speedup of 60!). Especially on notebooks we experienced much less noise, as the combined processor load of a complete build is substantial less than with windows.
Our conclusion was, to switch from Windows to Linux over the course of the last year.
Regarding the parallelisation thing, we realized, it was some form of code-entanglement. Resolving this helped gradle and javac to parallelise the build a lot (also have a look into gradle-composite-builds)

Is Apache Zeppelin stable enough to be used in Production

I am using AWS EMR cluster. I have been experimenting with Spark Drivers and Apache Zeppelin Rest APIs to run jobs. I have run several hundred adhoc jobs with Zeppelin and didn't have any concern. With that fact I am considering to use Zeppelin Rest APIs in production. Will be submitting jobs using Rest APIs.
Has anyone experienced stability issues with Zeppelin in Production?
I have a zeppelin running in production in a multiuser environment (+/- 15 users) and it hasn't been very stable. To make it more stable I run zeppelin on its own node, not any longer on the master node.
Anyway, I found the following problems:
In the releases before 0.7.2 Zeppelin created a lot of zombie processes, which causes memory problems after heavy usage.
User libraries can break Zeppelin, this has been the case in the versions prior 0.7.0. E.g. Jackson libraries make Zeppelin unable to communicate with the spark interpreter. In 0.7.0 and up this problem has been mitigated.
There are random freezes when there are a lot of users. The only way to fix this, is a restart of the service. (All versions)
Sometimes when a user starts his interpreter and the local repo is empty, zeppelin doesn't download all the libraries specified in the interpreter config. Then it won't download them again, the only way to mitigate this is to delete the contents of the local repo of the interpreter. (All versions)
Sometimes changes on notebooks don't get saved, which causes users to loose code.
In version 0.6.0 spark interpreters shared a context, which caused users to overwrite each other variables.
Problems are difficult to debug, the logging is not that great yet. Some bugs seem to break the logging and sometimes running an interpreter in debug mode fixes the problem.
So, I wouldn't put it in a production setting yet, where people depend on it. But for testing and data discovery it would be fine. Zeppelin is clearly still in a beta stage.
Also don't run it on the master node, but setup your own instance and let it connect remotely to the cluster. This makes it much more stable. Put it on a beefy node and restart it overnight.
Most of the bugs I encountered are already on the Jira and the developers are working hard to make things better. The stability becomes better and better every release and I see the maintenance load going down every version, so it certainly has potential.
I have used zeppelin now for more than a year. It gets you going quickly when you are just starting but it is not a good candidate for production use cases and especially with more than 10 users and it depends on your cluster resources. These were my concerns overall with Zeppelin.
By default you can't have more than one job running at a time, you
will need to change the configuration to make that happen.
If you are loading additional libraries from s3 or external environments, you can do that only in the beginning or you will have
to restart zeppelin.
spark context is pre-created and there are only few settings you can make changes to.
The editor itself doesn't resize well when your output is large.
I am moving on to jupyter for my use cases which is much strong in my initial assessment.
As of the time of this answer, end of February 2019, my answer would be : NO. Plain and Simple. Zeppelin keeps crashing, hanging and getting unresponsive, notebooks tend to get unloadable due to size errors, very slow execution compared to Jupyter, plus so many limitations regarding third party displaying engines integration (although many effort have been made towards this).
I experienced these issues on a decently sized and capacited cluster, with a single user. I would never, ever, advice it to be a production tool. Not as it is today to the least. Unless you have an admin at hand able to restart the whole thing regularly and track down/fix errors and be in charge of integration.
We moved back to Jupyter, and everything worked smoothly out-of-the box from day one, after struggling to stabilize Zeppelin for weeks.

Meteor Node Process CPU Usage Nears 100%

I'm having trouble with my Meteor app when it gets to its peak amount of traffic (peak for this is nothing, 1k visits, maybe 2,500 pageviews in a day). CPU usage spikes and never recovers, so I've taken to using Nodetime to monitor usage and I've been reloading the process (forever restart) to get things back to normal.
I'm fairly new to profiling, so finding the underlying cause has me at a loss for where to start. I'm fairly certain it has to do with my app's server code, but the profiling seems to point to the Fibers module as a "hotspot" which I understand aids in making my server code synchronous.
Below is a snippet from the profiling results. I hope someone can guide me in the right direction in troubleshooting this!
While I don't have a specific answer to your question, I have experience dealing with CPU issues for our production meteor app for so I can give you a list of things to investigate.
Upgrade to the latest version of meteor and the appropriate node version (see the changelog). As of this writing that's meteor 0.8.2 and node 0.10.28.
Read this and this article. The latter makes a great point that you really should always try to delay activation of subscriptions until you need them. In particular you may not need to publish anything for users who are not logged in. In my experience, meteor CPU problems have everything to do with subscriptions.
Be careful with observe and observeChanges. These are expensive and are easy to abuse. In particular:
Make sure you are calling stop() on your handles when they are no longer needed (consider using a package like publish-with-relations so this is done for you).
Fetch only the collections and fields that you absolutely need. Observe works by continually diffing objects (requires lots of CPU). The fewer and smaller objects you have, the less there is to compute.
Consider using smart-collections before it is retired. Use oplog tailing - this can make for a night and day difference in performance and CPU usage in your app.
Consider making some things not reactive (also mentioned in the articles above). For us that was a big win. We had one extremely expensive join that was used on two frequently accessed pages on the site. When it got to the point where the CPU was pegged at 100% about every 30 minutes I gave up on reactivity for that element and just did the join on the server and shipped the data to the client via a method call. I also created a server-side expiring cache for these results and stored them by user (special thanks to Matt DeBergalis for this suggestion).
Do a preventative nightly restart. I have a cron job that tells forever to restart our app once a day in the middle of the night. That brings the CPU down from ~10% to 1%. This seems like black magic, but the fact that the CPU usage changes after a reset leads me to believe this is a good idea.
Updated thoughts (1/13/14)
We migrated to oplog tailing as soon as it was available (meteor 0.7) and that made a big difference. Note that in order to get access to the oplog, you'll probably need to either host your own db or run a dedicated instance on the hosting provider of your choice. I'd also recommend adding the facts package to actually tell if its working.
There was a memory leak discovered in publish-with-relations, and as of this writing the atmosphere version (v0.1.5) hasn't been bumped to reflect these changes. If you are using it in production, I strongly recommend checking out the HEAD version and running it locally.
We stopped doing nightly restarts a couple of weeks ago. So far everything has been fine (fingers crossed).
Updated thoughts (7/2/14)
A few months ago we switched over to using an Elastic Deployment on mongohq. It's affordable, the performance has been great, and they even have a blog post which tells you how to enable oplog tailing.
I'd strongly recommend checking out kadira to help diagnose performance issues in your app. Also check out the academy articles which have a number of good tips in them.
I'm also having this problem. Actually there is an issue with 0.6.6.1, I run meteor --release 0.6.6 and the cpu is back to normal now.

Maven build using/allocating huge amount of memory

I have a decent sized GWT (Google Web Toolkit) project that is built using Apache Maven. The build process involves generating 8 rpms and 2 wars.
I'm trying to build the project on a remote virtual server, running CentOS 5.2 as a guest OS. Since the guest OS can't use swap space, I am having to allocate a huge amount of memory to the box for it to build, otherwise I get a java could not allocate memory error (error=12). The build fails if there is under 7GB free. I suspect that most of this 7GB is never used, but is allocated for some reason.
At the end of the build the output reads: [INFO] Final Memory: 178M/553M
I have MAVEN_OPTS set to -Xms256m -Xmx1024M
I'm not sure how to make the maven build use less memory. Any suggestions are much appreciated.
Note that forking plugins like the maven gwt plugin (and maven surefire) uses memory that is "outside" the total that is reported by the maven execution. I would recommend corrolating OS-level process sizes with the output from "jps -lv" to find out which fork is stealing all your memory.
If, for instance, for some reason the forked process does not terminate it would get very crowded, very quickly.
That memory indicates it only ever needed a max of 553M, so the setting in MAVEN_OPTS is already above what you need. Are you saying you want to use less than that, or are you currently getting an error?

Resources