gitlab backup fails at one repo - gitlab

problem is that during backing up of repos, everytime i get the same error, that you can see below
my gitlab settings are 8 cpu, 16gb ram and 1.5tb space (if you need more data, let me know)
this one repo is about 50gb and it stops my whole process... i have other ones that are bigger and everything goes smoothly.
assuming that hardware is ok, what can i do to complete backup?
ps. those repos contains dvd images, but i dont have access to check what exactly
ive increased disk space and ram, tried to reconfigure, but it didnt help.
i didnt try yet to increase swap and ram even higher, although it doesnt seem like its going out of memory, ill try to increase it, just to eliminate hardware for 100%

Related

Rapid gain in storage use with gitlab not exchanging any files

I am currently running Gitlab CE. I have an issue where it is constantly gaining space,
There is 1 current user (myself). But sitting idle it gains 20gb of usage in under an hour for no apparent reason (not pushing or pulling or even using it, the service is simply live and idle) until eventually it fills my drive (411gb of free space before the installation of Gitlabs. takes less than 24hrs to fill it.).
I cannot locate the source of the issue, google seems to like referring me to size limitations, and that is fine if I needed to increase that which I don't, i have tried to disable some metrics and the safety features such as "Health checks" in an attempt to stop it from doing this but with no success
I have to keep reinstalling it to negate the idle data usage. There is a reason for me setting it up, but I cannot deploy this the way it is. Have any of you experienced this issue? Is there a way around this?
The system current running it: Fedora 36 running the installation on a 500GB SSD, 8 core Ryzen 7 Processor.
any advice to solve this problem would be great. Please note I am not an expert.
Answer to this question:
rsync was scheduled automatically and was in a loop.
Removed rsync, reinstalled it, rescheduled rsync to go on my schedule, removed the older 100 or so back ups and my space has been returned.
for those that are running rsync, just check that it is not running too closely and is detecting that its own backups are there. as the back ups i found were corrupted.

Unable to locate the memory hog on openvz container

i have a very odd issue on one of my openvz containers. The memory usage reported by top,htop,free and openvz tools seems to be ~4GB out of allocated 10GB.
when i list the processes by memory usage or use ps_mem.py script, i only get ~800MB of memory usage. Similarily, when i browse the process list in htop, i find myself unable to pinpoint the memory hogging offender.
There is definitely a process leaking ram in my container, but even when it hits critical levels and i stop everything in that container (except for ssh, init and shells) i cannot reclaim the ram back. Only restarting the container helps, otherwise the OOM starts kicking in in the container eventually.
I was under the assumption that leaky process releases all its ram when killed, and you can observe its misbehavior via top or similar tools.
If anyone has ever experienced behavior like this, i would be grateful for any hints. The container is running icinga2 (which i suspect for leaking ram) , although at most times the monitoring process sits idle, as it manages to execute all its scheduled checks in more than timely manner - so i'd expect the ram usage to drop at those times. It doesn't though.
I had a similar issue in the past and in the end it was solved by the hosting company where I had my openvz container. I think the best approach would be to open a support ticket to your hoster, explain them the problem and ask them to investigate. Maybe they use some outdated kernel version or they did changes on the server that have impact on your ovz container.

npm inodes issues in docker / kubernetes environment

we have a kubernetes cluster environment which at any given time has around 10~20 pods/containers running on a single node. Each node has around 200k ish inodes available. However our micro service (nodejs/npm) app can each eats up around 20k+ inodes, and times that by say 10 pods/containers on a node that basically eats up all the available inodes on a server (node).
The question is if there is way to deal with this issue in the node_modules to minimize the number of files it contains or use some kind of bundler for the node_modules ??
There isn't really anything Kubernetes can do for you. You need to increase the number of inodes on your filesystem.
Without specifics on the size of your disks I can't give you exact advice. But, the details on the arch wiki about bytes per inode is the correct advice: https://wiki.archlinux.org/index.php/ext4#Bytes-per-inode_ratio

Graph showing 1K instead of GB's about Disk's in Ganglia/RRDTool

I'm using Ganglia and RRDTool to show charts in a web page.
Everything is fine, but for some machines the graphs about DISK are with some kind of bug.
Here is how they look in some machines (both machine are in the same cluster):
This one is correct, about the disk space:
But this one is showing 1.4Kb of disk space. Which is incorrect. How can I fix this ?
Any idea ? I already uninstall it and install it many times, but it doesn't seem to fix the problem.
I got in touch with Ganglia Developers e-Mail list, they told me that the value is correct, which means 1.4K GB, which are 1400GB or 1.4T. I was all a misunderstanding. I just think it should be a documentation more clear about what are the metrics.

Web Server slows down (ASP.NET)

We have a really strange problem. One of the servers in the server farm becomes really slow. We see a number of timeouts in the logs and overall response time is not where it should be (and is on other servers in the farm).
What is also strange is that it is not just the web app - Just logging into the server takes up to 1.5 min to show you the desktop. Once you are in, the system is as responsive as ever - unless you try to launch something, i.e. notepad - it takes another minute to launch and after launch it works fine.
I checked a number of things - memory utilization is reasonable, CPU is below 15%, windows handles, event logs do not show anything.
Recycling the aps.net process does not fix it - it still takes over a minute to log in. Rebooting the server helped, but now it started to slow down again.
After a closer look we found out that Windows Temp directory is full of temp files - over 65k files. This is certainly something to take care of. But my question is could it be the root cause of the sluggishness, or there is still something else lurking in the shadows?
Edit
After more digging I am zeroing in on the issue related to the size of temp directories. This article: describes something very similar. I am still not too sure because the fact that the server is slow to open even Notepad remains unexplained.
Is it possible that under such conditions creating a new temp file takes over a minute?
You might want to check how many threads your using in the ASP.NET thread pool when the timeouts occur. Another idea might be to look at the GC information in perfmon and see if the GC is running a gen2 collection?
Ok, It is official, all of this was grief caused by this issue. When one of our servers was again behaving badly we cleaned the temp directory and it fixed the problem, including the slow login.
This last part still baffles me - I do not understand how excessive number of files in a temp directory can cause login to take over 1 min, leave alone launching a program, but whatever it is clearing the directory fixed it and I can live with it.
Did you check virtual memory as well ? paging ? does you app logs a lot of data in different files ? also - check - maybe the utilization happens in kernel mode and not user mode.

Resources