Unknown process of jenkins - "kxjdhendlvie" [closed] - linux

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 5 years ago.
Improve this question
I'm running Jenkins 2.38 on Ubuntu 14.04.5 LTS, EC2 instance AWS
Here is output of top command
top - 08:53:12 up 1 day, 39 min, 2 users, load average: 1.37, 1.37, 1.38
Tasks: 128 total, 2 running, 126 sleeping, 0 stopped, 0 zombie
%Cpu(s): 36.1 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 63.9 st
MiB Mem: 2000.484 total, 1916.172 used, 84.312 free, 420.863 buffers
MiB Swap: 4095.996 total, 5.953 used, 4090.043 free. 280.828 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3366 jenkins 20 0 231944 2976 560 S 94.9 0.1 1050:34 kxjdhendlvie
1119 mysql 20 0 1136676 463672 1996 S 1.0 22.6 29:43.49 mysqld
1578 www-data 20 0 490352 4644 1020 S 0.7 0.2 5:16.63 apache2
28038 root 20 0 23696 1664 1144 R 0.3 0.1 0:00.05 top
kxjdhendlvie has PID = 3366, I have never seen this before
and we have nothing on google about this process of Jenkins, too
root#build:/proc/3366# ps aux | grep jenkins
jenkins 1233 0.0 0.0 18752 340 ? S May29 0:00 /usr/bin/daemon --name=jenkins --inherit --env=JENKINS_HOME=/var/lib/jenkins --output=/var/log/jenkins/jenkins.log --pidfile=/var/run/jenkins/jenkins.pid -- /usr/bin/java -Djava.awt.headless=true -jar /usr/share/jenkins/jenkins.war --webroot=/var/cache/jenkins/war --httpPort=8080
jenkins 1234 0.8 21.8 1655032 448576 ? Sl May29 12:56 /usr/bin/java -Djava.awt.headless=true -jar /usr/share/jenkins/jenkins.war --webroot=/var/cache/jenkins/war --httpPort=8080
jenkins 3366 88.1 0.1 231944 2976 ? Sl May29 1076:10 ./kxjdhendlvie -c hjyfsnkfs.conf
Directory of 3366
root#build:/proc/3366# ll -rth
total 0
dr-xr-xr-x 141 root root 0 May 29 08:13 ../
dr-xr-xr-x 9 jenkins jenkins 0 May 29 13:00 ./
-r--r--r-- 1 jenkins jenkins 0 May 29 13:00 status
-r--r--r-- 1 jenkins jenkins 0 May 29 13:00 stat
-r--r--r-- 1 jenkins jenkins 0 May 29 13:00 cmdline
-r--r--r-- 1 jenkins jenkins 0 May 29 13:27 statm
-r-------- 1 jenkins jenkins 0 May 29 16:27 environ
lrwxrwxrwx 1 jenkins jenkins 0 May 30 06:39 exe -> /var/tmp/kxjdhendlvie (deleted)
-r--r--r-- 1 jenkins jenkins 0 May 30 08:36 wchan
-rw-r--r-- 1 jenkins jenkins 0 May 30 08:36 uid_map
-r--r--r-- 1 jenkins jenkins 0 May 30 08:36 timers
dr-xr-xr-x 6 jenkins jenkins 0 May 30 08:36 task/
-r--r--r-- 1 jenkins jenkins 0 May 30 08:36 syscall
-r--r--r-- 1 jenkins jenkins 0 May 30 08:36 stack
-r--r--r-- 1 jenkins jenkins 0 May 30 08:36 smaps
-rw-r--r-- 1 jenkins jenkins 0 May 30 08:36 setgroups
-r--r--r-- 1 jenkins jenkins 0 May 30 08:36 sessionid
-r--r--r-- 1 jenkins jenkins 0 May 30 08:36 schedstat
-rw-r--r-- 1 jenkins jenkins 0 May 30 08:36 sched
lrwxrwxrwx 1 jenkins jenkins 0 May 30 08:36 root -> //
-rw-r--r-- 1 jenkins jenkins 0 May 30 08:36 projid_map
-r--r--r-- 1 jenkins jenkins 0 May 30 08:36 personality
-r--r--r-- 1 jenkins jenkins 0 May 30 08:36 pagemap
-rw-r--r-- 1 jenkins jenkins 0 May 30 08:36 oom_score_adj
-r--r--r-- 1 jenkins jenkins 0 May 30 08:36 oom_score
-rw-r--r-- 1 jenkins jenkins 0 May 30 08:36 oom_adj
-r--r--r-- 1 jenkins jenkins 0 May 30 08:36 numa_maps
dr-x--x--x 2 jenkins jenkins 0 May 30 08:36 ns/
dr-xr-xr-x 5 jenkins jenkins 0 May 30 08:36 net/
-r-------- 1 jenkins jenkins 0 May 30 08:36 mountstats
-r--r--r-- 1 jenkins jenkins 0 May 30 08:36 mounts
-r--r--r-- 1 jenkins jenkins 0 May 30 08:36 mountinfo
-rw------- 1 jenkins jenkins 0 May 30 08:36 mem
-r--r--r-- 1 jenkins jenkins 0 May 30 08:36 maps
dr-x------ 2 jenkins jenkins 0 May 30 08:36 map_files/
-rw-r--r-- 1 jenkins jenkins 0 May 30 08:36 loginuid
-r--r--r-- 1 jenkins jenkins 0 May 30 08:36 limits
-r--r--r-- 1 jenkins jenkins 0 May 30 08:36 latency
-r-------- 1 jenkins jenkins 0 May 30 08:36 io
-rw-r--r-- 1 jenkins jenkins 0 May 30 08:36 gid_map
dr-x------ 2 jenkins jenkins 0 May 30 08:36 fdinfo/
dr-x------ 2 jenkins jenkins 0 May 30 08:36 fd/
lrwxrwxrwx 1 jenkins jenkins 0 May 30 08:36 cwd -> /var/tmp/
-r--r--r-- 1 jenkins jenkins 0 May 30 08:36 cpuset
-rw-r--r-- 1 jenkins jenkins 0 May 30 08:36 coredump_filter
-rw-r--r-- 1 jenkins jenkins 0 May 30 08:36 comm
--w------- 1 jenkins jenkins 0 May 30 08:36 clear_refs
-r--r--r-- 1 jenkins jenkins 0 May 30 08:36 cgroup
-r-------- 1 jenkins jenkins 0 May 30 08:36 auxv
-rw-r--r-- 1 jenkins jenkins 0 May 30 08:36 autogroup
dr-xr-xr-x 2 jenkins jenkins 0 May 30 08:36 attr/
I see nothing related to kxjdhendlvie in /var/tmp/, maybe it's deleted, but process is still running
Does anyone have idea related to it? Please help me investigate
./kxjdhendlvie -c hjyfsnkfs.conf
here is hjyfsnkfs.conf
{
"url" : "stratum+tcp://188.165.214.76:80",
"url" : "stratum+tcp://176.31.117.82:80",
"url" : "stratum+tcp://94.23.8.105:80",
"url" : "stratum+tcp://37.59.51.212:80",
"user" : "46v8xnTsBVx6BzPxb1JAGAj2fURbn6ne59sTa6kg8WEbX1yAoArxwUyMENKfFLJZ6A8b2EqDfSEaB5puwMvVyytfLmR2NoN",
"pass" : "x",
"algo" : "cryptonight",
"quiet" : true
}

You Jenkins instance might have been compromised by this security exploit, https://groups.google.com/forum/m/#!topic/jenkinsci-advisories/sN9S0x78kMU! I suggest that you update your Jenkins installation...

Related

cloud-init-output.log increases it size very quickly on RHEL 8

I use Linux machine on AWS EC2 instance with Red Hat Enterprise Linux 8.6 and my cloud-init-output.log is increasing its size very quickly causing my app logs to stop writing in one-two days even though I have 20GB of storage.
user$ df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 1.8G 0 1.8G 0% /dev
tmpfs 1.9G 0 1.9G 0% /dev/shm
tmpfs 1.9G 195M 1.7G 11% /run
tmpfs 1.9G 0 1.9G 0% /sys/fs/cgroup
/dev/xvda2 20G 17G 3.5G 83% /
tmpfs 373M 0 373M 0% /run/user/1000
user$ ls -ltr /var/log
total 10499168
drwxr-x---. 2 chrony chrony 6 Jun 15 2021 chrony
drwxr-xr-x. 2 root root 6 Apr 5 18:43 qemu-ga
drwx------. 2 root root 6 Apr 8 04:50 insights-client
drwx------. 2 root root 6 May 3 08:58 private
-rw-rw----. 1 root utmp 0 May 3 08:58 btmp
-rw-------. 1 root root 0 May 3 08:59 maillog
-rw-------. 1 root root 0 May 3 08:59 spooler
drwxr-x---. 2 sssd sssd 73 Jun 3 11:15 sssd
drwx------. 2 root root 23 Jul 12 14:43 audit
drwxr-xr-x. 2 root root 23 Jul 12 14:44 tuned
-rw-r--r--. 1 root root 128263 Jul 12 14:44 cloud-init.log
drwxr-xr-x. 2 root root 43 Jul 12 14:44 rhsm
-rw-r--r--. 1 root root 806 Jul 12 14:44 kdump.log
-rw-r--r--. 1 root root 1017 Jul 12 14:45 choose_repo.log
drwxr-xr-x. 2 root root 67 Jul 12 14:47 amazon
-rw-r--r--. 1 root root 1560 Jul 14 05:04 hawkey.log
-rw-------. 1 root root 26318 Jul 14 06:39 secure
-rw-rw-r--. 1 root utmp 4224 Jul 14 06:58 wtmp
-rw-rw-r--. 1 root utmp 292292 Jul 14 06:58 lastlog
-rw-r--r--. 1 root root 10752 Jul 14 07:00 dnf.rpm.log
-rw-r--r--. 1 root root 48816 Jul 14 07:00 dnf.librepo.log
-rw-r--r--. 1 root root 97219 Jul 14 07:00 dnf.log
-rw-------. 1 root root 12402 Jul 14 07:01 cron
-rw-------. 1 root root 2160833934 Jul 14 07:02 messages
-rw-r-----. 1 root adm 5257112056 Jul 14 07:03 cloud-init-output.log
I changed logging level from default DEBUG to ERROR in /etc/cloud/cloud.cfg.d but it didn't help. Messages are also filling up fast.
Is this log file even supposed to be filling in after the EC2 instance is up?
Is there something I can do to stop the size increase?
I tried also to manually do the logrotate logrotate --force /etc/logrotate.d/ but it didn't do much.

How docker interacts with your terminal

I'm curious about how dockerized processes interact with the terminal from which you run docker run.
From some research I done, I found that when you run a container without -t or -i, the file descriptors of the process are:
// the PID of the containerized process is 16198
~$ sudo ls -l /proc/16198/fd
total 0
lrwx------ 1 root root 64 Jan 18 09:28 0 -> /dev/null
l-wx------ 1 root root 64 Jan 18 09:28 1 -> 'pipe:[242758]'
l-wx------ 1 root root 64 Jan 18 09:28 2 -> 'pipe:[242759]'
I see that the other ends of those pipes are of the containerd-shim process that spawned the containerized process. We know that once that process will write something to its STDOUT, it will show up on the terminal from which you ran docker run. Further, when you run a container with -t and look at the open FDs of the process:
~$ sudo ls -l /proc/17317/fd
total 0
lrwx------ 1 root root 64 Jan 18 09:45 0 -> /dev/pts/0
lrwx------ 1 root root 64 Jan 18 09:45 1 -> /dev/pts/0
lrwx------ 1 root root 64 Jan 18 09:45 2 -> /dev/pts/0
So now the container has a pseudo-tty slave as STDIN, STDOUT and STDERR. Who has the master side of that slave? Listing the FDs of the parent containerd-shim we can now see that it has a /dev/ptmx open:
$ sudo ls -l /proc/17299/fd
total 0
lr-x------ 1 root root 64 Jan 18 09:50 0 -> /dev/null
l-wx------ 1 root root 64 Jan 18 09:50 1 -> /dev/null
lrwx------ 1 root root 64 Jan 18 09:50 10 -> 'socket:[331340]'
l--------- 1 root root 64 Jan 18 09:50 12 -> /run/docker/containerd/afb8b7a1573c8da16943adb6f482764bb27c0973cf4f51279db895c6c6003cff/init-stdin
l--------- 1 root root 64 Jan 18 09:50 13 -> /run/docker/containerd/afb8b7a1573c8da16943adb6f482764bb27c0973cf4f51279db895c6c6003cff/init-stdin
lrwx------ 1 root root 64 Jan 18 09:50 14 -> /dev/pts/ptmx
...
So I suppose that the containerd-shim process interacts with the container process using this pseudo-terminal system. By the way, even in this case you can't interact with the process, since I didn't ran the container with -i.
So one question is: what difference does it make if containerd-shim interacts with the process using a pipe or using a pseudo-terminal subsystem?
Another question is, how do containerd-shim rolls this data to the terminal from which I ran docker run

kubelet failed to find mountpoint for CPU

I'm using kubeadm 1.15.3, docker-ce 18.09 on Debian 10 buster 5.2.9-2, and seeing errors in journalctl -xe | grep kubelet:
server.go:273] failed to run Kubelet: mountpoint for cpu not found
My /sys/fs/cgroup contains:
-r--r--r-- 1 root root 0 Sep 2 18:49 cgroup.controllers
-rw-r--r-- 1 root root 0 Sep 2 18:50 cgroup.max.depth
-rw-r--r-- 1 root root 0 Sep 2 18:50 cgroup.max.descendants
-rw-r--r-- 1 root root 0 Sep 2 18:49 cgroup.procs
-r--r--r-- 1 root root 0 Sep 2 18:50 cgroup.stat
-rw-r--r-- 1 root root 0 Sep 2 18:49 cgroup.subtree_control
-rw-r--r-- 1 root root 0 Sep 2 18:50 cgroup.threads
-rw-r--r-- 1 root root 0 Sep 2 18:50 cpu.pressure
-r--r--r-- 1 root root 0 Sep 2 18:50 cpuset.cpus.effective
-r--r--r-- 1 root root 0 Sep 2 18:50 cpuset.mems.effective
drwxr-xr-x 2 root root 0 Sep 2 18:49 init.scope
-rw-r--r-- 1 root root 0 Sep 2 18:50 io.pressure
-rw-r--r-- 1 root root 0 Sep 2 18:50 memory.pressure
drwxr-xr-x 20 root root 0 Sep 2 18:49 system.slice
drwxr-xr-x 2 root root 0 Sep 2 18:49 user.slice
docker.service is running okay and has /etc/docker/daemon.json:
{
"exec-opts": [
"native.cgroupdriver=systemd"
],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m"
},
"storage-driver": "overlay2"
}
The kubeadm docs say if using docker the cgroup driver will be autodetected, but I tried supplying it anyway for good measure - no change.
With mount or cgroupfs-mount:
$ mount -t cgroup -o all cgroup /sys/fs/cgroup
mount: /sys/fs/cgroup: cgroup already mounted on /sys/fs/cgroup/cpuset.
$ cgroupfs-mount
mount: /sys/fs/cgroup/cpu: cgroup already mounted on /sys/fs/cgroup/cpuset.
mount: /sys/fs/cgroup/blkio: cgroup already mounted on /sys/fs/cgroup/cpuset.
mount: /sys/fs/cgroup/memory: cgroup already mounted on /sys/fs/cgroup/cpuset.
mount: /sys/fs/cgroup/pids: cgroup already mounted on /sys/fs/cgroup/cpuset.
Is the problem that it's at cpuset rather than cpu? I tried to create a symlink, but root does not have write permission for /sys/fs/cgroup. (Presumably I can change it, but I took that as enough warning not to meddle.)
How can let kubelet find my CPU cgroup mount?
I would say that something very weird with your docker-ce installation and not kubelet. You are looking into the right direction showing mapping problem.
I have tried 3 different docker versions on both GCP and AWS environments instances.
What I have noticed comparing our results - you have wrong folder structure under /sys/fs/cgroup. Pay attention that I have much more permissions in /sys/fs/cgroup comparing to your output. This is how my results looks like:
root#instance-3:~# docker version
Client: Docker Engine - Community
Version: 19.03.1
API version: 1.39 (downgraded from 1.40)
Go version: go1.12.5
Git commit: 74b1e89
Built: Thu Jul 25 21:21:24 2019
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 18.09.1
API version: 1.39 (minimum version 1.12)
Go version: go1.10.6
Git commit: 4c52b90
Built: Wed Jan 9 19:02:44 2019
OS/Arch: linux/amd64
Experimental: false
root#instance-3:~# ls -la /sys/fs/cgroup
total 0
drwxr-xr-x 14 root root 360 Sep 3 11:30 .
drwxr-xr-x 6 root root 0 Sep 3 11:30 ..
dr-xr-xr-x 5 root root 0 Sep 3 11:30 blkio
lrwxrwxrwx 1 root root 11 Sep 3 11:30 cpu -> cpu,cpuacct
dr-xr-xr-x 5 root root 0 Sep 3 11:30 cpu,cpuacct
lrwxrwxrwx 1 root root 11 Sep 3 11:30 cpuacct -> cpu,cpuacct
dr-xr-xr-x 2 root root 0 Sep 3 11:30 cpuset
dr-xr-xr-x 5 root root 0 Sep 3 11:30 devices
dr-xr-xr-x 2 root root 0 Sep 3 11:30 freezer
dr-xr-xr-x 5 root root 0 Sep 3 11:30 memory
lrwxrwxrwx 1 root root 16 Sep 3 11:30 net_cls -> net_cls,net_prio
dr-xr-xr-x 2 root root 0 Sep 3 11:30 net_cls,net_prio
lrwxrwxrwx 1 root root 16 Sep 3 11:30 net_prio -> net_cls,net_prio
dr-xr-xr-x 2 root root 0 Sep 3 11:30 perf_event
dr-xr-xr-x 5 root root 0 Sep 3 11:30 pids
dr-xr-xr-x 2 root root 0 Sep 3 11:30 rdma
dr-xr-xr-x 5 root root 0 Sep 3 11:30 systemd
dr-xr-xr-x 5 root root 0 Sep 3 11:30 unified
root#instance-3:~# ls -la /sys/fs/cgroup/unified/
total 0
dr-xr-xr-x 5 root root 0 Sep 3 11:37 .
drwxr-xr-x 14 root root 360 Sep 3 11:30 ..
-r--r--r-- 1 root root 0 Sep 3 11:42 cgroup.controllers
-rw-r--r-- 1 root root 0 Sep 3 11:42 cgroup.max.depth
-rw-r--r-- 1 root root 0 Sep 3 11:42 cgroup.max.descendants
-rw-r--r-- 1 root root 0 Sep 3 11:30 cgroup.procs
-r--r--r-- 1 root root 0 Sep 3 11:42 cgroup.stat
-rw-r--r-- 1 root root 0 Sep 3 11:42 cgroup.subtree_control
-rw-r--r-- 1 root root 0 Sep 3 11:42 cgroup.threads
drwxr-xr-x 2 root root 0 Sep 3 11:30 init.scope
drwxr-xr-x 52 root root 0 Sep 3 11:30 system.slice
drwxr-xr-x 3 root root 0 Sep 3 11:30 user.slice
Encourage you completely reinstall docker from scratch(or recreate instance and install docker again). That should help.
Let me share with you my docker-ce installation steps:
$ sudo apt update
$ sudo apt install apt-transport-https ca-certificates curl gnupg2 software-properties-common
$ curl -fsSL https://download.docker.com/linux/debian/gpg | sudo apt-key add -
$ sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/debian $(lsb_release -cs) stable"
$ sudo apt update
$ apt-cache policy docker-ce
$ sudo apt install docker-ce=5:18.09.1~3-0~debian-buster
I have also seen a workaroung in Kubelet: mountpoint for cpu not found issue answer, but also dont have a permission under root to fix it:
mkdir /sys/fs/cgroup/cpu,cpuacct
mount -t cgroup -o cpu,cpuacct none /sys/fs/cgroup/cpu,cpuacct

Should I mess with file permissions in the Jenkins home directory?

Looking in /var/lib/jenkins on a relatively fresh install, I notice some file permissions that are, well, scary:
-rw-r--r-- 1 jenkins jenkins 7285 Apr 29 13:29 config.xml
-rw-r--r-- 1 jenkins jenkins 4008 Apr 28 21:04 credentials.xml
-rw-r--r-- 1 jenkins jenkins 64 Apr 28 13:57 secret.key
And in /var/lib/jenkins/secrets:
-rw-r--r-- 1 jenkins jenkins 272 Apr 28 15:08 hudson.console.AnnotatedLargeText.consoleAnnotator
-rw-r--r-- 1 jenkins jenkins 32 Apr 28 15:08 hudson.model.Job.serverCookie
-rw-r--r-- 1 jenkins jenkins 272 Apr 28 14:25 hudson.util.Secret
-rw-r--r-- 1 jenkins jenkins 32 Apr 28 13:57 jenkins.model.Jenkins.crumbSalt
-rw-r--r-- 1 jenkins jenkins 48 Apr 28 14:25 jenkins.security.ApiTokenProperty.seed
-rw-r--r-- 1 jenkins jenkins 256 Apr 28 13:57 master.key
-rw-r--r-- 1 jenkins jenkins 272 Apr 28 13:57 org.jenkinsci.main.modules.instance_identity.InstanceIdentity.KEY
-rw-r--r-- 1 jenkins jenkins 5 Apr 29 13:29 slave-to-master-security-kill-switch
I'm thinking all these files should be set to mode 600 with owner jenkins, but I'm not sure if I'm being paranoid. Is there some reason why the maintainers haven't locked these files down more? Is there some other well-protected master key that makes these files by themselves less valuable?
The above permissions seems standard across all Jenkins. Changing the permissions have messed up the set up for me in the past.

git gc: no space left on device, even though 3GB available and tmp_pack only 16MB

> git gc --aggressive --prune=now
Counting objects: 68752, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (66685/66685), done.
fatal: sha1 file '.git/objects/pack/tmp_pack_cO6T53' write error: No space left on device
sigh, ok
df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 19G 15G 3.0G 84% /
udev 485M 4.0K 485M 1% /dev
tmpfs 99M 296K 99M 1% /run
none 5.0M 0 5.0M 0% /run/lock
none 494M 0 494M 0% /run/shm
cgroup 494M 0 494M 0% /sys/fs/cgroup
doesn't look that bad
ls -lh .git/objects/pack/
total 580M
-r--r--r-- 1 foouser root 12K Oct 30 05:47 pack-0301f67f3b080de7eb0139b982fa732338c49064.idx
-r--r--r-- 1 foouser root 5.1M Oct 30 05:47 pack-0301f67f3b080de7eb0139b982fa732338c49064.pack
-r--r--r-- 1 foouser root 5.1K Oct 14 10:51 pack-27da727e362bcf2493ac01326a8c93f96517a488.idx
-r--r--r-- 1 foouser root 100K Oct 14 10:51 pack-27da727e362bcf2493ac01326a8c93f96517a488.pack
-r--r--r-- 1 foouser root 11K Oct 25 10:35 pack-4dce80846752e6d813fc9eb0a0385cf6ce106d9b.idx
-r--r--r-- 1 foouser root 2.6M Oct 25 10:35 pack-4dce80846752e6d813fc9eb0a0385cf6ce106d9b.pack
-r--r--r-- 1 foouser root 1.6M Apr 3 2014 pack-4dcef34b411c8159e3f5a975d6fcac009a411850.idx
-r--r--r-- 1 foouser root 290M Apr 3 2014 pack-4dcef34b411c8159e3f5a975d6fcac009a411850.pack
-r--r--r-- 1 foouser root 40K Oct 26 11:53 pack-87529eb2c9e58e0f3ca0be00e644ec5ba5250973.idx
-r--r--r-- 1 foouser root 6.1M Oct 26 11:53 pack-87529eb2c9e58e0f3ca0be00e644ec5ba5250973.pack
-r--r--r-- 1 foouser root 1.6M Apr 19 2014 pack-9d5ab71d6787ba2671c807790890d96f03926b84.idx
-r--r--r-- 1 foouser root 102M Apr 19 2014 pack-9d5ab71d6787ba2671c807790890d96f03926b84.pack
-r--r--r-- 1 foouser root 1.6M Oct 3 10:12 pack-af6562bdbbf444103930830a13c11908dbb599a8.idx
-r--r--r-- 1 foouser root 151M Oct 3 10:12 pack-af6562bdbbf444103930830a13c11908dbb599a8.pack
-r--r--r-- 1 foouser root 4.7K Oct 20 11:02 pack-c0830d7a0343dd484286b65d380b6ae5053ec685.idx
-r--r--r-- 1 foouser root 125K Oct 20 11:02 pack-c0830d7a0343dd484286b65d380b6ae5053ec685.pack
-r--r--r-- 1 foouser root 6.2K Oct 2 15:38 pack-c20278ebc16273d24880354af3e395929728481a.idx
-r--r--r-- 1 foouser root 4.2M Oct 2 15:38 pack-c20278ebc16273d24880354af3e395929728481a.pack
-r--r--r-- 1 root root 16M Feb 27 08:19 tmp_pack_cO6T53
So, git gc bails out on a tmp pack that's only 16MB big while my disk appears to have 3GB free. What am I missing? How can I get git gc to work more reliably? I've tried without aggressive option and --prune instead of --prune=now as well, same story.
Update
Doing a df -h during the repack action it shows that it is now using all my disk (100% usage). A little while later the repack action fails and it leaves another 14MB file in the .git/objects/pack/ folder. So, to recap, my packs use a total of 580MB. git repack somehow manages to use up 3GB to repack that. I have ~800MB free in the RAM after it's done btw. - maybe it's using so much working memory that it clogs up the swap? I guess my question comes down to: Are there options to make git repack less resource hungry?
versions: git version 1.7.9.5 on Ubuntu 12.04
Update 2
I've updated git to 2.3. Didn't change anything unfortunately.
> git --version
git version 2.3.0
> git repack -Ad && git prune
Counting objects: 68752, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (36893/36893), done.
fatal: sha1 file '.git/objects/pack/tmp_pack_N9jyVJ' write error: No space left on device
Update 3
Ok, so I just noticed something that is curious: the .git directory actually uses much more disk space than the 508MB previously reported.
> du -h -d 1 ./.git
8.0K ./.git/info
40K ./.git/hooks
24M ./.git/modules
28K ./.git/refs
4.0K ./.git/branches
140K ./.git/logs
5.0G ./.git/objects
5.0G ./.git
Upon further inspection .git/objects/pack actually uses 4.5GB. The differences lies in hidden temp files I didn't notice before:
ls -lha ./.git/objects/pack/
total 4.5G
drwxr-xr-x 2 foouser root 56K Feb 27 15:40 .
drwxr-xr-x 260 foouser root 4.0K Oct 26 14:24 ..
-r--r--r-- 1 foouser root 12K Oct 30 05:47 pack-0301f67f3b080de7eb0139b982fa732338c49064.idx
-r--r--r-- 1 foouser root 5.1M Oct 30 05:47 pack-0301f67f3b080de7eb0139b982fa732338c49064.pack
-r--r--r-- 1 foouser root 5.1K Oct 14 10:51 pack-27da727e362bcf2493ac01326a8c93f96517a488.idx
-r--r--r-- 1 foouser root 100K Oct 14 10:51 pack-27da727e362bcf2493ac01326a8c93f96517a488.pack
-r--r--r-- 1 foouser root 11K Oct 25 10:35 pack-4dce80846752e6d813fc9eb0a0385cf6ce106d9b.idx
-r--r--r-- 1 foouser root 2.6M Oct 25 10:35 pack-4dce80846752e6d813fc9eb0a0385cf6ce106d9b.pack
-r--r--r-- 1 foouser root 1.6M Apr 3 2014 pack-4dcef34b411c8159e3f5a975d6fcac009a411850.idx
-r--r--r-- 1 foouser root 290M Apr 3 2014 pack-4dcef34b411c8159e3f5a975d6fcac009a411850.pack
-r--r--r-- 1 foouser root 40K Oct 26 11:53 pack-87529eb2c9e58e0f3ca0be00e644ec5ba5250973.idx
-r--r--r-- 1 foouser root 6.1M Oct 26 11:53 pack-87529eb2c9e58e0f3ca0be00e644ec5ba5250973.pack
-r--r--r-- 1 foouser root 1.6M Apr 19 2014 pack-9d5ab71d6787ba2671c807790890d96f03926b84.idx
-r--r--r-- 1 foouser root 102M Apr 19 2014 pack-9d5ab71d6787ba2671c807790890d96f03926b84.pack
-r--r--r-- 1 foouser root 1.6M Oct 3 10:12 pack-af6562bdbbf444103930830a13c11908dbb599a8.idx
-r--r--r-- 1 foouser root 151M Oct 3 10:12 pack-af6562bdbbf444103930830a13c11908dbb599a8.pack
-r--r--r-- 1 foouser root 4.7K Oct 20 11:02 pack-c0830d7a0343dd484286b65d380b6ae5053ec685.idx
-r--r--r-- 1 foouser root 125K Oct 20 11:02 pack-c0830d7a0343dd484286b65d380b6ae5053ec685.pack
-r--r--r-- 1 foouser root 6.2K Oct 2 15:38 pack-c20278ebc16273d24880354af3e395929728481a.idx
-r--r--r-- 1 foouser root 4.2M Oct 2 15:38 pack-c20278ebc16273d24880354af3e395929728481a.pack
-r--r--r-- 1 root root 1.1K Feb 27 15:37 .tmp-7729-pack-00447364da9dfe647c89bb7797c48c79589a4e44.idx
-r--r--r-- 1 root root 14M Feb 27 15:29 .tmp-7729-pack-00447364da9dfe647c89bb7797c48c79589a4e44.pack
-r--r--r-- 1 root root 1.1K Feb 27 15:32 .tmp-7729-pack-020efaa9c7caf8b792081f89b27361093f00c2db.idx
-r--r--r-- 1 root root 41M Feb 27 15:30 .tmp-7729-pack-020efaa9c7caf8b792081f89b27361093f00c2db.pack
-r--r--r-- 1 root root 1.1K Feb 27 15:37 .tmp-7729-pack-051980133b8f0052b66dce418b4d3899de0d1342.idx
(continuing for a *long* while).
Now I'd like to know: Is it safe to just delete those?
So here is what I found out so far: I couldn't find any documentation about these hidden '.tmp-XXXX-pack' in the .git/objects/pack folder. All other threads I can find are about non-hidden files with tmp_ prefix in the same folder. The hidden ones are also clearly created during the repack action and it's possible that these get stuck as well. I can't confirm whether that's still possible in git 2.3.0 (which I've updated to since), but at least the disk space requirement doesn't seem to have changed in this newer version - it still can't complete gc/repack. By deleting these .tmp-files I was able to recover my last 4GB and git still seems to behave fine afterwards - your results may vary though, so please make sure you have a backup before doing this. Finally, even 4GB wasn't enough to repack with gc --agressive. My .git folder is 1.1GB after the cleanup, my entire repository is 1.7GB. So 2x the size of your repository is possibly not enough for git gc, even with the aggressive option (which should save space). So I had to recover more space from elsewhere first.
Here is the command I used to clean up (again, have backups!):
git gc --aggressive --prune=now || rm -f .git/objects/*/tmp_* && rm -f .git/objects/*/.tmp-*
Similar scenario (about 2.3G available), except git gc itself would also fail with fatal: Unable to create '/home/ubuntu/my-app-here/.git/gc.pid.lock': No space left on device
What worked was to git prune first, and then run the gc.
I had an instance of this problem. I was able to free up a considerable amount of disk space, but of course, that didn't solve the problem of what to do about the .tmp-* files. I ran git fsck and the Git repository wasn't damaged in that way.
I did the conventional pack-and-garbage-collect operations
git repack -Ad
git prune
but that didn't remove the .tmp-* files, though it would have ensured that all of the necessary objects were in the standard pack-* files if they needed to be copied from transient files left over from the crashed Git processes in the past.
Eventually I realized that I could safely move the .tmp-* files to a scratch directory, then run git fsck to see if what remained in the .git directory was complete. It turned out that it was, so I deleted the scratch directory and the files it contained. If git fsck had reported problems, I could have moved the .tmp-* files back into the .git directory and researched another solution.
The hidden ones are also clearly created during the repack action and it's possible that these get stuck as well
The way "git repack"(man) created temporary files when it received a signal was prone to deadlocking, which has been corrected with Git 2.39 (Q4 2022).
See commit 9b3fadf (23 Oct 2022), and commit 1934307, commit 9cf10d8, commit a4880b2, commit b639606, commit d3d9c51 (21 Oct 2022) by Jeff King (peff).
(Merged by Taylor Blau -- ttaylorr -- in commit c88895e, 30 Oct 2022)
repack: use tempfiles for signal cleanup
Reported-by: Jan Pokorný
Signed-off-by: Jeff King
When git-repack(man) exits due to a signal, it tries to clean up by calling its remove_temporary_files() function, which walks through the packs dir looking for ".tmp-$$-pack-*" files to delete (where "$$" is the pid of the current process).
The biggest problem here is that remove_temporary_files() is not safe to call in a signal handler.
It uses opendir(), which isn't on the POSIX async-signal-safe list.
The details will be platform-specific, but a likely issue is that it needs to allocate memory; if we receive a signal while inside malloc(), etc, we'll conflict on the allocator lock and deadlock with ourselves.
We can fix this by just cleaning up the files directly, without walking the directory.
We already know the complete list of .tmp-* files that were generated, because we recorded them via populate_pack_exts().
When we find files there, we can use register_tempfile() to record the filenames.
If we receive a signal, then the tempfile API will clean them up for us, and it's async-safe and pretty battle-tested.
And:
repack: drop remove_temporary_files()
Signed-off-by: Jeff King
After we've successfully finished the repack, we call remove_temporary_files(), which looks for and removes any files matching ".tmp-$$-pack-*", where $$ is the pid of the current process.
But this is pointless.
If we make it this far in the process, we've already renamed these tempfiles into place, and there is nothing left to delete.
Nor is there a point in trying to call it to clean up when we aren't successful.
It's not safe for using in a signal handler, and the previous commit already handed that job over to the tempfile API.
It might seem like it would be useful to clean up stray .tmp files left by other invocations of git-repack.
But it won't clean those files; it only matches ones with its pid, and leaves the rest.
Fortunately, those are cleaned up naturally by successive calls to git-repack; we'll consider .tmp-*.pack the same as normal packfiles, so "repack -ad", etc, will roll up their contents and eventually delete them.

Resources