#2958 closed task (fixed)

woodie-server issue with disk space

Reported by: robe Owned by: sac@…
Priority: normal Milestone: Sysadmin Contract 2023-I
Component: SysAdmin Keywords:
Cc:

Description

Ran into an issue today with woodie-server that 800GB of it space was taken so much that nothing could be done with the server and I couldn't even increase space.

I was able to make a new copy of it from the old, and after reboot of osgeo8 I was able to delete the old container.

But still the new container claims it's using 200GB of space, even after doing

docker system prune -a

and deleting all the backup snapshots. I haven't figured out where all this space is or just something off with the lxd disk visibility.

Even the backup of it on osgeo4 claims it's 200GB in size.

var/log the other likely culprit is only 254M

Change History (2)

comment:1 by robe, 12 months ago

Okay there seems to be a lot of stuff in :/var/lib/docker/vfs/dir and running du on it is taking a long time. So I suspect that might be what's occupying the remaining space

comment:2 by robe, 12 months ago

Resolution: fixed
Status: newclosed

I ended up taking the harsh route of

# this took a couple of ours and I stopped it once disk usage went from 260GB to 2 GB

cd /var/lib/docker/vfs/dir
rm -r *

The above, while it preserved the docker volumes of data the woodie server writes to, I think damaged the agent as it tried ot use a file that was in that folder. After rebooting the container I did:

su woodie
cd ~/
docker compose pull #which fixed up the images and pulled woodpecker-server 1.0 (so yah the screen looks quite different now)
docker compose up -d

Things seem to be now back to normal except with a hiccup I had logging in and authenticating, but after clearing out my browser cache and relogging in, things seemed fine and I see all the run logs from before in the interface, so db must be fine too.

I did reduce in the docker-compose.yml (in /home/woodie) the number of procs we run on this server to 2 from 5.

I'm going to setup another docker image, first on osgeo8 that has 4 procs.

I suspect what happened why :/var/lib/docker/vfs/dir was not being cleared by docker system prune was because somewhere along the line, with all that shaking we've been doing in postgis lately, it ran out of space before it could figure out what to delete.

I set in cron to do a docker system prune nightly, so that should keep things clean. Also having agent on a separate container will allow us to be more reckless since agents can be thrown away without risking damaging history.

Note: See TracTickets for help on using tickets.