Opened 5 years ago

Closed 5 years ago

Last modified 4 years ago

#4365 closed defect (fixed)

Debbie is very sick

Reported by: robe Owned by: robe
Priority: blocker Milestone: Website Management, Bots
Component: QA/buildbots Version: 2.4.x
Keywords: Cc:

Description

Unfortunately debie seems a little sick with everything showing a killed. I plan to rebuild her this weekend.

Change History (6)

comment:1 by robe, 5 years ago

debbie is still having some issues but doing better.

The killed process errors were a result of some memory thing. I first wound her back to March 13th backup to rule out any updates done since to her. that did not help or perhaps that was not far back enough.

Then upgraded her from 4GB to 8GB ram and that didn't help.

I then had them move her to another physical cluster to rule out hardware. And that helped a bit, but was still getting kill errors of the form:

Then I checked

free and noticed she had no swap space.

So I created a small one and then kept on increasing it - each increase reduced the number of kill messages.

I'm now trying 4GB (as the docos weren't building and concave_hull_hard was still crashing with 2GB) The make dist that does the build doc unfortunately seems to be dying still with :

12:28:00 libtoolize: Consider adding '-I macros' to ACLOCAL_AMFLAGS in Makefile.am.
12:28:00 * Running /usr/bin/aclocal (1.16.1)
12:28:02 Killed
12:28:02 
12:28:02   Something went wrong, giving up!
swapoff -a
fallocate -l 4G /swap
chmod 0600 /swap
mkswap /swap
swapon /swap

#then in /etc/fstab 
/swap swap swap defaults 0 0

I'm also in the middle of building a lxd new containerized debbie which I currently have installed with jenkins and copied over the GEOS jobs and was able to run those with latest jenkins.

Once I have all that running and moved over the website, I'll flip to an 18.04 LTS 8gB ubuntu with a containerized debian 10 debbie.

Last edited 5 years ago by robe (previous) (diff)

comment:2 by robe, 5 years ago

whoops that didn't work.

the fallocate kept on creating an 8GB swap file. so switched to something I found listed which seems to make the doc build move

swapoff -a
dd if=/dev/zero of=/swap bs=1024 count=1048576
chmod 0600 /swap
mkswap /swap
swapon /swap

I'm not sure if debbie ever had a swap space, I talked to atlantic about this and their comment:

By default out Cloud Servers normally do not have a swap or swapfile added. I believe on some of the newer Debian/Ubuntu servers, we did for a short time have a swapfile added, but this was removed so everything is kept the same (ie: no swap). You are more than welcome to add a swap file if that helps performance, but there are some applications that do not do well with swap, like some Java apps, which is why it is off by default. I hope this helps explain what you were seeing. Let us know if we can be of any further assistance. 

So it could be she had one before and things went south when it was taken out or she's never had one and something is amiss.

comment:3 by robe, 5 years ago

yeh she just made it past the concavehull_hard hurdle and also building her first doc since her illness.

comment:4 by robe, 5 years ago

Finally figured out why debbie was so sick. It appears her jenkins got infected with this:

https://stackoverflow.com/questions/55318938/jenkins-high-cpu-usage-khugepageds

I discovered this by accident when I noticed a cronjob under jenkins account. I'm going to check things out on new server to make sure i didn't copy over the infection and also reset all the keys jenkins uses.

comment:5 by robe, 5 years ago

Resolution: fixed
Status: newclosed

We have a new debbie now.

comment:6 by robe, 4 years ago

Milestone: Management 2.0Website Management, Bots

Milestone renamed

Note: See TracTickets for help on using tickets.