Opened 2 years ago

Last modified 4 months ago

#1940 new task

osgeo4 needs new disks

Reported by: strk Owned by: wildintellect
Priority: normal Milestone: Unplanned
Component: Systems Admin Keywords:
Cc: wildintellect

Description

As reported by Justin L Dugger (IRC nickname pwnguin) from OSUOSL staff there's a degraded raid on osgeo4 server which should be replaced.

This ticket is to track operations toward getting that disk replaced.

Change History (12)

comment:1 Changed 2 years ago by strk

Cc: wildintellect added

NOTE: this got assigned ID [support.osuosl.org #29544] by OSUOSL support.

Alex do you know how to proceed here ?

comment:2 Changed 2 years ago by strk

From the upstream ticket (Samarendra Hedaoo):

To add more details, the current status is:                                                         
RAID-6:6 drives:557.75GB:                                                                           
Degraded Drives:5                                                                                   
1 Bad Drives (1974326 Errors)  

comment:3 Changed 6 months ago by robe

Note that this is fairly urgent now as we have two drives that need replacement.

My proposed plan is to

1) finish moving off projects/and adhoc 2) Purchase new drives and have OSUOSL put these in 3) Reformat osgeo4 as an LXD host

https://lists.osgeo.org/pipermail/sac/2019-April/010837.html

comment:4 Changed 6 months ago by robe

Milestone: Sysadmin Contract 2019-I

comment:5 Changed 6 months ago by wildintellect

I completely agree with step 1. That has been the goal since we bought Osgeo6.

Previous discussions, osgeo4 raid card is not reliable we should not continue to use it. So I do not advise steps 2 or 3. I advise retirement once clear.

comment:6 Changed 5 months ago by robe

Resolution: wontfix
Status: newclosed

I think we decided we are going to just chuck osgeo4.

I have moved everything off of it and doing a final backup of adhoc before I tell osuosl to shut down the whole host.

comment:7 Changed 5 months ago by robe

Resolution: wontfix
Status: closedreopened
Summary: osgeo4 raid needs a replacementosgeo4 needs new disks

I spoke to OSUOSL folks and Lance said he thinks the RAID on osgeo4 is fine and suggest we just completely replace the disks and sent us a sample quote.

I talked with Martin and he concurs that probably just the disks need replacing.

Both Martin and Lance (from OSUOSL) concur that osgeo4 would make a fine LXD backup host of the disks are replaced.

I have moved everything off of osgeo4, so I'd really like to move forward with getting fresh disks and reformatting it.

Martin's notes from IRC:

22:00:22	MartinSpott:	As far as I can tell, Osgeo4 has E5540 CPU's and 48 GByte main memory
22:00:36	MartinSpott:	.... which still makes a nice testbed

I'd rather reuse osgeo4 and reformat as Ubuntu 18.04 LTS or Debian 9/10 and use a spare LXD host. Perhaps moving over old-adhoc / old-projects and setting up a bridge so the LXD hosts are on the same private network.

So I am reopening this ticket and changing the title to "needs new disks" instead of needs new RAID.

I will ask Lance if he can ask Justin if he is still around, why he thought the raid was bad as it seems that is where this claim is coming from.


As reported by Justin L Dugger (IRC nickname pwnguin) from OSUOSL staff there's a degraded raid on osgeo4 server which should be replaced.


comment:8 Changed 5 months ago by wildintellect

It's had 2-3 dead drives previously. Last time we tried to rebuild the raid took week instead of hours. Replaced the raid battery once. It just seemed less reliable than osgeo3. The drives are cheap enough, and we could move to RAID 5 instead of RAID 6, also could do the RAID via software as osgeo5/6/7 are. Put the link in here for the new drives so we can send a purchase request in.

comment:9 Changed 5 months ago by robe

Also Martin mentioned it has DRBD setup. We assumed that was for some clustering between osgeo3 / osgeo4 - should we ask OSUOSL about that? Not sure if we need to worry about it.

comment:10 Changed 5 months ago by robe

Here is note from Lance's email dated Apr 8th 2019, with the link:


I just checked and osgeo3 has no failed drives (currently). However, osgeo4 has one failed drive in a RAID6 and if you'd like to replace it, it seems like it's going to be fairly cheap [1]. This should be an exact replacement. If you do plan on replacing it, you should buy 2-3 more just in case others fail.

Do you have any idea when you'll be wrapping the migrations up?

Thanks-

[1] https://www.newegg.com/Product/Product.aspx?Item=N82E16822148538


comment:11 Changed 5 months ago by robe

Lance's response on 5/12/2019 - sorry forgot to add to this ticket


Justin no longer works at the lab so I'm not entirely sure what he thought was going on with the RAID controller card. The battery seems to be doing fine from what I can tell and has 92% absolute state of charge which is pretty good. Perhaps the firmware upgrade on the controller might "fix" whatever issues you were having before. FWIW we use R710's quite a bit in the lab and have very little issues with those same controllers.

At at absolute minimum you should replace the two failed drives plus have an additional one or two for spares. Or you could completely replace all six drives with fresh drives. The controller supports either 3.5" SATA or SAS drives. The controller also support RAID10 if that's of use.

Let us know what you'd like to do as far as replacing drives and we'll move forward with getting the firmware updated on the system and prepping it for your backup uses. Do you want to retain the same hostname?


Last edited 5 months ago by robe (previous) (diff)

comment:12 Changed 4 months ago by robe

Milestone: Sysadmin Contract 2019-IUnplanned
Owner: changed from sac@… to wildintellect
Status: reopenednew
Note: See TracTickets for help on using tickets.