Opened 8 years ago

Closed 8 years ago

#1810 closed task (fixed)

AdhocVM not accessible

Reported by: Jeff McKenna Owned by: sac@…
Priority: normal Milestone:
Component: SysAdmin Keywords:
Cc:

Description

Change History (24)

comment:1 by martin, 8 years ago

Checking.

comment:2 by tomkralidis, 8 years ago

$ ssh tomkralidis@demo.pycsw.org
ssh: connect to host demo.pycsw.org port 22: Connection timed out
$ ssh tomkralidis@adhoc.osuosl.osgeo.org
ssh: Could not resolve hostname adhoc.osuosl.osgeo.org: Name or service not known

comment:3 by Jeff McKenna, 8 years ago

As that was 3 days ago, and as this machine is critical for so many OSGeo projects, maybe we need to spread out the administration of this VM, so it isn't one person holding this back. I am sure Tom or I can help here. Please let us know if we can help in any way.

comment:4 by martin, 8 years ago

Surprisingly the "working on it"-comment seemingly didn't get through .... well, here it is :-)

The issue is a little bit tricky. The easiest solution - for us - would be to ask OSL to set up the same kernel/boot hack as they did for the "webextra" VM. Last time I asked them they weren't too happy about it.

Alex, would you mind asking again ?

comment:5 by Jeff McKenna, 8 years ago

Thanks for the update Martin, as I am sure several projects were awaiting this update. Thanks again for sharing this news. 'working on it' for 3 days, for a server hosting so many OSGeo projects, likely just needed an update on status. Thanks again for this.

comment:6 by martin, 8 years ago

I'm really sorry the intermediate update was lost - maybe I simply clicked the wrong button.

I *do* have access to the "bare iron" and thus to the filesystems of the VM as well. Anyhow, finding out why the boot loader doesn't work in this setup (the initial setup of all of these VM's had a little design flaw) proved to be more time-consuming than expected (and I need to respect the constraints of my day-job).

comment:7 by Jeff McKenna, 8 years ago

I do appreciate the sarcasm, but, rest assured that all projects received your 'working on it' message here in this ticket 3 days ago; my point is, for future tickets, just give updates as you travel down the journey - your message today of OSL was great, I am sure the OSGeo projects appreciated the update. The 3 days of your effort so far was greatly appreciated, just be sure to keep this ticket updated with your efforts. thanks.

comment:8 by strk, 8 years ago

I think sharing the admin burden among at least 2 people is still a good idea. 3 would be even better.

comment:9 by wildintellect, 8 years ago

There are lots of people who have admin to the VM, just not the bare metal (which we are retiring).

Yes, we'll need to file a ticket with osuosl, if someone has an old email about how it was managed last time that would help.

As an alternative, what if we clone this VM disk over to osgeo6 and run it inside of a KVM+libvirt setup? We need a plan to move everything anyways as osgeo4 needs to be retired, and has 1 failed disk right now.

comment:10 by Jeff McKenna, 8 years ago

Cloning this disk over to osgeo6 sounds better, as we're not delayed by an external ticket in that case.

comment:11 by Jeff McKenna, 8 years ago

Can we have an update on this?

comment:12 by martin, 8 years ago

I'll start working on it right now and hopefully will be able to provide a proper solution for the other affected VM's as well.

comment:13 by Jeff McKenna, 8 years ago

thanks martin

comment:14 by martin, 8 years ago

I give up on this, I simply don't understand how they're booting our VM's. No matter how I'm installing a GRUB bootloader into the virtual disk, it simply doesn't get loaded.

Thus, would anybody being in touch with OSL please ask them to bolt a kernel into the "adhoc" VM in the same way as they did to the "webextra" VM ?

If not, then we might consider turning the "osgeo6" machine into a Xen host and migrate a filesystem dump of the "adhoc" VM into a Xen guest there.

comment:15 by martin, 8 years ago

Well, it looks like I finally made it work. Please let me know what's missing.

comment:16 by strk, 8 years ago

Last time I contacted OSUOSL people (not sure it was for webextra) I did so by joining #osuosl IRC channel on freenode.

Will do again, referencing this ticket.

comment:17 by strk, 8 years ago

I actually just logged into adhoc, am I useing the wrong IP ?

strk@adhoc:~$ hostname -f; date; /sbin/ifconfig eth0 adhoc.osgeo.osuosl.org Fri Oct 28 01:48:12 PDT 2016 eth0 Link encap:Ethernet HWaddr aa:00:00:ae:6b:fc

inet addr:140.211.15.84 Bcast:140.211.15.255 Mask:255.255.255.0 inet6 addr: fe80::a800:ff:feae:6bfc/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:205136 errors:0 dropped:0 overruns:0 frame:0 TX packets:39667 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:40203361 (38.3 MiB) TX bytes:19318868 (18.4 MiB)

comment:18 by martin, 8 years ago

Sandro, i can't see anything wrong about your findings.

As I finally wrote last night, I managed to get a proper boot loader installed. As a side effect I learned how to cure the partitioning and boot loader setups, which allows to upgrade all our VM's to latest Debian.

Next I'll apply the same procedure to the former "mail" VM, in preparation for #1805.

comment:19 by strk, 8 years ago

Great news, thank you Martin ! (I hadn't read your comment before writing my findings)

comment:20 by Jeff McKenna, 8 years ago

Great work Martin! Sounds tricky!!!! Looks good here, I'm able to login and start the MapServer services. Do you think we should add any notes about this issue (for next time) to the wiki? https://wiki.osgeo.org/wiki/AdhocVM

comment:21 by martin, 8 years ago

Hi Jeff, I'll add my notes to the Wiki after the result proved to be reproducible on a different VM.

I've just issued another reboot on the "adhoc" VM in order to test, wether standard system-, kernel- or bootloader-upgrades are safe. Apparently they are.

Thus if you still require manual intervention to start services after reboot, please go ahead now.

comment:22 by Jeff McKenna, 8 years ago

thanks Martin, I've restarted the MapServer services.

comment:23 by strk, 8 years ago

Jeff can this be closed then ?

comment:24 by Jeff McKenna, 8 years ago

Resolution: fixed
Status: newclosed

of course, thanks!

Note: See TracTickets for help on using tickets.