Opened 8 years ago
Closed 8 years ago
#1810 closed task (fixed)
AdhocVM not accessible
Reported by: | Jeff McKenna | Owned by: | |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | SysAdmin | Keywords: | |
Cc: |
Description
- cannot ssh into the AdhocVM (https://wiki.osgeo.org/wiki/AdhocVM)
- not sure if this is related to recent DDOS attacks
- host: adhoc(dot)osgeo(dot)osuosl(dot)org
Change History (24)
comment:1 by , 8 years ago
comment:2 by , 8 years ago
$ ssh tomkralidis@demo.pycsw.org ssh: connect to host demo.pycsw.org port 22: Connection timed out $ ssh tomkralidis@adhoc.osuosl.osgeo.org ssh: Could not resolve hostname adhoc.osuosl.osgeo.org: Name or service not known
comment:3 by , 8 years ago
As that was 3 days ago, and as this machine is critical for so many OSGeo projects, maybe we need to spread out the administration of this VM, so it isn't one person holding this back. I am sure Tom or I can help here. Please let us know if we can help in any way.
comment:4 by , 8 years ago
Surprisingly the "working on it"-comment seemingly didn't get through .... well, here it is :-)
The issue is a little bit tricky. The easiest solution - for us - would be to ask OSL to set up the same kernel/boot hack as they did for the "webextra" VM. Last time I asked them they weren't too happy about it.
Alex, would you mind asking again ?
comment:5 by , 8 years ago
Thanks for the update Martin, as I am sure several projects were awaiting this update. Thanks again for sharing this news. 'working on it' for 3 days, for a server hosting so many OSGeo projects, likely just needed an update on status. Thanks again for this.
comment:6 by , 8 years ago
I'm really sorry the intermediate update was lost - maybe I simply clicked the wrong button.
I *do* have access to the "bare iron" and thus to the filesystems of the VM as well. Anyhow, finding out why the boot loader doesn't work in this setup (the initial setup of all of these VM's had a little design flaw) proved to be more time-consuming than expected (and I need to respect the constraints of my day-job).
comment:7 by , 8 years ago
I do appreciate the sarcasm, but, rest assured that all projects received your 'working on it' message here in this ticket 3 days ago; my point is, for future tickets, just give updates as you travel down the journey - your message today of OSL was great, I am sure the OSGeo projects appreciated the update. The 3 days of your effort so far was greatly appreciated, just be sure to keep this ticket updated with your efforts. thanks.
comment:8 by , 8 years ago
I think sharing the admin burden among at least 2 people is still a good idea. 3 would be even better.
comment:9 by , 8 years ago
There are lots of people who have admin to the VM, just not the bare metal (which we are retiring).
Yes, we'll need to file a ticket with osuosl, if someone has an old email about how it was managed last time that would help.
As an alternative, what if we clone this VM disk over to osgeo6 and run it inside of a KVM+libvirt setup? We need a plan to move everything anyways as osgeo4 needs to be retired, and has 1 failed disk right now.
comment:10 by , 8 years ago
Cloning this disk over to osgeo6 sounds better, as we're not delayed by an external ticket in that case.
comment:12 by , 8 years ago
I'll start working on it right now and hopefully will be able to provide a proper solution for the other affected VM's as well.
comment:14 by , 8 years ago
I give up on this, I simply don't understand how they're booting our VM's. No matter how I'm installing a GRUB bootloader into the virtual disk, it simply doesn't get loaded.
Thus, would anybody being in touch with OSL please ask them to bolt a kernel into the "adhoc" VM in the same way as they did to the "webextra" VM ?
If not, then we might consider turning the "osgeo6" machine into a Xen host and migrate a filesystem dump of the "adhoc" VM into a Xen guest there.
comment:15 by , 8 years ago
Well, it looks like I finally made it work. Please let me know what's missing.
comment:16 by , 8 years ago
Last time I contacted OSUOSL people (not sure it was for webextra) I did so by joining #osuosl IRC channel on freenode.
Will do again, referencing this ticket.
comment:17 by , 8 years ago
I actually just logged into adhoc, am I useing the wrong IP ?
strk@adhoc:~$ hostname -f; date; /sbin/ifconfig eth0 adhoc.osgeo.osuosl.org Fri Oct 28 01:48:12 PDT 2016 eth0 Link encap:Ethernet HWaddr aa:00:00:ae:6b:fc
inet addr:140.211.15.84 Bcast:140.211.15.255 Mask:255.255.255.0 inet6 addr: fe80::a800:ff:feae:6bfc/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:205136 errors:0 dropped:0 overruns:0 frame:0 TX packets:39667 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:40203361 (38.3 MiB) TX bytes:19318868 (18.4 MiB)
comment:18 by , 8 years ago
Sandro, i can't see anything wrong about your findings.
As I finally wrote last night, I managed to get a proper boot loader installed. As a side effect I learned how to cure the partitioning and boot loader setups, which allows to upgrade all our VM's to latest Debian.
Next I'll apply the same procedure to the former "mail" VM, in preparation for #1805.
comment:19 by , 8 years ago
Great news, thank you Martin ! (I hadn't read your comment before writing my findings)
comment:20 by , 8 years ago
Great work Martin! Sounds tricky!!!! Looks good here, I'm able to login and start the MapServer services. Do you think we should add any notes about this issue (for next time) to the wiki? https://wiki.osgeo.org/wiki/AdhocVM
comment:21 by , 8 years ago
Hi Jeff, I'll add my notes to the Wiki after the result proved to be reproducible on a different VM.
I've just issued another reboot on the "adhoc" VM in order to test, wether standard system-, kernel- or bootloader-upgrades are safe. Apparently they are.
Thus if you still require manual intervention to start services after reboot, please go ahead now.
Checking.