Opened 6 years ago
Closed 6 years ago
#2297 closed task (fixed)
Drone (agent?) is failing to reach docker
Reported by: | strk | Owned by: | robe |
---|---|---|---|
Priority: | normal | Milestone: | Sysadmin Contract 2019-I |
Component: | SysAdmin | Keywords: | |
Cc: |
Description
See https://dronie.osgeo.org/postgis/postgis/240
Error message is: pg-9.5: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Change History (13)
comment:1 by , 6 years ago
comment:2 by , 6 years ago
As mentioned on irc. This might have to do with my upgrade to 1.0.
The job that is succeeding is the drone that runs directly on dronie-server. The other ones failing are run on ianna and debbie-docker.
It's hard to say if upgrade is the issue as lots of things were happening at same time.
The reason why a matrix job fails is each job task of a matrix is being passed to a different agent. For ow I've shut ffo the drones on ianna and debbie-docker and restarted the failing job to see if that fixes the issue.
comment:3 by , 6 years ago
Shutting down ianna and debbie-docker seemed to have fixed the issue for now. Let's keep this open until we put those agents back online though, as builds are slower now...
comment:4 by , 6 years ago
I'm taking it back, the problem is still not fixed. See https://dronie.osgeo.org/postgis/postgis/254/3/1
Any other agent connected to the server ? Can you check the logs to tell, Regina ?
comment:5 by , 6 years ago
docker logs
doesn't show any detail about which agents are connected from which IP addresses. The only visible information in the logs is the error of a runner
with identifier "machine":"399d4d53c8f9"
that starts execution and fails.
I don't know why there's no trace about any other machines.
Logs of the agent docker contain info about both successful and failing build, but w/out details of the failure, and no trace of the 399d4d53c8f9
identifier.
Where are the server and the agents configured ?
comment:6 by , 6 years ago
Details here - https://git.osgeo.org/gitea/sac/osgeo7/wiki/Dronie-Server-container
I did turn off the others, so does seem to be some sort of dronie server agent issue.
I suppose I can revert back to before I upgraded to 1.0 version (last was rc5).
or I could just wipe out the database.
comment:7 by , 6 years ago
Forgot to say not sure why it's even trying to reach the other agents when I shut them down.
comment:8 by , 6 years ago
Regina can you make me an administrator of the drone-1.0 server ? Chances are there's some admin menu. How do you tell it is trying to access other agents ?
comment:9 by , 6 years ago
The instructions on https://git.osgeo.org/gitea/sac/osgeo7/wiki/Dronie-Server-container seem to say that the server and the agent are started via lxc
but while on osgeo7 machine I see them running in docker docker ps
-- what's the deal ? Is there an overlap between lxc
and docker
? Or are we running two services in parallel ?
comment:10 by , 6 years ago
This is an interesting info to add to the wiki: https://github.com/drone/drone/issues/1496
comment:11 by , 6 years ago
Thread on Drone support forum addressing this error: https://discourse.drone.io/t/cannot-connect-to-the-docker-daemon-at-unix-var-run-docker-sock-is-the-docker-daemon-running/4071
comment:12 by , 6 years ago
Regina: you're missing DRONE_AGENTS_ENABLED=true to the server startup script
comment:13 by , 6 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Fix confirmed (was a missing DRONE_AGENTS_ENABLED variable)
Odd, one of the 3 matrix jobs did work in cloning (the pg10 one), in that build.