Opened 6 years ago

Closed 6 years ago

#2297 closed task (fixed)

Drone (agent?) is failing to reach docker

Reported by: strk Owned by: robe
Priority: normal Milestone: Sysadmin Contract 2019-I
Component: SysAdmin Keywords:
Cc:

Description

See https://dronie.osgeo.org/postgis/postgis/240

Error message is: pg-9.5: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

Change History (13)

comment:1 by strk, 6 years ago

Odd, one of the 3 matrix jobs did work in cloning (the pg10 one), in that build.

comment:2 by robe, 6 years ago

As mentioned on irc. This might have to do with my upgrade to 1.0.

The job that is succeeding is the drone that runs directly on dronie-server. The other ones failing are run on ianna and debbie-docker.

It's hard to say if upgrade is the issue as lots of things were happening at same time.

The reason why a matrix job fails is each job task of a matrix is being passed to a different agent. For ow I've shut ffo the drones on ianna and debbie-docker and restarted the failing job to see if that fixes the issue.

comment:3 by strk, 6 years ago

Shutting down ianna and debbie-docker seemed to have fixed the issue for now. Let's keep this open until we put those agents back online though, as builds are slower now...

comment:4 by strk, 6 years ago

I'm taking it back, the problem is still not fixed. See https://dronie.osgeo.org/postgis/postgis/254/3/1

Any other agent connected to the server ? Can you check the logs to tell, Regina ?

comment:5 by strk, 6 years ago

docker logs doesn't show any detail about which agents are connected from which IP addresses. The only visible information in the logs is the error of a runner with identifier "machine":"399d4d53c8f9" that starts execution and fails.

I don't know why there's no trace about any other machines.

Logs of the agent docker contain info about both successful and failing build, but w/out details of the failure, and no trace of the 399d4d53c8f9 identifier.

Where are the server and the agents configured ?

comment:6 by robe, 6 years ago

Details here - https://git.osgeo.org/gitea/sac/osgeo7/wiki/Dronie-Server-container

I did turn off the others, so does seem to be some sort of dronie server agent issue.

I suppose I can revert back to before I upgraded to 1.0 version (last was rc5).

or I could just wipe out the database.

comment:7 by robe, 6 years ago

Forgot to say not sure why it's even trying to reach the other agents when I shut them down.

comment:8 by strk, 6 years ago

Regina can you make me an administrator of the drone-1.0 server ? Chances are there's some admin menu. How do you tell it is trying to access other agents ?

comment:9 by strk, 6 years ago

The instructions on https://git.osgeo.org/gitea/sac/osgeo7/wiki/Dronie-Server-container seem to say that the server and the agent are started via lxc but while on osgeo7 machine I see them running in docker docker ps -- what's the deal ? Is there an overlap between lxc and docker ? Or are we running two services in parallel ?

comment:10 by strk, 6 years ago

This is an interesting info to add to the wiki: https://github.com/drone/drone/issues/1496

comment:12 by strk, 6 years ago

Regina: you're missing DRONE_AGENTS_ENABLED=true to the server startup script

comment:13 by strk, 6 years ago

Resolution: fixed
Status: assignedclosed

Fix confirmed (was a missing DRONE_AGENTS_ENABLED variable)

Note: See TracTickets for help on using tickets.