Opened 7 months ago

Closed 7 weeks ago

#5637 closed defect (fixed)

woodie builds are unstable

Reported by: strk Owned by: robe
Priority: medium Milestone: Website Management, Bots
Component: QA/buildbots Version:
Keywords: woodie Cc:

Description

When all languages are built it always happens that the woodie CI job ends up out of disk quota. For example here:

https://woodie.osgeo.org/repos/30/pipeline/1592/23

This is due to an upstream bug that fail to cleanup unused containers after use, in a multi-steps pipeline as we are using now.

This ticket is to deal with it.

What we can do:

  1. Avoid the multi-step pipeline and use a single step (after all we're using the SAME Docker image for all steps, and the current multi-step approach is to give easier-to-read logs).
  2. Request OSGeo global fix of the issue, which is ticketed here: https://trac.osgeo.org/osgeo/ticket/3040

Change History (7)

comment:1 by robe, 4 months ago

After I upgrade to 2+, I think we will need to fix up the ci yml.

For example https://woodpecker-ci.org/docs/next/migrations

says here steps.[name].group has been replaced with steps.[name].depends_on

and we are using group in a couple of places.

comment:2 by robe, 4 months ago

I upgraded the server to 2.3 (and osgeo4 agent to 2.3), but it seems the new agents are not working as well as the non-upgraded agents. the new agents don't seemed to be picked except for the woodie-server companion agent.

woodpecker-server_1 | {"level":"error","error":"stream: not found","time":"2024-03-14T07:44:18Z","message":"tail of logs failed"} ".

and agent has - {"level":"error","error":"rpc error: code = Unavailable desc = closing transport due to: connection error: desc = \"error reading from server: EOF\", received prior goaway: code: ENHANCE_YOUR_CALM, debug data: \"too_many_pings\"","time":"2024-03-14T07:38:06Z","message":"grpc error: wait(): code: Unavailable"}

I'm going to restart the nginx agent and server to see if that helps any.

comment:3 by robe, 4 months ago

I should add I also don't know what to replace many of the groups with — change group to depends_on doesn't work for many because there is no pipeline called those. So I guess these were really just for grouping and maybe we should drop them entirely or replace them with something else.

comment:4 by robe, 4 months ago

I'm going to leave this open for now until things are a bit more stable.

comment:5 by strk, 4 months ago

The group keyword is now completely gone, and only depends_on is used. See https://woodie.osgeo.org/repos/30/pipeline/1958

As for agents, it would be a good opportunity to also deal with identification of them: https://trac.osgeo.org/osgeo/ticket/3148

comment:6 by strk, 7 weeks ago

Keywords: woodie added
Summary: woodie exceeds disk quotawoodie builds are unstable

Things don't seem to be much stable still, but I'm not sure it's about disk anymore, I changed the ticket title. See stability here: https://woodie.osgeo.org/repos/30

comment:7 by strk, 7 weeks ago

Resolution: fixed
Status: newclosed

It looks like latest stabilities have been due to exceeding the timeout (3 hours). I've now raised it to 4 hours, and will close this ticket to re-open on more specific issue.

Note: See TracTickets for help on using tickets.