Opened 11 months ago
Closed 6 months ago
#5637 closed defect (fixed)
woodie builds are unstable
Reported by: | strk | Owned by: | robe |
---|---|---|---|
Priority: | medium | Milestone: | Website Management, Bots |
Component: | QA/buildbots | Version: | |
Keywords: | woodie | Cc: |
Description
When all languages are built it always happens that the woodie CI job ends up out of disk quota. For example here:
https://woodie.osgeo.org/repos/30/pipeline/1592/23
This is due to an upstream bug that fail to cleanup unused containers after use, in a multi-steps pipeline as we are using now.
This ticket is to deal with it.
What we can do:
- Avoid the multi-step pipeline and use a single step (after all we're using the SAME Docker image for all steps, and the current multi-step approach is to give easier-to-read logs).
- Request OSGeo global fix of the issue, which is ticketed here: https://trac.osgeo.org/osgeo/ticket/3040
Change History (7)
comment:1 by , 8 months ago
comment:2 by , 8 months ago
I upgraded the server to 2.3 (and osgeo4 agent to 2.3), but it seems the new agents are not working as well as the non-upgraded agents. the new agents don't seemed to be picked except for the woodie-server companion agent.
woodpecker-server_1 | {"level":"error","error":"stream: not found","time":"2024-03-14T07:44:18Z","message":"tail of logs failed"} ".
and agent has - {"level":"error","error":"rpc error: code = Unavailable desc = closing transport due to: connection error: desc = \"error reading from server: EOF\", received prior goaway: code: ENHANCE_YOUR_CALM, debug data: \"too_many_pings\"","time":"2024-03-14T07:38:06Z","message":"grpc error: wait(): code: Unavailable"}
I'm going to restart the nginx agent and server to see if that helps any.
comment:3 by , 8 months ago
I should add I also don't know what to replace many of the groups with — change group to depends_on doesn't work for many because there is no pipeline called those. So I guess these were really just for grouping and maybe we should drop them entirely or replace them with something else.
comment:4 by , 8 months ago
I'm going to leave this open for now until things are a bit more stable.
comment:5 by , 8 months ago
The group
keyword is now completely gone, and only depends_on
is used.
See https://woodie.osgeo.org/repos/30/pipeline/1958
As for agents, it would be a good opportunity to also deal with identification of them: https://trac.osgeo.org/osgeo/ticket/3148
comment:6 by , 6 months ago
Keywords: | woodie added |
---|---|
Summary: | woodie exceeds disk quota → woodie builds are unstable |
Things don't seem to be much stable still, but I'm not sure it's about disk anymore, I changed the ticket title. See stability here: https://woodie.osgeo.org/repos/30
comment:7 by , 6 months ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
It looks like latest stabilities have been due to exceeding the timeout (3 hours). I've now raised it to 4 hours, and will close this ticket to re-open on more specific issue.
After I upgrade to 2+, I think we will need to fix up the ci yml.
For example https://woodpecker-ci.org/docs/next/migrations
says here steps.[name].group has been replaced with steps.[name].depends_on
and we are using group in a couple of places.