Opened 9 months ago

Closed 8 months ago

Last modified 8 months ago

#2978 closed task (fixed)

repo.osgeo.org contains 0 byte files

Reported by: jive Owned by: jive
Priority: critical Milestone: Unplanned
Component: SysAdmin Keywords:
Cc:

Description

Alexander via discuss email list:

We're thankful consumers of osgeo's maven repository, though this morning we noticed that empty files are pulled. Inspecting certain cases, we saw that since this morning files with size 0 seem to be in the repo, e.g. the files here https://repo.osgeo.org/service/rest/repository/browse/release/org/jboss/jboss-parent/36/.

Kévin Belellou has reported via https://osgeo-org.atlassian.net/browse/GEOT-7433

Since this morning (September 7th 2023), we encounter a bug with your Maven Nexus instance (Nexus Repository Manager), that we proxy in our own instance.

A lot of artifacts with size of 0 bytes have appeared in the geonetwork-cache repository, that caused our compile workflows to fail.

Weird thing is that some of these “ghost” artifacts are ours (com.total.* for example).

My theory is that our Nexus asked yours for these artifacts (that you shouldn’t have) and your Nexus somehow created empty ones and returned them.

A minor thing that may be important: all the artifacts in package com.total.* have a classifier like sources, config or javadoc.

Change History (20)

comment:1 by jive, 9 months ago

Frank Gasdorf provides the following troubleshooting:

I recently stumble about this problem to and my investigation is as follows:

  • you a company uses a repository manager and has osger repo as a proxy repo configured AND
  • and internal artefact has been request but is not avaliable in internal repositories, that a http get is set to osgeo repo as well
  • this leads to enties like that with size 0 for each artefact

IMHO it seems to be an issue in nexus configuration and I investigate how to configure filter for proxe repositories in Nexus.

comment:2 by jive, 9 months ago

I am trying to determine if there is anything for me to do as an admin of repo.osgeo.org ?

I would like to determine if this is a cache problem or someone in the community uploading zero-sized files by accident.

Browsing *release* I immediately see junk:

Repository	release
Format	maven2
Component Group	$%7Bbusm.datasync.groupid%7D
Component Name	dataconfig-service
Component Version	$%7Bbusm.version%7D
Path	$%7Bbusm/datasync/groupid%7D/dataconfig-service/$%7Bbusm.version%7D/dataconfig-service-$%7Bbusm.version%7D.pom
Content type	application/xml
File size	0 bytes
Content type	application/xml
File size	0 bytes
Blob created	Wed Sep 06 2023 19:46:12 GMT-0700 (Pacific Daylight Saving Time)
Blob updated	Wed Sep 06 2023 19:46:12 GMT-0700 (Pacific Daylight Saving Time)
Last downloaded	Wed Sep 06 2023
Locally cached	false
Blob reference	cache@EF9560A3-97C6ABE4-329A592F-5E3FCFA2-D8F8BD08:4d71c425-4b81-4dfb-abc2-e05174a8c61d
Containing repo	geonetwork-cache
Uploader	anonymous
Uploader's IP Address	14.137.135.63

This appears to be in geonetwork-cache.

comment:3 by jive, 9 months ago

Yep, the geonetwork-cache seems very troubled, collecting zero byte files on a wide range of topics?!?

Going to focus on this configuration for now.

  • cleanup policy does not allow me to remove based on size (ha!)
  • It is a cache of https://github.com/geonetwork/core-maven-repo
  • GitHub provided guidance that using GitHub as a maven repository was impolite; and the project moved to repo.osgeo.org two years ago.
  • I assume GitHub cut off the use even as a cache yesterday, and the resulting madness has geonetwork-cache collecting timeouts and zero sized files for everyone.

I am going to take geonetwork-cache out of release for now.

comment:4 by jive, 9 months ago

Please test and let me know if that addresses the problem; for everyone but the geonetwork community.

comment:5 by jive, 9 months ago

Owner: changed from sac@… to jive
Priority: normalcritical

comment:6 by fgdrf, 9 months ago

IMHO this ticket has two aspects: The repository setup itself that external request might not be logged on osgeo instance AND the recommend setup on the repository manager that proxy osgeo repositories.

Its possible to define routing rules in the repo manager (in this case nexus) as written here : https://help.sonatype.com/repomanager3/using-nexus-repository/repository-manager-concepts/proxy-repository-concepts#ProxyRepositoryConcepts-RoutingRules

I will try this on my end. Nevertheless, why are these "wrong" requests lead to zero size artefacts in osgeo release repository? Is it a Nexus bug?

comment:8 by jive, 9 months ago

Okay:

a) Cutting out geonetwork-cache has restored service. Downstream caching repositories may need to clear their cache for osgeo release (I could not find a way to clear only the zero sized things)

b) The repository setup for geonetwork-cache *could* of used routing rules to short list only the content contained in the cache. But it was a random collection of patched jars that community made over time; and I thought it was just a temporary fix while they got their act together. For this I apologize; I should have followed up with that community again.

in reply to:  6 ; comment:9 by jive, 9 months ago

Frank I am replying to your comment as I am not sure I understand. There are six components in play and I cannot match your words to each component.

-- upstream --

PROBLEM REPOSITORY

-- osgeo repository --

GEONETWORK-CACHE (cache of the upstream problem repository)

OSGEO-RELEASE

-- downstream --

MAVEN BUILD 1 (against osgeo-release)

DOWNSTREAM CACHING REPOSITORY or MIRROR (cache of osgeo-release above)

MAVEN BUILD 2 (against downstream caching repository or mirror)

Replying to fgdrf:

IMHO this ticket has two aspects:

1) The repository setup itself that external request might not be logged on osgeo instance

I do not understand external requests that might not be logged on osgeo instances?

  • Do you mean how the "downstream caching repository" is setup?
  • Do you mean the setup of "geonetwork-cache" in the osgeo repository?

2) the recommend setup on the repository manager that proxy osgeo repositories.

There is no good answer to this as each project has a different constellation of jars and artefacts it requires for health and happiness.

The "downstream caching repository" or "mirror" could choose to use routing rules to pick and choose what to cache from "osgeo release", or may have greater control to use routing rules against "geonetwork-release", "geoserver-release", "geotools-release" to tightly control what jars they obtain from where.

The "osgeo-release" gathers up lots of sources to optimize "maven build 1". Since maven checks *each* repository listed in the pom.xml file - it is much faster to have a mirror like "osgeo-release" to increase build times.

For "maven build 2" a downstream repository is setting up their own mirror.

I will try this on my end. Nevertheless, why are these "wrong" requests lead to zero size artefacts in osgeo release repository? Is it a Nexus bug?

Yes this appears to be a Nexus bug, the service that caused the issue is an old style maven 2 repository: https://raw.githubusercontent.com/geonetwork/core-maven-repo/master

This now returns: 400: Invalid Request

I assume it is a Nexus bug that this response is being stored as a 0 byte artifact.

comment:10 by jive, 9 months ago

I have confirmed that the core-geonetwork builds are broken; they were infact using some patched jars from the geonetwork-cache repository.

I have instructed the project to recover artifacts from their version history, or local repositories, before they are lost, and upload to geonetwork-releases repository.

comment:11 by jive, 9 months ago

Please hold this ticket open until core-geonetwork team is happy again.

comment:12 by juanluisrp, 9 months ago

I've been able to build all the active GeoNetwork branches (3.12.x, 4.0.x, 4.2.x and main) with an empty local maven repository, so I'd say the problem for GN is fixed.

What puzzles me is why this has happened now if two years ago we removed the contents of https://github.com/geonetwork/core-maven-repo/. I don't think Github has stopped serving contents from there. Maybe the contents were still cached in https://repo.osgeo.org/geonetwork-cache and somehow they have been evicted / cleared from there.

Anyway, I think it's safe to delete geonetwork-cache repository from repo.osgeo.org since all the dependencies are already in the release repo and there are instructions for legacy GeoNetwork versions to update the configuration in case anybody needs to build and old version.

comment:13 by fgdrf, 9 months ago

Thanks for your help and support. IMHO its solved once geonetwork builds are fine (again).

However, on my end I configured a routing rule in nexus to avoid external requests for internal artefacts

in reply to:  9 comment:14 by fgdrf, 9 months ago

We configured https://repo.osgeo.org/repository/releases as a proxy repository. This was the repo with empty files which should not be there (we haven't deployed these internal artefacts).

And we neared down the problem that these artefacts were requested internally but did not exists in our hosted repositories. Therefore these were request in external proxied repositories as well.

Due to the possible bug in nexus files with size 0 were written in remote repository.

Today I tried again to request a non existing artefact again and it doesn't appeared here

    <dependency>
      <groupId>com.group.id</groupId>
      <artifactId>whatever</artifactId>
      <version>1.0.15</version>
    </dependency>

Nothing there - this is great! https://repo.osgeo.org/#browse/browse:release:com%2Fgroup%2Fid%2Fwhatever

Does this scenario description help to understand what problems we ware faced with?

Again, thank you for your help and clean-up.

Replying to jive:

Frank I am replying to your comment as I am not sure I understand. There are six components in play and I cannot match your words to each component.

-- upstream --

PROBLEM REPOSITORY

-- osgeo repository --

GEONETWORK-CACHE (cache of the upstream problem repository)

OSGEO-RELEASE

-- downstream --

MAVEN BUILD 1 (against osgeo-release)

DOWNSTREAM CACHING REPOSITORY or MIRROR (cache of osgeo-release above)

MAVEN BUILD 2 (against downstream caching repository or mirror)

Replying to fgdrf:

IMHO this ticket has two aspects:

1) The repository setup itself that external request might not be logged on osgeo instance

I do not understand external requests that might not be logged on osgeo instances?

  • Do you mean how the "downstream caching repository" is setup?
  • Do you mean the setup of "geonetwork-cache" in the osgeo repository?

2) the recommend setup on the repository manager that proxy osgeo repositories.

There is no good answer to this as each project has a different constellation of jars and artefacts it requires for health and happiness.

The "downstream caching repository" or "mirror" could choose to use routing rules to pick and choose what to cache from "osgeo release", or may have greater control to use routing rules against "geonetwork-release", "geoserver-release", "geotools-release" to tightly control what jars they obtain from where.

The "osgeo-release" gathers up lots of sources to optimize "maven build 1". Since maven checks *each* repository listed in the pom.xml file - it is much faster to have a mirror like "osgeo-release" to increase build times.

For "maven build 2" a downstream repository is setting up their own mirror.

I will try this on my end. Nevertheless, why are these "wrong" requests lead to zero size artefacts in osgeo release repository? Is it a Nexus bug?

Yes this appears to be a Nexus bug, the service that caused the issue is an old style maven 2 repository: https://raw.githubusercontent.com/geonetwork/core-maven-repo/master

This now returns: 400: Invalid Request

I assume it is a Nexus bug that this response is being stored as a 0 byte artifact.

comment:15 by jive, 8 months ago

Resolution: fixed
Status: newclosed

I've been able to build all the active GeoNetwork branches (3.12.x, 4.0.x, 4.2.x and main) with an empty local maven repository, so I'd say the problem for GN is fixed.

Thanks Juan, we will marked this closed.

comment:16 by jive, 8 months ago

I have removed the now un-used geonetwork-cache from repo.osgeo.org.

comment:17 by jive, 8 months ago

For organizations encountering this issue and wondering how an external maven repository was filled with your "internal" metadata files ...

  1. This seems to have occurred to a bug in the nexus software used by repo.osgeo.org (the zero sized files). One of the caches we had setup, geonetwork-cache, was incorrectly record zero byte files for every request that came in.
  1. I have completely removed the geonetwork-cache and all traces to my knowledge of these zero byte files. So your "metadata" is no longer visible in geonetwork-cache or osgeo-release.
  1. Keep in mind the requests for your "internal" metadata.xml files are still coming in. It is just we now correctly answering that we do not have this information.

If you are concerned about the visibility of your organizations metadata files - this is an ongoing concern with the configuration of some maven or gradle build used within your development team.

Although these files are not present within our infrastructure - your developers are making requests for these files constantly to repo.osgeo.org (and any other maven repository your team is using world wide).

It is not necessarily a problem asking public repositories for the artifacts groups and artifact names used "internally" by your team. Just keep in mind such requests will appear in the network traffic and logs of each external repository your team makes use of.

To manage your team's configuration:

  1. you should be running your own nexus maven repository
  2. Your team should configure ~/.m2/settings.xml to mirror any external repositories such as repo.osgeo.org to operative via your mirror.
  3. Your mirror should cache repo.osgeo.org, with rules to only fetch the jars (such as org.geotools.* that are required to support your operations

I also note that gradle allows fine grain control over how each repository is used with includes/excludes control.

I personally use maven which does not offer such a facility as part of a default install, instead using mirrors as described above:

Aside: I am volunteering to look at repo.osgeo.org on behalf of my employer GeoCat BV and our customers. We take part in a number of projects including GeoServer and GeoNetwork. If you need further assistance please reach out on these tickets.

comment:18 by robe, 8 months ago

Just a heads up. I have plans to upgrade repo.osgeo.org to latest nexus version in about 2 weeks. I'm wondering if such a change would help or hurt or not make a difference with this kind of issue.

in reply to:  18 comment:19 by jive, 8 months ago

Replying to robe:

Just a heads up. I have plans to upgrade repo.osgeo.org to latest nexus version in about 2 weeks. I'm wondering if such a change would help or hurt or not make a difference with this kind of issue.

Update should be fine, we have not reported, or even checked with the nexus bug tracker to see if this is a known issue.

Note: See TracTickets for help on using tickets.