Problem
Run: tenant-manager · Build Pipeline · 2.2.0-beta.6
This is a two-attempt failure chain — each attempt has a distinct root cause.
Attempt #1 — Cosign signing failed (Rekor 404)
Job: build / Build tenant-manager (attempt 1)
Build and push to both DockerHub and GHCR succeeded. Cosign signing then failed on all 3 retry attempts with:
error during command execution: signing [docker.io/lerianstudio/tenant-manager@sha256:49758a04...]:
signing digest: [GET /api/v1/log/entries/{entryUUID}][404] getLogEntryByUuidNotFound
Root cause: transient Rekor (Sigstore public transparency log) 404 getLogEntryByUuidNotFound. The cosign client successfully retrieved the SCT (Signed Certificate Timestamp) but then failed to confirm the entry in Rekor, indicating a momentary inconsistency in the public log service.
There is also a secondary template error at the end of this step:
The template is not valid. ...build.yml@v1.31.0 (Line: 372, Col: 28): Unexpected value ''
This suggests the continue-on-error expression or a similar field on line 372 evaluates to an empty string under some conditions, which is itself a bug.
Attempt #2 — Docker push denied (tag immutability)
Job: build / Build tenant-manager (attempt 2)
The pipeline was re-run to recover from the cosign failure. The full Docker build ran again (~19 s), then:
ERROR: failed to solve: failed to push lerianstudio/tenant-manager:2.2.0-beta.6:
denied: requested access to the resource is denied — tag 2.2.0-beta.6 is already
assigned to an image in this repository and cannot be updated due to immutability settings.
Root cause: DockerHub has tag immutability enabled. The tag was already published in attempt #1; the re-run had no way to detect this before spending time on a full rebuild.
Proposed Fixes
Fix 1 — Resilience to transient Rekor failures (attempt #1 root cause)
The 3-attempt retry with exponential backoff exists but is insufficient for Rekor intermittency, which can last several minutes. Options:
- Increase
cosign_max_attempts default from 3 to a higher value (e.g. 5) and increase the backoff ceiling.
- Add jitter to the retry delay to avoid thundering-herd if multiple jobs hit Rekor simultaneously.
- Fix the template error on line 372: The
Unexpected value '' error means a field receives an empty string where a boolean or defined value is expected. This should be investigated and fixed — it may also mask error propagation silently in other scenarios.
- Consider honouring
continue_gitops_on_signing_failure more broadly: if Rekor is down, the image is still valid and signed certificates were issued — only the transparency log entry retrieval failed. Blocking the entire pipeline (and forcing a re-run that will fail for a different reason) is a disproportionate response to a Sigstore outage.
Fix 2 — Pre-flight tag existence check (attempt #2 root cause)
Before starting the Docker build, check whether the target tag already exists in each enabled registry. If it does, either skip the build (idempotent re-run behaviour) or fail fast with a clear, early error — not after a full build.
Suggested new input:
on_existing_tag: 'fail' | 'skip' | 'warn' # default: 'fail'
Implementation sketch (DockerHub):
TOKEN=
STATUS="000""000""000"
if [ "$STATUS" = "200" ]; then
echo "::warning::Tag $TAG already exists — skipping (immutable registry)."
exit 0
fi
For GHCR: docker manifest inspect ghcr.io/$ORG/$IMAGE:$TAG.
Fix 3 — Cleanup pushed images on downstream failure (suggestion)
If the build+push succeeds but a later step fails (cosign, GitOps artifact upload, Helm dispatch), the image is left in the registry unsigned and without a GitOps record. No rollback exists today.
Suggestion: optional cleanup job/step running if: failure():
cleanup_on_failure: true | false # default: false
- DockerHub:
DELETE /v2/repositories/{namespace}/{repository}/tags/{tag} (requires delete-scope token).
- GHCR:
gh api -X DELETE /orgs/{org}/packages/container/{package}/versions/{version_id}.
- Target only tags published in the current run.
- If the registry has immutability and deletion is not possible, emit a warning with the digest and manual remediation steps.
Checklist
Problem
Run: tenant-manager · Build Pipeline ·
2.2.0-beta.6This is a two-attempt failure chain — each attempt has a distinct root cause.
Attempt #1 — Cosign signing failed (Rekor 404)
Job:
build / Build tenant-manager(attempt 1)Build and push to both DockerHub and GHCR succeeded. Cosign signing then failed on all 3 retry attempts with:
Root cause: transient Rekor (Sigstore public transparency log)
404 getLogEntryByUuidNotFound. The cosign client successfully retrieved the SCT (Signed Certificate Timestamp) but then failed to confirm the entry in Rekor, indicating a momentary inconsistency in the public log service.There is also a secondary template error at the end of this step:
This suggests the
continue-on-errorexpression or a similar field on line 372 evaluates to an empty string under some conditions, which is itself a bug.Attempt #2 — Docker push denied (tag immutability)
Job:
build / Build tenant-manager(attempt 2)The pipeline was re-run to recover from the cosign failure. The full Docker build ran again (~19 s), then:
Root cause: DockerHub has tag immutability enabled. The tag was already published in attempt #1; the re-run had no way to detect this before spending time on a full rebuild.
Proposed Fixes
Fix 1 — Resilience to transient Rekor failures (attempt #1 root cause)
The 3-attempt retry with exponential backoff exists but is insufficient for Rekor intermittency, which can last several minutes. Options:
cosign_max_attemptsdefault from 3 to a higher value (e.g. 5) and increase the backoff ceiling.Unexpected value ''error means a field receives an empty string where a boolean or defined value is expected. This should be investigated and fixed — it may also mask error propagation silently in other scenarios.continue_gitops_on_signing_failuremore broadly: if Rekor is down, the image is still valid and signed certificates were issued — only the transparency log entry retrieval failed. Blocking the entire pipeline (and forcing a re-run that will fail for a different reason) is a disproportionate response to a Sigstore outage.Fix 2 — Pre-flight tag existence check (attempt #2 root cause)
Before starting the Docker build, check whether the target tag already exists in each enabled registry. If it does, either skip the build (idempotent re-run behaviour) or fail fast with a clear, early error — not after a full build.
Suggested new input:
Implementation sketch (DockerHub):
For GHCR:
docker manifest inspect ghcr.io/$ORG/$IMAGE:$TAG.Fix 3 — Cleanup pushed images on downstream failure (suggestion)
If the build+push succeeds but a later step fails (cosign, GitOps artifact upload, Helm dispatch), the image is left in the registry unsigned and without a GitOps record. No rollback exists today.
Suggestion: optional cleanup job/step running
if: failure():DELETE /v2/repositories/{namespace}/{repository}/tags/{tag}(requires delete-scope token).gh api -X DELETE /orgs/{org}/packages/container/{package}/versions/{version_id}.Checklist
Unexpected value ''on line 372 ofbuild.ymlcosign_max_attemptsdefault and retry backoff ceilingon_existing_taginput (fail/skip/warn) with pre-flight check for DockerHub and GHCRcleanup_on_failureinput with registry cleanup on downstream step failure