Skip to content

Conversation

@patchback
Copy link

@patchback patchback bot commented Jan 22, 2026

This is a backport of PR #1073 as merged into main (42d0913).

Ok, the last fix was correct for duplicate artifacts across domains, but it didn't solve for duplicate metadata artifacts within a domain. At first this seems impossible, but there is a common scenario where this can occurs. A user uploads package a.1.whl with metadata xyz. They realize the package is missing some files and rebuild with the new files, exact same name and crucially the exact same metadata. They reupload and pulp creates a new package since the entire package has a new sha256 even though the metadata is the same as the old one. Then in our migration we will encounter two "different" packages with the same metadata artifact inside them.

My changes try to fix this by keeping track of the metadata artifacts shad256s and avoiding making duplicates. Since we do the saves in batches I have to do a check first within the batch to make sure there are no dups and then do a second check to make sure there are no dups from previous batches. Also, I'm grouping the packages by domain, so all the batches should be inside the same domain.

Hopefully I didn't screw up the logic anywhere.

fixes: #1071

Change migration 19 to reset metadata_sha256 to null

(cherry picked from commit 42d0913)
@jobselko jobselko merged commit 54149a1 into 3.24 Jan 22, 2026
13 checks passed
@jobselko jobselko deleted the patchback/backports/3.24/42d091347aa437f06a1f43e2e819128110543a33/pr-1073 branch January 22, 2026 16:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants