-
Notifications
You must be signed in to change notification settings - Fork 11
feat(worker): centralize provider service error handling with retry for rate limits #649
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(worker): centralize provider service error handling with retry for rate limits #649
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #649 +/- ##
==========================================
- Coverage 93.35% 93.35% -0.01%
==========================================
Files 1292 1292
Lines 47128 47188 +60
Branches 1567 1567
==========================================
+ Hits 43996 44052 +56
- Misses 2823 2827 +4
Partials 309 309
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
a76563b to
4834d5d
Compare
…or rate limits Add get_provider_service() to BaseCodecovTask for unified handling of GitHub app availability errors. This replaces duplicated error handling across tasks with a single wrapper that: - Retries when GitHub apps are rate-limited (using API retry-after or next hour) - Returns None gracefully when apps are suspended/unavailable - Handles OwnerWithoutValidBotError and RepositoryWithoutValidBotError uniformly - Includes proper logging context and breadcrumb tracking Updates sync_repos, sync_teams, commit_update, preprocess_upload, and upload_finisher to use the new centralized method. Fixes: - WORKER-TRS (~391k events) - NoConfiguredAppsAvailable in UploadFinisher - WORKER-P4H (~375k events) - OwnerWithoutValidBotError in SyncRepos - WORKER-NVD (~124k events) - NoConfiguredAppsAvailable in commit_update - WORKER-NVE (~119k events) - NoConfiguredAppsAvailable in preprocess_upload
3f4ef3c to
ae83fb6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
…ception The exception handler in base.py accesses exp.earliest_retry_after_seconds but this attribute was never defined on the exception class. This caused an AttributeError at runtime when GitHub apps were rate-limited. Adds the optional parameter with a default of None, which allows the existing fallback logic (get_seconds_to_next_hour()) to work correctly.
The get_provider_service method uses commit.repoid when a commit is provided, so tests need to set mock_commit.repoid to match the expected values in breadcrumb task assertions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| if exp.rate_limited_count > 0: | ||
| # At least one GitHub app is available but rate-limited. Retry after | ||
| # waiting until the next hour (minimum 1 minute delay). | ||
| retry_delay_seconds = max(60, get_seconds_to_next_hour()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rate limit retry ignores API-provided retry-after time
Medium Severity
The duplicated NoConfiguredAppsAvailable exception handling in commit_update.py, preprocess_upload.py, and upload_finisher.py calculates retry delay using only get_seconds_to_next_hour(), ignoring the earliest_retry_after_seconds field from the exception. The centralized get_provider_service method in base.py correctly checks this field first. This inconsistency means tasks may retry too early (wasting API calls and hitting rate limits again) or unnecessarily late, depending on what the API recommends.
Summary
Adds
get_provider_service()toBaseCodecovTaskfor unified handling of GitHub app availability errors. This replaces duplicated error handling across tasks with a single wrapper that:retry-afteror next hour)Nonegracefully when apps are suspended/unavailableOwnerWithoutValidBotErrorandRepositoryWithoutValidBotErroruniformlyUpdates
sync_repos,sync_teams,commit_update,preprocess_upload, andupload_finisherto use the new centralized method.Fixes
Behavior
When
NoConfiguredAppsAvailableis raised:rate_limited_count > 0: At least one GitHub app exists but is rate-limited. The task retries usingearliest_retry_after_secondsfrom the API response, or falls back toget_seconds_to_next_hour()(minimum 60 seconds).rate_limited_count == 0: No apps are configured or all are suspended. ReturnsNonegracefully.Why This Matters
These exceptions are expected states for some customers:
Previously, these bubbled up as unhandled exceptions. Now they:
Total Impact
This PR eliminates approximately ~1M Sentry events/month from expected bot configuration states and adds automatic retry for rate-limited scenarios.
Legal Boilerplate
Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. In 2022 this entity acquired Codecov and as result Sentry is going to need some rights from me in order to utilize my contributions in this PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms.
Note
Unifies provider-service fetching and error handling across worker tasks with rate-limit-aware retries and richer logging/breadcrumbs.
BaseCodecovTask.get_provider_service()to centralize handling ofNoConfiguredAppsAvailable,OwnerWithoutValidBotError,RepositoryWithoutValidBotError,Torngit*errors, including breadcrumb emission and contextual loggingsync_reposandsync_teamsto use the centralized helper; gracefully no-op when no provider service is availablecommit_update,preprocess_upload, andupload_finisherto retry when apps are rate-limited and return errors when apps are suspended/unavailable; adds corresponding breadcrumbsNoConfiguredAppsAvailablewithearliest_retry_after_seconds; retry delay uses this value or falls back toget_seconds_to_next_hour()(min 60s)Written by Cursor Bugbot for commit f3dcdea. This will update automatically on new commits. Configure here.