-
Notifications
You must be signed in to change notification settings - Fork 14
feat: add webhook_url parameter to crawler endpoint #71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add support for webhook notifications when crawl jobs complete. This allows users to receive POST notifications at a specified URL when their crawl job finishes processing. Changes: - Python SDK: Added webhook_url to CrawlRequest model with validation - Python SDK: Updated sync and async client crawl methods - JavaScript SDK: Added webhookUrl option to crawl function Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Adds support for providing a webhook URL so users can receive POST notifications when crawl jobs complete.
Changes:
- Python SDK: Extend
CrawlRequestwithwebhook_urland add validation. - Python SDK: Add
webhook_urlparameter to sync/asynccrawl()methods and include it in request payloads. - JavaScript SDK: Add
webhookUrloption and map it towebhook_urlin the crawl payload.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| scrapegraph-py/scrapegraph_py/models/crawl.py | Adds webhook_url field to CrawlRequest plus URL-format validation. |
| scrapegraph-py/scrapegraph_py/client.py | Adds webhook_url to sync client crawl() signature, logging, and request JSON. |
| scrapegraph-py/scrapegraph_py/async_client.py | Adds webhook_url to async client crawl() signature and request JSON. |
| scrapegraph-js/src/crawl.js | Adds webhookUrl option and sends it as webhook_url in the request payload. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @model_validator(mode="after") | ||
| def validate_webhook_url(self) -> "CrawlRequest": | ||
| """Validate webhook URL format if provided""" | ||
| if self.webhook_url is not None: | ||
| if not self.webhook_url.strip(): | ||
| raise ValueError("Webhook URL cannot be empty") | ||
| if not ( | ||
| self.webhook_url.startswith("http://") | ||
| or self.webhook_url.startswith("https://") | ||
| ): | ||
| raise ValueError( | ||
| "Invalid webhook URL - must start with http:// or https://" | ||
| ) | ||
| return self |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing test coverage for the newly added webhook_url behavior (accepts valid http(s) URLs, rejects empty/whitespace, rejects non-http(s) schemes, and is included/excluded correctly in model_dump(exclude_none=True)). There are already CrawlRequest validation/serialization tests (e.g., scrapegraph-py/tests/test_crawl_path_filtering.py) that should be extended to cover this field.
| request_data["exclude_paths"] = exclude_paths | ||
| if webhook_url is not None: | ||
| request_data["webhook_url"] = webhook_url |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The client now adds webhook_url to request_data, but there are no assertions in the existing crawl request-body tests to ensure this field is actually sent when provided (and omitted when None). Please extend the existing crawl tests that inspect request JSON (e.g., in scrapegraph-py/tests/test_crawl_polling.py) to cover webhook_url.
| request_data["include_paths"] = include_paths | ||
| if exclude_paths is not None: | ||
| request_data["exclude_paths"] = exclude_paths | ||
| if webhook_url is not None: | ||
| request_data["webhook_url"] = webhook_url |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Async crawl currently supports webhook_url, but the async client tests don’t verify the outgoing request payload includes webhook_url when set. Consider adding an async test that asserts the POST body contains webhook_url (and omits it when None) to prevent regressions.
| if (webhookUrl) { | ||
| payload.webhook_url = webhookUrl; |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (webhookUrl) silently ignores explicitly provided empty-string values and also allows whitespace-only strings through (both are likely invalid URLs). Prefer treating “provided” as webhookUrl != null, validating it’s a non-empty string after trim and starts with http:// or https://, and then setting payload.webhook_url (or throwing a clear error) to avoid surprising no-op behavior.
| if (webhookUrl) { | |
| payload.webhook_url = webhookUrl; | |
| if (webhookUrl != null) { | |
| if (typeof webhookUrl !== 'string') { | |
| throw new Error('webhookUrl must be a string starting with "http://" or "https://".'); | |
| } | |
| const trimmedWebhookUrl = webhookUrl.trim(); | |
| if (!trimmedWebhookUrl) { | |
| throw new Error('webhookUrl must be a non-empty string.'); | |
| } | |
| if (!trimmedWebhookUrl.startsWith('http://') && !trimmedWebhookUrl.startsWith('https://')) { | |
| throw new Error('webhookUrl must start with "http://" or "https://".'); | |
| } | |
| payload.webhook_url = trimmedWebhookUrl; |
|
🎉 This PR is included in version 1.45.0 🎉 The release is available on:
Your semantic-release bot 📦🚀 |
Add support for webhook notifications when crawl jobs complete. This allows users to receive POST notifications at a specified URL when their crawl job finishes processing.
Changes: