Skip to content

Conversation

@VinciGit00
Copy link
Contributor

Add support for webhook notifications when crawl jobs complete. This allows users to receive POST notifications at a specified URL when their crawl job finishes processing.

Changes:

  • Python SDK: Added webhook_url to CrawlRequest model with validation
  • Python SDK: Updated sync and async client crawl methods
  • JavaScript SDK: Added webhookUrl option to crawl function

Add support for webhook notifications when crawl jobs complete.
This allows users to receive POST notifications at a specified URL
when their crawl job finishes processing.

Changes:
- Python SDK: Added webhook_url to CrawlRequest model with validation
- Python SDK: Updated sync and async client crawl methods
- JavaScript SDK: Added webhookUrl option to crawl function

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@github-actions
Copy link

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for providing a webhook URL so users can receive POST notifications when crawl jobs complete.

Changes:

  • Python SDK: Extend CrawlRequest with webhook_url and add validation.
  • Python SDK: Add webhook_url parameter to sync/async crawl() methods and include it in request payloads.
  • JavaScript SDK: Add webhookUrl option and map it to webhook_url in the crawl payload.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
scrapegraph-py/scrapegraph_py/models/crawl.py Adds webhook_url field to CrawlRequest plus URL-format validation.
scrapegraph-py/scrapegraph_py/client.py Adds webhook_url to sync client crawl() signature, logging, and request JSON.
scrapegraph-py/scrapegraph_py/async_client.py Adds webhook_url to async client crawl() signature and request JSON.
scrapegraph-js/src/crawl.js Adds webhookUrl option and sends it as webhook_url in the request payload.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +178 to +191
@model_validator(mode="after")
def validate_webhook_url(self) -> "CrawlRequest":
"""Validate webhook URL format if provided"""
if self.webhook_url is not None:
if not self.webhook_url.strip():
raise ValueError("Webhook URL cannot be empty")
if not (
self.webhook_url.startswith("http://")
or self.webhook_url.startswith("https://")
):
raise ValueError(
"Invalid webhook URL - must start with http:// or https://"
)
return self
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing test coverage for the newly added webhook_url behavior (accepts valid http(s) URLs, rejects empty/whitespace, rejects non-http(s) schemes, and is included/excluded correctly in model_dump(exclude_none=True)). There are already CrawlRequest validation/serialization tests (e.g., scrapegraph-py/tests/test_crawl_path_filtering.py) that should be extended to cover this field.

Copilot uses AI. Check for mistakes.
Comment on lines 961 to +963
request_data["exclude_paths"] = exclude_paths
if webhook_url is not None:
request_data["webhook_url"] = webhook_url
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The client now adds webhook_url to request_data, but there are no assertions in the existing crawl request-body tests to ensure this field is actually sent when provided (and omitted when None). Please extend the existing crawl tests that inspect request JSON (e.g., in scrapegraph-py/tests/test_crawl_polling.py) to cover webhook_url.

Copilot uses AI. Check for mistakes.
Comment on lines 949 to +953
request_data["include_paths"] = include_paths
if exclude_paths is not None:
request_data["exclude_paths"] = exclude_paths
if webhook_url is not None:
request_data["webhook_url"] = webhook_url
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Async crawl currently supports webhook_url, but the async client tests don’t verify the outgoing request payload includes webhook_url when set. Consider adding an async test that asserts the POST body contains webhook_url (and omits it when None) to prevent regressions.

Copilot uses AI. Check for mistakes.
Comment on lines +102 to +103
if (webhookUrl) {
payload.webhook_url = webhookUrl;
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (webhookUrl) silently ignores explicitly provided empty-string values and also allows whitespace-only strings through (both are likely invalid URLs). Prefer treating “provided” as webhookUrl != null, validating it’s a non-empty string after trim and starts with http:// or https://, and then setting payload.webhook_url (or throwing a clear error) to avoid surprising no-op behavior.

Suggested change
if (webhookUrl) {
payload.webhook_url = webhookUrl;
if (webhookUrl != null) {
if (typeof webhookUrl !== 'string') {
throw new Error('webhookUrl must be a string starting with "http://" or "https://".');
}
const trimmedWebhookUrl = webhookUrl.trim();
if (!trimmedWebhookUrl) {
throw new Error('webhookUrl must be a non-empty string.');
}
if (!trimmedWebhookUrl.startsWith('http://') && !trimmedWebhookUrl.startsWith('https://')) {
throw new Error('webhookUrl must start with "http://" or "https://".');
}
payload.webhook_url = trimmedWebhookUrl;

Copilot uses AI. Check for mistakes.
@VinciGit00 VinciGit00 merged commit 912a1e8 into main Jan 23, 2026
12 of 16 checks passed
@VinciGit00 VinciGit00 deleted the feat/add-webhook-url-to-crawl branch January 23, 2026 07:09
@github-actions
Copy link

🎉 This PR is included in version 1.45.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants