Skip to content

Logs: incorrect number of pages to crawl #1682

@andrebastosdias

Description

@andrebastosdias

Logs in crawlee 1.0.0

[BeautifulSoupCrawler] INFO  Crawled 0/1 pages, 0 failed requests, desired concurrency 10.
[BeautifulSoupCrawler] INFO  Current request statistics:
┌───────────────────────────────┬────────┐
│ requests_finished0      │
│ requests_failed0      │
│ retry_histogram               │ [0]    │
│ request_avg_failed_durationNone   │
│ request_avg_finished_durationNone   │
│ requests_finished_per_minute0      │
│ requests_failed_per_minute0      │
│ request_total_duration0s     │
│ requests_total0      │
│ crawler_runtime42.7ms │
└───────────────────────────────┴────────┘
[crawlee._autoscaling.autoscaled_pool] INFO  current_concurrency = 0; desired_concurrency = 10; cpu = 0; mem = 0; event_loop = 0.0; client_info = 0.0
[BeautifulSoupCrawler] INFO  Crawled 22/70 pages, 0 failed requests, desired concurrency 10.
[BeautifulSoupCrawler] INFO  Crawled 61/70 pages, 0 failed requests, desired concurrency 11.
[crawlee._autoscaling.autoscaled_pool] INFO  Waiting for remaining tasks to finish
[BeautifulSoupCrawler] INFO  Final request statistics:
┌───────────────────────────────┬────────────┐
│ requests_finished70         │
│ requests_failed0          │
│ retry_histogram               │ [70]       │
│ request_avg_failed_durationNone       │
│ request_avg_finished_duration2.82s      │
│ requests_finished_per_minute196        │
│ requests_failed_per_minute0          │
│ request_total_duration3min 17.4s │
│ requests_total70         │
│ crawler_runtime21.38s     │
└───────────────────────────────┴────────────┘

Same code in crawlee 1.2.0 and 1.3.0

[BeautifulSoupCrawler] INFO  Current request statistics:
┌───────────────────────────────┬──────┐
│ requests_finished0    │
│ requests_failed0    │
│ retry_histogram               │ [0]  │
│ request_avg_failed_durationNone │
│ request_avg_finished_durationNone │
│ requests_finished_per_minute0    │
│ requests_failed_per_minute0    │
│ request_total_duration0s   │
│ requests_total0    │
│ crawler_runtime0s   │
└───────────────────────────────┴──────┘
[BeautifulSoupCrawler] INFO  Crawled 0/318 pages, 0 failed requests, desired concurrency 10.
[crawlee._autoscaling.autoscaled_pool] INFO  current_concurrency = 0; desired_concurrency = 10; cpu = 0; mem = 0; event_loop = 0.0; client_info = 0.0
[BeautifulSoupCrawler] INFO  Crawled 25/387 pages, 0 failed requests, desired concurrency 10.
[BeautifulSoupCrawler] INFO  Crawled 55/387 pages, 0 failed requests, desired concurrency 11.
[crawlee._autoscaling.autoscaled_pool] INFO  Waiting for remaining tasks to finish
[BeautifulSoupCrawler] INFO  Final request statistics:
┌───────────────────────────────┬────────────┐
│ requests_finished70         │
│ requests_failed0          │
│ retry_histogram               │ [70]       │
│ request_avg_failed_durationNone       │
│ request_avg_finished_duration3.37s      │
│ requests_finished_per_minute178        │
│ requests_failed_per_minute0          │
│ request_total_duration3min 55.7s │
│ requests_total70         │
│ crawler_runtime23.55s     │
└───────────────────────────────┴────────────┘

Metadata

Metadata

Assignees

No one assigned

    Labels

    t-toolingIssues with this label are in the ownership of the tooling team.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions