Skip to content

Add post-navigation hooks and browser lifecycle hooks #1741

@vdusek

Description

@vdusek

Context

Crawlee JS provides navigation hooks and browser lifecycle hooks that Python is missing. See parity report for broader context.

Gaps

1. Post-navigation hooks (main gap)

JS BrowserCrawler and HttpCrawler both support postNavigationHooks — they run after page.goto() / HTTP request completes but before the request handler. Useful for CAPTCHA detection, response validation, etc.

Python only has only pre_navigation_hook. No post-navigation equivalent exists.

2. Browser lifecycle hooks (BrowserPool)

JS BrowserPool exposes 6 hook types, these are for consideration:

  • preLaunchHooks / postLaunchHooks — before/after browser launch
  • prePageCreateHooks / postPageCreateHooks — before/after new page creation
  • prePageCloseHooks / postPageCloseHooks — before/after page close

Python's BrowserPool has no lifecycle hooks.

Reference

  • JS BrowserCrawler hooks: packages/browser-crawler/src/internals/browser-crawler.ts
  • JS BrowserPool hooks: packages/browser-pool/src/browser-pool.ts
  • Python pre-nav hooks: src/crawlee/crawlers/_playwright/_playwright_crawler.py
  • Python BrowserPool: src/crawlee/browsers/_browser_pool.py

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request.t-toolingIssues with this label are in the ownership of the tooling team.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions