feat: schema-validated router labels via standard-schema#3748
Open
B4nan wants to merge 5 commits into
Open
Conversation
Introduce per-label typing of `request.userData` for the router, in two layers: - A `label -> userData` map can be passed as the router's `Routes` type argument, typing `request.userData` per label and rejecting unknown labels at compile time. Backwards compatible (default is an open map). - A per-label Standard Schema map (Zod, Valibot, ArkType, …) passed to `Router.create`/`createXRouter` both infers the `userData` types and validates them at runtime before the handler runs, replacing `request.userData` with the parsed value. Invalid requests throw a new non-retryable `RequestValidationError`. Adds the types-only `@standard-schema/spec` dependency to `@crawlee/core`. Relates to #3082
- default handler keeps `request.userData` loosely typed (it is a fallback for any request, including labels not in the route map) - split factory/`Router.create` into explicit overloads (route map vs legacy flat userData) for backwards compatibility, keeping the schema overload - drop the exported `RouteMap` alias (referenced as prose in docs instead)
When a typed router is used, the route map now also types the request inputs: - handler context: `ctx.addRequests` and `ctx.enqueueLinks` require the `userData` shape matching the request's `label` (and reject unknown labels); this is driven by the router, so it works for every crawler type. - crawler instance: `Routes` is inferred from the `requestHandler` option and used to type `crawler.addRequests`/`crawler.run` for the HTTP-based crawlers (Basic/Http/Cheerio/JSDOM/LinkeDOM). Unlabeled requests keep loose `userData` (they hit the default handler). All typing is backwards compatible via the open-map default. Note: crawler-instance `addRequests` typing is not yet wired for the browser crawlers (Playwright/Puppeteer/Stagehand) — their `requestHandler` is redefined in BrowserCrawlerOptions which breaks generic inference through the hierarchy; their handler-context methods are still fully typed via the router.
Threads the `Routes` generic through the browser crawler hierarchy (`BrowserCrawler` + Playwright/Puppeteer/Stagehand/AdaptivePlaywright) so `crawler.addRequests`/`run` are route-typed for them as well, completing the propagation started for the HTTP-based crawlers. The missing piece was that `PlaywrightCrawlerOptions`/`StagehandCrawlerOptions` redefined `requestHandler` without the route-aware `RouterHandler` overload, shadowing the one on `BrowserCrawlerOptions` and preventing `Routes` inference. `Routes` is also placed before the internal `__Browser*` generics on `BrowserCrawler(Options)` so subclasses can forward it positionally.
The route-aware requestHandler union paired `RouterHandler<ExtendedContext>`
with `StagehandRequestHandler` (which wraps the context in `LoadedContext`).
The two union members had different call-signature parameter types, so TS
could not contextually type an inline `requestHandler({ page, request, ... })`
and reported the params as implicit `any` (breaking the Stagehand guide
examples). Use `RequestHandler<ExtendedContext>` for the plain member so both
signatures are identical; `StagehandCrawlingContext` already carries a
`LoadedRequest`, so nothing is lost.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Makes the router's request
labeldrive the type ofrequest.userDataend-to-end — from declaring routes, through the request handler, all the way to the methods that enqueue new requests. Targetsv4.1. Typed route map (compile-time)
Declare a
label → userDatamap and pass it as the router's second type argument. Handlers getrequest.userDatatyped per label, and unknown labels are a compile error. Fully backwards compatible — the default route map is open (Record<string, …>), so existing untyped usage and the legacy flat-userDatageneric keep working.The default handler (
addDefaultHandler) is a fallback for any request, so itsuserDatastays loosely typed.2. Schema-validated labels (types + runtime validation)
Passing a per-label Standard Schema (Zod, Valibot, ArkType, …) both infers the
userDatatypes and validates them at runtime before the handler runs, replacingrequest.userDatawith the parsed value. InvaliduserDatathrows a new non-retryableRequestValidationError(the data won't change between attempts) carrying the schema issues.3. Propagation to crawler & context request methods
When a typed router is used, the route map now also types the request inputs: providing a declared
labelrequires the matchinguserDatashape (and rejects unknown labels); unlabeled requests keep looseuserData(they hit the default handler).ctx.addRequests/ctx.enqueueLinksare typed from the route map. Driven by the router itself, so it works for every crawler type.Routesis inferred from therequestHandleroption and typescrawler.addRequests/crawler.run. Threaded through all crawler classes (Basic/Http/Cheerio/JSDOM/LinkeDOM and the browser family: Browser/Playwright/Puppeteer/Stagehand/AdaptivePlaywright).How
RouteSchemas/RoutesFromSchemas(viaStandardSchemaV1.InferOutput);Router.createand thecreateXRouterfactories gain overloads for the route map, the legacy flat-userDataform, and the schema map (disambiguated by value type). Adds the tiny, types-only@standard-schema/specdep to@crawlee/core(zodis already a core dep onv4and implements the interface).LabeledSource/TypedRequestsLikebuild a label-discriminated request-input type; a signature-preserving transform retypesenqueueLinks'slabel/userDatawithout changing argument optionality or return type (which differ per crawler).Routesgeneric is inferred fromrequestHandlerand threaded through the crawler-class hierarchy.All of it is backwards compatible via the open route-map default.
Relates to #3082. The remaining parts of that issue (typed
pushData/ key-value store) are a separate axis and are not in scope here.🤖 Generated with Claude Code