feat: align EnqueueLinksOptions with crawlee-python#3533
Draft
l2ysho wants to merge 10 commits into
Draft
Conversation
…`exclude` API Align `EnqueueLinksOptions` with crawlee-python (#3409): - Replace `globs`, `regexps`, `pseudoUrls` options with `include`/`exclude` accepting `UrlPatternInput[]` - Strip request options (label, method, payload, userData, headers) from pattern objects — patterns are pure URL matchers - `transformRequestFunction` is now the only way to customize per-request options, runs after all filtering - Add `'skip'` and `'unchanged'` return values to `RequestTransform` (aligned with Python's `RequestTransformAction`) - Apply same changes to `enqueueLinksByClickingElements` (Playwright + Puppeteer) and `SitemapRequestList` - Remove `@apify/pseudo_url` dependency and `PseudoUrl` re-export - Update all templates from `globs` to `include` BREAKING CHANGE: `globs`, `regexps`, and `pseudoUrls` options removed. Use `include`/`exclude` instead.
- Replace remaining `globs` → `include` in docs (4 examples + 2 guides) - Convert `type` to `interface` for UrlPatternObject, GlobObject, RegExpObject (ESLint)
…`exclude` API Align `EnqueueLinksOptions` with crawlee-python (#3409): - Replace `globs`, `regexps`, `pseudoUrls` options with `include`/`exclude` accepting `UrlPatternInput[]` - Strip request options (label, method, payload, userData, headers) from pattern objects — patterns are pure URL matchers - `transformRequestFunction` is now the only way to customize per-request options, runs after all filtering - Add `'skip'` and `'unchanged'` return values to `RequestTransform` (aligned with Python's `RequestTransformAction`) - Apply same changes to `enqueueLinksByClickingElements` (Playwright + Puppeteer) and `SitemapRequestList` - Remove `@apify/pseudo_url` dependency and `PseudoUrl` re-export - Update all templates from `globs` to `include` BREAKING CHANGE: `globs`, `regexps`, and `pseudoUrls` options removed. Use `include`/`exclude` instead.
- Replace remaining `globs` → `include` in docs (4 examples + 2 guides) - Convert `type` to `interface` for UrlPatternObject, GlobObject, RegExpObject (ESLint)
4dbf9ef to
ca34f93
Compare
barjin
reviewed
Apr 30, 2026
…thon Resolve conflicts between the unified include/exclude enqueue API and v4: - enqueue_links.ts / click-elements.ts: keep include/exclude API, adopt v4's IRequestManager/requestManager and async createRequestQueueMock - index.ts: drop PseudoUrl export, keep v4's `export type` for type-only re-exports - sitemap_request_loader.ts: keep include/exclude over globs/regexps, adopt v4's SitemapRequestList -> SitemapRequestLoader rename - tests: keep include/exclude tests, drop obsolete pseudoUrls/globs tests, apply v4's async mock - drop yarn.lock (v4 migrated to pnpm); remove @apify/pseudo_url from pnpm-lock.yaml to match core's package.json Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ith-crawlee-python' Reconcile the local v4 merge with the rebased remote feature branch. The remote branch is based on an older v4 (IRequestList / RequestProvider / SitemapRequestList), while the local branch carries the latest v4 tip (IRequestLoader / IRequestManager / SitemapRequestLoader). Resolved all conflicts in favor of the newer v4 API, since the include/exclude feature logic is identical on both sides: - enqueue_links.ts, click-elements.ts (pw/pptr): keep IRequestManager import - sitemap_request_loader.ts: keep SitemapRequestLoader rename + IRequestLoader - sitemap_request_loader.test.ts: keep SitemapRequestLoader / new method names Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Update the v4 upgrade guide and sitemap example to reflect the globs/regexps/pseudoUrls -> include collapse, removal of PseudoUrl and per-pattern request options, and corrected transformRequestFunction precedence. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Align
EnqueueLinksOptionswith crawlee-python (#3409):globs,regexps,pseudoUrlsoptions withinclude/excludeacceptingUrlPatternInput[]transformRequestFunctionis now the only way to customize per-request options, runs after all filtering'skip'and'unchanged'return values toRequestTransform(aligned with Python'sRequestTransformAction)enqueueLinksByClickingElements(Playwright + Puppeteer) andSitemapRequestList@apify/pseudo_urldependency andPseudoUrlre-exportglobstoinclude