Skip to content

feat: align EnqueueLinksOptions with crawlee-python#3533

Draft
l2ysho wants to merge 10 commits into
v4from
3409-align-enqueuelinksoptions-with-crawlee-python
Draft

feat: align EnqueueLinksOptions with crawlee-python#3533
l2ysho wants to merge 10 commits into
v4from
3409-align-enqueuelinksoptions-with-crawlee-python

Conversation

@l2ysho

@l2ysho l2ysho commented Mar 27, 2026

Copy link
Copy Markdown
Contributor

Align EnqueueLinksOptions with crawlee-python (#3409):

  • Replace globs, regexps, pseudoUrls options with include/exclude accepting UrlPatternInput[]
  • Strip request options (label, method, payload, userData, headers) from pattern objects — patterns are pure URL matchers
  • transformRequestFunction is now the only way to customize per-request options, runs after all filtering
  • Add 'skip' and 'unchanged' return values to RequestTransform (aligned with Python's RequestTransformAction)
  • Apply same changes to enqueueLinksByClickingElements (Playwright + Puppeteer) and SitemapRequestList
  • Remove @apify/pseudo_url dependency and PseudoUrl re-export
  • Update all templates from globs to include

l2ysho added 5 commits March 27, 2026 14:44
…`exclude` API

Align `EnqueueLinksOptions` with crawlee-python (#3409):

- Replace `globs`, `regexps`, `pseudoUrls` options with `include`/`exclude` accepting `UrlPatternInput[]`
- Strip request options (label, method, payload, userData, headers) from pattern objects — patterns are pure URL matchers
- `transformRequestFunction` is now the only way to customize per-request options, runs after all filtering
- Add `'skip'` and `'unchanged'` return values to `RequestTransform` (aligned with Python's `RequestTransformAction`)
- Apply same changes to `enqueueLinksByClickingElements` (Playwright + Puppeteer) and `SitemapRequestList`
- Remove `@apify/pseudo_url` dependency and `PseudoUrl` re-export
- Update all templates from `globs` to `include`

BREAKING CHANGE: `globs`, `regexps`, and `pseudoUrls` options removed. Use `include`/`exclude` instead.
- Replace remaining `globs` → `include` in docs (4 examples + 2 guides)
- Convert `type` to `interface` for UrlPatternObject, GlobObject, RegExpObject (ESLint)
…`exclude` API

Align `EnqueueLinksOptions` with crawlee-python (#3409):

- Replace `globs`, `regexps`, `pseudoUrls` options with `include`/`exclude` accepting `UrlPatternInput[]`
- Strip request options (label, method, payload, userData, headers) from pattern objects — patterns are pure URL matchers
- `transformRequestFunction` is now the only way to customize per-request options, runs after all filtering
- Add `'skip'` and `'unchanged'` return values to `RequestTransform` (aligned with Python's `RequestTransformAction`)
- Apply same changes to `enqueueLinksByClickingElements` (Playwright + Puppeteer) and `SitemapRequestList`
- Remove `@apify/pseudo_url` dependency and `PseudoUrl` re-export
- Update all templates from `globs` to `include`

BREAKING CHANGE: `globs`, `regexps`, and `pseudoUrls` options removed. Use `include`/`exclude` instead.
- Replace remaining `globs` → `include` in docs (4 examples + 2 guides)
- Convert `type` to `interface` for UrlPatternObject, GlobObject, RegExpObject (ESLint)
@l2ysho l2ysho force-pushed the 3409-align-enqueuelinksoptions-with-crawlee-python branch from 4dbf9ef to ca34f93 Compare April 28, 2026 20:40

@barjin barjin left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A passer-by note - if we're planning to drop this from both Crawlee JS and Crawlee for Python, we might want to get in touch w/ the Console team to drop it from the input components as well:

video.mp4

l2ysho and others added 3 commits June 18, 2026 08:35
…thon

Resolve conflicts between the unified include/exclude enqueue API and v4:
- enqueue_links.ts / click-elements.ts: keep include/exclude API, adopt
  v4's IRequestManager/requestManager and async createRequestQueueMock
- index.ts: drop PseudoUrl export, keep v4's `export type` for type-only re-exports
- sitemap_request_loader.ts: keep include/exclude over globs/regexps, adopt
  v4's SitemapRequestList -> SitemapRequestLoader rename
- tests: keep include/exclude tests, drop obsolete pseudoUrls/globs tests,
  apply v4's async mock
- drop yarn.lock (v4 migrated to pnpm); remove @apify/pseudo_url from
  pnpm-lock.yaml to match core's package.json

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ith-crawlee-python'

Reconcile the local v4 merge with the rebased remote feature branch.
The remote branch is based on an older v4 (IRequestList / RequestProvider /
SitemapRequestList), while the local branch carries the latest v4 tip
(IRequestLoader / IRequestManager / SitemapRequestLoader). Resolved all
conflicts in favor of the newer v4 API, since the include/exclude feature
logic is identical on both sides:
- enqueue_links.ts, click-elements.ts (pw/pptr): keep IRequestManager import
- sitemap_request_loader.ts: keep SitemapRequestLoader rename + IRequestLoader
- sitemap_request_loader.test.ts: keep SitemapRequestLoader / new method names

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Update the v4 upgrade guide and sitemap example to reflect the
globs/regexps/pseudoUrls -> include collapse, removal of PseudoUrl and
per-pattern request options, and corrected transformRequestFunction
precedence.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants