Skip to content

feat(python-notebook-migration): add notebook-migration-service microservice in backend#5258

Draft
zyratlo wants to merge 7 commits into
apache:mainfrom
zyratlo:migration-tool-backend-notebook-migration-service
Draft

feat(python-notebook-migration): add notebook-migration-service microservice in backend#5258
zyratlo wants to merge 7 commits into
apache:mainfrom
zyratlo:migration-tool-backend-notebook-migration-service

Conversation

@zyratlo

@zyratlo zyratlo commented May 28, 2026

Copy link
Copy Markdown
Contributor

What changes were proposed in this PR?

Introduces the microservice that mediates between Texera and the JupyterLab docker stack landed in migration-tool-jupyter-docker. Adds a new SBT subproject notebook-migration-service plus shared config and a frontend dev-proxy route.

New SBT subproject notebook-migration-service/:

  • build.sbt and project/build.properties — module SBT setup; module depends on the existing Auth, Config, and DAO projects
  • src/main/scala/.../NotebookMigrationService.scala — Dropwizard Application entry point; sets Jersey URL pattern to /api/*, registers the resource class, initializes the shared SQL connection via
    SqlServer.initConnection(StorageConfig.jdbcUrl, …), and wires in RequestLoggingFilter.
  • src/main/scala/.../NotebookMigrationServiceConfiguration.scala — Dropwizard Configuration subclass.
  • src/main/scala/.../resource/NotebookMigrationResource.scala — five REST endpoints under /notebook-migration:
    • GET /get-jupyter-url — health-checks the Jupyter container and returns its base URL.
    • GET /get-jupyter-iframe-url — returns the iframe-ready URL for notebook.ipynb.
    • POST /set-notebook — receives a notebook JSON, PUTs it into JupyterLab via its /api/contents/work/{name} API.
    • POST /store-notebook-and-mapping — persists a notebook + workflow-notebook mapping into Postgres in a single transaction (writes to the notebook and workflow_notebook_mapping tables added by migration-tool-database-tables).
    • POST /fetch-notebook-and-mapping — returns the most recent notebook + mapping for a given (wid, vid).
  • src/main/resources/logback.xml — logging config.
  • src/main/resources/notebook-migration-service-web-config.yaml — Dropwizard server config (HTTP port 9098, DB connection refs).

Root build wiring:

  • build.sbt — declares the new NotebookMigrationService SBT subproject and adds it to the TexeraProject aggregation.

Shared config:

  • common/config/src/main/resources/storage.conf — new jupyter { url = "http://localhost:9100" } block, overridable via STORAGE_JUPYTER_URL.
  • common/config/src/main/scala/.../StorageConfig.scala — adds the jupyterURL accessor.

Frontend dev proxy:

  • frontend/proxy.config.json — routes /api/notebook-migration/* to http://localhost:9098.

Any related issues, documentation, discussions?

Closes #5257
Parent issue #4301

  • Hard dependency: must be merged after migration-tool-database-tables feat(python-notebook-migration): add database tables for Notebook Migration tool #5055 — the resource imports jOOQ-generated Notebook / WorkflowNotebookMapping classes that only exist once the schema PR is merged.
  • Soft dependency: the JupyterLab container from migration-tool-jupyter-docker is what StorageConfig.jupyterURL points to. Without it running, the Jupyter-related endpoints return a 500 with "Cannot connect to Jupyter server". Service still starts and the DB-persistence endpoints work in isolation.

How was this PR tested?

Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.7)

@github-actions github-actions Bot added dependencies Pull requests that update a dependency file frontend Changes related to the frontend GUI common labels May 28, 2026
@codecov-commenter

codecov-commenter commented May 28, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 52.94%. Comparing base (891d2ad) to head (346cb6a).
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #5258      +/-   ##
============================================
- Coverage     52.95%   52.94%   -0.01%     
- Complexity     2627     2630       +3     
============================================
  Files          1090     1090              
  Lines         42210    42210              
  Branches       4534     4534              
============================================
- Hits          22353    22350       -3     
- Misses        18546    18548       +2     
- Partials       1311     1312       +1     
Flag Coverage Δ *Carryforward flag
access-control-service 70.91% <ø> (ø)
agent-service 34.36% <ø> (ø) Carriedforward from a090029
amber 53.15% <ø> (+0.03%) ⬆️ Carriedforward from a090029
computing-unit-managing-service 1.65% <ø> (ø)
config-service 56.71% <ø> (ø)
file-service 57.06% <ø> (ø)
frontend 47.89% <ø> (-0.04%) ⬇️
pyamber 89.77% <ø> (ø) Carriedforward from a090029
python 90.73% <ø> (ø) Carriedforward from a090029
workflow-compiling-service 58.69% <ø> (ø)

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@zyratlo zyratlo marked this pull request as ready for review May 28, 2026 04:34
@Yicong-Huang Yicong-Huang changed the title feat(python-notebook-migration, backend): add notebook-migration-service microservice in backend feat(python-notebook-migration): add notebook-migration-service microservice in backend May 28, 2026
@github-actions

Copy link
Copy Markdown
Contributor

⚠️ Benchmark changes need a look

🟢 0 better · 🔴 10 worse · ⚪ 5 noise (<±5%) · 0 without baseline

Compared against main 891d2ad benchmarked on this same runner, so the delta is largely free of cross-runner hardware noise. The "7d avg" column still reflects the gh-pages dashboard. Treat <±5% as noise unless repeated.

Dashboard · Run

config throughput MB/s latency max Δ latest / 7d
🔴 bs=10 sw=10 sl=64 405 0.247 23,413/36,190/36,190 us 🔴 +9.3% / ⚪ within ±5%
🔴 bs=100 sw=10 sl=64 920 0.562 113,140/143,208/143,208 us 🔴 +19.6% / ⚪ within ±5%
bs=1000 sw=10 sl=64 1,099 0.671 909,583/949,827/949,827 us ⚪ within ±5% / 🟢 -7.2%
Baseline details

Latest main 891d2ad from same runner

config metric PR latest main 7d avg Δ latest Δ 7d
bs=10 sw=10 sl=64 throughput 405 tuples/sec 434 tuples/sec 410.82 tuples/sec -6.7% -1.4%
bs=10 sw=10 sl=64 MB/s 0.247 MB/s 0.265 MB/s 0.251 MB/s -6.8% -1.5%
bs=10 sw=10 sl=64 p50 23,413 us 21,414 us 23,785 us +9.3% -1.6%
bs=10 sw=10 sl=64 p95 36,190 us 33,824 us 34,980 us +7.0% +3.5%
bs=10 sw=10 sl=64 p99 36,190 us 33,824 us 34,980 us +7.0% +3.5%
bs=100 sw=10 sl=64 throughput 920 tuples/sec 971 tuples/sec 891.94 tuples/sec -5.3% +3.1%
bs=100 sw=10 sl=64 MB/s 0.562 MB/s 0.593 MB/s 0.544 MB/s -5.2% +3.2%
bs=100 sw=10 sl=64 p50 113,140 us 102,487 us 112,277 us +10.4% +0.8%
bs=100 sw=10 sl=64 p95 143,208 us 119,696 us 139,802 us +19.6% +2.4%
bs=100 sw=10 sl=64 p99 143,208 us 119,696 us 139,802 us +19.6% +2.4%
bs=1000 sw=10 sl=64 throughput 1,099 tuples/sec 1,085 tuples/sec 1,041 tuples/sec +1.3% +5.6%
bs=1000 sw=10 sl=64 MB/s 0.671 MB/s 0.663 MB/s 0.635 MB/s +1.2% +5.6%
bs=1000 sw=10 sl=64 p50 909,583 us 919,939 us 972,714 us -1.1% -6.5%
bs=1000 sw=10 sl=64 p95 949,827 us 967,952 us 1,023,057 us -1.9% -7.2%
bs=1000 sw=10 sl=64 p99 949,827 us 967,952 us 1,023,057 us -1.9% -7.2%
Raw CSV
config_idx,batch_size,schema_width,string_len,num_batches,total_ms,total_tuples,total_bytes,tuples_per_sec,mb_per_sec,lat_p50_us,lat_p95_us,lat_p99_us
0,10,10,64,20,493.24,200,128000,405,0.247,23412.50,36190.37,36190.37
1,100,10,64,20,2173.26,2000,1280000,920,0.562,113139.63,143207.60,143207.60
2,1000,10,64,20,18194.27,20000,12800000,1099,0.671,909583.22,949827.03,949827.03

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common dependencies Pull requests that update a dependency file frontend Changes related to the frontend GUI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Notebook Migration] Add notebook-migration-service for Texera and Jupyter communication and Database communication

2 participants