Skip to content

GTFS Diff schema v2-rc1 update#19

Merged
cka-y merged 7 commits into
mainfrom
update-schema
Jun 2, 2026
Merged

GTFS Diff schema v2-rc1 update#19
cka-y merged 7 commits into
mainfrom
update-schema

Conversation

@cka-y

@cka-y cka-y commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Summary:

  • Changed schema version from 2.0.0 to 2.0.0-rc1 and rename schema file to v2-rc1.json
  • Removed all additionalProperties constraints to support extensions
  • Added not_compared file status with a reason object (code + message) for files that cannot be meaningfully compared
  • Added ignored_columns to file diffs for columns excluded due to unreliable values (e.g. referencing a file that isn't compared)
  • Add optional per-file and per-column change statistics (row counts, change percentages, column modification breakdowns)
  • Add files_not_compared_count to the summary
  • Add validation scripts and a GitHub Action to check JSON syntax and validate examples against the schema

Comment thread .github/workflows/validate-schema.yml
"minimum": 0,
"description": "Total number of rows in the base version of the file."
},
"total_rows_new": {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add total_rows_modified?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done ✅

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whats the diff with rows_modified_count?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I was too focused on the differences. Here we have a few fields: rows_added_count, rows_deleted_count, and rows_modified_count on a file diff node; then we have the stats on the node with some metrics. We should align these metrics by adding all fields to the root of the diff node and removing the stats node, or moving them all to the stats node(I'm ok either way). And use total_rows_modified or rows_modified_count not both. Thoughts?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I kept all stats in the stats node and have stats exclusively in the file_diff node to avoid duplication

Comment thread spec/v2/specification.md Outdated
@cka-y cka-y requested a review from davidgamez June 1, 2026 17:21
Comment thread spec/v2/specification.md Outdated
Co-authored-by: jcpitre <106176106+jcpitre@users.noreply.github.com>
@cka-y cka-y requested a review from jcpitre June 1, 2026 17:48
Comment thread spec/v2/specification.md Outdated
Co-authored-by: jcpitre <106176106+jcpitre@users.noreply.github.com>
@cka-y cka-y requested a review from jcpitre June 1, 2026 17:55
Comment thread spec/v2/json_schema/v2-rc1.json
Comment thread spec/v2/json_schema/v2-rc1.json Outdated
Comment thread .github/workflows/validate-schema.yml
@cka-y cka-y requested a review from jcpitre June 2, 2026 14:38
Comment thread spec/v2/json_schema/v2-rc1.json
Comment thread spec/v2/json_schema/v2-rc1.json Outdated

@jcpitre jcpitre left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 small comments, but apart from that LGTM!

Co-authored-by: jcpitre <106176106+jcpitre@users.noreply.github.com>
@cka-y cka-y merged commit d6450a1 into main Jun 2, 2026
1 check passed
@cka-y cka-y deleted the update-schema branch June 2, 2026 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

3 participants