[Draft][Epic] Extract and normalize GTFS feed providers

## Summary

Extract provider information from GTFS `agency.txt` and `feed_info.txt`, store unique raw file records, and allow humans to link them to canonical `feed_provider` records.

## Problem

GTFS files include agencies and publishers, but names and URLs are not always consistent.

Example:

- `TTC`
- `Toronto Transit Commission`
- `Toronto Transit Commission (TTC)`

The system should not guess. It should store the raw values and support manual normalization.

## Goals

- Extract `agency.txt` and `feed_info.txt`
- Avoid duplicate raw records when the file values are identical
- Link datasets to the extracted records
- Add canonical `feed_provider` records
- Store provider names and contacts
- Link raw records to a `feed_provider`
- Generate feed-level provider roles
- Reuse known mappings only on exact name and URL match


## Proposed model

### Feed provider tables

- `feed_provider`
  - `id`
  - `name`
  - `organization_id`
  - `status`: wip, published, not_published
  - `created_at`
  - `updated_at`

 
- `feed_provider_name_alias`
  - `id`
  - `feed_provider_id`
  - `value`

- `feed_provider_contact`
  - `id`
  - `feed_provider_id`
  - `contact_type`
  - `value`

Contact types:

- `website`
- `email`
- `phone`

### Raw GTFS file tables

- `agency_txt`
  - unique rows from `agency.txt`
  - includes `agency_id`, `agency_name`, `agency_url`, timezone, phone, email, fare URL
  - includes optional `feed_provider_id`
  - includes `content_hash`

- `gtfs_dataset_agency_txt`
  - links GTFS datasets to `agency_txt` rows

- `feed_info_txt`
  - unique rows from `feed_info.txt`
  - includes publisher name, publisher URL, language, dates, version, contact email, contact URL
  - includes optional `feed_provider_id`
  - includes `content_hash`

- `gtfs_dataset_feed_info_txt`
  - links GTFS datasets to `feed_info_txt` rows

### Feed-level role table

- `feed_provider_feed_role`
  - `feed_id`
  - `feed_provider_id`
  - `role`

Supported roles:

- `agency`
- `publisher`

## Process

1. Parse `agency.txt` and `feed_info.txt` when a GTFS dataset is processed.
2. Create or reuse `agency_txt` and `feed_info_txt` records using `content_hash`.
3. Link the dataset to those records.
4. If an exact known mapping exists, assign `feed_provider_id`.
5. If not, create a new feed_provider and leave it unlinked for human review.
6. When linked to a provider, add missing names and contacts.
7. Generate feed-level roles from the latest dataset.

## Matching rules


Reuse a known mapping only when both values match exactly:

- `agency_name` or `agency_url`
- `feed_publisher_name` or `feed_publisher_url`

## API proposal

- `GET /v1/gtfs/datasets/{dataset_id}/agency-txt`
- `GET /v1/gtfs/datasets/{dataset_id}/feed-info-txt`
- `GET /v1/feeds/{feed_id}/feed-providers?role=agency|publisher`

## Acceptance criteria

- `agency.txt` and `feed_info.txt` are extracted.
- Identical raw records are stored once and linked to datasets.
- Raw records can be linked to a `feed_provider`.
- `feed_provider` can have multiple names and contacts.
- `feed_provider.organization_id` is optional.
- Feed-level provider roles are generated from linked records.
- Exact known mappings are reused.
- No fuzzy matching or confidence score is added.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Draft][Epic] Extract and normalize GTFS feed providers #1715

Summary

Problem

Goals

Proposed model

Feed provider tables

Raw GTFS file tables

Feed-level role table

Process

Matching rules

API proposal

Acceptance criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Draft][Epic] Extract and normalize GTFS feed providers #1715

Description

Summary

Problem

Goals

Proposed model

Feed provider tables

Raw GTFS file tables

Feed-level role table

Process

Matching rules

API proposal

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions