A Lightweight Public Transport Stops API as a Data Engineering Laboratory
This working paper investigates how a lightweight, read-only API built on top of open public transport data can function as a practical laboratory for data engineering, spatial querying, and applied analytics. Rather than focusing on performance optimization or production-grade guarantees, the study explores how minimal architectural choices—Parquet snapshots, in-process SQL engines, and simple HTTP interfaces—can enable meaningful experimentation and learning.
The work examines a thin JSON/HTTP wrapper around Auckland Transport’s bus stop dataset, designed to support exploratory use cases such as spatial proximity analysis, pagination, and identifier-based lookup. The goal is not to propose a generalized solution for geospatial APIs, but to reflect on the trade-offs involved in deliberately constrained systems intended for research, teaching, and prototyping.
This paper is exploratory and conceptual in nature, grounded in an applied implementation. It does not claim completeness, operational robustness, or external validity beyond its context. As a working paper, it presents observations, design decisions, and open questions that emerge from the implementation, rather than finalized results or prescriptive architectures. The intent is to invite reflection on how small, well-scoped data services can act as educational and experimental tools within the broader data engineering ecosystem.
General Information
Motivation
Modern data engineering discourse often centers on large-scale, production-ready platforms, which can obscure the value of smaller, intentionally limited systems. The motivation behind this investigation is to understand how a simple API, built on open data and modest infrastructure, can still support meaningful experimentation with data access patterns, spatial queries, and system design.
The Auckland Bus Stops API was conceived as a public utility rather than a commercial or mission-critical service. Its purpose is to lower the barrier to entry for working with real-world transport data while maintaining transparency about its limitations.
Scope and assumptions
This work focuses exclusively on stop-level public transport data derived from a GTFS-based snapshot. It assumes periodic, but not guaranteed, data freshness and operates under the assumption that users are engaging with the API for learning, research, or prototyping purposes. Performance, availability, and strict correctness guarantees are explicitly out of scope.
Non-goals
This paper does not aim to design a full GTFS API, a real-time transport service, or a comprehensive GIS platform. Route planning, timetables, live vehicle tracking, and high-precision geospatial calculations are intentionally excluded. The system is not evaluated against production SLAs or scalability benchmarks.
Status of the investigation
The implementation is considered experimental and exploratory. Design choices are subject to change, and the findings presented here reflect the current state of the system rather than a finalized architecture.



