delivery

Oregon High School Tennis Rankings System

A data-driven ranking and playoff simulation system for Oregon high school tennis, designed to solve real problems in how teams are evaluated and seeded for state competition.

Role: Product Design, Data Engineering, Algorithm Design · Year: 2024 · Skills: Data Pipeline Design, Systems Thinking, Sports Analytics, Static Site Generation · Status: Live

The Problem Space

Oregon high school tennis faces a challenge common to many sports: how do you fairly rank teams that play in different leagues, different regions, and face vastly different competition?

The existing system relies heavily on win-loss records and subjective committee decisions. This creates several issues:

Schedule strength is invisible. A team going 10-2 against strong Portland-area competition looks worse on paper than a team going 14-0 in a weak rural league.
Roster depth doesn’t factor in. Tennis is unique—a team fields 7-8 positions (4 singles, 4 doubles). A team winning 6-1 every match demonstrates more depth than one scraping by 4-3, but both count as wins.
Playoff seeding is contentious. Without objective metrics, bracket placement involves guesswork and politics.
Travel logistics are ignored. A 16-team bracket using pure seeding might send a Pendleton team to Brookings (400+ miles) in round one when reasonable alternatives exist.

System Design Philosophy

1. Transparent, Defensible Methodology

Every ranking decision can be traced back to match data. The system uses established statistical approaches (RPI-style strength of schedule) combined with sport-specific innovations (flight-weighted scoring). When stakeholders ask “why is Team A ranked above Team B?”, there’s always a concrete answer.

2. Serve the Actual Users

The primary users are coaches and athletic directors—people who understand tennis but aren’t data scientists. Design decisions consistently prioritize usability:

Display raw scores (0-3.95 FWS) that map to intuitive concepts (“we averaged 2.8 flight wins per match”)
Use normalized values internally for calculations, but don’t expose that complexity
Provide “what-if” tools (playoff simulator) so users can explore scenarios themselves
Include head-to-head context directly in the interface

3. Handle Real-World Messiness

High school sports data is incomplete and inconsistent:

Matches get rained out or forfeited
Some teams play 8 flights, others play 6
Dual meets sometimes end in ties
Schools change classifications year-to-year
City names in the source data have typos and variations

The system handles all of this gracefully rather than failing or producing misleading results.

Technical Architecture

Data Pipeline

OSAA API → Raw JSON → Normalization → Statistical Processing → Static Site

Ingestion: Fetches match-level data from the OSAA athletics API, handling pagination and rate limiting.

Normalization: Reconciles school identities across years, maps classifications and leagues, handles missing fields.

Processing: Calculates cascading statistics where each metric depends on others:

Win Percentage → Opponent Win Percentage → Opponent’s Opponent Win Percentage → APR
Flight results → Weighted FWS → Normalized FWS → Power Index

Output: Generates a static HTML dashboard with embedded JSON data. No server required—hosts anywhere.

Key Design Principle: The system balances computational complexity with accessibility: sophisticated algorithms running behind simple, usable interfaces.

The Ranking Algorithm

Power Index = (APR × 0.50) + (Normalized FWS × 0.50)

This balances two questions:

APR (Adjusted Power Rating): “How good are you at winning, accounting for who you played?”
FWS (Flight-Weighted Score): “How deep is your roster?”

APR uses the RPI formula common in college sports:

25% your win percentage
50% your opponents’ win percentages
25% your opponents’ opponents’ win percentages

FWS weights each flight position:

#1 Singles/Doubles: 1.0 points (your best players)
#2 Singles: 0.75, #2 Doubles: 0.50
#3 positions: 0.25 each
#4 positions: 0.10 each

A team winning all 8 flights earns 3.95 points. But critically, the denominator adjusts for matches with fewer flights—a 6-flight match earning 3.0/3.0 rates the same as an 8-flight match earning 3.95/3.95.

Head-to-Head Tiebreaker Logic

When teams are close in Power Index, direct competition matters. The system implements nuanced H2H logic:

Trigger conditions: Teams within 2% PI difference, OR teams in the same league within 2 standings positions
Resolution: H2H winner moves up, but only if it doesn’t create circular conflicts (A beats B, B beats C, C beats A)
Split series: If teams split their matches, the tiebreaker defers to other metrics
Match ties: Uses FWS differential from the head-to-head match itself

Geographic Optimization

The playoff bracket simulator includes a “Regional Mode” that optimizes travel while respecting competitive integrity:

Protected Tier (Seeds 1-4): Pure seeding preserved—top teams earned their placement
Flex Tier (Seeds 5-12): Matchups optimized for geographic proximity, avoiding same-district pairings
Anchor Tier (Seeds 13-16): Placed by proximity to their likely round-two opponents

Distance calculations use real coordinates via OpenStreetMap geocoding, not estimates.

Key Engineering Decisions

Why Static Site Generation?

Zero infrastructure cost. Hosts on GitHub Pages, Netlify, or any static host.
No security surface. No database, no auth, no API to protect.
Offline-capable. Coaches can save the page and use it without connectivity.
Fast. All interactivity happens client-side with pre-processed data.

Why Embedded JSON Instead of API Calls?

The dataset is small enough (1,200 team-seasons) that embedding it directly in the page makes sense. This eliminates latency, simplifies deployment, and makes the system self-contained.

Why Cache Geocoding Results?

OpenStreetMap’s Nominatim API is free but rate-limited. Caching results means:

First build geocodes unknown cities (slowly, respectfully)
Subsequent builds use cached coordinates (instantly)
The cache ships with the repo, so fresh clones don’t need API access

Why Not Use an Existing Sports Analytics Platform?

High school sports have unique constraints:

No budget for commercial tools
Data access is limited and non-standard
Stakeholders need education, not just dashboards
The ranking methodology itself is part of the value

Impact and Stakeholder Value

For Coaches

Objective measure of team strength beyond win-loss
Understand where they stand in the state, not just their league
Simulate playoff scenarios before the bracket is set
Identify strength-of-schedule opportunities

For Athletic Directors

Data to support playoff seeding decisions
Reduce controversy around bracket placement
Historical tracking across seasons

For OSAA (Potential)

Model for modernizing playoff selection
Travel optimization could save thousands of miles across classifications
Transparent methodology reduces appeals and complaints

What This Project Demonstrates

Domain Modeling: Translating messy real-world rules (tennis scoring, OSAA classifications, tiebreaker hierarchies) into clean data structures.
Algorithm Design: Building ranking systems that balance multiple factors fairly, handle edge cases, and produce defensible results.
Systems Thinking: Understanding how components interact—data quality affects rankings, rankings affect tiebreakers, tiebreakers affect user trust.
User Empathy: Building for non-technical users without dumbing down the capability.
Pragmatic Engineering: Choosing boring technology (static HTML, embedded JSON) when it’s the right fit, rather than over-engineering.
Data Pipeline Design: Handling the full lifecycle from external API to processed output, with appropriate caching and error handling.

The Problem Space

System Design Philosophy

1. Transparent, Defensible Methodology

2. Serve the Actual Users

3. Handle Real-World Messiness

Technical Architecture

Data Pipeline

The Ranking Algorithm

Head-to-Head Tiebreaker Logic

Geographic Optimization

Key Engineering Decisions

Why Static Site Generation?

Why Embedded JSON Instead of API Calls?

Why Cache Geocoding Results?

Why Not Use an Existing Sports Analytics Platform?

Impact and Stakeholder Value

For Coaches

For Athletic Directors

For OSAA (Potential)

What This Project Demonstrates

Related Links