Building Real-time ETA Prediction at Scale

2024

This post covers how we built a real-time ETA prediction system that handles 50,000 requests per day with p99 latency under 100ms.

The Problem

Our ride-hailing platform needed accurate ETA predictions for:

  • Driver-passenger matching
  • Estimated pickup times
  • Dynamic pricing calculations

Initial Approach:

eta = distance / average_speed

But this gave us ~5 minute errors on average.

The Solution

We built a machine learning pipeline using:

  1. XGBoost for primary predictions
  2. CatBoost for edge cases
  3. Feature engineering including:
    • Historical travel times
    • Time of day patterns
    • Weather data
    • Traffic incidents
    • Driver availability zones

Results

MetricValue
Average error49 seconds
p99 latency<100ms
Daily requests50,000
Accuracy improvement65%

Lessons Learned

  1. Feature engineering > model complexity
  2. Cache aggressively with expiration
  3. Fallback to simple heuristics for edge cases
  4. Monitor distribution shifts in production

Future Work

  • Add real-time traffic incorporation
  • Experiment with deep learning for pattern recognition
  • A/B test different model versions