Building Real-time ETA Prediction at Scale
2024
This post covers how we built a real-time ETA prediction system that handles 50,000 requests per day with p99 latency under 100ms.
The Problem
Our ride-hailing platform needed accurate ETA predictions for:
- Driver-passenger matching
- Estimated pickup times
- Dynamic pricing calculations
Initial Approach:
eta = distance / average_speed
But this gave us ~5 minute errors on average.
The Solution
We built a machine learning pipeline using:
- XGBoost for primary predictions
- CatBoost for edge cases
- Feature engineering including:
- Historical travel times
- Time of day patterns
- Weather data
- Traffic incidents
- Driver availability zones
Results
| Metric | Value |
|---|---|
| Average error | 49 seconds |
| p99 latency | <100ms |
| Daily requests | 50,000 |
| Accuracy improvement | 65% |
Lessons Learned
- Feature engineering > model complexity
- Cache aggressively with expiration
- Fallback to simple heuristics for edge cases
- Monitor distribution shifts in production
Future Work
- Add real-time traffic incorporation
- Experiment with deep learning for pattern recognition
- A/B test different model versions