Why 98.5% of Drive-Throughs Still Don’t Use Voice AI (And What’s About to Change)

Aug 15
decor image

Justin Foster is a voice AI entrepreneur and industry expert who founded Incept, a restaurant technology startup specializing in advanced audio processing for voice AI systems. With deep experience in the restaurant voice AI space, Foster previously worked at Presto, another major player in the industry. He brings a unique technical perspective to the challenges of implementing AI-powered ordering systems, particularly focusing on the critical but often overlooked audio processing layer that enables accurate speech recognition in noisy drive-through environments. Foster hosts his own podcast and is passionate about improving both employee and customer experiences through thoughtful technology implementation.

Voice AI Technology Evolution

  • First Generation: Used intent modeling with tree-and-branch approaches (rules-based systems like McDonald’s Apprenti/IBM implementation)
  • Current Generation: Leverages large language models for better intent understanding, making the technology more accessible to startups

Four Core Components of Voice AI Systems

  1. Raw audio capture – Converting sounds in noisy drive-through environments
  2. Speech-to-text transcription – Processing audio into readable text
  3. Intent processing – Understanding what customers actually want to order
  4. Text-to-speech response – Communicating back to customers naturally

Technical Challenges & Solutions

  • Audio processing: The biggest differentiator, requiring specialized noise cancellation for drive-through environments (wind, rain, multiple voices, engine noise)
  • Latency concerns: Systems must respond naturally without artificial delays
  • Accuracy standards: Industry-wide achieving 80-85% order completion without human intervention
  • Guardrails: Preventing AI hallucinations that could confirm unavailable menu items

Current State & Economics

  • Only ~1.5% of North American drive-throughs currently use voice AI
  • Technology costs more than most restaurant tech stack components due to foundation model expenses
  • ROI depends on restaurant volume (6,000+ monthly transactions recommended for labor savings)
  • Additional benefits include increased upselling, consistent service, and improved employee satisfaction

Future Outlook

  • Movement toward small language models and edge computing (3-5 years away)
  • Enhanced personalization through vehicle recognition, voice prints, or loyalty integration
  • Expansion beyond order-taking to employee training, reservations, and comprehensive guest experience management
  • Multilingual capabilities already available for top 30-40 global languages

The discussion emphasized that voice AI implementation is a journey requiring training and optimization rather than a plug-and-play solution, with audio quality being the foundation for system accuracy.


Subscribe to the Restaurant AI Podcast on YouTube or Spotify

Follow the Restaurant AI Podcast on LinkedIn

Connect with Matt Wampler and ClearCOGS on LinkedIn

Connect with Justin Foster on LinkedIn