Why 98.5% of Drive-Throughs Still Don't Use Voice AI

Justin Foster is a voice AI entrepreneur and industry expert who founded Incept, a restaurant technology startup specializing in advanced audio processing for voice AI systems. With deep experience in the restaurant voice AI space, Foster previously worked at Presto, another major player in the industry. He brings a unique technical perspective to the challenges of implementing AI-powered ordering systems, particularly focusing on the critical but often overlooked audio processing layer that enables accurate speech recognition in noisy drive-through environments. Foster hosts his own podcast and is passionate about improving both employee and customer experiences through thoughtful technology implementation.

Voice AI Technology Evolution

First Generation: Used intent modeling with tree-and-branch approaches (rules-based systems like McDonald’s Apprenti/IBM implementation)
Current Generation: Leverages large language models for better intent understanding, making the technology more accessible to startups

Four Core Components of Voice AI Systems

Raw audio capture – Converting sounds in noisy drive-through environments
Speech-to-text transcription – Processing audio into readable text
Intent processing – Understanding what customers actually want to order
Text-to-speech response – Communicating back to customers naturally

Technical Challenges & Solutions

Audio processing: The biggest differentiator, requiring specialized noise cancellation for drive-through environments (wind, rain, multiple voices, engine noise)
Latency concerns: Systems must respond naturally without artificial delays
Accuracy standards: Industry-wide achieving 80-85% order completion without human intervention
Guardrails: Preventing AI hallucinations that could confirm unavailable menu items

Current State & Economics

Only ~1.5% of North American drive-throughs currently use voice AI
Technology costs more than most restaurant tech stack components due to foundation model expenses
ROI depends on restaurant volume (6,000+ monthly transactions recommended for labor savings)
Additional benefits include increased upselling, consistent service, and improved employee satisfaction

Future Outlook

Movement toward small language models and edge computing (3-5 years away)
Enhanced personalization through vehicle recognition, voice prints, or loyalty integration
Expansion beyond order-taking to employee training, reservations, and comprehensive guest experience management
Multilingual capabilities already available for top 30-40 global languages

The discussion emphasized that voice AI implementation is a journey requiring training and optimization rather than a plug-and-play solution, with audio quality being the foundation for system accuracy.

Subscribe to the Restaurant AI Podcast on YouTube or Spotify

Follow the Restaurant AI Podcast on LinkedIn

Connect with Matt Wampler and ClearCOGS on LinkedIn

Connect with Justin Foster on LinkedIn

Why 98.5% of Drive-Throughs Still Don’t Use Voice AI (And What’s About to Change)

Voice AI Technology Evolution

Four Core Components of Voice AI Systems

Technical Challenges & Solutions

Current State & Economics

Future Outlook

See also

How Restaurants Are Using AI to Outsell the Competition

The Technology Your Restaurant Cameras Are Missing

How One Denver Restaurant Operator Replaced Traditional POS with AI-Powered Solutions