In the race to make AI models not just reason better but respond faster, latency—the delay before an answer appears—is often treated as a purely technical constraint, something to minimize and move past. But how is this relentless push for speed actually impacting the people using these systems every day?
There is a rich body of work in human–computer interaction linking faster response times to better usability. But AI models are fundamentally different from the deterministic systems that previous research was built on. When you wait for a file to download or a page to load, the outcome is fixed and predictable.
AI models are probabilistic—you cannot anticipate the precise response. Their conversational interface means users naturally read human social cues into the interaction. A pause might be read as the AI “thinking,” for instance. Users are increasingly asked to choose between faster models and slower, deeper-reasoning ones, without guidance on what that choice actually means for their experience.
