Ouri Wolfson — How to Determine if an AI Agent is Conscious?

A recent question discussed extensively in the popular and scientific literature is whether or not existing large language models such as ChatGPT are conscious (or sentient). Assuming that machine consciousness emerges as a robot or an AI agent interacts with the world, this presentation addresses the question: how would humans know whether or not the agent is or was conscious. Since subjective experience is first and foremost subjective, the most natural answer to this question is to program the agent to inform an authority when it becomes conscious. However, the agent may behave deceptively, and in fact LLM’s are known to have done so (Park et. al. 2024). Thus we propose a formal mechanism M that can be employed to prevent the agent from lying about its own consciousness. This solves the deceptiveness problem, but this raises the question whether M can interfere with the agent’s functionality or acquisition of consciousness. We prove mathematically that under very reasonable conditions this is not the case. In other words, under these conditions M can be installed in the agent without interfering with the agent’s functionality and consciousness acquisition, while also guaranteeing that the agent will be honest about its own consciousness.

Blog