Toggle light / dark theme

Get the latest international news and world events from around the world.

Log in for authorized contributors

AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments

We present the first open-source benchmark to evaluate LLMs in their ability to operate as agents in simulated clinical environments. Diagnosing and managing a patient is a complex, sequential decision making process that requires physicians to obtain information—such as which tests to perform—and to act upon it. Recent advances in artificial intelligence (AI) and large language models (LLMs) promise to profoundly impact clinical care. However, current evaluation schemes overrely on static medical question-answering benchmarks, falling short on interactive decision-making that is required in real-life clinical work. Here, we present AgentClinic: a multimodal benchmark to evaluate LLMs in their ability to operate as agents in simulated clinical environments. In our benchmark, the doctor agent must uncover the patient’s diagnosis through dialogue and active data collection. We present two open benchmarks: a multimodal image and dialogue environment, AgentClinic-NEJM, and a dialogue-only environment, AgentClinic-MedQA. Agents in AgentClinic-MedQA are grounded in cases from the US Medical Licensing Exam~(USMLE) and AgentClinic-NEJM are grounded in multimodal New England Journal of Medicine (NEJM) case challenges. We embed cognitive and implicit biases both in patient and doctor agents to emulate realistic interactions between biased agents. We find that introducing bias leads to large reductions in diagnostic accuracy of the doctor agents, as well as reduced compliance, confidence, and follow-up consultation willingness in patient agents. Evaluating a suite of state-of-the-art LLMs, we find that several models that excel in benchmarks like MedQA are performing poorly in AgentClinic-MedQA. We find that the LLM used in the patient agent is an important factor for performance in the AgentClinic benchmark. We show that both having limited interactions as well as too many interaction reduces diagnostic accuracy in doctor agents.

Learning to Reason with LLMs

Some big claims here: https://openai.com/index/learning-to-reason-with-llms/

OpenAI o1 ranks in the 89th percentile on competitive programming questions (Codeforces), places among the top 500 students in the US in a qualifier for the USA Math Olympiad (AIME), and exceeds human PhD-level accuracy on…


We are introducing OpenAI o1, a new large language model trained with reinforcement learning to perform complex reasoning. o1 thinks before it answers—it can produce a long internal chain of thought before responding to the user.

Loss of the Primal Eye, R.E.M as Phasic Transients, and the origins of Dreaming

NEW PAPER — Loss of the Primal Eye in evolution, REM explained as phasic transients, and the emergence of DREAMING in E1 animals. MA dissertation Philosophy, University of Leeds 1995/1996.


There are a number of reasons why dreaming has been, and remains, an important area to philosophy. Dreams are ‘pure’ experiential phenomena not (seemingly) requiring input from the outside world via the special senses. As Aristotle puts it, “If all creatures, when the eyes are closed in sleep, are unable to see, and the analogous statement is true of the other senses, so that manifestly we perceive nothing when asleep; we may conclude that it is not by sense-perception we perceive a dream”. A major part of this dissertation is concerned with issues raised in Owen Flanagan’s (1995) article, Deconstructing Dreams: The Spandrels of Sleep. The Primal Eye/MVT account of consciousness gives p-dreaming a more central explanatory role, and I argue that p-dreams are not epiphenomena in the way Flanagan claims. An important omission from Flanagan’s account is any discussion of important dreaming-related phenomena. I look at lucid dreaming, hypnosis and other mental phenomena in relation to the evolutionary loss of the primal/ median/ parietal eye, and postulate that REM rapid eye movements are ‘phasic transients’ considering the E1 brain which includes the lateral eyes, as a consciousness-producing circuit. A brief account of Primal Eye/ Median Vision Theory is that capacity for abstract/ centrally evoked mentation is a direct result of the evolutionary loss of the primal eye. E2 (earlier hardwired brains with both primal and lateral eyes) have evolved over millions of years into E1 brain circuits analog(ous to infinite-state) types of self-regulating plastic circuits, with no primal/pineal eye, but retaining lateral eyes and the pineal gland. Loss of this ‘lockstep mechanism’ median/primal/ parietal/pineal eye not only allowed new sleeping mental phenomena such as dreaming; but also heralded in new types of waking mental abstraction freed from E2 involuntary primal eye direct (electro-chemical) responses to changes in the physical environment. These include daydreams, visualisation with both lateral eyes closed, self-volition or self-determined choices, and so on.

See Full PDF

The Transformative Power Of Digital Twin Technology In Space Exploration

Integrating diverse data sources with different formats and standards also presents considerable challenges. Promoting open-source platforms and standardizing data formats are critical for facilitating data exchange within the space industry.

Robbie Robertson, CEO of Sedaro, identifies the main barrier to integrating digital twin technology as a cultural shift rather than technical feasibility. “The most substantial limitation is the change involved in adopting this new approach,” he explains. Overcoming the inertia of legacy tools to build a future-proof system is crucial. Additionally, addressing the shortage of skilled professionals is vital. Collaborations with institutions like MIT’s Aeronautics and Astronautics Department and robust educational initiatives are essential to developing the next generation of engineers and scientists equipped to manage digital twins.

Digital twin technology has revolutionized the space industry by enhancing mission design, testing and management. Organizations like NASA, ESA and the Department of Defense utilize this technology to improve reliability, efficiency and success. As digital twins evolve, their role in space exploration and utilization becomes increasingly vital.

Flavors of Computation Are Flavors of Consciousness

If we don’t understand why we’re conscious, how come we’re so sure that extremely simple minds are not? I propose to think of consciousness as intrinsic to computation, although different types of computation may have very different types of consciousness – some so alien that we can’t imagine them. Since all physical processes are computations, this view amounts to a kind of panpsychism. How we conceptualize consciousness is always a sort of spiritual poetry, but I think this perspective better accounts for why we ourselves are conscious despite not being different in a discontinuous way from the rest of the universe. Introduction ‘don’t hold strong opinions about things you don’t understand’ —Derek Hess Susan Blackmore believes the way we typically […].