Toggle light / dark theme

Kallaway on Instagram: OpenAI just launched something massive

12K likes, — kanekallaway on September 12, 2024: “OpenAI just launched something massive. The first model of its a kind…” o1” designed for deep reasoning. General AI reasoning has always been the white whale of the space. Whoever figured out how to build advanced models that could reason through multi-step problems on their own, would lay the rails for the path to AGI. It’s still way too early to say if this model will do it, but based on the demos and early feedback, there is something super advanced here. o1 is different from all previous versions of GPT because it thinks before it answers, like a human would. Then, the model lays out its complex logic path to get to an answer.

AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments

We present the first open-source benchmark to evaluate LLMs in their ability to operate as agents in simulated clinical environments. Diagnosing and managing a patient is a complex, sequential decision making process that requires physicians to obtain information—such as which tests to perform—and to act upon it. Recent advances in artificial intelligence (AI) and large language models (LLMs) promise to profoundly impact clinical care. However, current evaluation schemes overrely on static medical question-answering benchmarks, falling short on interactive decision-making that is required in real-life clinical work. Here, we present AgentClinic: a multimodal benchmark to evaluate LLMs in their ability to operate as agents in simulated clinical environments. In our benchmark, the doctor agent must uncover the patient’s diagnosis through dialogue and active data collection. We present two open benchmarks: a multimodal image and dialogue environment, AgentClinic-NEJM, and a dialogue-only environment, AgentClinic-MedQA. Agents in AgentClinic-MedQA are grounded in cases from the US Medical Licensing Exam~(USMLE) and AgentClinic-NEJM are grounded in multimodal New England Journal of Medicine (NEJM) case challenges. We embed cognitive and implicit biases both in patient and doctor agents to emulate realistic interactions between biased agents. We find that introducing bias leads to large reductions in diagnostic accuracy of the doctor agents, as well as reduced compliance, confidence, and follow-up consultation willingness in patient agents. Evaluating a suite of state-of-the-art LLMs, we find that several models that excel in benchmarks like MedQA are performing poorly in AgentClinic-MedQA. We find that the LLM used in the patient agent is an important factor for performance in the AgentClinic benchmark. We show that both having limited interactions as well as too many interaction reduces diagnostic accuracy in doctor agents.

Learning to Reason with LLMs

Some big claims here: https://openai.com/index/learning-to-reason-with-llms/

OpenAI o1 ranks in the 89th percentile on competitive programming questions (Codeforces), places among the top 500 students in the US in a qualifier for the USA Math Olympiad (AIME), and exceeds human PhD-level accuracy on…


We are introducing OpenAI o1, a new large language model trained with reinforcement learning to perform complex reasoning. o1 thinks before it answers—it can produce a long internal chain of thought before responding to the user.

Harnessing Automated Insulin Delivery: Case Reports from Marathon Runners with Type 1 Diabetes

How can machine learning help individuals with type 1 diabetes (T1D)? This is what a study presented at this year’s Annual Meeting of the European Association for the Study of Diabetes (EASD) hopes to address as a team of researchers have developed a system using machine learning capable of managing blood sugars levels with such proficiency that those using system were able to lead lives far more active than the average T1D patient.

For the study, the researchers developed the AID system, which uses closed-loop technology that delivers insulin based on readings from the machine learning algorithm, resulting in a 50-year-old man, a 40-year-old man, and a 34-year-old woman with T1D being able to run hours-long marathons in Tokyo, Santiago, and Paris, respectively. This study holds the potential to help develop better technology capable of allowing T1D diabetes patients to stay in shape without constantly fearing for their blood sugar levels, which can lead to long-term health problems, including hyperglycemia, nerve damage, or a heart attack.

“Despite better systems for monitoring blood sugars and delivering insulin, maintaining glucose levels in target range during aerobic training and athletic competition is especially difficult,” said Dr. Maria Onetto, who is in the Department of Nutrition at the Pontifical Catholic University of Chile and lead author of the study. “The use of automated insulin delivery technology is increasing, but exercise continues to be a challenge for individuals with T1D, who can still struggle to reach the recommended blood sugar targets.”

AIs generate more novel and exciting research ideas than human experts

The first statistically significant results are in: not only can Large Language Model (LLM) AIs generate new expert-level scientific research ideas, but their ideas are more original and exciting than the best of ours – as judged by human experts.

Recent breakthroughs in large language models (LLMs) have excited researchers about the potential to revolutionize scientific discovery, with models like ChatGPT and Anthropic’s Claude showing an ability to autonomously generate and validate new research ideas.

This, of course, was one of the many things most people assumed AIs could never take over from humans; the ability to generate new knowledge and make new scientific discoveries, as opposed to stitching together existing knowledge from their training data.

Combining existing sensors with machine learning algorithms improves robots’ intrinsic sense of touch

A team of roboticists at the German Aerospace Center’s Institute of Robotics and Mechatronics finds that combining traditional internal force-torque sensors with machine-learning algorithms can give robots a new way to sense touch.

In their study published in the journal Science Robotics, the group took an entirely new approach to give robots a that does not involve artificial skin.

For living creatures, touch is a two-way street; when you touch something, you feel its texture, temperature and other features. But you can also be touched, as when someone or something else comes in contact with a part of your body. In this new study, the research team found a way to emulate the latter type of touch in a robot by combining internal force-torque sensors with a machine-learning algorithm.

OpenAI releases reasoning AI with eye on safety, accuracy

ChatGPT creator OpenAI on Thursday released a new series of artificial intelligence models designed to spend more time thinking—in hopes that generative AI chatbots provide more accurate and beneficial responses.

The new models, known as OpenAI o1-Preview, are designed to tackle and solve more challenging problems in science, coding and mathematics, something that earlier models have been criticized for failing to provide consistently.

Unlike their predecessors, these models have been trained to refine their thinking processes, try different methods and recognize mistakes, before they deploy a final answer.

/* */