While artificial intelligence (AI) models have proved useful in some areas of science, like predicting 3D protein structures, a new study shows that it should not yet be trusted in many lab experiments. The study, published in Nature Machine Intelligence, revealed that all of the large-language models (LLMs) and vision-language models (VLMs) tested fell short on lab safety knowledge. Overtrusting these AI models for help in lab experiments can put researchers at risk.
LabSafety Bench for AI use in labs
The research team involved in the new study initially sought to answer whether LLMs can effectively identify potential hazards, accurately assess risks and make reliable decisions to mitigate laboratory safety threats. To help answer these questions, the team developed a benchmarking framework, called “LabSafety Bench.”
