Dr. discusses one of the most provocative frontiers in technology: the automation of moral judgement — in his talk focusses on outcomes of a comparative moral Turing test (AI outperforms humans across a range of metrics), as well as AI assisted medical triage!
Link in reply🔗
Eyal Aharoni
Dr. Eyal Aharoni (Georgia State University) to the Future Day 2026 stage to discuss one of the most provocative frontiers in technology: the automation of moral judgement.
Breaking the Moral Turing Test: Studies of human attribution and deference to AI moral judgment and decision-making.
Synopsis: As AI systems become integrated into high-stakes environments, we face a critical question: How much do we trust an algorithm’s “conscience”? Dr. Aharoni will present findings from the first Modified Moral Turing Test (mMTT), revealing how laypeople perceive AI-generated moral commentary.
The talk moves from the laboratory to the hospital, discussing new research on AI-assisted medical triage. While high-performance AI can save lives, it also risks eroding human performance through over-trust. Dr. Aharoni will outline a novel framework for predicting and mitigating these errors, ensuring that when humans and AI collaborate in emergencies, the results are better – not worse – than either could achieve alone.
0:00 Intro.
0:40 Talk — Breaking the Moral Turing Test.
1:36 The LLM moral reasoning challenge — LLMs outperform humans in moral reasoning — Moral reasoning is regarded as one of the most sophisticated and unique of human faculties. Given the results of comparative moral turing tests and other studies, LLMs seem to outperform and enhance human reasoning in various ways. Do LLMs challenge the tenet?
2:10 LLM Problems: transparency, wide bugs, lack emotional checks & balances.
3:28 Metaphysical challenge: Do LLMs have moral intelligence?
3:50 Psychological challenge: Do humans perceive AIs as having moral intelligence?
4:03 How do ordinary people perceive LLM moral commentary? — The (Explicit) Moral Turing Test MTT & its problems.
5:14 Problem of demand characteristics.
5:52 The Comparative Moral Turing Test (cMTT)
6:47 Stimulus Development.
7:57 Funeral example as a conventional wrong.
9:07 cMTT result analysis.
10:03 Predictions (pre-registered)
11:41 cMTT Results — participants rated the LLM generated responses as significantly better in quality than he human-generated ones.
12:46 cMTT Descriptives — quality preference by criterion.
13:21 cMTT results — participants guessed which responses were generated by an LLM 80% of the time.
15:15 ‘Breaking’ the MTT — AI exceeded expectations.
16:03 False Conclusions: not claiming that LLMs are morally intelligent, or humans think LLM evals are generally superior etc.
17:21 Study Conclusions: AI moral responses superior & perceived moral expertise may have given identity away.
17:46 Trust of AI may lead to uncritical acceptance of questionable advice.
18:37 Pope Leo’s warning against LLM generation of sermons.
19:20 AI in high stakes decision contexts — Medical Triage.
21:03 Can AI’s outperform medical professionals — initial data (HealthBench — Evaluating LLMs Towards Improved Human Health)
21:51 AI decision support — the human factor — The Performance Paradox.
24:01 When AI seems reliable, humans tend to increase risk exposure.
25:10 Ensuring human-AI collaborations are constructive.
27:50 Real time AI assistance in learning, then withdrawal of AI assistance (study 1)
31:30 Test immediate vs delayed assistance (before vs after human judgement) (study 2)
33:34 Human-Centered Design — Designs: fault injection, prompting users for justifications for their judgements (before receiving AI judgement) etc.
37:30 Ethical Questions in AI-Assisted Triage.
41:20 End of presentation. Q&A starts.
41:48 Enfeeblement vs empowerment — when and when not to outsource to AI
45:24 Why do we have moral reasoning? (evolutionary reasoning etc) Rivalrist or pluralist theories? — Better angels of our nature evolved because they were useful (reciprocity, cooperation). LLMs give us moral cues that evolution has taught us to associate with the presence of a moral actor or patient. (so how can we separate the detection of actual moral behaviour from these psychological cues?)
50:50 Blade Runner synth detection tech — our Turing tests will have to become more sophisticated.
56:04 Consciousness in non-human animals.
57:39 Main vulnerabilities of over-trust in AI — realtime/high stakes decisions.
1:06:06 AI as a tutor, instead of an oracle.
1:09:28 Can AI be more moral than us? Can AI help us be more moral?
1:16:51 How is change in AI effecting how we think about AI and the advice they give etc?
1:18:53 AI capability is jagged and brittle — in high stakes AI hallucinations could cause fatalities.
1:19:56 Hard to know how much time we have before transformative AI to work on guardrails.
#AI #Ethics #Philosophy.
Many thanks for tuning in!
Please support SciFuture by subscribing and sharing!
Buy me a coffee? https://buymeacoffee.com/tech101z.