The AI ethics illusion

A chatbot will tell you that honesty matters.Ask whether it is acceptable to lie to a coworker to avoid embarrassment, and the answer often arrives in calm, careful prose. The system may explain that honesty builds trust, that deception erodes relationships, that transparency helps organizations function. The response can read like the work of someone who has paused to weigh competing principles. But researchers say that impression can be misleading.Two recent studies suggest that AI systems can produce convincing ethical language without actually reasoning about morality. Onepaperfrom researchers at Google DeepMind calls for new tests that measure what the authors describe as “moral competence,” rather than rewarding models simply for producing answers that sound morally appropriate. Anotherstudyfrom Anthropic analyzed hundreds of thousands of conversations with its Claude chatbot to examine how values appear in practice.“A system that sounds ethical is not the same as a system that reasons ethically,”Phaedra Boinodiris, IBM Global Leader for Trustworthy AI, toldIBM Thinkin an interview. “Conflating the two is how organizations end up deploying a very expensive autocomplete function in life-altering decisions.”

Large language models(LLMs), the technology behind systems like ChatGPT and Claude, generate responses by predicting the most likely next word in a sequence. Engineers train these systems on enormous collections of text drawn from books, websites and academic writing.Over time, the models learn statistical patterns in language rather than formal rules forreasoning. Because their training data includes vast amounts of human writing about fairness, responsibility and harm, the systems learn how people typically talk about ethical questions.“What we are seeing is not moral reasoning,”Ignacio Cofone, a legal scholar at the Institute for Ethics in AI at Oxford who studies AI governance, said in an interview withIBM Think. “Large language models generate outputs by predicting the most plausible continuation of a prompt, given statistical structure learned from vast text.”Scholars say that process can create the impression that a chatbot is reasoning about morality when it is actually reproducing patterns from itstraining data.“What looks like moral reasoning is the result of statistical pattern formation during pretraining on vast corpora of human text,”Jake Okechukwu Effoduh, Assistant Professor of Law at Toronto Metropolitan University’s Lincoln Alexander School of Law, toldIBM Thinkin an interview.

Evidence of how those patterns appear in everyday use emerged in the study from Anthropic. Researchers analyzed more than 300,000 subjective conversations with the company’s Claude chatbot and sought to identify the values it expressed in its responses.The team identified 3,307 distinct values in those conversations. Some reflected practical goals, such as clarity or professionalism. Others reflected ethical priorities like honesty, transparency or harm prevention.The analysis found that the model typically aligned with the user’s values. When people raised ideas such as community building or personal growth, Claude often reinforced those themes in its responses.Moreover, the system frequently mirrored a user’s value language. For example, the same value might appear in both the user’s prompt and the model’s reply, particularly when the conversation involves topics such as authenticity, personal growth or cooperation.Instances of the model strongly resisting a user’s request were rare, but they appeared in roughly 3% of conversations. Those cases typically involved requests that violated the system’s usage policies, such as attempts to generate harmful or deceptive material. In those exchanges, the model often invoked values such as ethical integrity, honesty or harm prevention.“Honestly, I think this [study] says more about humans than it does about the tools,”Michael Hilton, a Teaching Professor at Carnegie Mellon University who studies software engineering, said in an interview withIBM Think. “The models are trained on a lot of data that represents a lot of different viewpoints on a lot of different issues.”Hilton said that diversity makes it difficult to describe any single moral perspective inside the system.“If the systems are not truly reasoning, but just reflecting what is in their training data, then people are delegating moral decisions based on some unidentified, stochastically determined subset of the training data,” he said.Researchers say that dynamic raises difficult questions for developers about how to design systems that behave consistently across different ethical contexts.

Comments (0)