AI Framework Predicts Alloy Behavior Even for Elements It's Never Seen

Researchers combine LLM-extracted knowledge with experimental data using Dempster-Shafer theory, achieving 86-92% accuracy on unstudied compositions.

Close-up of electronic circuit board with metallic components

Machine learning models for materials science share a common weakness: they struggle with compositions outside their training data. A team spanning Japan, Vietnam, and the US has developed an AI framework that overcomes this by combining experimental data with expert knowledge extracted from scientific literature, using a mathematical approach that explicitly handles uncertainty.

The research, published in Digital Discovery, achieved 86-92% accuracy predicting the behavior of high-entropy alloys containing elements that were completely absent from the training data.

The Problem with Interpolation

Conventional machine learning excels at interpolating—finding answers between known data points. But materials discovery often requires extrapolation—predicting what happens in unexplored territory where no training data exists.

High-entropy alloys (HEAs) illustrate this challenge. These materials combine five or more elements in roughly equal proportions, creating vast compositional spaces. Testing every combination experimentally is prohibitively expensive. But standard ML models trained on available data cannot reliably predict what happens when you introduce a new element.

What They Built

The team, led by Hieu-Chi Dam at Japan Advanced Institute of Science and Technology with collaborators at Duke University and other institutions, created a framework that fuses multiple knowledge sources:

  • Experimental measurements from published alloy studies
  • Computational modeling results
  • Expert knowledge extracted from scientific literature using large language models

The key innovation is using Dempster-Shafer theory to combine these sources. Unlike standard probability methods, Dempster-Shafer can explicitly represent uncertainty and even ignorance—allowing “we cannot tell” as a legitimate scientific outcome. The system doesn’t pretend to know more than it does when exploring unknown territory.

The Elemental Substitution Principle

Central to the framework is identifying chemically similar elements that can be interchanged while preserving desired properties. For example, hafnium and zirconium behave similarly in many contexts. If you have data on hafnium-containing alloys but not zirconium ones, this similarity relationship lets the model make informed predictions.

The LLM extracts these similarity relationships from scientific literature—not just explicit statements but implicit knowledge buried across thousands of papers that would be impractical for humans to synthesize manually.

What This Means

The 86-92% accuracy on out-of-distribution predictions represents a substantial improvement over conventional ML methods that often fail entirely when encountering unseen elements.

For materials scientists, this approach could accelerate discovery in areas where experimental data is sparse. High-entropy alloys are relevant for aerospace, nuclear, and other applications requiring materials that withstand extreme conditions.

The methodology may generalize beyond alloys. Any domain where expert knowledge exists in literature but isn’t fully captured in structured databases could potentially benefit from similar knowledge fusion approaches.

The Fine Print

The accuracy figures come from specific alloy datasets with defined test conditions. Real-world materials discovery involves many more variables—processing conditions, microstructure, defects—that weren’t tested here.

The framework relies on LLMs to extract knowledge from literature. If those models hallucinate or misinterpret source material, errors propagate into predictions. The team used uncertainty quantification to flag low-confidence predictions, but this doesn’t eliminate the possibility of confident errors.

The Dempster-Shafer framework provides mathematical rigor for combining evidence, but the quality of outputs depends entirely on the quality of inputs. “We cannot tell” is an honest answer, but scientists often need actionable predictions to prioritize experiments.