Unsupervised Machine Learning Assisted Materials Search

I have recently developed a framework for unsupervised machine learning from a large volume of unlabeled text, which can enable new learning tools for efficiently teaching new concepts. This effort was inspired by a recent demonstration of similar unsupervised learning (Nature, 571, 95-99, 2019.). I have collected scientific abstracts from various journals and trained neural networks using the collective text data via widely used natural language processing algorithms. Surprisingly, the model learned special concepts in materials science, physics, and device engineering without any explicit guidance.

The model represents the words using high-dimensional vectors, which can be processed using vector algebra to find meaningful results, e.g., element name + material class = materials list.

vector["bismuth"] + vector["topological insulator"] → vector(["Bi2Se3", "Bi2Te3"]).

vector["tin"] + vector["topological insulator"] → vector(["SnTe"]).

We can also relate phenomena to corresponding materials, devices, and applications. For example, tunneling magnetoresistance (TMR) and spin-transfer torque (STT) phenomena are used to read and write magnetic tunnel junctions (MTJ), respectively, in the emerging STT magnetoresistive random access memory (MRAM) technology, and the trained model knows about it.

vector["tunneling magnetoresistance"] + vector["spin transfer torque"]vector(["magnetic tunnel junction"]).

vector["magnetic tunnel junction"] + vector["memory"]vector(["MRAM"]).

vector["MRAM"] + vector["applications"]vector(["nonvolatile memories", "neuromorphic computing",... ]).

Figure 1: A new tool enabled by unsupervised machine learning. The tool learns from available text data and (b)-(c) provides lists and patterns for materials combinations associated with various phenomena (d) that agree with known results.

I have developed a program (see Fig. 1a) that uses such vector algebra to answer questions on a topic. For example, suppose I ask, "which materials combinations give a large TMR in MTJ devices?". In that case, it generates a list that shows that magnesium oxide (MgO)-based combinations have the highest correlations with both TMR and MTJ (see Fig. 1b). Interestingly, the model also learned the intrinsic figure-of-merit of a materials combination associated with a particular phenomenon. For example, suppose I ask "which transition metals exhibit oscillatory exchange coupling (OEC)?". In that case, it calculates the correlation between vector["OEC"] and different vector[transition metal symbols] and plots them as a function of the atomic numbers, see Fig. 1c. The correlation pattern from the trained model match quite well with the known experimental results on the first antiferromagnetic (AFM) peak strengths of OEC, see Fig. 1d.