In early August, the White House brought together leaders from academia, industry, and government who have been involved in the Materials Genome Initiative (MGI) to celebrate the initiative’s fifth anniversary. Over the past five years, federal agencies have invested more than $500 million in resources and infrastructure in support of this initiative and its goal to enable the discovery, design, development, and deployment of new materials twice as fast at half the cost (see James Warren’s recent blog post for a brief overview of MGI). In reviewing the accomplishments and technical highlights of MGI so far, the launch announcement of the High-Throughput Experimental Materials Science Virtual Laboratory (HTE-VL) caught my eye due to its marrying of physical and virtual processes through coordinated data management—an important concept I highlighted in a previous blog post.
Over the past decade, the National Renewable Energy Laboratory (NREL) and National Institute of Standards and Technology (NIST) have both advanced expertise and innovation in the HTE area, developing functional materials such as transparent conducting oxides for solar cells and thermoelectric materials that allow direct conversion between thermal and electrical energy. The HTE-VL initiative is the first large-scale partnership between these two agencies in this area. It aims to lead a broad cultural shift in materials research by encouraging and facilitating an integrated team approach—one of the key challenges outlined in the 2014 MGI Strategic Plan—that uses HT-combinatorial experimental tools for synthesis and characterization (i.e., physical systems) and materials infrastructure for data management (i.e., virtual systems). This systematic HTE approach could enable new materials discoveries in an accelerated time frame.
The basic idea of HTE-VL is the integration of core MGI activities such as HT experimentation, integrated computational materials engineering (ICME), and materials data analytics (MDA). Combining these activities, however, is challenging due to the following complexities of materials data and metadata:
- Multivariate, heterogeneous data: HT experimentation can generate large amounts of data with multiple variables (i.e., multivariate) due to different experimental processes and the combined use of measurement methods to overcome each measurement tool’s limits of detection range (i.e., heterogeneous). However, interpreting such complex data to find hidden correlations between variables is cumbersome because of the difficulty in visualizing multiple parameters in 2D (i.e., X and Y) or 3D (i.e., X, Y, and Z) space and normalizing different types/units of heterogeneous data.
- Multiscale data: Computational modeling with ICME generates multiscale data from various atomistic, mesoscale, and continuum simulations in different length scales—from sub-nanometers to meters—to simulate materials behavior in different time scales—from picoseconds to seconds, or even days.
- Mined, derived data: The above multivariate, multiscale HT data is used to generate MDA-driven data of integrated structure-process-property-performance relationships, which can be used to inform materials design strategies.
To address the issue of HT data complexity, HTE-VL will apply an effective management model for the entire materials data life cycle—generation, dissemination, preservation, archiving—to the different types of data described above. This management model focuses on three primary functions:
- Interoperability: While the real-time interoperable data/metadata generated from individual entities will be captured through the laboratory information management system (LIMS) that was originally developed for the Process Development and Integration Laboratory (PDIL) at NREL, NIST’s Materials Data Curation System will store the data/metadata to make it more accessible.
- Discoverability: HTE-VL will facilitate data discoverability by including HTE repositories and registries with information about synthesis methods and characterization equipment in NIST’s publicly available Materials Resource Registry. The Materials Data Curation System at NIST and the Materials Data Facility (MDF) will lead data curation and publication, with support from the NIST-sponsored Chicago-based Center for Hierarchical Materials Design.
- Usability: Measurement results from HTE quickly yield large datasets on the order of gigabytes, which poses challenges for currently available license-free data transfer methods (e.g., Fast Data Transfer). HTE-VL plans to use the Globus data transfer tool, which uses a Grid File Transfer Protocol to handle big data. In support of the Obama Administration’s open data policy, HTE-VL also plans to make various materials data infrastructure tools and codes widely available with the aid of NIST’s MGI Code Catalog and DSpace
The HTE-VL partnership between NIST and NREL is an exciting step toward full integration of the virtual and physical systems necessary to accelerate the discovery of new materials, many of which will have a remarkable impact on our everyday lives. HTE-VL’s capabilities—particularly when complemented by other emerging advanced techniques and tools such as HT computational screening with machine learning, open source-based cloud data storage, the Semantic Web, and data citation practices—can facilitate the increased robustness, availability, and usefulness of complex materials data. I look forward to seeing what’s next for both HTE-VL and the broader field of advanced materials discovery and design.