NIMS Research Data Express Automates Materials Data Processing for AI Applications

By Trinzik

TL;DR

NIMS's Research Data Express gives researchers an edge by automating data processing to create AI-ready datasets, accelerating materials discovery and innovation.

RDE uses Dataset Templates to automatically interpret raw experimental data, restructure it into readable formats, and perform analyses while maintaining FAIR principles.

This system promotes collaborative research by reducing data-sharing barriers, ultimately accelerating sustainable materials development for a better future.

RDE has already processed over 3 million files using 1,900 templates, showing how automation can transform materials science research.

Found this article helpful?

Share it with your network and spread the knowledge!

NIMS Research Data Express Automates Materials Data Processing for AI Applications

Materials research generates vast amounts of data, but the information often exists in manufacturer-specific formats with inconsistent terminology, making aggregation, comparison, and reuse difficult. Traditionally, researchers have spent considerable time on tedious tasks like format conversion, metadata assignment, and characteristics extraction, which can discourage data sharing and hinder data-driven work. This problem is particularly acute given the field's increasing reliance on AI-driven materials discovery, which requires high-quality datasets.

To address this challenge, researchers at the National Institute for Materials Science (NIMS) have developed Research Data Express (RDE), a highly flexible data management system for materials scientists. Published in Science and Technology of Advanced Materials: Methods, RDE automatically interprets experimental data from raw files and manually inputted measurements, then restructures and stores this information in a format with enhanced readability. According to Jun Fujima, corresponding author and researcher at NIMS's Materials Data Platform, "RDE significantly reduces the burden of routine data processing for researchers and enhances data findability, interoperability, reusability (the FAIR principles), and traceability. We hope this will promote collaborative, data-driven materials research."

Unlike similar systems that typically define data formats, RDE's core innovation is the "Dataset Template" that defines and directs how data from different types of experiments should be processed. For example, if a researcher uploads spreadsheets of X-ray measurements from different sources, the Dataset Template can be configured to interpret them. The system then automatically performs advanced analyses and creates visualizations to provide immediate overviews. Multiple templates can be prepared for different materials research themes, allowing for maximum flexibility in data management, and individual researchers can easily prepare custom templates when necessary. Many templates have already been prepared and shared among users.

Fujima explains that "RDE's unique approach allows researchers to freely define data structures tailored to their instruments, while enabling the system to perform massive data structuring and metadata extraction automatically." Since its launch in January 2023, RDE has demonstrated scalability through widespread adoption across Japan's materials research community. The system currently has over 5,000 users, with more than 1,900 Dataset Templates for various experimental methods implemented, over 16,000 datasets created, and more than three million data files accumulated. It serves as data infrastructure for major national initiatives, including the Materials Research DX Platform initiative promoted by Japan's Ministry of Education, Culture, Sports, Science and Technology. The NIMS team has released an open-source software toolkit (RDEToolKit) to encourage broader use within the research community. The research paper detailing this system is available at https://doi.org/10.1080/27660400.2025.2597702.

Curated from NewMediaWire

blockchain registration record for this content
Trinzik

Trinzik

@trinzik

Trinzik AI is an Austin, Texas-based agency dedicated to equipping businesses with the intelligence, infrastructure, and expertise needed for the "AI-First Web." The company offers a suite of services designed to drive revenue and operational efficiency, including private and secure LLM hosting, custom AI model fine-tuning, and bespoke automation workflows that eliminate repetitive tasks. Beyond infrastructure, Trinzik specializes in Generative Engine Optimization (GEO) to ensure brands are discoverable and cited by major AI systems like ChatGPT and Gemini, while also deploying intelligent chatbots to engage customers 24/7.