Hello. I'm Takasaburo Fukuda from Artificial Intelligence Laboratory in Fujitsu Research.
To promote the use of generative AI at enterprises, Fujitsu has developed a generative AI framework for enterprises that can flexibly respond to diverse and changing corporate needs and easily comply with the vast amount of data held by a company and laws and regulations. The framework was successively launched in July 2024 as part of Fujitsu Kozuchi (R&D)'s AI service lineup. In this article, we will focus on the transformation in system development and operations brought about by generative AI, and introduce Design Review Assistant, which automates the review of software design documents.
Generative AI transforms enterprise system development and operation processes and automation
Generative AI is redefining the very foundations of software development. Code completion and automated test generation are just the beginning; we're moving toward an "AI-based development process" in every step of the process, from requirements definition to design, implementation, and operation. Large-scale language models (LLMs) extract knowledge from internal documents and logs, accelerating the process of drafting specifications, estimating impact scope, and creating release notes. Meanwhile, issues such as how to ensure quality, security, and accountability are being addressed.
Enterprise system development is particularly challenging. Complex business domains, integration with legacy assets, strict auditing and availability, and frequent compliance with legal changes - these are issues that cannot be resolved with simple automated generation. It is essential to accurately verbalize requirements, visualize non-functional requirements, and maintain architectural consistency while controlling the spread of changes.
Fujitsu is engaged in research and development of advanced technologies that will solve these issues. The core technology is Fujitsu Knowledge Graph enhanced RAG technology, which enables accurate referencing and utilization of large amounts of data. While this technology can be used in a variety of general-purpose scenarios, this series focuses on automating and streamlining system development and operations, and introduces the following seven technologies. In the future, we aim to build a multi-agent system that will promote requirements definition, design, implementation, and operation in a highly reliable and well-controlled manner by having Knowledge Graph enhanced RAG as an integrated database that can be accessed by AI across a wide range of development and operation tasks.

Table1: System development and operation processes to which each technology can be applied
| Requirement definition | Design | Implementation | Test | Operation | Maintenance | |
|---|---|---|---|---|---|---|
| (1) System Specification Visualization | ✓ | ✓ | ||||
| (2) Design Review Assistant | ✓ | |||||
| (3) Code Specification Consistency Analysis | ✓ | ✓ | ||||
| (4) Test Specification Generation | ✓ | ✓ | ||||
| (5) Failure Analysis | ✓ | ✓ | ||||
| (6) Log Analysis | ✓ | ✓ | ||||
| (7) QA Automation | ✓ | ✓ | ✓ |
(1) System Specification Visualization (Knowledge Graph enhanced RAG for Software Engineering, Now Showing)
This technology not only analyzes and understands source code, but also generates high-level functional design documents and summaries, enabling modernization.
(2) Design Review Assistant (This article)
This technology automates the checking of ambiguity and consistency in system design documents by converting complex system design documents into a form that can be understood by generative AI.
(3) Code Specification Consistency Analysis (Now Showing)
This technology compares source code with specifications to detect differences and identify problem areas, shortening the time required to investigate the cause of a failure when one occurs.
(4) Test Specification Generation (Coming October 27)
This technology extracts rules for identifying test cases from existing design documents and test specifications, making it possible to generate complete test specifications that take into account the characteristics of the project.
(5) Failure Analysis (Knowledge Graph enhanced RAG for Root Cause Analysis, Now Showing)
This technology creates a report when a failure occurs based on system logs and data from failure cases, and suggests countermeasures based on similar failure cases.
(6) Log Analysis (Knowledge Graph enhanced RAG for Log Analysis, Now Showing)
This technology automatically analyzes system log files and answers highly specialized questions related to identifying the cause of failures, detecting anomalies, and preventive maintenance.
(7) QA Automation (Knowledge Graph enhanced RAG for Q&A, Now Showing)
This technology enables advanced Q&A with a bird's-eye view of large amounts of document data, such as product manuals.
In this article, I will introduce "(2) Design Review Assistant" in detail.
What is Design Review Assistant?
In software development, design document review is a critical process that forms the foundation of product quality. However, this process still relies heavily on manual inspection, requiring significant human effort. Complex document structures and project-specific formats have long been major obstacles to automation. To address these challenges, we have developed an AI-driven technology that converts such complex design documents into a structure that generative AI can understand, enabling automated checks for ambiguity and consistency. We are also conducting experiments using actual project documents to verify the effectiveness of this technology in real-world scenarios.

This research was presented at SANER 2025 (the IEEE International Conference on Software Analysis, Evolution and Reengineering), one of the top international conferences in the field of software analysis, held in March 2025 in Montreal, Canada. In this article, we introduce an overview of the presentation and its key insights.
Paper
- Title: Development of Automated Software Design Document Review Methods Using Large Language Models
- Authors: Takasaburo Fukuda, Takao Nakagawa, Keisuke Miyazaki, Susumu Tokumoto
- Link (arXiv): https://arxiv.org/abs/2509.09975
SANER2025

SANER (Software Analysis, Evolution and Reengineering) is an international conference dedicated to presenting and discussing research on software analysis, evolution, and reverse engineering. It brings together researchers and practitioners who focus on understanding, reconstructing, maintaining, and evolving existing software systems. The conference is well known for its practical and industry-oriented research within the broader field of software engineering. The 32nd edition of SANER was held in Montreal, Canada, from March 4 to 7, 2025, at Polytechnique Montréal. Approximately 200 participants from North America, Europe, and Asia attended the event.
The program included 4 keynotes, 52 papers in the Research Track, and 10 papers in the Industrial Track, along with additional categories such as Short Papers, RENE (Reverse Engineering New Ideas), and Tool Demos. 2 tutorials and 6 workshops were also organized, and the conference’s traditional Most Influential Paper (MIP) award was presented.
In terms of research trends, Generative AI (GenAI) and agent-based systems were among the main topics that attracted significant attention. Many presentations focused on AI-driven approaches to program repair, defect prediction, vulnerability detection, automated testing, and intelligent software agents, showing how generative AI and agent technologies are contributing to deeper program understanding and improvement.
Presentation Content
In this study, we explored a review support method utilizing large language models (LLMs) to automate software design document reviews. Design document review is a critical task for ensuring quality assurance, but it often faces challenges such as variability in quality and oversights due to reviewers' skill levels or time constraints. To address these issues, we aimed to enable LLMs to automatically detect inconsistencies and issues in the descriptions within design documents.
Structuring Review Perspectives
Current general-purpose large language models (LLMs) face the challenge of lacking the specialized knowledge required for effective design document reviews. To overcome this, our study systematically organized the design document review process and defined 11 review perspectives, including sufficiency, consistency, ambiguity, and cross-document validation. We further classified these perspectives based on the required knowledge level and the number of referenced design documents, adopting a strategy to design prompts tailored to each perspective. Table 2 defines the reviewer difficulty level as the level at which individuals with limited expertise, such as junior team members or third parties outside the project developing the design documents, can perform the review. Review perspectives requiring multiple document references are classified as "multiple design documents," while those needing only one are categorized as "single design document." As a result, review perspectives marked with a check mark on the right side of the table indicate higher difficulty, with levels ranging from 1 to 4. Using this framework, we identified the scope of tasks that current LLMs can handle and focused our validation efforts primarily on perspectives such as ambiguity and consistency.

Handling Complex Input Structures in Design Documents
In Japan’s software industry, many design documents are created using spreadsheet tools such as Excel. These documents often feature hierarchical and complex header structures, which can make it difficult to clearly associate column headers with their corresponding values. As a result, large language models (LLMs) struggle to correctly interpret the context and structure when the data is provided in plain CSV format. This problem becomes even more pronounced when multiple semantic elements are combined within a single cell, or when there are dependencies across rows and columns. In such cases, LLMs often fail to determine which items refer to which pieces of information.
To address this issue, our study proposes a method for converting design documents into a format that allows LLMs to more easily understand header structures. Specifically, we adopted a Markdown-based format that can represent both natural language and structural information. By explicitly defining the relationships between headers and their corresponding values, the model can more accurately comprehend design information. This approach has led to a significant improvement in review accuracy while preserving the original document structure that was previously lost in CSV format. Additionally, for certain design documents that primarily consist of symbolic expressions or item definitions, we also use a supplementary JSON format.

Experimental Evaluation
To verify the effectiveness of the proposed method, we conducted experiments focusing on consistency checking. Based on actual design documents used in real projects, we created multiple sample datasets in which inconsistencies (for example, mismatched ID names) were intentionally inserted, and compared detection performance across different document formats. The models used for evaluation were GPT-3.5-turbo, GPT-4, and GPT-4o. Performance was assessed using precision (whether false positives were minimized) and recall (whether missed detections were minimized).
The experiments were designed around 3 main research questions (RQs), and this article discusses 2 of them:
- RQ1: Does converting design documents into Markdown format improve review performance?
- RQ2: Does the conversion perform differently for natural language-rich and symbolic representation-rich design documents?
Results of RQ1
Table 3 below shows the comparison between the review results using the original CSV format and those obtained after applying the proposed conversion method. When the design documents were input in CSV format, the LLM had difficulty correctly understanding the header relationships within the table, resulting in a low detection rate (Recall). In contrast, when the documents were converted using the proposed method, the relationships between headers and values were made explicit, leading to a significant improvement in review performance. This effect was particularly noticeable in consistency checking. Although the Precision (accuracy) did not change significantly, there was no increase in false positives, and the improvement in Recall contributed to an overall enhancement of review quality.

Results of RQ2
Tables 4 and 5 show the comparison results of precision and recall between two types of design documents: those written mainly in natural language, and those that primarily use symbolic expressions such as database field names. For design documents centered on natural language descriptions, the proposed Markdown-based conversion method achieved the highest performance. By clearly organizing structural relationships within the text, such as processing names, conditions, and explanations, into headings, the LLM was able to follow the context more effectively and identify inconsistencies with higher accuracy. In contrast, for design documents that contain many symbolic representations such as table structures or data item definitions, formats that explicitly preserve the original structure proved to be more effective than Markdown. When identifiers and numerical relationships within cells were clearly defined, the LLM made fewer incorrect associations between different items, resulting in a reduction in false positives. These results indicate that selecting an appropriate input format based on both the characteristics of the design document and the behavior of the model is effective for achieving stable review performance.


Closing Remarks
Building on the findings of this study, we are continuing to advance our research on enhancing review support in the software design process. We have also been exploring multimodal design document review technologies that can handle both textual and diagrammatic representations, such as UML, in an integrated manner. This work*1 was presented at the Software Engineering Symposium 2024 (Japan). We will continue our research and development efforts toward realizing AI-driven software development processes powered by generative AI.
*1:Takasaburo Fukuda, Susumu Tokumoto, Hiroaki Fujimoto, Shigeyuki Odashima, Exploring Automatic Review Methods for Software Design Documents with Diagrams Using Large Language Models, in Proceeding of the Software Engineering Symposium 2024 (SES 2024), pp.319-320, 2024 (In Japanese)