
Hello. I'm Oura from Artificial Intelligence Laboratory in Fujitsu Research.
To promote the use of generative AI at enterprises, Fujitsu has developed a generative AI framework for enterprises that can flexibly respond to diverse and changing corporate needs and easily comply with the vast amount of data held by a company and laws and regulations. The framework was successively launched in July 2024 as part of Fujitsu Kozuchi (R&D)'s AI service lineup. In this article, we will focus on the transformation in system development and operations brought about by generative AI, and introduce Code Specification Consistency Analysis, which automates the identification of program failure causes.
Generative AI transforms enterprise system development and operation processes and automation
Generative AI is redefining the very foundations of software development. Code completion and automated test generation are just the beginning; we're moving toward an "AI-based development process" in every step of the process, from requirements definition to design, implementation, and operation. Large-scale language models (LLMs) extract knowledge from internal documents and logs, accelerating the process of drafting specifications, estimating impact scope, and creating release notes. Meanwhile, issues such as how to ensure quality, security, and accountability are being addressed.
Enterprise system development is particularly challenging. Complex business domains, integration with legacy assets, strict auditing and availability, and frequent compliance with legal changes - these are issues that cannot be resolved with simple automated generation. It is essential to accurately verbalize requirements, visualize non-functional requirements, and maintain architectural consistency while controlling the spread of changes.
Fujitsu is engaged in research and development of advanced technologies that will solve these issues. The core technology is Fujitsu Knowledge Graph enhanced RAG technology, which enables accurate referencing and utilization of large amounts of data. While this technology can be used in a variety of general-purpose scenarios, this series focuses on automating and streamlining system development and operations, and introduces the following seven technologies. In the future, we aim to build a multi-agent system that will promote requirements definition, design, implementation, and operation in a highly reliable and well-controlled manner by having Knowledge Graph enhanced RAG as an integrated database that can be accessed by AI across a wide range of development and operation tasks.

Table: System development and operation processes to which each technology can be applied
| Requirement definition | Design | Implementation | Test | Operation | Maintenance | |
|---|---|---|---|---|---|---|
| (1) System Specification Visualization | ✓ | ✓ | ||||
| (2) Design Review Assistant | ✓ | |||||
| (3) Code Specification Consistency Analysis | ✓ | ✓ | ||||
| (4) Test Specification Generation | ✓ | ✓ | ||||
| (5) Failure Analysis | ✓ | ✓ | ||||
| (6) Log Analysis | ✓ | ✓ | ||||
| (7) QA Automation | ✓ | ✓ | ✓ |
(1) System Specification Visualization (Knowledge Graph enhanced RAG for Software Engineering, Now Showing)
This technology not only analyzes and understands source code, but also generates high-level functional design documents and summaries, enabling modernization.
(2) Design Review Assistant (Now Showing)
This technology automates the checking of ambiguity and consistency in system design documents by converting complex system design documents into a form that can be understood by generative AI.
(3) Code Specification Consistency Analysis (This article)
This technology compares source code with specifications to detect differences and identify problem areas, shortening the time required to investigate the cause of a failure when one occurs.
(4) Test Specification Generation (Now Showing)
This technology extracts rules for identifying test cases from existing design documents and test specifications, making it possible to generate complete test specifications that take into account the characteristics of the project.
(5) Failure Analysis (Knowledge Graph enhanced RAG for Root Cause Analysis Now Showing)
This technology creates a report when a failure occurs based on system logs and data from failure cases, and suggests countermeasures based on similar failure cases.
(6) Log Analysis (Knowledge Graph enhanced RAG for Log Analysis, Now Showing)
This technology automatically analyzes system log files and answers highly specialized questions related to identifying the cause of failures, detecting anomalies, and preventive maintenance.
(7) QA Automation (Knowledge Graph enhanced RAG for Q&A, Now Showing)
This technology enables advanced Q&A with a bird's-eye view of large amounts of document data, such as product manuals.
In this article, I will introduce "(3) Code Specification Consistency Analysis" in detail.
What is Code Specification Consistency Analysis?
Code Specification Consistency Analysis (hereafter referred to as CSCA) is an AI core engine that analyzes the causes of system failures by referencing both specifications and source code. For example, when an error occurs during the testing phase of system development, the tester can input the error description in natural language, and the AI will automatically analyze the specifications and source code to pinpoint the problematic section. Even in cases where failures span across multiple components, CSCA can automatically identify the cause, enabling testers to investigate issues quickly without needing to involve each component’s developer.

In this article, we will take a technical deep dive into how CSCA works, discuss its challenges, and explore future directions. While the primary use case covered is failure analysis during system testing, developers themselves can also use CSCA during implementation.
Existing Technologies and Their Challenges
In recent years, methods that leverage large language models (LLMs) to analyze specifications and source code have attracted attention as a way to assist in identifying the root causes of system failures. One representative approach is a technique called ReAct (Reasoning + Acting)*1.
ReAct provides an LLM with both a question and a set of available tools. The LLM then selects and executes tools step by step, gathering and analyzing information to arrive at the final answer. The key is which tools are made available for a given use case. For instance, the following tools may be prepared:
| No. | Tool Name | Purpose / Description |
|---|---|---|
| 1 | Specification Search Tool | Searches a pre-analyzed database of specifications for relevant sections. |
| 2 | Source Code Search Tool | Searches source code by keywords and outputs the matching file names. |
| 3 | File Reading Tool | Reads the specified file. |

Let’s walk through an example of applying ReAct to a use case.
Suppose a web service is under development. During testing, it was found that only specific users fail to log in. The tester inputs the following request into ReAct’s main program:
"We are currently developing a web service. During testing, we encountered an issue where only specific accounts cannot log in. Please identify the cause."
ReAct, working with the LLM, executes the tools in the following steps:
Step 1: Specifications Search
Since the issue relates to authentication, the “Specification Search Tool” is used to look up “Authentication Policy.” The search yields the following information:
- User passwords must be stored using SHA-256 hashing.
- Initially, MD5 was planned, but the design was later revised to SHA-256.
This leads to the hypothesis that some implementation of MD5 encryption may still remain in the code.
Step 2: Source Code Search
Next, to locate relevant source code, the “Source Code Search Tool” is used with keywords like md5 / hash / crypto. The results include the following file candidates:
- legacy_hash.py
- crypto_utils.py
- auth_handler.py
- docs/CHANGELOG.md
- tests/crypto/test_md5_compat.py
- README_security.md
Step 3: Reviewing Source Code
Using the “File Reading Tool,” the candidate files are reviewed one by one. However, each file spans hundreds of lines, and during this process the LLM hits its token limit and cannot continue.
- legacy_hash.py (~800 lines)
- crypto_utils.py (~1500 lines)
- auth_handler.py (~500 lines)
Even without hitting the token limit, the more irrelevant information is included, the more the LLM’s accuracy degrades, raising the risk of overlooking the true cause.
Thus, while ReAct provides a powerful framework, simply combining basic search and file-reading tools is insufficient for analyzing failures in source code. More sophisticated approaches are required.
Features of CSCA
To address these challenges, we developed Code Specification Consistency Analysis (CSCA). CSCA extends the ReAct framework by incorporating unique source code tools (No. 2–4 below), enabling efficient failure cause identification from both specifications and source code.
| No. | Tool Name | Purpose / Description |
|---|---|---|
| 1 | Specification Search Tool | Searches a pre-analyzed database of specifications for relevant sections. |
| 2 | Source Code Listing Tool | Retrieves a list of program files along with their directory names. |
| 3 | Source Code Summary Tool | Summarizes function names and main operations of a file using another LLM. |
| 4 | Source Code Extraction Tool | Extracts specific functions or processes from the designated source file. |

Revisiting the earlier example with CSCA, the process proceeds as follows:
Step 1: Specification Search
As before, the “Specification Search Tool” is used to locate the “Authentication Policy.”
Step 2: Source Code File Identification
The “Source Code Listing Tool” retrieves file names. Files containing words like auth, crypto, or hash are selected:
- legacy_hash.py
- crypto_utils.py
- auth_handler.py
Step 3: Source Code Summarization
The “Source Code Summary Tool” is run on each file, producing summaries of only a few dozen lines. From these, we learn:
- In auth_handler.py, the
check_password()function imports and uses crypto_utils.py for encryption. - In crypto_utils.py, both a SHA256Hash class and an MD5Hash function coexist.
Step 4: Source Code Extraction
Finally, the “Source Code Extraction Tool” is instructed to extract check_password() from auth_handler.py. Reviewing this snippet reveals: hashed = MD5Hash(password)
This contradicts the specification. Furthermore, the implementation routes only early-registered users through MD5, instead of SHA-256.
Based on this, the LLM outputs a conclusion such as:
"In auth_handler.py, certain users are processed with MD5 hashing instead of SHA-256. This contradicts the specification and likely remains from the initial development phase. Please modify the code so that all users use SHA256Hash."
Thus, CSCA collects and organizes information from both specifications and source code, enabling the LLM to conduct efficient and accurate failure analysis. By progressively narrowing the focus, CSCA overcomes limitations of traditional methods, such as token constraints and accuracy degradation due to noise.
Challenges of CSCA
CSCA was released in July 2024 as one of the “AI Core Engines” on Fujitsu Kozuchi, a platform for rapidly experimenting with cutting-edge AI technologies. As it has been adopted within internal projects and by customers, several challenges have surfaced. The three main issues are:
- Insufficient Coverage: When failure causes span multiple files, CSCA sometimes prematurely concludes “this is the cause” after finding a few related files, without fully exploring other candidates. This can result in incomplete cause identification.
- Scalability in Large Projects: For systems with a very large number of files, accuracy tends to decline. This is because CSCA’s process of listing, summarizing, and reviewing related files must be repeated many times, introducing more noise and reducing precision.
- Unclear Fix Guidance: While CSCA was originally intended for testers to isolate cause locations, there is growing demand for it to also suggest how to fix the issues. Currently, CSCA only identifies the causes, without providing concrete remediation steps.
Future Development Plans
To address these challenges, we are pursuing several enhancements:
- Improvement of Code Search Coverage: Introduce tools that evaluate code dependencies and retrieve related files from a starting point, enabling more systematic and comprehensive searches. Add structured search strategies and result validation mechanisms.
- Multi-Agent Collaboration: Large-scale projects are difficult for a single agent to handle due to the volume of information. By orchestrating multiple agents that search in parallel or evaluate each other’s outputs, CSCA can handle bigger projects more effectively.
- Providing Fix Proposals: Beyond cause analysis, CSCA aims to eventually suggest specific code fixes or patch candidates.
Conclusion
CSCA is an AI core engine developed to support system failure analysis, combining ReAct-based multi-step reasoning with advanced code search techniques. While challenges such as coverage and scalability remain, we are actively researching solutions through enhanced search strategies, multi-agent systems, and fix proposal generation.
Looking ahead, CSCA will continue to evolve, including integration with other technologies such as Fujitsu Knowledge Graph enhanced RAG series.
Acknowledgment
This technology was developed by the following team members. I would like to take this opportunity to introduce.
Artificial Intelligence Laboratory: Junki Oura, Akihiro Wada, Tatsuya Kikuzuki, Masatoshi Ogawa.
Space Data Frontiers Research Center: Keiichi Nakatsugawa.
*1:Shunyu Yao, et al. "React: Synergizing reasoning and acting in language models." International Conference on Learning Representations (ICLR). 2023.