
Hello. We are Fujii and Yanashima from the Artificial Intelligence Research Laboratory.
In this article, we will introduce Fujitsu's Causal AI knowledge-guided causal discovery technology and its application examples.
Fujitsu Causal AI was created to solve the following challenges that conventional action recommendation technologies faced:
- Difficulty in considering negative impacts (side effects) of actions.
- Inability to simultaneously consider multiple causal relationships, potentially leading to sub-optimal action proposals.
- Reliance on a single dataset, making it susceptible to data volume and bias, leading to limited action proposals.
To address these challenges, Fujitsu Causal AI consists of the following three core technologies: * Causal Action Optimization Technology * Integrated Causal Discovery Technology * Knowledge-Guided Causal Discovery Technology

We hope this series of articles will provide hints for solving your challenges. At the end of the article, we will also guide you on how to try out this technology.
(1) Causal Action Optimization Technology (Now available)
This technology rapidly analyzes causal relationships between phenomena from numerical data and explains those causal relationships with graphs and natural language sentences. Furthermore, based on the results of these causal relationships, it is a technology that recommends the most effective actions with no negative impact.
(2) Knowledge-Guided Causal Discovery (This Article)
The Knowledge-Guided Causal Discovery Technology leverages past causal relationship graphs as prior knowledge during causal discovery, achieving highly reliable analysis even with small datasets.
(3) Case Study of Fujitsu Causal AI Technology (Article scheduled around January 16)
We will introduce a case study of our pilot project that utilized this technology for parameter design in the beer brewing process.
Knowledge-Guided Causal Discovery: Enabling Highly Reliable Causal Discovery Even with Limited Data
Knowledge-guided causal discovery technology is a technique that enables highly reliable analysis even with a small amount of data by utilizing past causal discovery results and known causal graphs as prior knowledge when exploring causal relationships. This section outlines this technology and introduces an application example that leverages "Hirosaki Health Checkup Causal Network Model," a known causal graph integrated into Fujitsu Causal AI.
What are the challenges of having limited data?
The situation of "wanting to perform data analysis but having a limited amount of data that can be collected" is a frequent problem when dealing with real-world data. Furthermore, in situations such as "wanting to perform causal analysis on employee data from a specific department," the data size tends to be inherently small.
The causal discovery technology underpinning Fujitsu Causal AI estimates causal relationships behind data using statistical methods. Therefore, when data is limited, there is a higher probability of incorrectly estimating the direction and magnitude of causal relationships between variables, making it difficult to fully grasp the "correct causal relationships" that should exist behind the data.
How to leverage prior knowledge?
To improve analysis accuracy even with limited data, it is crucial to set prior knowledge such as "A can be a cause of B, but not vice versa." Traditionally, users have had to set this prior knowledge themselves based on their domain expertise. However, for users without specialized knowledge, providing correct prior knowledge is a challenging task.
Furthermore, it is difficult for users to provide prior knowledge that considers complex causal relationships among multiple variables, which can undermine the validity and reliability of the analysis results. For example, when considering a direct causal relationship between A and B, it is necessary to set up the correct causal relationship between A and B as prior knowledge, taking into account variable C that also influences these two variables. Typically, there are multiple common causes similar to C for A and B (e.g., D, E, ...), making it extremely difficult for humans to set such complex prior knowledge that also considers these common causes.
How is knowledge-guided causal discovery achieved?
To solve this problem, this technology guides causal relationship information tailored to user data as prior knowledge from causal outcome graphs obtained from past causal discovery or highly reliable known causal graphs, and then applies it to causal discovery for user data. This technology is largely realized by the following two techniques:
- Data variable to causal graph node matching
- Causal information extraction
Data Variable to Causal Graph Node Matching

In data variable to causal graph node matching, the system automatically determines which variable in the user's data corresponds to which node on the known causal graph that serves as prior knowledge.
For example, let's consider which source item the target item "Activity Level" corresponds to. Here, the blue squares in the figure correspond to the target items, and the black circles correspond to the source items. If we measure the semantic proximity between item names, in the example shown, "Activity Level" is closest to "Body Mass Index," resulting in a different mapping than the more desirable "Exercise Habits."
Originally, when matching variables, we would want to determine the final correspondence by considering both the semantic proximity of item names and the data proximity. However, for known causal graphs, it is often not possible to directly access the original data, making it difficult to establish correspondences by comparing data distributions.
Therefore, here we determine the correspondence using a developed technique called Proxy-assisted matching. This technology obtains proxy data for the source causal graph from the target data, achieving a more desirable matching. First, we determine the data variable closest to the node on the causal graph based on semantic proximity. Then, using this correspondence, proxy data for the graph node is obtained from the user data, and the proximity between the proxy data and the user data is measured in the data space. Finally, by integrating these, the proximity between the graph node and the user data variable is measured, and the final correspondence is determined.
For instance, in the previous example, where "Activity Level" and "Body Mass Index" were close due to semantic proximity alone, the correction by proxy data made "Activity Level" and "Exercise Habits" closer, achieving the desired matching.
Causal Information Extraction

Based on the determined correspondences, information to be used as prior knowledge is extracted from the causal graph. The simplest method is to transfer information about the presence or absence of direct causal relationships between nodes on the known causal graph corresponding to user data variables. However, as seen in the "Hirosaki Health Checkup Causal Network Model" described later, if a massive graph is available as a known causal graph, transferring information including not only local causal relationships but also global indirect causal relationships can leverage prior knowledge more richly.
For example, when there are "Factory Operating Rate" and "Atmospheric CO2 Concentration" nodes on a known causal graph corresponding to user data, looking at the causal relationship between "Factory Operating Rate" and "Atmospheric CO2 Concentration" shows "no causal relationship." At this point, the feature of this technology is that it can consider this relationship even if there is a causal relationship such as "Factory Operating Rate" -> ● -> "Atmospheric CO2 Concentration" via an unmapped node ●.
On the other hand, there is a challenge when there are multiple unmapped nodes ● between "Factory Operating Rate" and "Electricity Price": whether it is appropriate to apply such a relationship as is. This is because causal effects often diminish or disappear over many intermediate variables. In this technology, we analyze the positional relationship between corresponding nodes on a known causal graph based on geometrical features such as graph connectivity, and automatically determine the degree to which non-corresponding nodes that are passed through should be considered.
"Hirosaki Health Checkup Causal Network Model" Integrated Into Fujitsu Causal AI
So far, we have introduced the technical background of knowledge-guided causal discovery. With Fujitsu Causal AI, you can actually try out this technology using the highly reliable, large-scale causal graph "Hirosaki Health Checkup Causal Network Model" as prior knowledge.
The Hirosaki Health Checkup Causal Network Model is a highly reliable causal graph obtained by applying Bayesian network technology to the big data of multi-item health checkup results acquired by "Hirosaki University COI-NEXT" through the "Iwaki Health Promotion Project," and estimating causal relationships between items by a research group from Kyoto University and Hirosaki University. Fujitsu has so far applied this technology to various datasets, using the Hirosaki Health Checkup Causal Network Model as prior knowledge. Here, we will introduce two application examples.
Example 1: Application to Gene and Dietary Habit Data
With the advancement of genomics, many correlations between genes and physical constitution/behavior have been reported. However, delving deeper into causal relationships – understanding what causes what, and through what mechanism effects occur – is difficult due to the need to consider multi-factor interactions, unlike mere correlations. In particular, precise data analysis is indispensable in areas where complex factors such as food preferences, lifestyle habits, and body type are intertwined.
In this initiative, we combined large-scale genetic and questionnaire data from Genequest Inc. (hereinafter, Genequest) with Fujitsu Causal AI to analyze these complex causal mechanisms more deeply. Furthermore, by applying knowledge-guided causal discovery technology, reliability was enhanced, and the existence of important factors not visible from the dataset alone was suggested.
What we learned from causal discovery
In this initiative, we conducted causal analysis using Fujitsu Causal AI with genetic and questionnaire data from approximately 4,000 consented individuals. As a result, two major findings were obtained:
- Relationship between genetic characteristics related to alcohol metabolism and dietary habits
Genetic characteristics related to alcohol decomposition ability are strongly associated with drinking frequency. Previous research by Genequest has also shown associations with various dietary habits, such as sweet preference and coffee consumption frequency. The influence of genetic factors on sweet preference and coffee consumption frequency has been suggested previously. Analysis using Fujitsu Causal AI showed that while genetic alcohol tolerance is partly related to sweet preference, this association is likely primarily mediated by drinking frequency. Furthermore, no association with drinking frequency was observed for coffee consumption frequency, suggesting that genetic alcohol tolerance may be influencing it. This indicates the possibility that specific genetic characteristics affect an individual's beverage choices.
- Relationship between genetic characteristics related to body type and dietary habits/BMI
Using a polygenic score (Note 8), an index integrating numerous genetic factors related to diet and BMI, we analyzed the causal relationship between genetic predisposition to obesity, dietary habits, and BMI. The analysis results suggested a direct association between genetic predisposition to obesity and BMI, and its statistical impact was shown to be comparable to that of sex and age. Furthermore, a slight association was observed with food preferences such as fatty and sweet tastes.
Application of Knowledge-Guided Causal Discovery
Moreover, in the analysis that utilized the Hirosaki Health Checkup Causal Network Model through knowledge-guided causal discovery, more precise results were obtained. The influence of meal quantity, which had been suggested as a major factor in BMI change second only to the polygenic score in previous analyses, relatively decreased. Instead, preferences for fatty and umami tastes were suggested as more influential factors. Furthermore, the analysis suggested that factors not included in the data, such as family history of illness (cancer, hypertension, heart disease, etc.), the subject's height, and employment status, could be unobserved common causes between variables.
Example 2: Application to Employee Data
The data applied in this case example is an image of employee health checkup data but is simulation data created solely for the purpose of technical introduction.
Appropriate Measures for Each Department
We consider formulating measures to improve target variables such as sales revenue, taking into account the causal relationships of data items, using employee health checkup data, engagement surveys, and financial data held by a company. In this scenario, causal relationships can be estimated from the integrated data combining health checkup, financial, and other data, and causal AI can then recommend effective measures based on these estimated causal relationships. However, such integrated data often mixes various departments, and different departments may exhibit unique characteristics. Therefore, to achieve the single goal of increasing revenue, it might be more effective for the company to implement multiple tailored measures across various departments rather than a single, company-wide measure. In such situations, it is necessary to segment data samples by department and formulate effective measures even with limited departmental data samples.
Formulating Measures to Increase Sales Revenue

First, we estimate a causal graph without using prior knowledge and execute action proposals by Fujitsu Causal AI. This process unfortunately recommended increasing gross profit as an action to boost sales revenue. Observing the causal graph (image above), it appears that gross profit was the dominant item among those upstream of sales revenue, thus leading to the recommendation that increasing gross profit would lead to higher sales revenue. However, this is an obvious measure, and as employees, we cannot take concrete, actionable steps based on this recommendation.
Utilizing Knowledge-Guided Causal Discovery

Therefore, we estimated a causal graph using knowledge-guided causal discovery, leveraging the reliable Hirosaki Health Checkup Causal Network Model as prior knowledge. Using the estimated causal graph (image above), the recommendation to increase engagement was proposed as a measure to boost sales revenue. Engagement is a metric directly relevant to us as employees, and actions to increase it appear comparatively easier to undertake than previous suggestions.
We don't want to compromise employee health
However, it was also discovered that taking actions to increase engagement could, as a side effect, make employees more susceptible to stress. Sacrificing employee well-being by increasing stress in exchange for higher sales revenue is undesirable from a health perspective. Therefore, we asked the causal AI to recommend actions that would not lead to increased employee stress. The AI then recommended, in addition to increasing engagement, ensuring strong support from colleagues. This can also be seen as an effect of simultaneously utilizing health checkup data. By the way, how exactly can we increase engagement?
We want to increase engagement
Previously, we queried the causal AI for measures to increase sales revenue. This time, let's focus on increasing engagement and have the causal AI formulate a strategy. Considering our previous experience, we will also add the constraint of not compromising health. The insight gained was that to effectively increase engagement, strengthening both manager and colleague support is beneficial.
Would you like to try Fujitsu Causal AI?
The causal discovery and action optimization features we introduced in this article are readily available via API and GUI through Fujitsu's AI service "Fujitsu Kozuchi."
No programming is required. Simply upload a CSV file to visualize causal graphs from your data and obtain a list of candidate actions.
| Use Case | Input | Output proposed by Kozuchi |
|---|---|---|
| Beer Development | Data on taste chart (bitterness, sweetness, aroma, etc.) and factory settings (fermentation temperature, hop addition amount, etc.), and target values for the taste chart | Entire manufacturing recipe, including fermentation temperature/time, malt blending ratio, hop addition timing, etc. |
| Employee Engagement | Current survey results and target scores | Specific HR actions such as training content, peer bonus amounts, frequency of 1-on-1s with supervisors. |
Without programming, you can visualize causal graphs and obtain a list of candidate actions simply by uploading CSV files. Please try to instantly verify "What if...?" with your own data.
Related Articles
Paper Presentation (ECML-PKDD 2024):
Presented "LayeredLiNGAM," which speeds up LINGAM, a representative model for statistical causal discovery.
▶ Technical Blog Explanation
▶ Paper (Springer)
Application Examples (Materials Informatics Special Feature):
▶ #9: What causes changes in material properties? Causal Discovery AI answers!
▶ #10: Application of Causal Discovery AI to Semiconductor Device Design
Press Releases:
▶ Genequest and Fujitsu uncover new insights into genetics-lifestyle relationships through high-speed, reliable causal AI from Fujitsu Kozuchi (2025/10/09)
▶ Update on "Fujitsu Kozuchi" Causal Discovery Technology (2025/03/06)
▶ Launch of "Fujitsu Kozuchi" AI Service to Accelerate DX in a Wide Range of Business Systems (2023/05/17)
▶ Developed AI Technology to Incorporate Human Judgments and Hypotheses to Discover Highly Accurate Causal Relationships (2020/12/17)
Try Fujitsu Causal AI
▶ Fujitsu Research Portal: Data-Driven Decision Making