Please enable JavaScript in your browser.

Imagined Spaces: Powering Cooperative Multi-Robot Operation - fltech - Technology Blog of Fujitsu Research

fltech - Technology Blog of Fujitsu Research

A technology blog where Fujitsu researchers talk about a variety of topics

Imagined Spaces: Powering Cooperative Multi-Robot Operation

Hello, I’m Kazuki Osamura from the Spatial Robotics Research Center at Fujitsu Laboratories.
We are developing a Spatial World Model, a core technology showcased at Fujitsu Technology Update(FTU2025), enabling human–robot collaboration in real-world environments. We also released an official press release on the Spatial World Model today. I would be grateful if you could read it together with this article.

1. The Era of Robots Entering Human Living Spaces

In recent years, AI-driven robots have rapidly expanded beyond factories and warehouses into everyday environments such as offices, cafés, retail stores, and airports. Within this growing trend of “Physical AI,” the importance of safely and efficiently managing spaces where diverse people and robots operate simultaneously has never been greater.

However, unlike the well-structured and controlled settings of manufacturing sites or logistics centers, real-world living spaces involve complex human behaviors and constantly changing conditions. In such environments, smooth collaboration between humans and robots remains a major challenge.

Typical issues observed in real environments include:

  • Delivery robots nearly colliding with pedestrians
  • Autonomous robots failing to coordinate with each other
  • Service robots unable to interpret complex changes in their surroundings

These problems stem from a fundamental limitation: a robot’s narrow field of view and purely local decision-making are insufficient to understand the broader spatial context, surrounding dynamics, and underlying causal relationships.

Challenges in Real-World Robot Operation

To address this, we are developing the Spatial World Model, based on the idea of intelligentizing not just robots, but the entire space itself. Leveraging Fujitsu’s long-standing expertise in Vision AI, our goal is to enable Physical AI to evolve to the next stage by giving robots the ability to understand messy, unstructured, and dynamic real-world environments with far greater fidelity.

2. What Is the Spatial World Model?

A Spatial World Model integrates information from ceiling cameras, robot-mounted cameras, and various sensors to understand the global state of an environment, capture the underlying causal structure, and anticipate future situations. It goes beyond conventional “monitoring” or “sensing,” functioning instead as the cognitive engine of the space.

How It Differs from Conventional World Models

Conventional world models typically fall into two categories:

①: Video Generation

These models generate the next scene from text or video inputs with high fidelity. However, they are computationally heavy—e.g., requiring 30 minutes to predict 5 seconds of future video—making them impractical for real-time robot control in the physical world.

②: Future forecasting

These abstract an agent’s internal state and predict next states efficiently. However, they only model the limited surroundings of the agent, making them unsuitable for understanding an entire environment.

Use Cases of Conventional World Model

Conventional world models focus on predicting in closed, simulated environments, such as video generation or simulation. They are not designed to directly act on the real world.

Concept of the Spatial World Model

The Spatial World Model abstracts relationships among people, robots, and objects across the entire environment and predicts future actions and spatial changes in real time. Crucially, these predictions can be directly used for real-world robot control, enabling deployment in human-populated spaces.

Comparison of Conventional vs Spatial World Model

Feature 1: Multi-View Integration for Building a Spatial World Model

Mutual correction of spatial and temporal offsets between fixed and moving robot cameras

In real environments, people and robots constantly move, and viewpoints change rapidly. Thus, real-time integration of ceiling cameras and robot cameras has been a major technical challenge.

Conventional methods suffer from:

  • Different fields of view for each camera
  • Appearance mismatch due to lens distortion, viewpoint changes, etc.
  • Pixel-level fusion being fragile in dynamic environments

To address this, the Spatial World Model does not rely on matching pixels. Instead, it uses entity-level information—the detected positions of “people,” “robots,” etc.—to mutually correct the misalignment between fixed and mobile cameras.

This approach enables:

  • Correction of positional offsets
  • Synchronization of acquisition timing
  • Robust handling of viewpoint differences and lens distortions

As a result, the system can consistently understand the entire environment and obtain, in real time:

  • Accurate positions and trajectories of people, robots, and objects
  • Occluded regions that single robots cannot observe
  • A time-aligned, unified spatial map

Feature 2: Imaging the Future of Spaces

Imagining the future of people, robots, and objects in a space

For robots to collaborate with humans, they must understand human intentions and predict the future. However, previous methods are limited because they:

  • Rely only on the robot’s viewpoint
  • Cannot capture global spatial dynamics
  • Cannot reason about human intentions or future actions

The Spatial World Model integrates spatial relationships and, through causal reasoning, infers: “Who is in what situation, with what intention, and what they are likely to do next.”

Using this causal understanding, the system predicts future spatial changes with high accuracy and generates cooperative action plans, such as:

  • Optimal task allocation
  • Path planning and adjustment
  • Congestion avoidance and collision prevention

This enables robots to move naturally and safely—even in crowded, complex environments.

3. Demonstration: Multi-Robot Coordination Powered by the Spatial World Model

We built a demonstration system that visualizes cooperative multi-robot behavior in real time, showcasing the core capabilities of the Spatial World Model. With ceiling cameras and a projector, the system performs real-time inference, control, and projection-based visualization.

Multi-Robot Coordination Demo Powered by the Spatial World Model

Robots can operate autonomously, but as human movement or congestion increases, their local-only perception becomes insufficient. The Spatial World Model compensates by:

  • Globally anticipating future changes
  • Adjusting robot trajectories and actions
  • Coordinating with projection cues or audio guidance for humans

This ensures safe and cooperative operation even in complex scenarios.

Normal Situations: Robot Autonomous Control

Robots autonomously navigate, update their maps, and respond with voice guidance. However, in crowded or dynamic environments, local autonomy alone becomes unstable.

Projection Showing Robot Behavior Based Solely on Its Own Decisions

Dynamic Situations: Spatial World Model–Guided Coordination

When sudden human motion, congestion, or collision risk arises, the Spatial World Model anticipates changes in real time and adjusts robot operations accordingly.

Projection Showing Proactive Control Preventing Entry into a Restricted Area

4. Future Plan

Fujitsu is accelerating real-world validation and expanding applications of the Spatial World Model toward full social deployment. At CES2026, one of the world’s largest technology exhibitions, we plan to showcase new demonstrations themed around “Future Collaborative Robots Powered by the Spatial World Model.”

CES 2026 Exhibit

In addition, we will also leverage the redeveloped Fujitsu Technology Park(FTP) and other large-scale environments to conduct real-world field trials, allowing us to optimize the Spatial World Model at the operational level.

Technology Trials at FTP Redevelopment for Validation Across Offices, Construction Sites, and hospitals

5. Conclusion

The Spatial World Model is a technology that intelligentizes not only robots but the entire space, enabling both to co-evolve through continuous interaction. Going forward, we will continue enhancing the model by incorporating operational data collected from real environments such as FTP. Through these efforts, Fujitsu aims to realize a new society where intelligent spaces and intelligent robots grow and operate together.

References

[1] Cosmos World Foundation Model Platform for Physical AI
[2] OpenAI Sora
[3] Mastering Diverse Domains through World Models
[4] Diffusion for World Modeling:Visual Details Matter in Atari
[5] 1x