Introducing Lucy 2: SOTA realtime world transformation model

January 26, 2026

Introducing Lucy 2: SOTA realtime world transformation model

Today, we’re releasing Lucy 2.0 — a real-time world transformation model that shifts high-fidelity video editing from offline rendering to live interaction. This opens up an unlimited set of possibilities ranging from character swaps and product placement to high-fidelity real-time data augmentation and simulation for robotics.

Try Lucy 2.0 Now

Lucy 2.0 builds on the foundations of MirageLSD and Lucy Edit 1.0, but it is not simply a higher-quality video generator. It is a live system capable of transforming the visual world with high fidelity at 30fps 1080p resolution with near-zero latency. What you see is not a pre-rendered clip — it is a continuously generated scene that responds as it runs.

Lucy 2.0 allows character swaps, motion control, product placement, clothing changes, and full environment transformation, all guided by text prompts and reference images while the video is still streaming live.

— The Decart Team

‍

‍

Emergent Physics: No Depth Maps, No 3D, Just Learning

For real-time transformation to feel coherent and scale to a nearly infinite number of possible edits, the model must do more than apply visual effects — it must implicitly understand and model the structure of the world.

Lucy 2.0 does not rely on depth maps, meshes, or hybrid 3D pipelines. It is a pure diffusion model. The physical behavior you see emerges from learned visual dynamics, not from engineered geometry or explicit physics engines. This give the model abilities beyond those modeled by classical 3D simulations.

As a result:

- When a tarantula crawls across a hand, the model respects finger geometry and contact.
- When a jacket is unzipped, cloth separates, folds, and deforms naturally.
- When a helmet is removed, the model handles object separation and modeling the hair beneath.

Lucy 2.0 learned that actions like unzipping or removing imply topological change purely by observing how the world evolves in video — without ever being told what a zipper, a helmet, or a hand is.

‍

Solving Drift with Smart History Augmentation

Autoregressive video models typically degrade over time. Small artifacts compound frame by frame, eventually destabilizing identity, geometry, and texture.

Lucy 2.0 addresses this with a proprietary Smart History Augmentation method.

During training, the model is exposed to its own imperfect outputs and is explicitly penalized when quality drifts. This aligns the training distribution with real inference conditions, teaching the model to recognize implausible states and recover from them.

Instead of blindly following prior frames, Lucy 2.0 learns when to correct course — pulling generation back toward a stable, high-fidelity trajectory.

This allows Lucy 2.0 to run indefinitely. Streams can persist for hours without identity collapse or world degradation.

‍

‍

From Kernel to Network: The Real-Time Stack

Achieving real-time performance is not about a single optimization — it requires removing friction everywhere. The challenge originates from the low-latency requirement of processing the frames auto-regressively: traditional ML accelerators were designed primarily with throughput in mind, so the strict latency requirement increases the importance of a wide variety of overheads that are traditionally negligible in throughput-oriented workloads. Therefore, we combine the following key characteristics to achieve end-to-end real-time performance at scale:

- Mega-kernels to reduce launch overhead and memory movement evident in low-latency world models. This enables us to keep the model activations as close as possible to the tensor cores to avoid costly HBM memory transactions.
- Custom model architecture tailored to the underlying hardware characteristics of the accelerator. We perform microbenchmarks on the accelerator to provide a precise cycle-level model of the chip and we then tailor our model architectures according to those results.
- A custom WebRTC pipeline that minimizes buffering and transport latency for bidirectional video transmission to the accelerator and back. This is critical to ensure high quality worldwide without comprising on latency or frame rate.

Every stage — from packet arrival to matrix multiplication — was optimized with one goal: keeping glass-to-glass latency within real-time bounds without sacrificing visual stability.

Applications

Lucy 2.0 enables a wide range of real-time interactive applications. Because video remains editable and responsive while it is being generated, Lucy can be used for live character swaps, motion control, virtual try-ons, product placement, interactive media, and real-time content creation. In these settings, video is no longer a static artifact — it is a persistent, mutable stream that can be guided continuously through text prompts and reference inputs.

Beyond these applications, Lucy 2.0 was designed with robotics as a core use case.

Modern robots are bottlenecked not by model capacity, but by data. Collecting diverse, physically grounded interaction data in the real world is slow, expensive, and difficult to scale. While simulation helps, traditional simulators struggle to capture the long tail of real-world appearance, materials, lighting, and interaction dynamics.

Lucy 2.0 addresses this gap by acting as a real-time data augmentation and simulation engine. Because it can transform materials, lighting, environments, and object properties live — while preserving physical consistency — a single real-world demonstration can be expanded into thousands of plausible variations. The same manipulation can be replayed with different textures, object geometries, lighting conditions, backgrounds, or environmental dynamics, without re-collecting data.

This enables the training of more robust visual-language-action (VLA) and imitation learning policies. Robots trained on Lucy-augmented data are exposed to a broader distribution of appearances and conditions, improving generalization and reducing sensitivity to spurious visual cues.

‍