Habitizing Diffusion Planning for Efficient and Effective Decision Making

Abstract

Diffusion models have shown great promise in decision-making, also known as diffusion planning. However, the slow inference speeds limit their potential for broader real-world applications. Here, we introduce Habi, a general framework that transforms powerful but slow diffusion planning models into fast decision-making models, which mimics the cognitive process in the brain that costly goal-directed behavior gradually transitions to efficient habitual behavior with repetitive practice. Even using a laptop CPU, the habitized model can achieve an average 800+ Hz decision-making frequency (faster than previous diffusion planners by orders of magnitude) on standard offline reinforcement learning benchmarks D4RL, while maintaining comparable or even higher performance compared to its corresponding diffusion planner. Our work proposes a fresh perspective of leveraging powerful diffusion models for real-world decision-making tasks. We also provide robust evaluations and analysis, offering insights from both biological and engineering perspectives for efficient and effective decision-making.

Habitual Inference (HI), a lightweight model generated by our framework Habi,
achieves an optimal balance between performance and speed.

Methods

The diagram of Habi. (a) During the Habitization (Training) stage, Habi learns to reconstruct actions from plans generated by a diffusion planner, with the decision spaces of habits (prior) and planning (posterior) aligned via KL divergence in the latent space. Trainable parts include Prior Encoder, Posterior Encoder, Decoder, and Critic. (b) During the Habitual Inference (HI) stage, only the lightweight prior encoder and latent decoder are required, enabling fast, high-quality habitual behaviors for decision-making.

Habi is easy-to-use without task-specific parameter tuning. Habi automatically balance reconstruction accuracy and decision space alignment to achieve almost optimal performance across all tasks.

Visualization

Action distribution of a diffusion planner, and its corresponding Habitual Inference policy generated by Habi, across different types of decision-making tasks. The actions are dimension-reduced by Principal Component Analysis (PCA) for better visualization.

MuJoCo

Franka Kitchen

Antmaze

Maze2D

We choose typical decision-making tasks, including locomotion, manipulation, and navigation for evaluation.

Experiments

Performance vs. Decision Frequency

Performance vs. Decision Frequency on a laptop CPU, across different decision-making tasks. Decision frequency (Hz) is measured on a laptop CPU (Apple M2, MacBook).

Performance Reference

Performance comparison on the D4RL benchmark. The reported values are Mean ± Standard Error over 500 episode seeds for robust testing. Frequencies are measured using a CPU (Apple M2 Max, MacBook laptop).