Fast Tracks to Diverse Behaviors: VQ-BeT Achieves 5x Speed Surge Compared to Diffusion Policies | by Synced | SyncedReview

Fast Tracks to Diverse Behaviors: VQ-BeT Achieves 5x Speed Surge Compared to Diffusion Policies | by Synced | SyncedReview | Mar, 2024

The Tech Guy March 11, 2024 1 min read

Generative modeling of complex behaviors from labeled datasets has long been a significant challenge in decision-making. This entails modeling actions — continuous-valued vectors that exhibit multimodal distributions, often sourced from uncurated data. Errors in generation can compound, especially in sequential prediction scenarios.

To address this challenge, in a new paper Behavior Generation with Latent Actions, a research team from Seoul National University, New York University and Artificial Intelligence Institute of SNU introduces the Vector-Quantized Behavior Transformer (VQ-BeT). This innovative model offers a solution for behavior generation, addressing multimodal action prediction, conditional generation, and partial observations. VQ-BeT not only demonstrates enhanced capability in capturing diverse behavior modes but also accelerates inference speed by a factor of 5 compared to Diffusion Policies.

VQ-BeT’s versatility makes it suitable for both conditional and unconditional generation tasks, with applications spanning simulated manipulation, autonomous driving, and real-world robotics. The model comprises two key stages: the Action Discretization phase and the VQ-BeT Learning phase. In the former, a Residual Vector-Quantized Variational Autoencoder (Residual VQ-VAE) is employed to learn a scalable action discretizer, crucial for dealing with the complexity of real-world action spaces. The latter phase involves training a GPT-like transformer architecture to model the probability distribution of actions or action sequences from observations.