![](https://crypto4nerd.com/wp-content/uploads/2024/03/1VUqxKAQL4jDRNzu0CVR4ug.png)
Generative modeling of complex behaviors from labeled datasets has long been a significant challenge in decision-making. This entails modeling actions — continuous-valued vectors that exhibit multimodal distributions, often sourced from uncurated data. Errors in generation can compound, especially in sequential prediction scenarios.
To address this challenge, in a new paper Behavior Generation with Latent Actions, a research team from Seoul National University, New York University and Artificial Intelligence Institute of SNU introduces the Vector-Quantized Behavior Transformer (VQ-BeT). This innovative model offers a solution for behavior generation, addressing multimodal action prediction, conditional generation, and partial observations. VQ-BeT not only demonstrates enhanced capability in capturing diverse behavior modes but also accelerates inference speed by a factor of 5 compared to Diffusion Policies.
VQ-BeT’s versatility makes it suitable for both conditional and unconditional generation tasks, with applications spanning simulated manipulation, autonomous driving, and real-world robotics. The model comprises two key stages: the Action Discretization phase and the VQ-BeT Learning phase. In the former, a Residual Vector-Quantized Variational Autoencoder (Residual VQ-VAE) is employed to learn a scalable action discretizer, crucial for dealing with the complexity of real-world action spaces. The latter phase involves training a GPT-like transformer architecture to model the probability distribution of actions or action sequences from observations.
In their empirical study, the team conducted experiments across eight benchmark environments, yielding several notable insights: