The recent success of the generative model shows that leveraging the multi-modal embedding space can manipulate an image using text information. However, manipulating an image with other sources rather than text, such as sound, is not easy due to the …
We introduce a motion forecasting (behavior prediction) method that meets the latency requirements for autonomous driving in dense urban environments without sacrificing accuracy. A whole-scene sparse input representation allows StopNet to scale to …
Behavior prediction remains one of the most challenging tasks in the autonomous vehicle (AV) software stack. Forecasting the future trajectories of nearby agents plays a critical role in ensuring road safety, as it equips AVs with the necessary …
There has been a long attempt to transfer the field of art such as painting to computer-based creation. In contrast to realism, non-photorealistic rendering (NPR) area, in particular, has focused on creating artificial style rendering for painting, …
Semantically meaningful image manipulation often involves laborious manual human examination for each desired manipulation. Recent success suggests that leveraging the representation power of existing Contrastive Language-Image Pretraining (CLIP) …
In general, an experimental environment for deep learning assumes that the training and the test dataset are sampled from the same distribution. However, in real-world situations, a difference in the distribution between two datasets, domain shift, …
Recent self-supervised representation learning methods depend on joint embedding learning with siamese-like networks, trained by maximizing the agreement of differently augmented same-class representations (positive pairs). Using positive pairs may …