1

Sound-Guided Semantic Image Manipulation

The recent success of the generative model shows that leveraging the multi-modal embedding space can manipulate an image using text information. However, manipulating an image with other sources rather than text, such as sound, is not easy due to the …

StopNet: Scalable Trajectory and Occupancy Prediction for Urban Autonomous Driving

We introduce a motion forecasting (behavior prediction) method that meets the latency requirements for autonomous driving in dense urban environments without sacrificing accuracy. A whole-scene sparse input representation allows StopNet to scale to …

A Scenario-Based Platform for Testing Autonomous Vehicle Behavior Prediction Models in Simulation

Behavior prediction remains one of the most challenging tasks in the autonomous vehicle (AV) software stack. Forecasting the future trajectories of nearby agents plays a critical role in ensuring road safety, as it equips AVs with the necessary …

Audio-Semantic Image Synthesis for Artistic Paintings

There has been a long attempt to transfer the field of art such as painting to computer-based creation. In contrast to realism, non-photorealistic rendering (NPR) area, in particular, has focused on creating artificial style rendering for painting, …

Sound-guided Semantic Image Manipulation

Semantically meaningful image manipulation often involves laborious manual human examination for each desired manipulation. Recent success suggests that leveraging the representation power of existing Contrastive Language-Image Pretraining (CLIP) …

SelfReg: Self-supervised Contrastive Regularization for Domain Generalization

In general, an experimental environment for deep learning assumes that the training and the test dataset are sampled from the same distribution. However, in real-world situations, a difference in the distribution between two datasets, domain shift, …

BMWReg: Brownian-diffusive, Multiview, Whitening Regulararizations for Self-supervised Learning

Recent self-supervised representation learning methods depend on joint embedding learning with siamese-like networks, trained by maximizing the agreement of differently augmented same-class representations (positive pairs). Using positive pairs may …