1

ORA3D Overlap Region Aware Multi-view 3D Object Detection

Current multi-view 3D object detection methods often fail to detect objects in the overlap region properly, and the networks' understanding of the scene is often limited to that of a monocular detection network. Moreover, objects in the overlap …

Zero-shot Visual Commonsense Immorality Prediction

Artificial intelligence is currently powering diverse real-world applications. These applications have shown promising performance, but raise complicated ethical issues, i.e. how to embed ethics to make AI applications behave morally. One way toward …

Bridging the Domain Gap towards Generalization in Automatic Colorization

We propose a novel automatic colorization technique that learns domain-invariance across multiple source domains and is able to leverage such invariance to colorize grayscale images in unseen target domains. This would be particularly useful for …

Grounding Visual Representations with Texts for Domain Generalization

Reducing the representational discrepancy between source and target domains is a key component to maximize the model generalization. In this work, we advocate for leveraging natural language supervision for the domain generalization task. We …

Sound-guided Semantic Video Generation

The recent success in StyleGAN demonstrates that pre-trained StyleGAN latent space is useful for realistic video generation. However, the generated motion in the video is usually not semantically meaningful due to the difficulty of determining the …

Zero-shot Visual Commonsense Immorality Prediction (Abstracted Version)

Artificial intelligence is currently powering diverse realworld applications. These applications have shown promising performance, but raise complicated ethical issues, i.e. how to embed ethics to make AI applications behave morally. One way toward …

Sound-Guided Semantic Image Manipulation

The recent success of the generative model shows that leveraging the multi-modal embedding space can manipulate an image using text information. However, manipulating an image with other sources rather than text, such as sound, is not easy due to the …

StopNet: Scalable Trajectory and Occupancy Prediction for Urban Autonomous Driving

We introduce a motion forecasting (behavior prediction) method that meets the latency requirements for autonomous driving in dense urban environments without sacrificing accuracy. A whole-scene sparse input representation allows StopNet to scale to …

A Scenario-Based Platform for Testing Autonomous Vehicle Behavior Prediction Models in Simulation

Behavior prediction remains one of the most challenging tasks in the autonomous vehicle (AV) software stack. Forecasting the future trajectories of nearby agents plays a critical role in ensuring road safety, as it equips AVs with the necessary …

Audio-Semantic Image Synthesis for Artistic Paintings

There has been a long attempt to transfer the field of art such as painting to computer-based creation. In contrast to realism, non-photorealistic rendering (NPR) area, in particular, has focused on creating artificial style rendering for painting, …