1

Just Add $100 More, Augmenting Pseudo-LiDAR Point Cloud for Resolving Class-imbalance Problem

Typical LiDAR-based 3D object detection models are trained with real-world data collection, which is often imbalanced over classes. To deal with it, augmentation techniques are commonly used, such as copying ground truth LiDAR points and pasting them …

Unified Domain Generalization and Adaptation for Multi-View 3D Object Detection

Recent advances in 3D object detection leveraging multi-view cameras have demonstrated their practical and economical value in various challenging vision tasks. However, typical supervised learning approaches face challenges in achieving satisfactory …

Bridging the Domain Gap by Clustering-based Image-Text Graph Matching

Learning domain-invariant representations is important to train a model that can generalize well to unseen target task domains. Text descriptions inherently contain semantic structures of concepts and such auxiliary semantic cues can be used as …

Text-Driven Prototype Learning for Few-Shot Class-Incremental Learning

Few-shot class-incremental learning (FSCIL) aims to learn generalizable representations with large amounts of initial data and incrementally adapt to new classes with limited data (i.e., few-shot). Recently, prototype-based approaches have shown …

Who Should Have Been Focused: Transferring Attention-based Knowledge from Future Observations for Trajectory Prediction

Accurately predicting the trajectories of dynamic agents is crucial for the safe navigation of autonomous robotics. However, achieving precise predictions based solely on past and current observations is challenging due to the inherent uncertainty in …

Leveraging Inductive Bias in ViT for Medical Image Diagnosis

Recent advances in attention-based models have raised expectations for an automated diagnosis application in computer vision due to their high performance. However, attention-based models tend to lack some of the inherent assumptions for images, …

Mitigating the Linguistic Gap with Phonemic Representations for Robust Multilingual Language Understanding

Approaches to improving multilingual language understanding often struggle with significant performance gaps between high-resource and low-resource languages. While there are efforts to align the languages in a single latent space to mitigate such …

Zero-Shot Cross-Lingual NER Using Phonemic Representations for Low-Resource Languages

Finetuning Pre-trained Model with Limited Data for LiDAR-based 3D Object Detection by Bridging Domain Gaps

LiDAR-based 3D object detectors have been largely utilized in various applications, including autonomous vehicles or mobile robots. However, LiDAR-based detectors often fail to adapt well to target domains with different sensor configurations (e.g., …

Enhanced Motion Forecasting with Visual Relation Reasoning

In this work, we emphasize and demonstrate the importance of visual relation learning for motion forecasting task in autonomous driving (AD). Since exploiting the benefits of RGB images in the existing vision-based joint perception and prediction …