3D occupancy prediction has emerged as a key perception task for autonomous driving, as it reconstructs 3D environments to provide a comprehensive scene understanding. Recent studies focus on integrating spatiotemporal information obtained from past …
The Pedestrian Attribute Recognition (PAR) task aims to identify various detailed attributes of an individual, such as clothing, accessories, and gender. To enhance PAR performance, a model must capture features ranging from coarse-grained global …
Simulation-based testing has become a crucial complement to road testing for ensuring the safety of cyber-physical systems (CPS). As a result, significant research efforts have been directed toward identifying failure scenarios within simulation …
The resolution of voxel queries significantly influences the quality of view transformation in camera-based 3D occupancy prediction. However, computational constraints and the practical necessity for real-time deployment require smaller query …
While Large Language Models (LLMs) have recently shown impressive results in reasoning tasks, their application to pedestrian trajectory prediction remains challenging due to two key limitations, insufficient use of visual information and the …
Fine-tuning text-to-image diffusion models to maximize rewards has proven effective for enhancing model performance. However, reward fine-tuning methods often suffer from slow convergence due to online sample generation. Therefore, obtaining diverse …
Next-token prediction models have predominantly relied on decoder-only Transformers with causal attention, driven by the common belief that causal attention is essential to prevent “cheating” by masking future tokens. We challenge this widely …
Personalizing text-to-image models using a limited set of images for a specific object has been explored in subject-specific image generation. However, existing methods often face challenges in aligning with text prompts due to overfitting to the …
Accurately detecting objects at long distances remains a critical challenge in 3D object detection when relying solely on LiDAR sensors due to the inherent limitations of data sparsity. To address this issue, we propose the LiDAR-Camera Augmentation …
Deep spiking neural networks (SNNs), gaining attention as the next generation of artificial neural networks, have been successfully applied to many applications thanks to the development of various algorithms, such as spike encoding. Spike encoding …