1

Mitigating Trade-off: Stream and Query-guided Aggregation for Efficient and Effective 3D Occupancy Prediction

3D occupancy prediction has emerged as a key perception task for autonomous driving, as it reconstructs 3D environments to provide a comprehensive scene understanding. Recent studies focus on integrating spatiotemporal information obtained from past …

VITA-PAR: Visual and Textual Attribute Alignment with Attribute Prompting for Pedestrian Attribute Recognition

The Pedestrian Attribute Recognition (PAR) task aims to identify various detailed attributes of an individual, such as clothing, accessories, and gender. To enhance PAR performance, a model must capture features ranging from coarse-grained global …

Querying Labeled Time Series Data with Scenario Programs

Simulation-based testing has become a crucial complement to road testing for ensuring the safety of cyber-physical systems (CPS). As a result, significant research efforts have been directed toward identifying failure scenarios within simulation …

3D Occupancy Prediction with Low-Resolution Queries via Prototype-aware View Transformation

The resolution of voxel queries significantly influences the quality of view transformation in camera-based 3D occupancy prediction. However, computational constraints and the practical necessity for real-time deployment require smaller query …

GUIDE-CoT: Goal-driven and user-informed dynamic estimation for pedestrian trajectory using chain-of-thought

While Large Language Models (LLMs) have recently shown impressive results in reasoning tasks, their application to pedestrian trajectory prediction remains challenging due to two key limitations, insufficient use of visual information and the …

DiffExp: Efficient Exploration in Reward Fine-tuning for Text-to-Image Diffusion Models

Fine-tuning text-to-image diffusion models to maximize rewards has proven effective for enhancing model performance. However, reward fine-tuning methods often suffer from slow convergence due to online sample generation. Therefore, obtaining diverse …

ENTP: Encoder-only Next Token Prediction

Next-token prediction models have predominantly relied on decoder-only Transformers with causal attention, driven by the common belief that causal attention is essential to prevent “cheating” by masking future tokens. We challenge this widely …

InstructBooth: Instruction-following Personalized Text-to-Image Generation

Personalizing text-to-image models using a limited set of images for a specific object has been explored in subject-specific image generation. However, existing methods often face challenges in aligning with text prompts due to overfitting to the …

Sparse-to-Dense LiDAR Point Generation by LiDAR-Camera Fusion for 3D Object Detection

Accurately detecting objects at long distances remains a critical challenge in 3D object detection when relying solely on LiDAR sensors due to the inherent limitations of data sparsity. To address this issue, we propose the LiDAR-Camera Augmentation …

H-Direct: Homeostasis-aware Direct Spike Encoding for Deep Spiking Neural Networks

Deep spiking neural networks (SNNs), gaining attention as the next generation of artificial neural networks, have been successfully applied to many applications thanks to the development of various algorithms, such as spike encoding. Spike encoding …