自动驾驶BEV感知技术发展趋势与探索-专题论坛-PRCV-中国模式识别与计算机视觉大会

点击论坛跳转到相应位置

自动驾驶BEV感知技术发展趋势与探索

论坛简介、目的与意义

随着自动驾驶硬件配置越来越复杂，如何将多传感器信息统一在相同视角下，优化特征，提高多目、多传感器的检测性能，变得尤为重要。2022年，我们看到自动驾驶各种国际知名榜单上（nuScenes, Waymo, argoverse）的算法频繁更新，多个工作都提到了BEV感知算法框架。上海人工智能实验室自动驾驶团队凭借BEVFormer++模型获得2022 Waymo比赛冠军，相关工作已被ECCV 2022接受。

最近我们在筹备CVPR 2023 workshop准备工作，筹集了相关国际知名团队一块组局，相信在PRCV 2022 上提前举办相关论坛会进一步夯实我们在这方面的积累。

论坛日程

论坛嘉宾

梁小丹 报告嘉宾

中山大学副教授

嘉宾简介：Xiaodan Liang is currently an Associate Professor at Sun Yat-sen University. She was a Project Scientist at Carnegie Mellon University, working with Prof. Eric Xing (IEEE/AAAI Fellow). She graduated the Ph.D degree in Computer Science in 2016. She focuses on interpretable and cognitive intelligence and its applications on large-scale visual recognition, cross-modal analysis and understanding and digital human analysis. She has published over 100 cutting-edge papers which have appeared in the most prestigious journals (e.g. TPAMI) and conferences (e.g. CVPR/ICCV/ECCV/Neurips) in the field, Google Citation 15000+. She serves as an Area Chair of ICCV 2019, WACV 2020, Neurips 2021-2022, CVPR 2020 and Tutorial Chair (Organization committee) of CVPR 2021 and Ombud Committee of CVPR 2023. She also serves as the Associate Editor of Neural Network Journal (Impact Factor >8). She has been awarded the ACM China (only 2 in China) and CCF Best Doctoral Dissertation Award, the Alibaba DAMO Academy Young Fellow (Top10 under 35 in China), and the ACL 2019 Best Demo paper nomination. She is named one of the young innovators 30 under 30 by Forbes (China). She is a senior member of IEEE.

报告题目：Unified Autonomous Driving via Multi-modality Multi-task Learning

报告摘要：Aiming towards a holistic understanding of multiple downstream tasks simultaneously, there is a need for extracting features with better transferability. Though many latest self-supervised pre-training methods have achieved impressive performance on various vision tasks under the prevailing pretrain-finetune paradigm, their generalization capacity to multi-task learning scenarios is yet to be explored. Here we present a simple yet effective pretrain-adapt-finetune paradigm for general multi-task training, where the off-the-shelf pretrained models can be effectively adapted without increasing the training overhead. Besides, we propose a novel adapter named LV-Adapter, which incorporates language priors in the multi-task model via task-specific prompting and alignment between visual and textual features. Moreover, we collect a series of real-world cases with noisy data distribution for developing 3D detection fusion methods and systematically formulate a robustness benchmark toolkit that can simulate these cases on any clean dataset, which has the camera and LiDAR input modality. Finally the future of how to develop efficient multi-modality multi-task learning paradigm is discussed.

周博磊 报告嘉宾

UCLA助理教授

嘉宾简介：Bolei Zhou is an Assistant Professor in the Computer Science Department at the University of California, Los Angeles (UCLA). He earned his Ph.D. from MIT in 2018. He has been a faculty member at The Chinese University of Hong Kong (CUHK) for the past 3 years. His research interest lies at the intersection of computer vision and machine autonomy, focusing on enabling interpretable and trustworthy human-AI interaction. He has developed many widely used interpretation methods such as CAM and Network Dissection, as well as computer vision benchmarks Places and ADE20K. He has been area chair for CVPR, ECCV, ICCV, and AAAI. He received MIT Tech Review's Innovators under 35 in Asia-Pacific Award. More about his research is at https://boleizhou.github.io/.

报告题目：Toward Generalizable Embodied AI in Machine Autonomy

报告摘要：Embodied AI as an emerging research topic has been studied in various visuomotor tasks such as indoor navigation and autonomous driving. Most embodied AI studies are conducted in fixed simulation environments, where the generalizability and safety of the autonomous agents in unseen complex scenes remain questionable. In this talk, I will introduce the research work in my lab for building three pillars to facilitate generalizable embodied AI for machine autonomy: the training data/environment, the representation, and the learning pipeline. First, I will introduce our effort in building the MetaDrive driving simulator by incorporating the capability of importing real-world scenarios and learning to generate novel ones. Then I will talk about learning generalizable representations for decision-making from hours of uncurated YouTube driving videos. Finally, I will discuss how our work on Human-in-the-loop learning brings safe training and inference for human-AI shared control.

许春景 报告嘉宾

华为中央研究院诺亚方舟计算视觉实验室主任

嘉宾简介：Chunjing Xu, received Bachelor degree on Math from Wuhan University 1999, Master degree on Math from Peking University 2002, and PhD from Chinese University of Hong Kong 2009. He was Assistant Professor and then Associate Professor at Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences. He joined Huawei since April, 2012 as advanced research engineer, then principal research engineer in Media lab. He became director of computer vision lab in Noah's Ark lab, Central research institute. His main research interests focus on machine learning and computer vision. He has about 40 research papers published on top tier conferences and Journals such as TPAMI, CVPR, ICCV, IJCAI, NeurIPS, AAAI, ICML.

报告题目：BEV感知实践以及未来可能的感知架构

报告摘要：基于BEV的感知架构逐步得工业界的认可，并到非常广泛的应用到工程实践中，但是在应用中，对不同区域可变分辨率的要求，动静目标的融合，对更多细节线索的感知等问题，对现有架构提出了非常大的挑战，基于这些挑战，我们认为需要探讨更新的更具有任务弹性的架构，来满足自动驾驶感知的诉求。我们会提出一些可能的长期研究和探索的路径，以供参考。

刘子纬 报告嘉宾

南洋理工大学助理教授

嘉宾简介：Prof. Ziwei Liu is currently an Assistant Professor at Nanyang Technological University, Singapore. Previously, he was a senior research fellow at the Chinese University of Hong Kong and a postdoctoral researcher at University of California, Berkeley. Ziwei received his PhD from the Chinese University of Hong Kong. His research revolves around computer vision, machine learning and computer graphics. He has published extensively on top-tier conferences and journals in relevant fields, including CVPR, ICCV, ECCV, NeurIPS, ICLR, ICML, TPAMI, TOG and Nature - Machine Intelligence. He is the recipient of Microsoft Young Fellowship, Hong Kong PhD Fellowship, ICCV Young Researcher Award and HKSTP Best Paper Award. He also serves as an Area Chair of ICCV, NeurIPS and ICLR.

报告题目：Robust and Data-Efficient 3D Perception

报告摘要：Perceiving the underlying 3D world behind RGB and LiDAR sensors has been a long-pursuing goal of computer vision, with extensive real-life applications. It is at the core of embodied intelligence. In this talk, I will discuss our work in robust and data-efficient 3D perception, with an emphasis on learning structural deep representations under incomplete inputs or supervisions. I will also discuss the challenges related to naturally-distributed data (e.g. long-tailed and open-ended) emerged from real-world sensors, and how we can overcome these challenges by incorporating new neural computing mechanisms such as dynamic memory and routing. Our approach has shown its effectiveness and generalizability on a wide range of tasks.

李弘扬 主持人

上海人工智能实验室青年科学家

嘉宾简介：李弘扬博士，上海人工智能实验室青年科学家。研究方向为通用视觉下游应用研发、自动驾驶感知与决策算法研发等。香港中文大学博士学位。以第一作者身份完成的相关成果，发表于相关国际会议如CVPR/ICCV/NeurIPS/ICML等，累计引用率1400余次，专利授权10余项。2021年至今，担任清华大学研究生课程高等计算机视觉主讲人。带领团队斩获自动驾驶国际挑战赛Waymo Open Challenge 2022第一名，在纯视觉、激光雷达等赛道上取得国际领先地位，提出的BEVFormer工作为自动驾驶量产落地提供了实际解决方案。

会议程序

点击论坛跳转到相应位置

自动驾驶BEV感知技术发展趋势与探索

会议程序

主办单位

承办单位

联合承办