Lane topology extraction involves detecting lanes and traffic elements and determining their relationships, a key perception task for mapless autonomous driving. This task requires complex reasoning, such as determining whether it is possible to turn left into a specific lane. To address this challenge, we introduce neuro-symbolic methods powered by vision-language foundation models (VLMs). Existing approaches have notable limitations: (1) Dense visual prompting with VLMs can achieve strong performance but is costly in terms of both financial resources and carbon footprint, making it impractical for robotics applications. (2) Neuro-symbolic reasoning methods for 3D scene understanding fail to integrate visual inputs when synthesizing programs, making them ineffective in handling complex corner cases. To this end, we propose a fast-slow neuro-symbolic lane topology extraction algorithm, named Chameleon, which alternates between a fast system that directly reasons over detected instances using synthesized programs and a slow system that utilizes a VLM with a chain-of-thought design to handle corner cases. Chameleon leverages the strengths of both approaches, providing an affordable solution while maintaining high performance. We evaluate the method on the OpenLane-V2 dataset, showing consistent improvements across various baseline detectors. Our code, data, and models are publicly available.
@article{zhang2025chameleon,title={Chameleon: Fast-slow Neuro-symbolic Lane Topology Extraction},author={Zhang, Zongzheng and Li, Xinrun and Zou, Sizhe and Chi, Guoxuan and Li, Siqi and Qiu, Xuchong and Wang, Guoliang and Zheng, Guantian and Wang, Leichen and Zhao, Hang and others},journal={arXiv preprint arXiv:2503.07485},year={2025}}
IEEE/RSJ IROS
Delving into Mapping Uncertainty for Mapless Trajectory Prediction
Zongzheng
Zhang, Xuchong
Qiu, Boran
Zhang, and
7 more authors
In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2025
Recent advances in autonomous driving are moving towards mapless approaches, where High-Definition (HD) maps are generated online directly from sensor data, reducing the need for expensive labeling and maintenance. However, the reliability of these online-generated maps remains uncertain. While incorporating map uncertainty into downstream trajectory prediction tasks has shown potential for performance improvements, current strategies provide limited insights into the specific scenarios where this uncertainty is beneficial. In this work, we first analyze the driving scenarios in which mapping uncertainty has the greatest positive impact on trajectory prediction and identify a critical, previously overlooked factor: the agent’s kinematic state. Building on these insights, we propose a novel Proprioceptive Scenario Gating that adaptively integrates map uncertainty into trajectory prediction based on forecasts of the ego vehicle’s future kinematics. This lightweight, self-supervised approach enhances the synergy between online mapping and trajectory prediction, providing interpretability around where uncertainty is advantageous and outperforming previous integration methods. Additionally, we introduce a Covariance-based Map Uncertainty approach that better aligns with map geometry, further improving trajectory prediction. Extensive ablation studies confirm the effectiveness of our approach, achieving up to 23.6% improvement in mapless trajectory prediction performance over the state-of-the-art method using the real-world nuScenes driving dataset. Our code, data, and models are publicly available.
@article{zhang2025delving,title={Delving into Mapping Uncertainty for Mapless Trajectory Prediction},author={Zhang, Zongzheng and Qiu, Xuchong and Zhang, Boran and Zheng, Guantian and Gu, Xunjiang and Chi, Guoxuan and Gao, Huan-ang and Wang, Leichen and Liu, Ziming and Li, Xinrun and others},journal={arXiv preprint arXiv:2507.18498},year={2025}}
Along with AIGC shines in CV and NLP, its potential in the wireless domain has also emerged in recent years. Yet, existing RF-oriented generative solutions are ill-suited for generating high-quality, time-series RF data due to limited representation capabilities. In this work, inspired by the stellar achievements of the diffusion model in CV and NLP, we adapt it to the RF domain and propose RF-Diffusion. To accommodate the unique characteristics of RF signals, we first introduce a novel Time-Frequency Diffusion theory to enhance the original diffusion model, enabling it to tap into the information within the time, frequency, and complex-valued domains of RF signals. On this basis, we propose a Hierarchical Diffusion Transformer to translate the theory into a practical generative DNN through elaborated design spanning network architecture, functional block, and complex-valued operator, making RF-Diffusion a versatile solution to generate diverse, high-quality, and time-series RF data. Performance comparison with three prevalent generative models demonstrates the RF-Diffusion’s superior performance in synthesizing Wi-Fi and FMCW signals. We also showcase the versatility of RF-Diffusion in boosting Wi-Fi sensing systems and performing channel estimation in 5G networks.
@inproceedings{chi2024rf,title={RF-Diffusion: Radio Signal Generation via Time-Frequency Diffusion},author={Chi, Guoxuan and Yang, Zheng and Wu, Chenshu and Xu, Jingao and Gao, Yuchong and Liu, Yunhao and Han, Tony Xiao},booktitle={Proceedings of the 30th Annual International Conference on Mobile Computing and Networking},pages={77–92},year={2024}}
Recent years have witnessed an increasing demand for human fall detection systems. Among all existing methods, Wi-Fi-based fall detection has become one of the most promising solutions due to its pervasiveness. However, when applied to a new domain, existing Wi-Fi-based solutions suffer from severe performance degradation caused by low generalizability. In this paper, we propose XFall, a domain-adaptive fall detection system based on Wi-Fi. XFall overcomes the generalization problem from three aspects. To advance cross-environment sensing, XFall exploits an environment-independent feature called speed distribution profile, which is irrelevant to indoor layout and device deployment. To ensure sensitivity across all fall types, an attention-based encoder is designed to extract the general fall representation by associating both the spatial and temporal dimensions of the input. To train a large model with limited amounts of Wi-Fi data, we design a cross-modal learning framework, adopting a pre-trained visual model for supervision during the training process. We implement and evaluate XFall on one of the latest commercial wireless products through a year-long deployment in real-world settings. The result shows XFall achieves an overall accuracy of 96.8%, with a miss alarm rate of 3.1% and a false alarm rate of 3.3%, outperforming the state-of-the-art solutions in both in-domain and cross-domain evaluation.
@article{chi2024xfall,title={XFall: Domain Adaptive Wi-Fi-based Fall Detection with Cross-Modal Supervision},author={Chi, Guoxuan and Zhang, Guidong and Ding, Xuan and Ma, Qiang and Yang, Zheng and Du, Zhenguo and Xiao, Houfei and Liu, Zhuang},journal={IEEE Journal on Selected Areas in Communications},year={2024},volume={42},number={9},pages={2457-2471},publisher={IEEE}}
IEEE IoTJ
RF-Prox: Radio-Based Proximity Estimation of Non-Directly Connected Devices
Yuchong
Gao, GuoxuanChi†, Zheng
Yang, and
2 more authors
@article{zhao2024airecg,title={AirECG: Contactless Electrocardiogram for Cardiac Disease Monitoring via mmWave Sensing and Cross-domain Diffusion Model},author={Zhao, Langcheng and Lyu, Rui and Lei, Hang and Lin, Qi and Zhou, Anfu and Ma, Huadong and Wang, Jingjia and Meng, Xiangbin and Shao, Chunli and Tang, Yida and Chi, Guoxuan and Yang, Zheng},journal={Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies},volume={8},number={3},pages={1–27},year={2024},publisher={ACM New York, NY, USA}}
IEEE MSN
WiViD: Leveraging Wi-Fi and Vision for Depth Estimation via Multimodal Diffusion
Shijie
Cheng, Yuchong
Gao, Zheng
Yang, and
2 more authors
In Proceedings of the 20th IEEE International Conference on Mobility, Sensing and Networking (MSN), 2024
Depth estimation is crucial for numerous applications, including autonomous driving, robotic navigation and augmented reality. Existing solutions based on LiDAR and mm Wave technologies are constrained by high deployment costs, while those utilizing monocular vision suffer from limited accuracy. To address these challenges, this paper proposes WiViD, a diffusion-based depth estimation system that leverages commercial Wi-Fi and vision. Diffusion models, with their ability to iteratively refine predictions, offer significant advantages in producing accurate and detailed estimations. We introduce a Multimodal Conditional Diffusion (MMCD) mechanism and design two encoding modules: the Complex-Valued CSI Encoder (CCE) and the Residual Image Encoder (RIE). These components fully exploit the spatio-temporal information inherent in Wi-Fi CSI and enable the effective fusion of Wi-Fi CSI and RGB image data, which results in high-precision and robust depth estimation. Experimental results in real-world scenarios demonstrate that WiViD out-performs state-of-the-art (SOTA) monocular methods, reducing the Absolute Relative Error (ARE) by 67.2 %, highlighting the advantages of WiViD in terms of accuracy and reliability.
@inproceedings{cheng2024wivid,title={Wivid: Leveraging Wi-Fi and vision for depth estimation via multimodal diffusion},author={Cheng, Shijie and Gao, Yuchong and Yang, Zheng and Chi, Guoxuan and Han, Tony Xiao},booktitle={2024 20th International Conference on Mobility, Sensing and Networking (MSN)},pages={73–80},year={2024},organization={IEEE Computer Society}}
Existing device-free localization systems have achieved centimeter-level accuracy and show their potential in a wide range of applications. However, today’s radio-based solutions fail to locate the target in millimeter-level due to their limited bandwidth and sampling rate, which constrains their applications in high-accuracy demand scenarios. We find an opportunity to break the bottleneck of existing radio-based localization systems by reconstructing the accurate signal spectral peak from the discrete samples, without changing either the bandwidth or the sampling rate of the radio hardware. This study proposes milliLoc, a millimeter-level radio-based localization system. We first derive a spectral peak reconstruction algorithm to reduce the ranging error from the previous centimeter-level to millimeter-level. Then, we improve the AoA measurement accuracy by leveraging the signal amplitude information. To ensure the practicality of milliLoc, we further extend our system to handle multi-target situations. We fully implement milliLoc on a commercial mmWave radar. Experiments show that milliLoc achieves a median ranging accuracy of 5.5 mm and decreases the AoA measurement error by 31.2% compared with the baseline. Our system fulfills the accuracy requirements of most application scenarios and can be easily integrated with other existing solutions, shedding light on high-accuracy location-based applications.
@article{zhang2023push,title={Push the limit of millimeter-wave radar localization},author={Zhang, Guidong and Chi, Guoxuan and Zhang, Yi and Ding, Xuan and Yang, Zheng},journal={ACM Transactions on Sensor Networks},volume={19},number={3},pages={1–21},year={2023},publisher={ACM New York, NY}}
IEEE GLOBECOM
Wi-Prox: Proximity Estimation of Non-Directly Connected Devices via Sim2Real Transfer Learning
Yuchong
Gao*, GuoxuanChi*, Guidong
Zhang, and
1 more author
In GLOBECOM 2023 - 2023 IEEE Global Communications Conference, Apr 2023
@inproceedings{gao2023wi,title={Wi-Prox: Proximity Estimation of Non-Directly Connected Devices via Sim2Real Transfer Learning},author={Gao, Yuchong and Chi, Guoxuan and Zhang, Guidong and Yang, Zheng},booktitle={GLOBECOM 2023-2023 IEEE Global Communications Conference},pages={5629–5634},year={2023},organization={IEEE}}
计算机学报
Research on Long-Range Multi-Target Tracking Algorithm with Millimeter-Wave Device
Guidong
Zhang, Zheng
Yang†, Yi
Zhang, and
3 more authors
@article{zhang2023mmwave,title={Research on Long-Range Multi-Target Tracking Algorithm with Millimeter-Wave Device},author={Zhang, Guidong and Yang, Zheng and Zhang, Yi and <strong>Chi</strong>, <strong>Guoxuan</strong> and Ma, Qiang and Miao, Xin},journal={Chinese Journal of Computers (计算机学报)},volume={46},number={7},pages={1366-1382},year={2023},doi={10.11897/SP.J.1016.2023.01366},}
After years of boom, drones and their applications are now entering indoors. Six-degree-of-freedom (6-DoF) pose tracking is the core of drone flight control, but existing solutions cannot be directly applied to indoor scenarios due to insufficient accuracy, low robustness to adverse texture and light conditions, and signal obstruction in indoor scenarios. To overcome the above limitations, we propose Wi-Drone, a Wi-Fi standalone 6-DoF tracking system for indoor drone flight control. Wi-Drone takes full advantage of both exte-roceptive and proprioceptive measurements of Wi-Fi to estimate the drone’s absolute pose and relative motion, and fuse them in a tight-coupling manner to achieve their complementary benefits. We implement Wi-Drone and integrate it into a flight control system. The evaluation results show that Wi-Drone achieves a real-time performance with the average location accuracy of 26.1 cm and the rotation accuracy of 3.8°, which demonstrates its competency of flight control, compared to visual-inertial-based flight control. Such results also outperform existing Wi-Fi-based tracking solutions in terms of both dimensionality and accuracy.
@inproceedings{chi2022wi,title={Wi-drone: wi-fi-based 6-DoF tracking for indoor drone flight control},author={Chi, Guoxuan and Yang, Zheng and Xu, Jingao and Wu, Chenshu and Zhang, Jialin and Liang, Jianzhe and Liu, Yunhao},booktitle={Proceedings of the 20th annual international conference on mobile systems, applications and services},pages={56–68},year={2022}}
arXiv
Hands-on Wireless Sensing with Wi-Fi: A Tutorial
Zheng
Yang†, Yi
Zhang, GuoxuanChi, and
1 more author
@article{yang2022hands,title={Hands-on wireless sensing with Wi-Fi: A tutorial},author={Yang, Zheng and Zhang, Yi and Chi, Guoxuan and Zhang, Guidong},journal={arXiv preprint arXiv:2206.09532},year={2022}}
Indoor navigation is essential to a wide spectrum of applications in the era of mobile computing. Existing vision-based technologies suffer from both start-up costs and the absence of semantic information for navigation. We observe an opportunity to leverage pervasively deployed surveillance cameras to deal with the above drawbacks and revisit the problem of indoor navigation with a fresh perspective. In this paper, we propose iSAT, a system that enables public surveillance cameras, as indoor navigating satellites, to locate users on the floorplan, tell users with semantic information about the surrounding environment, and guide users with navigation instructions. However, enabling public cameras to navigate is non-trivial due to 3 factors: absence of real scale, disparity of camera perspective, and lack of semantic information. To overcome these challenges, iSAT leverages POI-assisted framework and adopts a novel coordinate transformation algorithm to associate public and mobile cameras, and further attaches semantic information to user location. Extensive experiments in 4 different scenarios show that iSAT achieves a localization accuracy of 0.48m and a navigation success rate of 90.5 percent, outperforming the state-of-th-art systems by >30% . Benefiting from our solution, all areas with public cameras can upgrade to smart spaces with visual navigation services.
@article{chi2021locate,title={Locate, Tell, and Guide: Enabling public cameras to navigate the public},author={Chi, Guoxuan and Xu, Jingao and Zhang, Jialin and Zhang, Qian and Ma, Qiang and Yang, Zheng},journal={IEEE Transactions on Mobile Computing},volume={22},number={2},pages={1010–1024},year={2021},publisher={IEEE}}
Existing smartphone-based Augmented Reality (AR) systems are able to render virtual effects on static anchors. However, today’s solutions lack the ability to render follow-up effects attached to moving anchors since they fail to track the 6 degrees of freedom (6-DoF) poses of them. We find an opportunity to accomplish the task by leveraging sensors capable of generating sparse point clouds on smartphones and fusing them with vision-based technologies. However, realizing this vision is non-trivial due to challenges in modeling radar error distributions and fusing heterogeneous sensor data. This study proposes FollowUpAR, a framework that integrates vision and sparse measurements to track object 6-DoF pose on smartphones. We derive a physical-level theoretical radar error distribution model based on an in-depth understanding of its hardware-level working principles and design a novel factor graph competent in fusing heterogeneous data. By doing so, FollowUpAR enables mobile devices to track anchor’s pose accurately. We implement FollowUpAR on commodity smartphones and validate its performance with 800,000 frames in a total duration of 15 hours. The results show that FollowUpAR achieves a remarkable rotation tracking accuracy of 2.3° with a translation accuracy of 2.9mm, outperforming most existing tracking systems and comparable to state-of-the-art learning-based solutions. FollowUpAR can be integrated into ARCore and enable smartphones to render follow-up AR effects to moving objects.
@inproceedings{xu2021followupar,title={FollowUpAR: Enabling follow-up effects in mobile AR applications},author={Xu, Jingao and Chi, Guoxuan and Yang, Zheng and Li, Danyang and Zhang, Qian and Ma, Qiang and Miao, Xin},booktitle={Proceedings of the 19th Annual International Conference on Mobile Systems, Applications, and Services},pages={1–13},year={2021}}
Smartphone localization is essential to a wide spectrum of applications in the era of mobile computing. The ubiquity of smartphone mobile cameras and surveillance ambient cameras holds promise for offering sub-meter accuracy localization services thanks to the maturity of computer vision techniques. In general, ambient-camera-based solutions are able to localize pedestrians in video frames at fine-grained, but the tracking performance under dynamic environments remains unreliable. On the contrary, mobile-camera-based solutions are capable of continuously tracking pedestrians; however, they usually involve constructing a large volume of image database, a labor-intensive overhead for practical deployment. We observe an opportunity of integrating these two most promising approaches to overcome above limitations and revisit the problem of smartphone localization with a fresh perspective. However, fusing mobile-camera-based and ambient-camera-based systems is non-trivial due to disparity of camera in terms of perspectives, parameters and incorrespondence of localization results. In this article, we propose iMAC, an integrated mobile cameras and ambient cameras based localization system that achieves sub-meter accuracy and enhanced robustness with zero-human start-up effort. The key innovation of iMAC is a well-designed fusing frame to eliminate disparity of cameras including a construction of projection map function to automatically calibrate ambient cameras, an instant crowd fingerprints model to describe user motion patterns, and a confidence-aware matching algorithm to associate results from two sub-systems. We fully implement iMAC on commodity smartphones and validate its performance in five different scenarios. The results show that iMAC achieves a remarkable localization accuracy of 0.68 m, outperforming the state-of-the-art systems by >75%.
@article{dong2021enabling,title={Enabling surveillance cameras to navigate},author={Dong, Liang and Xu, Jingao and Chi, Guoxuan and Li, Danyang and Zhang, Xinglin and Li, Jianbo and Ma, Qiang and Yang, Zheng},journal={ACM Transactions on Sensor Networks (TOSN)},volume={17},number={4},pages={1–20},year={2021},publisher={ACM New York, NY}}
2018
IEEE VTC
Latency-Optimal Task Offloading for Mobile-Edge Computing System in 5G Heterogeneous Networks
GuoxuanChi, Yumei
Wang†, Xiang
Liu, and
1 more author
@inproceedings{chi2018latency,title={Latency-optimal task offloading for mobile-edge computing system in 5G heterogeneous networks},author={Chi, Guoxuan and Wang, Yumei and Liu, Xiang and Qiu, Yiming},booktitle={2018 IEEE 87th Vehicular Technology Conference (VTC Spring)},pages={1–5},year={2018},organization={IEEE}}