裴丹副教授
个人介绍

姓名:裴丹

职称:长聘副教授

电话:010-62783690

邮件: peidan@tsinghua.edu.cn

教育背景

工学学士 (计算机科学与技术), 清华大学, 中国, 1997

工学硕士 (计算机科学与技术), 清华大学, 中国, 2000

博士 (计算机科学), 加州大学洛杉矶分校, 美国, 2005

研究领域

智能运维(AIOps)

时间序列智能(Time Series Intelligence)


研究概况

       裴丹,博士,清华大学计算机系长聘副教授、博士生导师。获得电子学会科技进步一等奖。在智能运维领域发表了200余篇学术论文和30多项专利授权,Google学术引用过万次。CCF AIOps算法挑战赛的创办人和CCF OpenAIOps社区发起人。计算机网络领域旗舰期刊IEEE/ACM Transactions on Networking 编委,并曾担任IEEE计算机网络领域旗舰会议ICNP 2022的技术程序委员会主席。

奖励与荣誉

中国电子学会科技进步一等奖(2023)

IEEE ISSRE Best Research Paper (2023/2018)

高校计算机专业优秀教师奖励计划 (2021)

“大川基金助研奖” (2018)

ACM Senior Member (2011)

IEEE Senior Member (2011)

ICDCS Best Research Paper (2006)

加州大学洛杉矶分校计算机系最佳博士论文 (2005)

社会兼职

IEEE ICNP 2022 TPC Co-Chair

IEEEE/ACM Transactions on Networking (TON) Associate Editor

ACM PACMNET Editor

TPC for NSDI, CoNEXT ,IMC, SIGMETRICS, ICNP, WWW, INFOCOM

中国计算机学会 互联网专委会 委员

中国计算机学会 软件工程专委会 委员

中国计算机学会 服务计算专委会 委员

中国通信学会 人工智能技术与应用委员会 委员

学术成果

Zhe Xie, Shenglin Zhang, Yitong Geng, Yao Zhang, Minghua Ma, Xiaohui Nie, Zhenhe Yao&, Longlong Xu&, Yongqian Sun, Wentao Li, Dan Pei. Microservice Root Cause Analysis With Limited Observability Through Intervention Recognition in the Latent Space. KDD 2024. Barcelona, Spain, Aug. 25–29 2024.

Zhaoyang Yu, Changhua Pei, Xin Wang, Minghua Ma, Chetan Bansal, Saravan Rajmohan, Qingwei Lin, Dongmei Zhang, Xidao Wen, Jianhui li, Gaogang Xie, Dan Pei. Pre-trained KPI Anomaly Detection Model Based on Disentangled Transformer. KDD 2024. Barcelona, Spain, Aug. 25–29 2024.

Zhenhe Yao, Changhua Pei, Wenxiao Chen, Hanzhang Wang, Liangfei Su, Huai Jiang, Zhe Xie, Xiaohui Nie, Dan Pei. Chain-of-Event: Interpretable Root Cause Analysis for Microservices through Automatically Learning Weighted Event Causal Graph. FSE 2024, industry track. Ipojuca (Pernambuco), Brazil, Mon 15–Fri 19 July 2024.

Zhaoyang Yu, Minghua Ma, Chaoyun Zhang, Si Qin, Yu Kang, Chetan Bansal, Saravan Rajmohan, Yingnong Dang, Changhua Pei, Dan Pei, Qingwei Lin, Dongmei Zhang. MonitorAssistant: Simplifying Cloud Service Monitoring via Large Language Models. FSE 2024, industry track. Ipojuca (Pernambuco), Brazil, Mon 15–Fri 19 July 2024.

Zhaoyang Yu, Shenglin Zhang, Mingze Sun, Yingke Li, Yankai Zhao, Xiaolei Hua, Lin Zhu, Xidao Wen, Dan Pei. Supervised Fine-Tuning for Unsupervised KPI Anomaly Detection for Mobile Web Systems. WWW 2024, Singapore, May 2024.

Zhe Xie, Changhua Pei, Wanxue Li, Huai Jiang, Liangfei Su, Jianhui Li, Gaogang Xie, and Dan Pei. From Point-Wise to Group-Wise: A Fast and Accurate Microservice Trace Anomaly Detection Approach. ESEC/FSE 2023, San Francisco CA USA, November 2023.

Zhaoyang Yu, Changhua Pei, Shenglin Zhang, Xidao Wen, Jianhui Li, Gaogang Xie, and Dan Pei. AutoKAD: Empowering KPI Anomaly Detection with Label-Free Deployment. ISSRE 2023, Florence, Italy, October 2023. (Best Paper)

Chunhui Shen, Qianyu Ouyang, Feibo Li, Zhipeng Liu, Longcheng Zhu, Yujie Zou, Qing Su, Tianhuan Yu, Yi Yi, Jianhong Hu, Cen Zheng, Bo Wen, Hanbang Zheng, Lunfan Xu, Sicheng Pan, Bin Wu, Xiao He, Ye Li, Jian Tan, Sheng Wang, Dan Pei, Feifei Li. Lindorm TSDB: A Cloud-Native Time-Series Database for Large-Scale Monitoring Systems. VLDB 2023. August 2023.

Zhe Xie, Haowen Xu, Wenxiao Chen, Wanxue Li, Huai Jiang, Liangfei Su, Hanzhang Wang, and Dan Pei. Unsupervised Anomaly Detection on Microservice Traces Through Graph VAE. WWW 2023, Austin, Texas, USA. April 2023.

Qingyang Yu, Changhua Pei, Bowen Hao, Mingjie Li, Zeyan Li, Shenglin Zhang, Xianglin Lu, Rui Wang, Jiaqi Li, and Zhenyu Wu. CMDiagnostor: An Ambiguity-Aware Root Cause Localization Approach Based on Call Metric Data. WWW 2023, Austin, Texas, USA. April 2023.

Zeyan Li, Nengwen Zhao, Mingjie Li, Xianglin Lu, Lixin Wang, Dongdong Chang, Xiaohui Nie, Li Cao, Wenchi Zhang, Kaixin Sui, Yanhua Wang, Xu Du, Guoqiang Duan, and Dan Pei. Actionable and Interpretable Fault Localization for Recurring Failures in Online Service Systems. ESEC/FSE 2022, Singapore, November 2022.

Zhihan Li, Youjian Zhao, Yitong Geng, Zhanxiang Zhao, Hanzhang Wang, Wenxiao Chen, Huai Jiang, Amber Vaidya, Liangfei Su, and Dan Pei. Situation-Aware Multivariate Time Series Anomaly Detection Through Active Learning and Contrast VAE-Based Models in Large Distributed Systems. IEEE Journal on Selected Areas in Communications (JSAC). 2022.

Mingjie Li, Zeyan Li, Kanglin Yin, Xiaohui Nie, Wenchi Zhang, Kaixin Sui, and Dan Pei. Causal Inference-Based Root Cause Analysis for Online Service Systems with Intervention Recognition. SIGKDD 2022, Washington DC USA, August 2022.

Zhihan Li, Youjian Zhao, Jiaqi Han, Ya Su, Rui Jiao, Xidao Wen, and Dan Pei. Multivariate Time Series Anomaly Detection and Interpretation Using Hierarchical Inter-Metric and Temporal Embedding. SIGKDD 2021, Singapore, August 2021.

Nengwen Zhao, Junjie Chen, Zhaoyang Yu, Honglin Wang, Jiesong Li, Bin Qiu, Hongyu Xu, Wenchi Zhang, Kaixin Sui, and Dan Pei. Identifying Bad Software Changes Via Multimodal Anomaly Detection for Online Service Systems. ESEC/FSE 2021, Athens Greece, August 2021.

Nengwen Zhao, Honglin Wang, Zeyan Li, Xiao Peng, Gang Wang, Zhu Pan, Yong Wu, Zhen Feng, Xidao Wen, Wenchi Zhang, Kaixin Sui, and Dan Pei. An Empirical Investigation of Practical Log Anomaly Detection for Online Service Systems. ESEC/FSE 2021, Athens Greece, August 2021.

Minghua Ma, Shenglin Zhang, Junjie Chen, Jim Xu, Haozhe Li, Yongliang Lin, Xiaohui Nie, Bo Zhou, Yong Wang, and Dan Pei. Jump-Starting Multivariate Time Series Anomaly Detection for Online Service Systems. USENIX ATC 2021, July 2021.

Nengwen Zhao, Junjie Chen, Zhou Wang, Xiao Peng, Gang Wang, Yong Wu, Fang Zhou, Zhen Feng, Xiaohui Nie, Wenchi Zhang, Kaixin Sui, and Dan Pei. Real-Time Incident Prediction for Online Service Systems. ESEC/FSE 2020, November 2020.

Nengwen Zhao, Junjie Chen, Xiao Peng, Honglin Wang, Xinya Wu, Yuanzong Zhang, Zikai Chen, Xiangzhong Zheng, Xiaohui Nie, Gang Wang, Yong Wu, Fang Zhou, Wenchi Zhang, Kaixin Sui, and Dan Pei. Understanding and Handling Alert Storm for Online Service Systems. ICSE SEIP 2020, Seoul, South Korea, June 2020.

Minghua Ma, Zheng Yin, Shenglin Zhang, Sheng Wang, Christopher Zheng, Xinhao Jiang, Hanwen Hu, Cheng Luo, Yilin Li, Nengjun Qiu, Feifei Li, Changcheng Chen, and Dan Pei. Diagnosing Root Causes of Intermittent Slow Queries in Cloud Databases. VLDB 2020, Tokyo, Japan, April 2020.

Ruming Tang, Zheng Yang, Zeyan Li, Weibin Meng, Haixin Wang, Qi Li, Yongqian Sun, Dan Pei, Tao Wei, and Yanfei Xu. Zerowall: Detecting Zero-Day Web Attacks Through Encoder-Decoder Recurrent Neural Networks. INFOCOM 2020, Beijing, China, April 2020.

Nengwen Zhao, Panshi Jin, Lixin Wang, Xiaoqin Yang, Rong Liu, Wenchi Zhang, Kaixin Sui, and Dan Pei. Automatically and Adaptively Identifying Severe Alerts for Online Service Systems. INFOCOM 2020, Beijing, China, April 2020.

Xiaohui Nie, Youjian Zhao, Zhihan Li, Guo Chen, Kaixin Sui, Jiyang Zhang, Zijie Ye, and Dan Pei. Dynamic TCP Initial Windows and Congestion Control Schemes Through Reinforcement Learning. IEEE Journal on Selected Areas in Communications (JSAC), Artificial Intelligence and Machine Learning for Networking and Communications. 2019.

Weibin Meng, Ying Liu, Yichen Zhu, Shenglin Zhang, Dan Pei, Yuqing Liu, Yihao Chen, Ruizhi Zhang, Shimin Tao, and Pei Sun. LogAnomaly: Unsupervised Detection of Sequential and Quantitative Anomalies in Unstructured Logs. IJCAI 2019, Macao, China, Auguest 2019.

Ya Su, Youjian Zhao, Chenhao Niu, Rong Liu, Wei Sun, and Dan Pei. Robust Anomaly Detection for Multivariate Time Series Through Stochastic Recurrent Neural Network. SIGKDD 2019, Anchorage AK USA, July 2019.

Wenxiao Chen, Haowen Xu, Zeyan Li, Dan Pei, Jie Chen, Honglin Qiao, Yang Feng, and Zhaogang Wang. Unsupervised Anomaly Detection for Intricate KPIs Via Adversarial Training of VAE. INFOCOM 2019, Paris, France, April 2019.

Haowen Xu, Yang Feng, Jie Chen, Zhaogang Wang, Honglin Qiao, Wenxiao Chen, Nengwen Zhao, Zeyan Li, Jiahao Bu, Zhihan Li, Ying Liu, Youjian Zhao, and Dan Pei. Unsupervised Anomaly Detection Via Variational Auto-Encoder for Seasonal KPIs in Web Applications. WWW 2018, Lyon, France, 2018.

Minghua Ma, Shenglin Zhang, Dan Pei, Xin Huang, and Hongwei Dai. Robust and Rapid Adaption for Concept Drift in Software System Anomaly Detection. ISSRE 2018, Memphis, TN, USA, October 2018. (Best Paper)

Guo Chen, Yuanwei Lu, Yuan Meng, Bojie Li, Kun Tan, Dan Pei, Peng Cheng, Layong Larry Luo, Yongqiang Xiong, and Xiaoliang Wang. Fast and Cautious: Leveraging Multi-Path Diversity for Transport Loss Recovery in Data Centers. USENIX ATC 2016, Denver, CO, USA, June 2016.

Changhua Pei, Zhi Wang, Youjian Zhao, Zihan Wang, Yuan Meng, Dan Pei, Yuanquan Peng, Wenliang Tang, and Xiaodong Qu. Why It Takes So Long to Connect to a WIFI Access Point. INFOCOM 2017, Atlanta, Georgia, USA, May 2017.

Kaixin Sui, Mengyu Zhou, Dapeng Liu, Minghua Ma, Dan Pei, Youjian Zhao, Zimu Li, and Thomas Moscibroda. Characterizing and Improving WiFi Latency in Large-Scale Operational Networks. MobiSys 2016, Singapore, June 2016.

Dapeng Liu, Youjian Zhao, Kaixin Sui, Lei Zou, Dan Pei, Qingqian Tao, Xiyang Chen, and Dai Tan. Focus: Shedding Light on the High Search Response Time in the Wild. INFOCOM 2016, San Francisco, CA, USA, Apr 2016.

Changhua Pei, Youjian Zhao, Guo Chen, Ruming Tang, Yuan Meng, Minghua Ma, Ken Ling, and Dan Pei. WiFi Can Be the Weakest Link of Round Trip Network Latency in the Wild. INFOCOM 2016, San Francisco, CA, USA, Apr 2016.

Dapeng Liu, Youjian Zhao, Haowen Xu, Yongqian Sun, Dan Pei, Jiao Luo, Xiaowei Jing, and Mei Feng. Opprentice: Towards Practical and Automatic Anomaly Detection Through Machine Learning. ACM IMC 2015, Tokyo Japan, October 2015.

Kaixin Sui, Youjian Zhao, Dan Pei, and Li Zimu. How Bad Are the Rogues’ Impact on Enterprise 802.11 Network Performance? INFOCOM 2015, Hongkong, China, April 27-30, 2015.

He Yan, Lee Breslau, Zihui Ge, Dan Massey, Dan Pei, and Jennifer Yates. G-RCA: A Generic Root Cause Analysis Platform for Service Quality Management in Large IP Networks. ACM/IEEE Transactions on Networking, 20(6), 1734–1747. December 2012.

Suk-Bok Lee, Dan Pei, Mohammad Taghi Hajiaghayi, Ioannis Pefkianakis, Songwu Lu, He Yan, Zihui Ge, Jennifer Yates, and Mario Kosseifi. Threshold Compression for 3g Scalable Monitoring. INFOCOM 2012, Orlando, FL, April 2012.

Tongqing Qiu, Zihui Ge, Dan Pei, Jia Wang, and Jun Xu. What Happened in My Network: Mining Network Events from Router Syslogs. ACM IMC 2010, Melbourne Australia, November 2010.

Tongqing Qiu, Lusheng Ji, Dan Pei, Jia Wang, and Jun Xu. Towerdefense: Deployment Strategies for Battling Against Ip Prefix Hijacking. ICNP 2010, Kyoto, Japan, Oct 5-8, 2010.

Kai Chen, David R. Choffnes, Rahul Potharaju, Yan Chen, Fabian E. Bustamante, Dan Pei, and Yao Zhao. Where the Sidewalk Ends: Extending the Internet as Graph Using Traceroutes from P2P Users. ACM CoNEXT 2009, 217–228. Rome Italy, December 2009.

Tongqing Qiu, Lusheng Ji, Dan Pei, Jia Wang, Jun (Jim) Xu, and Hitesh Ballani. Locating Prefix Hijackers Using LOCK. USENIX Security 2009, Montreal, Canada, August 2009.

Ricardo Oliveira, Beichuan Zhang, Dan Pei, Izhak-Ratzin, Lixia Zhang. Quantifying Path Exploration in the Internet. ACM/IEEE Transactions on Networking, June 2009.

Ricardo Oliveira, Dan Pei, Walter Willinger, Beichuan Zhang, and Lixia Zhang. The (in)completeness of the Observed Internet AS-Level Structure. ACM/IEEE Transactions on Networking, 18(1), 109–122. June 2009.

Jeffrey Erman, Alexandre Gerber, Mohammad T. Hajiaghayi, Dan Pei, and Oliver Spatscheck. Network-Aware Forward Caching. WWW 2009, 291–300. Madrid Spain, April 2009.

Yao Zhao, Zhaosheng Zhu, Yan Chen, Dan Pei, and Jia Wang. Towards Efficient Large-Scale VPN Monitoring and Diagnosis Under Operational Constraints. INFOCOM 2009, Rio de Janeiro, Brazil, April 2009.

Franck Le, Geoffrey G. Xie, Dan Pei, Jia Wang, and Hui Zhang. Shedding Light on the Glue Logic of the Internet Routing Architecture. SIGCOMM 2008, Seattle WA USA, August 2008.

Changxi Zheng, Lusheng Ji, Dan Pei, Jia Wang, and Paul Francis. A Light-Weight Distributed Scheme for Detecting IP Prefix Hijacks in Realtime. SIGCOMM 2007, Kyoto, Japan, August 2007.

Patrick Verkaik, Dan Pei, Tom Scholl, Aman Shaikh, Alex C. Snoeren, and Jacobus E. Van Der Merwe. Wresting Control from BGP: Scalable Fine-Grained Route Control. USENIX ATC 2007, Santa Clara, June 2007.

Lan Wang, Malleswari Saranu, Joel M. Gottlieb, and Dan Pei. Understanding BGP Session Failures in a Large ISP. INFOCOM 2007, Anchorage, Alaska, USA, May 2007.

Ricardo Oliveira, Beichuan Zhang, Dan Pei, Rafit Izhak-Ratzin, and Lixia Zhang. Quantifying Path Exploration in the Internet. ACM IMC 2006, Rio de Janeriro Brazil, October 2006.

Dan Pei, and Jacobus Van Der Merwe. BGP Convergence in Virtual Private Networks. ACM IMC 2006, Rio de Janeriro Brazil, October 2006.

Mohit Lad, Daniel Massey, Dan Pei, Yiguo Wu, Beichuan Zhang, and Lixia Zhang. PHAS: A Prefix Hijack Alert System. USENIX Security 2006, Vancouver, B.C., Canada, August 2006.

Beichuan Zhang, Dan Pei, Daniel Massey, and Lixia Zhang. Timer Interaction in Route Flap Damping. ICDCS 2005, Columbus, Ohio, June 2005. (Best Paper)

Dan Pei, Xiaoliang Zhao, Lan Wang, D. Massey, A. Mankin, S. Felix Wu, and Lixia Zhang. Improving BGP Convergence Through Consistency Assertions. INFOCOM 2002, New York, USA, 2, 902–911 vol.2. June 2002.

Lan Wang, Xiaoliang Zhao, Dan Pei, Randy Bush, Daniel Massey, Allison Mankin, S. Felix Wu, and Lixia Zhang. Observation and Analysis of BGP Behavior Under Stress. ACM IMW 2002, Marseille, France, 2002.

Xiaoliang Zhao, Dan Pei, Lan Wang, Dan Massey, Allison Mankin, S. Felix Wu, and Lixia Zhang. An Analysis of BGP Multiple Origin AS (MOAS) Conflicts. ACM IMW 2001, San Francisco, California, USA, 2001.