Hi there, I'm Yue ZHAO (赵越 in Chinese)! 👋
At CMU, I work with Prof. Leman Akoglu, Prof. Zhihao Jia, and Prof. George H. Chen. I am a member of Data Analytics Techniques Algorithms (DATA) Lab and CMU automated learning systems group (Catalyst). Externally, I collaborate with Prof. Jure Leskovec at Stanford, Prof. Xia "Ben" Hu at Rice University, and Prof. Philip S. Yu at UIC.
Contributions to outlier detection systems, benchmarks, and applications: I build automated, scalable, and accelerated machine learning systems (MLSys) to support large-scale, real-world outlier detection applications in security, finance, and healthcare with millions of downloads. I designed CPU-based (PyOD), GPU-based (TOD), distributed detection systems (SUOD) for tabular (PyOD), time-series (TODS), and graph data (PyGOD). To understand the characteristics of OD algorithms, I co-author large-scale benchmarks for tabular data (ADBench), time-series data (paper), and graph data (UNOD). My work has been widely used by thousands of projects and applications, including leading firms like IBM, Morgan Stanley, and Tesla. See more applications.
| Primary field | Secondary | Method | Year | Venue | Lead author |
|---|---|---|---|---|---|
| large-scale Benchmark | tabular anomaly detection | ADBench | 2022 | NeurIPS | Y |
| large-scale Benchmark | graph anomaly detection | UNOD | 2022 | NeurIPS | Y |
| large-scale Benchmark | sequence anomaly detection | TODS | 2021 | NeurIPS | |
| automated machine learning | outlier model selection | MetaOD | 2021 | NeurIPS | Y |
| automated machine learning | outlier model selection | ELECT | 2022 | ICDM | Y |
| automated machine learning | outlier HP optimization | HPOD | 2022 | Preprint | Y |
| automated machine learning | outlier evaluation | IPM | 2021 | Preprint | Y |
| machine learning systems | PyOD | 2019 | JMLR | Y | |
| machine learning systems | time series | TODS | 2020 | AAAI | |
| machine learning systems | SUOD | 2021 | MLSys | Y | |
| machine learning systems | distributed systems | TOD | 2022 | Preprint | Y |
| machine learning systems | graph neural networks | PyGOD | 2022 | Preprint | Y |
| ensemble learning | semi-supervised | XGBOD | 2018 | IJCNN | Y |
| ensemble learning | LSCP | 2019 | SDM | Y | |
| ensemble learning | machine learning systems | combo | 2020 | AAAI | Y |
| ensemble learning | interpretable ML | COPOD | 2020 | ICDM | Y |
| ensemble learning | interpretable ML | ECOD | 2022 | TKDE | Y |
| graph mining | finance | AutoAudit | 2020 | BigData | |
| graph neural networks | contrastive learning | CONAD | 2022 | PAKDD | |
| Diffusion Models | survey | 2022 | Preprint | ||
| AI x Science | synthetic data | SynC | 2020 | ICDMW | |
| AI x Science | healthcare AI | PyHealth | 2020 | Preprint | Y |
| AI x Science | Datasets & Benchmark | TDC | 2021 | NeurIPS | |
| AI x Science | Datasets & Benchmark | TDC V2 | 2022 | NCHEMB |
- PyOD: A Python Toolbox for Scalable Outlier Detection (Anomaly Detection).
- ADBench: The most comprehensive tabular anomaly detection benchmark (30 anomaly detection algorithms on 55 benchmark datasets).
- TOD: Tensor-based outlier detection--First large-scale GPU-based system for acceleration!
- SUOD: An Acceleration System for Large-scale Heterogeneous Outlier Detection.
- anomaly-detection-resources: The most starred resources (books, courses, etc.)!
- Python Graph Outlier Detection (PyGOD): A Python Library for Graph Outlier Detection.
- Therapeutics Data Commons (TDC): Machine learning for drug discovery.
- PyTorch Geometric (PyG): Graph Neural Network Library for PyTorch. Contributed to profiler & benchmarking, and heterogeneous data transformation.
- combo: A Python Toolbox for ML Model Combination (Ensemble Learning).
- TODS: Time-series Outlier Detection. Contributed to core detection models.
- MetaOD: Automatic Unsupervised Outlier Model Selection (AutoML).
& Travel:
-
Sep 2022: Two large-scale anomaly detection benchmarks for tabular data (ADBench) and graph data (UNOD) accepted at NeurIPS 2022.
- ADBench is arguably my most important work---this 45-page paper contains the analysis results on 30 algorithms on 57 datasets, with around 100,000 experiments. If you are doing anomaly detection, I believe this is a must-read.
-
Sep 2022: Check out our comprehensive survey on diffusion models. Star the code repo!
-
Aug 2022: ELECT: Toward Unsupervised Outlier Model Selection is accepted to IEEE International Conference on Data Mining (ICDM) as a regular paper!
-
Jul 2022:
🌟 Reached 1000 citations on Google Scholar!


