Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review
Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review
}
TY - GEN
T1 - NodeGuard: A Highly Efficient Two-Party Computation Framework for Training Large-Scale Gradient Boosting Decision Tree
AU - Dai, Tianxiang
AU - Jiang, Yufan
AU - Li, Yong
AU - Mei, Fei
PY - 2024/7/4
Y1 - 2024/7/4
N2 - The Gradient Boosting Decision Tree (GBDT) is a well-known machine learning algorithm, which achieves high performance and outstanding interpretability in real-world scenes such as fraud detection, online marketing and risk management. Meanwhile, two data owners can jointly train a GBDT model without disclosing their private dataset by executing secure Multi-Party Computation (MPC) protocols. In this work, we propose NodeGuard, a highly efficient two-party computation (2PC) framework for large-scale GBDT training and inference. NodeGuard guarantees that no sensitive intermediate results are leaked in the training and inference. The efficiency advantage of NodeGuard is achieved by applying a novel keyed bucket aggregation protocol, which optimizes the communication and computation complexity globally in the training. Additionally, we introduce a probabilistic approximate division protocol with an optimization for re-scaling, when the divisor is publicly known. Finally, we compare NodeGuard to state-of-the-art frameworks, and we show that NodeGuard is extremely efficient. It can improve the privacy-preserving GBDT training performance by a factor of 5.0 to 131 in LAN and 2.7 to 457 in WAN.
AB - The Gradient Boosting Decision Tree (GBDT) is a well-known machine learning algorithm, which achieves high performance and outstanding interpretability in real-world scenes such as fraud detection, online marketing and risk management. Meanwhile, two data owners can jointly train a GBDT model without disclosing their private dataset by executing secure Multi-Party Computation (MPC) protocols. In this work, we propose NodeGuard, a highly efficient two-party computation (2PC) framework for large-scale GBDT training and inference. NodeGuard guarantees that no sensitive intermediate results are leaked in the training and inference. The efficiency advantage of NodeGuard is achieved by applying a novel keyed bucket aggregation protocol, which optimizes the communication and computation complexity globally in the training. Additionally, we introduce a probabilistic approximate division protocol with an optimization for re-scaling, when the divisor is publicly known. Finally, we compare NodeGuard to state-of-the-art frameworks, and we show that NodeGuard is extremely efficient. It can improve the privacy-preserving GBDT training performance by a factor of 5.0 to 131 in LAN and 2.7 to 457 in WAN.
U2 - 10.1109/spw63631.2024.00015
DO - 10.1109/spw63631.2024.00015
M3 - Conference contribution/Paper
BT - 2024 IEEE Security and Privacy Workshops (SPW)
PB - IEEE
ER -