Home > Research > Publications & Outputs > Optimizing CNN Inference Speed over Big Social ...

Associated organisational unit

Electronic data

Links

Text available via DOI:

View graph of relations

Optimizing CNN Inference Speed over Big Social Data through Efficient Model Parallelism for Sustainable Web of Things

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Published

Standard

Optimizing CNN Inference Speed over Big Social Data through Efficient Model Parallelism for Sustainable Web of Things. / Hu, Yuhao; Xu, Xiaolong; Bilal, Muhammad et al.
In: Journal of Parallel and Distributed Computing, Vol. 192, 104927, 31.10.2024.

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Harvard

Hu, Y, Xu, X, Bilal, M, Zhong, W, Liu, Y, Kou, H & Kong, L 2024, 'Optimizing CNN Inference Speed over Big Social Data through Efficient Model Parallelism for Sustainable Web of Things', Journal of Parallel and Distributed Computing, vol. 192, 104927. https://doi.org/10.1016/j.jpdc.2024.104927

APA

Hu, Y., Xu, X., Bilal, M., Zhong, W., Liu, Y., Kou, H., & Kong, L. (2024). Optimizing CNN Inference Speed over Big Social Data through Efficient Model Parallelism for Sustainable Web of Things. Journal of Parallel and Distributed Computing, 192, Article 104927. https://doi.org/10.1016/j.jpdc.2024.104927

Vancouver

Hu Y, Xu X, Bilal M, Zhong W, Liu Y, Kou H et al. Optimizing CNN Inference Speed over Big Social Data through Efficient Model Parallelism for Sustainable Web of Things. Journal of Parallel and Distributed Computing. 2024 Oct 31;192:104927. Epub 2024 Jun 8. doi: 10.1016/j.jpdc.2024.104927

Author

Hu, Yuhao ; Xu, Xiaolong ; Bilal, Muhammad et al. / Optimizing CNN Inference Speed over Big Social Data through Efficient Model Parallelism for Sustainable Web of Things. In: Journal of Parallel and Distributed Computing. 2024 ; Vol. 192.

Bibtex

@article{1c2efe7f006a49b184b7cd3a1cf7adf5,
title = "Optimizing CNN Inference Speed over Big Social Data through Efficient Model Parallelism for Sustainable Web of Things",
abstract = "The rapid development of artificial intelligence and networking technologies has catalyzed the popularity of intelligent services based on deep learning in recent years, which in turn fosters the advancement of Web of Things (WoT). Big social data (BSD) plays an important role during the processing of intelligent services in WoT. However, intelligent BSD services are computationally intensive and require ultra-low latency. End or edge devices with limited computing power cannot realize the extremely low response latency of those services. Distributed inference of deep neural networks (DNNs) on various devices is considered a feasible solution by allocating the computing load of a DNN to several devices. In this work, an efficient model parallelism method that couples convolution layer (Conv) split with resource allocation is proposed. First, given a random computing resource allocation strategy, the Conv split decision is made through a mathematical analysis method to realize the parallel inference of convolutional neural networks (CNNs). Next, Deep Reinforcement Learning is used to get the optimal computing resource allocation strategy to maximize the resource utilization rate and minimize the CNN inference latency. Finally, simulation results show that our approach performs better than the baselines and is applicable for BSD services in WoT with a high workload.",
author = "Yuhao Hu and Xiaolong Xu and Muhammad Bilal and Weiyi Zhong and Yuwen Liu and Huaizhen Kou and Lingzhen Kong",
year = "2024",
month = oct,
day = "31",
doi = "10.1016/j.jpdc.2024.104927",
language = "English",
volume = "192",
journal = "Journal of Parallel and Distributed Computing",
issn = "0743-7315",
publisher = "Academic Press Inc.",

}

RIS

TY - JOUR

T1 - Optimizing CNN Inference Speed over Big Social Data through Efficient Model Parallelism for Sustainable Web of Things

AU - Hu, Yuhao

AU - Xu, Xiaolong

AU - Bilal, Muhammad

AU - Zhong, Weiyi

AU - Liu, Yuwen

AU - Kou, Huaizhen

AU - Kong, Lingzhen

PY - 2024/10/31

Y1 - 2024/10/31

N2 - The rapid development of artificial intelligence and networking technologies has catalyzed the popularity of intelligent services based on deep learning in recent years, which in turn fosters the advancement of Web of Things (WoT). Big social data (BSD) plays an important role during the processing of intelligent services in WoT. However, intelligent BSD services are computationally intensive and require ultra-low latency. End or edge devices with limited computing power cannot realize the extremely low response latency of those services. Distributed inference of deep neural networks (DNNs) on various devices is considered a feasible solution by allocating the computing load of a DNN to several devices. In this work, an efficient model parallelism method that couples convolution layer (Conv) split with resource allocation is proposed. First, given a random computing resource allocation strategy, the Conv split decision is made through a mathematical analysis method to realize the parallel inference of convolutional neural networks (CNNs). Next, Deep Reinforcement Learning is used to get the optimal computing resource allocation strategy to maximize the resource utilization rate and minimize the CNN inference latency. Finally, simulation results show that our approach performs better than the baselines and is applicable for BSD services in WoT with a high workload.

AB - The rapid development of artificial intelligence and networking technologies has catalyzed the popularity of intelligent services based on deep learning in recent years, which in turn fosters the advancement of Web of Things (WoT). Big social data (BSD) plays an important role during the processing of intelligent services in WoT. However, intelligent BSD services are computationally intensive and require ultra-low latency. End or edge devices with limited computing power cannot realize the extremely low response latency of those services. Distributed inference of deep neural networks (DNNs) on various devices is considered a feasible solution by allocating the computing load of a DNN to several devices. In this work, an efficient model parallelism method that couples convolution layer (Conv) split with resource allocation is proposed. First, given a random computing resource allocation strategy, the Conv split decision is made through a mathematical analysis method to realize the parallel inference of convolutional neural networks (CNNs). Next, Deep Reinforcement Learning is used to get the optimal computing resource allocation strategy to maximize the resource utilization rate and minimize the CNN inference latency. Finally, simulation results show that our approach performs better than the baselines and is applicable for BSD services in WoT with a high workload.

U2 - 10.1016/j.jpdc.2024.104927

DO - 10.1016/j.jpdc.2024.104927

M3 - Journal article

VL - 192

JO - Journal of Parallel and Distributed Computing

JF - Journal of Parallel and Distributed Computing

SN - 0743-7315

M1 - 104927

ER -