An adaptive DNN inference acceleration framework with end–edge–cloud collaborative computing

Computing and Communications

Text available via DOI:

https://doi.org/10.1016/j.future.2022.10.033
Final published version

Keywords

Deep neural networks, DNN computation partitioning, DNN inference acceleration, End–edge–cloud collaboration, Latency prediction model

View graph of relations

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Published

Standard

An adaptive DNN inference acceleration framework with end–edge–cloud collaborative computing. / Liu, Guozhi; Dai, Fei; Xu, Xiaolong et al.
In: Future Generation Computer Systems, Vol. 140, 31.03.2023, p. 422-435.

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Harvard

Liu, G, Dai, F, Xu, X, Fu, X, Dou, W, Kumar, N & Bilal, M 2023, 'An adaptive DNN inference acceleration framework with end–edge–cloud collaborative computing', Future Generation Computer Systems, vol. 140, pp. 422-435. https://doi.org/10.1016/j.future.2022.10.033

APA

Liu, G., Dai, F., Xu, X., Fu, X., Dou, W., Kumar, N., & Bilal, M. (2023). An adaptive DNN inference acceleration framework with end–edge–cloud collaborative computing. Future Generation Computer Systems, 140, 422-435. https://doi.org/10.1016/j.future.2022.10.033

Vancouver

Liu G, Dai F, Xu X, Fu X, Dou W, Kumar N et al. An adaptive DNN inference acceleration framework with end–edge–cloud collaborative computing. Future Generation Computer Systems. 2023 Mar 31;140:422-435. Epub 2022 Nov 21. doi: 10.1016/j.future.2022.10.033

Author

Liu, Guozhi ; Dai, Fei ; Xu, Xiaolong et al. / An adaptive DNN inference acceleration framework with end–edge–cloud collaborative computing. In: Future Generation Computer Systems. 2023 ; Vol. 140. pp. 422-435.

Bibtex

@article{5527ddaf0cef45d99bfa7b1999261a75,

title = "An adaptive DNN inference acceleration framework with end–edge–cloud collaborative computing",

abstract = "Deep Neural Networks (DNNs) based on intelligent applications have been intensively deployed on mobile devices. Unfortunately, resource-constrained mobile devices cannot meet stringent latency requirements due to a large amount of computation required by these intelligent applications. Both exiting cloud-assisted DNN inference approaches and edge-assisted DNN inference approaches can reduce end-to-end inference latency through offloading DNN computations to the cloud server or edge servers, but they suffer from unpredictable communication latency caused by long wide-area massive data transmission or performance degeneration caused by the limited computation resources. In this paper, we propose an adaptive DNN inference acceleration framework, which accelerates DNN inference by fully utilizing the end–edge–cloud collaborative computing. First, a latency prediction model is built to estimate the layer-wise execution latency of a DNN on different heterogeneous computing platforms, which use neural networks to learn non-linear features related to inference latency. Second, a computation partitioning algorithm is designed to identify two optimal partitioning points, which adaptively divide DNN computations into end devices, edge servers, and the cloud server for minimizing DNN inference latency. Finally, we conduct extensive experiments on three widely-adopted DNNs, and the experimental results show that our latency prediction models can improve the prediction accuracy by about 72.31% on average compared with four baseline approaches, and our computation partitioning approach can reduce the end-to-end latency by about 20.81% on average against six baseline approaches under three wireless networks.",

keywords = "Deep neural networks, DNN computation partitioning, DNN inference acceleration, End–edge–cloud collaboration, Latency prediction model",

author = "Guozhi Liu and Fei Dai and Xiaolong Xu and Xiaodong Fu and Wanchun Dou and Neeraj Kumar and Muhammad Bilal",

year = "2023",

month = mar,

day = "31",

doi = "10.1016/j.future.2022.10.033",

language = "English",

volume = "140",

pages = "422--435",

journal = "Future Generation Computer Systems",

issn = "0167-739X",

publisher = "Elsevier",

}

RIS

TY - JOUR

T1 - An adaptive DNN inference acceleration framework with end–edge–cloud collaborative computing

AU - Liu, Guozhi

AU - Dai, Fei

AU - Xu, Xiaolong

AU - Fu, Xiaodong

AU - Dou, Wanchun

AU - Kumar, Neeraj

AU - Bilal, Muhammad

PY - 2023/3/31

Y1 - 2023/3/31

N2 - Deep Neural Networks (DNNs) based on intelligent applications have been intensively deployed on mobile devices. Unfortunately, resource-constrained mobile devices cannot meet stringent latency requirements due to a large amount of computation required by these intelligent applications. Both exiting cloud-assisted DNN inference approaches and edge-assisted DNN inference approaches can reduce end-to-end inference latency through offloading DNN computations to the cloud server or edge servers, but they suffer from unpredictable communication latency caused by long wide-area massive data transmission or performance degeneration caused by the limited computation resources. In this paper, we propose an adaptive DNN inference acceleration framework, which accelerates DNN inference by fully utilizing the end–edge–cloud collaborative computing. First, a latency prediction model is built to estimate the layer-wise execution latency of a DNN on different heterogeneous computing platforms, which use neural networks to learn non-linear features related to inference latency. Second, a computation partitioning algorithm is designed to identify two optimal partitioning points, which adaptively divide DNN computations into end devices, edge servers, and the cloud server for minimizing DNN inference latency. Finally, we conduct extensive experiments on three widely-adopted DNNs, and the experimental results show that our latency prediction models can improve the prediction accuracy by about 72.31% on average compared with four baseline approaches, and our computation partitioning approach can reduce the end-to-end latency by about 20.81% on average against six baseline approaches under three wireless networks.

AB - Deep Neural Networks (DNNs) based on intelligent applications have been intensively deployed on mobile devices. Unfortunately, resource-constrained mobile devices cannot meet stringent latency requirements due to a large amount of computation required by these intelligent applications. Both exiting cloud-assisted DNN inference approaches and edge-assisted DNN inference approaches can reduce end-to-end inference latency through offloading DNN computations to the cloud server or edge servers, but they suffer from unpredictable communication latency caused by long wide-area massive data transmission or performance degeneration caused by the limited computation resources. In this paper, we propose an adaptive DNN inference acceleration framework, which accelerates DNN inference by fully utilizing the end–edge–cloud collaborative computing. First, a latency prediction model is built to estimate the layer-wise execution latency of a DNN on different heterogeneous computing platforms, which use neural networks to learn non-linear features related to inference latency. Second, a computation partitioning algorithm is designed to identify two optimal partitioning points, which adaptively divide DNN computations into end devices, edge servers, and the cloud server for minimizing DNN inference latency. Finally, we conduct extensive experiments on three widely-adopted DNNs, and the experimental results show that our latency prediction models can improve the prediction accuracy by about 72.31% on average compared with four baseline approaches, and our computation partitioning approach can reduce the end-to-end latency by about 20.81% on average against six baseline approaches under three wireless networks.

KW - Deep neural networks

KW - DNN computation partitioning

KW - DNN inference acceleration

KW - End–edge–cloud collaboration

KW - Latency prediction model

U2 - 10.1016/j.future.2022.10.033

DO - 10.1016/j.future.2022.10.033

M3 - Journal article

AN - SCOPUS:85142307615

VL - 140

SP - 422

EP - 435

JO - Future Generation Computer Systems

JF - Future Generation Computer Systems

SN - 0167-739X

ER -

Research

Links

Text available via DOI:

Keywords