Deep reinforcement learning for dynamic computation offloading and resource allocation in cache-assisted mobile edge computing systems

doi:10.23919/ICN.2020.0014

2020, Vol. 1

Issue (2): 181-198 doi: 10.23919/ICN.2020.0014

Deep reinforcement learning for dynamic computation offloading and resource allocation in cache-assisted mobile edge computing systems

Samrat Nath(

),Jingxian Wu^*(

)

Walmart Inc., Bentonville, AR 72716, USA
Department of Electrical Engineering, University of Arkansas, Fayetteville, AR 72701, USA

Download:

PDF (580 KB)

HTML
Export: BibTeX | EndNote (RIS)

Abstract

Mobile Edge Computing (MEC) is one of the most promising techniques for next-generation wireless communication systems. In this paper, we study the problem of dynamic caching, computation offloading, and resource allocation in cache-assisted multi-user MEC systems with stochastic task arrivals. There are multiple computationally intensive tasks in the system, and each Mobile User (MU) needs to execute a task either locally or remotely in one or more MEC servers by offloading the task data. Popular tasks can be cached in MEC servers to avoid duplicates in offloading. The cached contents can be either obtained through user offloading, fetched from a remote cloud, or fetched from another MEC server. The objective is to minimize the long-term average of a cost function, which is defined as a weighted sum of energy consumption, delay, and cache contents’ fetching costs. The weighting coefficients associated with the different metrics in the objective function can be adjusted to balance the tradeoff among them. The optimum design is performed with respect to four decision parameters: whether to cache a given task, whether to offload a given uncached task, how much transmission power should be used during offloading, and how much MEC resources to be allocated for executing a task. We propose to solve the problems by developing a dynamic scheduling policy based on Deep Reinforcement Learning (DRL) with the Deep Deterministic Policy Gradient (DDPG) method. A new decentralized DDPG algorithm is developed to obtain the optimum designs for multi-cell MEC systems by leveraging on the cooperations among neighboring MEC servers. Simulation results demonstrate that the proposed algorithm outperforms other existing strategies, such as Deep Q-Network (DQN).

Key words： Mobile Edge Computing (MEC) caching computation offloading resource allocation Deep Reinforcement Learning (DRL) Deep Deterministic Policy Gradient (DDPG) multi-cell

Received: 30 May 2020 Online: 19 August 2021

Corresponding Authors: Jingxian Wu E-mail: samrat.nath@walmart.com;wuj@uark.edu

About author: Samrat Nath received the BS degree in electrical and electronic engineering from the Bangladesh University of Engineering and Technology, Dhaka, Bangladesh in 2014, and the PhD degree in electrical engineering from the University of Arkansas, Fayetteville, USA in 2020. He is currently a data scientist at Walmart Inc. in Bentonville, AR, USA. His research interests include statistical signal analysis, information sensing and processing, optimization, machine learning, and wireless communication.|Jingxian Wu received the BS (EE) degree from the Beijing University of Aeronautics and Astronautics, Beijing, China in 1998, the MEng (EE) degree from Tsinghua University, Beijing, China in 2001, and the PhD (EE) degree from the University of Missouri at Columbia, Missouri, USA in 2005. He is currently a professor at the Department of Electrical Engineering, University of Arkansas, Fayetteville. His research interests mainly focus on signal processing for large scale networks and wireless communications, cybersecurity for smart grids, statistical data analytics, etc. He served as symposium or track co-chairs for a number of international conferences, such as the 2012 and 2019 IEEE International Conference on Communications, the 2009, 2015, and 2017 IEEE Global Telecommunications Conference, etc. He served as an associate editor of the IEEE Transactions on Vehicular Technology from 2007 to 2011, an editor of the IEEE Transactions on Wireless Communications from 2011 to 2016, and is now serving as an associate editor of the IEEE Aeeess.


	Service

	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	Samrat Nath
	Jingxian Wu

Cite this article:

Samrat Nath,Jingxian Wu. Deep reinforcement learning for dynamic computation offloading and resource allocation in cache-assisted mobile edge computing systems. , 2020, 1: 181-198.

URL:

http://icn.tsinghuajournals.com/10.23919/ICN.2020.0014 OR http://icn.tsinghuajournals.com/Y2020/V1/I2/181

Notation	Definition
$𝒯 ? (𝒦)$	Set of discrete time slots (tasks)
$𝒩 ? (𝒮)$	Set of MUs (BSs)
$D ? (F)$	Cache (computational) capacity of the MEC server
$b k ? (d k)$	Data size (computational requirement) of $k$ -th task
$𝒄 t ? (𝒙 t)$	Caching (computation offloading) decision vector at slot $t$
$𝒃 t ? (f t)$	Transmission power (computational resource) allocation vector
$𝒦 t o ? (𝒦 t + 1 c)$	Set of tasks offloaded to (cached in) the MEC server at slot $t$
$k t$	Tasks requested by all the MUs at the start of slot $t$
$𝑯 t$	Channel matrix at slot $t$
$𝒩 t c$	Set of MUs with requested task available in the cache
$J ˉ ? (J ˉ ′)$	Long-term average cost of single-cell (multi-cell) MEC system
$ω n ? (ω c)$	Weight for delay-energy tradeoff (fetching cost)
$R ? (μ)$	Reward (policy) function
$𝜽 Q ? (𝜽 μ)$	Neural network weights of critic (actor) network
$\| 𝒩 s \|$	Number of MUs associated with the $s$ -th BS

Table 1 List of notations.

Fig. 1 System model for single-cell MEC network.

Parameter	Value
Number of training episodes ( $K max$ )	2000
Number of steps in each episode ( $T max$ )	100
Experience replay buffer size ( $\| ℛ B \|$ )	50 000
Mini-batch size ( $B$ )	128
Learning rate for critic network ( $α Q$ )	$10 - 4$
Learning rate for actor network ( $α μ$ )	$10 - 3$
Soft update rate for target networks ( $ζ$ )	$10 - 3$

Table 2 Hyper-parameters for training neural networks.

Fig. 2 Average system cost v.s. computational capacity of the MEC server.

Fig. 3 Average system cost vs cache size of the MEC server.

Fig. 4 Effect of computation offloading and resource allocation.

𝝎n=𝝎 (in W)

∀ n ∈ 𝐍

.
">

Fig. 5 Tradeoff between average energy consumption and average delay with tradeoff parameter $𝝎 n = 𝝎$ (in W) $∀ n ∈ 𝐍$ .

Fig. 6 Average system cost vs average task data size in the multi-cell MEC model.


[1]	Mao Y., You C., Zhang J., Huang K., and Letaief K. B., A survey on mobile edge computing: The communication perspective, IEEE Communications Surveys Tutorials, vol. 19, no. 4, pp. 2322-2358, 2017.
[2]	Hao Y., Chen M., Hu L., Hossain M. S., and Ghoneim A., Energy efficient task caching and offloading for mobile edge computing, IEEE Access, vol. 6, pp. 11 365-11 373, 2018.
[3]	Wang C., Liang C., Yu F. R., Chen Q., and Tang L., Computation offloading and resource allocation in wireless cellular networks with mobile edge computing, IEEE Transactions on Wireless Communications, vol. 16, no. 8, pp. 4924-4938, 2017.
[4]	Zahed M. I. A., Ahmad I., Habibi D., and Phung Q. V., Green and secure computation offloading for cache-enabled IoT networks, IEEE Access, vol. 8, pp. 63 840-63 855, 2020.
[5]	Alfakih T., Hassan M. M., Gumaei A., Savaglio C., and Fortino G., Task offloading and resource allocation for mobile edge computing by deep reinforcement learning based on SARSA, IEEE Access, vol. 8, pp. 54 074-54 084, 2020.
[6]	Maurice N., Pham Q.-V., and Hwang W.-J., Online computation offloading in noma-based multi-access edge computing: A deep reinforcement learning approach, IEEE Access, vol. 8, pp. 99 098-99 109, 2020.
[7]	Huang L., Feng X., Zhang C., Qian L., and Wu Y., Deep reinforcement learning-based joint task offloading and bandwidth allocation for multi-user mobile edge computing, Digital Communications and Networks, vol. 5, no. 1, pp. 10-17, 2019.
[8]	Wang J., Zhao L., Liu J., and Kato N., Smart resource allocation for mobile edge computing: A deep reinforcement learning approach, IEEE Transactions on Emerging Topics in Computing, . doi: 10.1109/TETC.2019.2902661
[9]	Nath S., Li Y., Wu J., and Fan P., Multi-user multi-channel computation offloading and resource allocation for mobile edge computing, . doi: 10.1109/ICC40277.2020.9149124
[10]	Nath S. and Wu J., Dynamic computation offloading and resource allocation for multi-user mobile edge computing, presented at IEEE Global Communications Conf. (GLOBECOM), Taipei, China, 2020.
[11]	Chen Z. and Wang X., Decentralized computation offloading for multi-user mobile edge computing: A deep reinforcement learning approach, EURASIP Journal on Wireless Communications and Networking, . doi: 10.1186/s13638-020-01801-6
[12]	Liu P., Xu G., Yang K., Wang K., and Meng X., Jointly optimized energy-minimal resource allocation in cache-enhanced mobile edge computing systems, IEEE Access, vol. 7, pp. 3336-3347, 2018.
[13]	Mao Y., Zhang J., Song S. H., and Letaief K. B., Stochastic joint radio and computational resource management for multi-user mobile-edge computing systems, IEEE Transactions on Wireless Communications, vol. 16, no. 9, pp. 5994-6009, 2017.
[14]	Chunlin L. and Zhang J., Dynamic cooperative caching strategy for delay-sensitive applications in edge computing environment, The Journal of Supercomputing, vol. 76, no. 1, pp. 1-25, 2020.
[15]	Xu J., Chen L., and Zhou P., Joint service caching and task offloading for mobile edge computing in dense networks, in Proc. of IEEE Conference on Computer Communications (INFOCOM), Honolulu, HI, USA, 2018, pp. 207-215.
[16]	Yang P., Zhang N., Zhang S., Yu L., Zhang J., and Shen X., Dynamic mobile edge caching with location differentiation, . doi: 10.1109/GLOCOM.2017.8254034
[17]	Sutton R. S. and Barto A. G., Reinforcement learning: An introduction. Cambridge, MA, USA: MIT press, 2018.
[18]	Arulkumaran K., Deisenroth M. P., Brundage M., and Bharath A. A., Deep reinforcement learning: A brief survey, IEEE Signal Processing Magazine, vol. 34, no. 6, pp. 26-38, 2017.
[19]	Mnih V., Kavukcuoglu K., Silver D., Graves A., Antonoglou I., Wierstra D., and Riedmiller M., Playing atari with deep reinforcement learning, arXiv preprint arXiv:1312.5602, 2013.
[20]	Lillicrap T. P., Hunt J. J., Pritzel A., Heess N., Erez T., Tassa Y., Silver D., and Wierstra D., Continuous control with deep reinforcement learning, arXiv preprint arXiv:1509.02971, 2015.
[21]	Shanmugam K., Golrezaei N., Dimakis A. G., Molisch A. F., and Caire G., Femtocaching: Wireless content delivery through distributed caching helpers, IEEE Transactions on Information Theory, vol. 59, no. 12, pp. 8402-8413, 2013.
[22]	Sadeghi A., Sheikholeslami F., and Giannakis G. B., Optimal dynamic proactive caching via reinforcement learning, . doi: 10.1109/SPAWC.2018.8445899
[23]	Suraweera H. A., Tsiftsis T. A., Karagiannidis G. K., and Nallanathan A., Effect of feedback delay on amplify-and-forward relay networks with beamforming, IEEE Transactions on Vehicular Technology, vol. 60, no. 3, pp. 1265-1271, 2011.
[24]	Abramowitz M. and Stegun I. A., Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. Washington, DC, USA: US Government Printing Office, 1948.
[25]	Ngo H. Q., Larsson E. G., and Marzetta T. L., Energy and spectral efficiency of very large multiuser mimo systems, IEEE Transactions on Communications, vol. 61, no. 4, pp. 1436-1449, 2013.
[26]	Wen Y., Zhang W., and Luo H., Energy-optimal mobile application execution: Taming resource-poor mobile devices with cloud clones, in Proc. IEEE Conference on Computer Communications (INFOCOM), Orlando, FL, USA, 2012, pp. 2716-2720.
[27]	Chen X., Jiao L., Li W., and Fu X., Efficient multi-user computation offloading for mobile-edge cloud computing, IEEE/ACM Transactions on Networking, vol. 24, no. 5, pp. 2795-2808, 2016.
[28]	Zhang K., Mao Y., Leng S., Zhao Q., Li L., Peng X., Pan L., Maharjan S., and Zhang Y., Energy-efficient offloading for mobile edge computing in 5G heterogeneous networks, IEEE Access, vol. 4, pp. 5896-5907, 2016.
[29]	Li J., Gao H., Lv T., and Lu Y., Deep reinforcement learning based computation offloading and resource allocation for MEC, in Proc. Wireless Communications and Networking Conference (WCNC), Barcelona, Spain, 2018, pp. 1-6.
[30]	Nath S., Wu J., and Yang J., Delay and energy efficiency tradeoff for information pushing system, IEEE Transactions on Green Communications and Networking, vol. 2, no. 4, pp. 1027-1040, 2018.
[31]	Nath S., Wu J., and Lin H., Optimum multicast scheduling in delay-constrained content-centric wireless networks, . doi: 10.1109/ICC.2019.8761690
[32]	Nath S., Wu J., and Yang J., Optimum energy efficiency and age-of-information tradeoff in multicast scheduling, . doi: 10.1109/ICC.2018.8422521
[33]	Nath S., Wu J., and Yang J., Optimizing age-of-information and energy efficiency tradeoff for mobile pushing notifications, . doi: 10.1109/SPAWC.2017.8227712
[34]	Adelman D. and Mersereau A. J., Relaxations of weakly coupled stochastic dynamic programs, Operations Research, vol. 56, no. 3, pp. 712-727, 2008.
[35]	Watkins C. J. and Dayan P., Q-learning, Machine learning, vol. 8, no. 3, pp. 279-292, 1992.
[36]	Uhlenbeck G. E. and Ornstein L. S., On the theory of the brownian motion, Physical Review, vol. 36, no. 5, p. 823, 1930.
[37]	Kingma D. P. and Ba J., Adam: A method for stochastic optimization, arXiv preprint arXiv:1412.6980, 2014.

[1]	Yanxiang Jiang,Hui Ge,Chaoyi Wan,Baotian Fan,Jie Yan. Pricing-based edge caching resource allocation in fog radio access networks[J]. , 2020, 1(3): 221-233.

[2]	G. M. Shafiqur Rahman,Tian Dang,Manzoor Ahmed. Deep reinforcement learning based computation offloading and resource allocation for low-latency fog radio access networks[J]. , 2020, 1(3): 243-257.

[3]	Mohammed Amine Bouras,Fadi Farha,Huansheng Ning. Convergence of computing, communication, and caching in internet of things[J]. , 2020, 1(1): 18-36.

Viewed

Full text

Abstract

Cited

Shared

Discussed