Deep reinforcement learning based worker selection for distributed machine learning enhanced edge intelligence in internet of vehicles

doi:10.23919/ICN.2020.0015

2020, Vol. 1

Issue (3): 234-242 doi: 10.23919/ICN.2020.0015

Deep reinforcement learning based worker selection for distributed machine learning enhanced edge intelligence in internet of vehicles

Junyu Dong,Wenjun Wu^*(

),Yang Gao,Xiaoxi Wang,Pengbo Si

Faculty of Information Technology, Beijing University of Technology, Beijing 100022, China

Download:

PDF (4153 KB)

HTML
Export: BibTeX | EndNote (RIS)

Abstract

Nowadays, Edge Information System (EIS) has received a lot of attentions. In EIS, Distributed Machine Learning (DML), which requires fewer computing resources, can implement many artificial intelligent applications efficiently. However, due to the dynamical network topology and the fluctuating transmission quality at the edge, work node selection affects the performance of DML a lot. In this paper, we focus on the Internet of Vehicles (IoV), one of the typical scenarios of EIS, and consider the DML-based High Definition (HD) mapping and intelligent driving decision model as the example. The worker selection problem is modeled as a Markov Decision Process (MDP), maximizing the DML model aggregate performance related to the timeliness of the local model, the transmission quality of model parameters uploading, and the effective sensing area of the worker. A Deep Reinforcement Learning (DRL) based solution is proposed, called the Worker Selection based on Policy Gradient (PG-WS) algorithm. The policy mapping from the system state to the worker selection action is represented by a deep neural network. The episodic simulations are built and the REINFORCE algorithm with baseline is used to train the policy network. Results show that the proposed PG-WS algorithm outperforms other comparation methods.

Key words： edge information system internet of vehicles distributed machine learning deep reinforcement learning worker selection

Received: 31 July 2020 Online: 19 August 2021

Fund: Science and Technology Foundation of Beijing Municipal Commission of Education(KM201810005027);National Natural Science Foundation of China(U1633115);Beijing Natural Science Foundation(L192002)

Corresponding Authors: Wenjun Wu E-mail: wenjunwu@bjut.edu.cn

About author: Junyu Dong received the BS degree in communication engineering from North China University of Technology in 2019. He is currently pursuing the MS degree at Faculty of Information Technology, Beijing University of Technology. His research interests include internet of vehicle and mobile edge computing.|Wenjun Wu received the BS and PhD degrees from Beijing University of Posts and Telecommunications, Beijing, China in 2007 and 2012, respectively. From 2012 to 2015, she was a post-doctoral researcher at the School of Electronic and Information Engineering, Beihang University, Beijing, China. She is now working as an associate professor at the Faculty of Information Technology, Beijing University of Technology, Beijing, China. Her research interests are in the fields of mobile edge computing, blockchain, and deep reinforcement learning.|Yang Gao received the BS degree in communication engineering from Beijing University of Technology, Beijing, China in 2018. She is currently pursuing the PhD degree in electronic science and technology at Beijing University of Technology, Beijing, China. Her current research interests include mobile edge computing, blockchain, deep reinforcement learning, and wireless resources management.|Xiaoxi Wang received the BS degree in communication engineering from Hebei University of Engineering in 2019. She is currently pursuing the MS degree at Faculty of Information Technology, Beijing University of Technology. Her current research interests include distributed machine learning and wireless networks.|Pengbo Si received the BE and PhD degrees from Beijing University of Posts and Telecommunications, Beijing, China in 2004 and 2009, respectively. He joined Beijing University of Technology, Beijing, China in 2009, where he is currently a full professor. During November 2007 and November 2008, he was a visiting student at Carleton University, Ottawa, Canada. During November 2014 and November 2015, he was a visiting scholar at the University of Florida, Gainesville, FL, USA.


	Service

	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	Junyu Dong
	Wenjun Wu
	Yang Gao
	Xiaoxi Wang
	Pengbo Si

Cite this article:

Junyu Dong,Wenjun Wu,Yang Gao,Xiaoxi Wang,Pengbo Si. Deep reinforcement learning based worker selection for distributed machine learning enhanced edge intelligence in internet of vehicles. , 2020, 1: 234-242.

URL:

http://icn.tsinghuajournals.com/10.23919/ICN.2020.0015 OR http://icn.tsinghuajournals.com/Y2020/V1/I3/234

Fig. 1 System architecture.

Fig. 2 DRL-based solution framework.

Parameter	Setting
Inter sites distance (m)	100
Number of vehicle nodes $N$	30
Noise density (dBm/Hz)	$- 174$
Size of intelligent information (bits)	12 000
Number of channel resource units	48
Moving speed of vehicle nodes (km/h)	[45, 90]
Time of tasks in uplink transmission	{1, 2, 4, 8}
Threshold of link outage	0.05
Time length of observed states $T$	20
Discount factor $λ$	0.9
Initial weights $r 1, r 2,$ and $r 3$	1

Table 1 Simulation parameters.

Fig. 3 Performance of average value in the training process.

Fig. 4 Training performance with different learning rates.

Fig. 5 Performance comparison with different reward weights.


[1]	Zhang J. and Letaief K. B., Mobile edge intelligence and computing for the internet of vehicles, Proc. IEEE, vol. 108, no. 2, pp. 246-261, 2020.
[2]	Xu W. C., Zhou H. B., Cheng N., Lv F., Shi W. S., Chen J. Y., and Shen X. M., Internet of vehicles in big data era, IEEE/CAA J. Autom. Sin., vol. 5, no. 1, pp. 19-35, 2018.
[3]	TomTom HD map for autonomous driving extends to Japan, , 2017.
[4]	HERE introduces HD live map to show the path to highly automated driving, , 2016.
[5]	Alcantarilla P. F., Stent S., Ros G., Arroyo R., and Gherardi R., Street-view change detection with deconvolutional networks, Auto. Robots, vol. 42, no. 7, pp. 1301-1322, 2018.
[6]	McMahan B., Moore E., Ramage D., Hampson S., and Arcas B. A., Communication-efficient learning of deep networks from decentralized data, arXiv preprint arXiv: 1602.05629, 2017.
[7]	Chen J. M., Pan X. H., Monga R., Bengio S., and Jozefowicz R., Revisiting distributed synchronous SGD, arXiv preprint arXiv: 1604.00981, 2016.
[8]	Wang S. Q., Tuor T., Salonidis T., Leung K. K., Makaya C., He T., and Chan K., When edge meets learning: Adaptive control for resource-constrained distributed machine learning, presented at IEEE INFOCOM 2018-IEEE Conf. Computer Communications, Honolulu, HI, USA, 2018, pp. 63-71.
[9]	Zhang R., Yu F. R., Liu J., Huang T., and Liu Y. J., Deep reinforcement learning (DRL)-based Device-to-Device (D2D) caching with blockchain and mobile edge computing, IEEE Trans. Wireless Comm., vol. 19, no. 10, pp. 6469-6485, 2020.
[10]	Gao Y., Wu W. J., Nan H. X., Sun Y., and Si P. B., Deep reinforcement learning based task scheduling in mobile Blockchain for IoT applications, presented at ICC 2020-2020 IEEE Int. Conf. Communications (ICC), Dublin, Ireland, 2020, pp. 1-7.
[11]	Li M., Yu F. R., Si P. B., Wu W. J., and Zhang Y. H., Resource optimization for delay-tolerant data in blockchain-enabled iot with edge computing: A deep reinforcement learning approach, IEEE Int. Things J., vol. 7, no. 10, pp. 9399-9412, 2020.
[12]	Mozaffari M., Saad W., Bennis M., and Debbah M., Mobile unmanned aerial vehicles (UAVs) for energy-efficient internet of things communications, IEEE Trans. Wirel. Comm., vol. 16, no. 11, pp. 7574-7589, 2017.
[13]	Liu H., Liu S. W., and Zheng K., A reinforcement learning-based resource allocation scheme for cloud robotics, IEEE Access, vol. 6, pp. 17 215-17 222, 2018.
[14]	Sutton R. S. and Barto A. G., Reinforcement Learning: An Introduction. Cambridge, MA, USA: MIT Press, 1998.
[15]	Enhancement of 3GPP Support for V2X Scenarios, 3GPP TS 22.186, 2019.

[1]	G. M. Shafiqur Rahman,Tian Dang,Manzoor Ahmed. Deep reinforcement learning based computation offloading and resource allocation for low-latency fog radio access networks[J]. , 2020, 1(3): 243-257.

[2]	Tan Li,Congduan Li,Jingjing Luo,Linqi Song. Wireless recommendations for internet of vehicles: Recent advances, challenges, and opportunities[J]. , 2020, 1(1): 1-17.

Viewed

Full text

Abstract

Cited

Shared

Discussed