doi:10.23919/ICN.2021.0016

2021, Vol. 2

Issue (3): 177-197 doi: 10.23919/ICN.2021.0016

Download:

PDF (5322 KB)

HTML
Export: BibTeX | EndNote (RIS)

Received: 25 August 2021 Online: 07 December 2021


	Service

	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors

Cite this article:

. . , 2021, 2: 177-197.

URL:

http://icn.tsinghuajournals.com/10.23919/ICN.2021.0016 OR http://icn.tsinghuajournals.com/Y2021/V2/I3/177

Solution space	Related work	Algorithmic technique	Limitation
Intelligent routing	DROM^[21], knowledge defined networking^[22], multi-agent routing^[23], graph based routing^[24]	DROM^[21] and knowledge defined networking^[22] make use of Deep Deterministic Policy Gradient (DDPG) algorithms to change the weights of network links in an SDN framework. Multi-agent routing^[23] upgrades Q-routing with multi-agent RL techniques for coordinated routing. Graph based routing^[24] develops a model-free RL technique that considers the graph nature of the network topology tailored to the routing problem.	This set of papers makes use of RL within the routing plane of the network. There is no modeling of the internal router/switch queue configurations or machine learning thereof.
Network traffic control	QFLOW^[25], LEARNET^[26], multi-hop routing^[27], SDN RL^[28], TCP-DRINC^[29]	QFLOW^[25] is a platform for RL-based edge network configuration that uses queuing, learning, and scheduling to meet the quatily of experience of video streaming applications. LEARNET^[26] makes use of RL for flow control in time-sensitive deterministic networks. In multi-hop routing^[27], a distributed model-free solution based on stochastic policy gradient RL was proposed, which aims to minimize the E2E delay by allowing each router to send a packet to the next-hop router according to the learned optimal probability. SDN-based RL techniques^[28] exploits the multi-path forwarding of SDN to increase throughput or reduce latency of packet transmission. TCP-DRINC^[29] uses an RL-based congestion control to adjust the congestion window size within TCP networks.	This set of traffic engineering techniques makes use of overlay congestion control or SDN-based flow control to improve the throughput, latency, and packet drop of networks. They do not consider the bottlenecks that can arise at the port queue configuration level in underlay networks.
Queue configuration	RL-QN^[30], Deep RL^[31], queue-learning^[32]	In Ref. [30], model-based RL was used to learn the optimal control policy of queueing networks, so that the average job delay is minimized. In Ref. [31], Proximal Policy Optimization (PPO) algorithm was tested on a parallel-server system and large-size multi-class queueing networks. The algorithm consistently generates control policies that outperform heuristics in a variety of load conditions from light to heavy traffic. Queue-learning^[32] studies an RL-based service rate control algorithm for providing QoS in tandem queueing networks. The proposed method is capable of guaranteeing probabilistic upper-bounds on the end-to-end delay of the system.	These papers provide a theoretical foundation on applying RL within queueing networks. However, specific nature of protocols used for traffic policing and shaping within routers/switches are not considered.

Tab.1

Fig.1

Fig.18

Fig.19

Fig.2

Fig.3

Fig.20

Fig.4

Fig.5

Fig.21

Fig.6

Fig.7

Fig.8

Tab.2

Fig.9

Fig.10

Configuration	Queue	Metric
Configuration	Queue	Throughput (jobs/s)	Queue length	Residence time (s)	Utilization ratio
Baseline	Q0	124.3	0.33	0.00070	0.240
	Q1	124.3	0.33	0.00070	0.240
	Q2	124.3	0.33	0.00070	0.240
	Q3	124.3	0.14	0.00034	0.120
	Q4	281.0	1.28	0.00300	0.560
	Q5	419.0	5.00	0.01200	0.830
	Q6	477.0	19.40	0.04600	0.950
	Q7	497.0	98.90	0.23000	0.990
	UDP-base	353.0	2.22	0.10000	0.700
	UDP-spike	379.8	0.13	0.00032	0.110
	TCP	445.0	44.00	0.14000	0.990
	Misc	995.0	66.90	0.16000	0.990
1	Q0	124.3	0.33	0.00086	0.240
	Q1	124.3	0.33	0.00086	0.240
	Q2	124.3	0.33	0.00086	0.240
	Q3	124.3	0.14	0.00037	0.120
	Q4	281.0	1.28	0.00300	0.560
	Q5	379.0	3.11	0.00820	0.750
	Q6	477.0	19.40	0.05000	0.950
	Q7	497.0	103.90	0.27000	0.990
	UDP-base	336.0	1.90	0.00510	0.670
	UDP-spike	358.0	0.13	0.00033	0.110
	TCP	445.0	42.70	0.11000	0.990
	Misc	995.0	66.90	0.16000	0.990
2	Q0	124.3	0.330	0.00072	0.240
	Q1	124.3	0.33	0.00072	0.240
	Q2	124.3	0.33	0.00072	0.240
	Q3	124.3	0.14	0.00031	0.120
	Q4	281.0	1.28	0.00280	0.560
	Q5	457.0	10.30	0.02250	0.910
	Q6	477.0	19.10	0.04000	0.950
	Q7	497.0	91.00	0.19000	0.990
	UDP-base	368.0	2.60	0.00510	0.730
	UDP-spike	401.0	0.14	0.00031	0.120
	TCP	445.0	46.60	0.10000	0.990
	Misc	995.0	67.60	0.14000	0.990
3	Q0	101.6	0.25	0.00120	0.200
	Q1	101.6	0.25	0.00120	0.200
	Q2	101.6	0.25	0.00120	0.200
	Q3	101.6	0.11	0.00051	0.100
	Q4	164.0	0.48	0.00200	0.320
	Q5	218.0	0.27	0.00130	0.210
	Q6	242.0	0.93	0.00430	0.480
	Q7	249.0	231.00	1.00000	0.990
	UDP-base	100.0	0.25	0.10000	0.200
	UDP-spike	101.0	0.03	0.00015	0.030
	TCP	266.0	1.40	0.00650	0.590
	Misc	812.0	4.16	0.00190	0.820
4	Q0	101.6	0.25	0.00120	0.200
	Q1	101.6	0.25	0.00120	0.200
	Q2	101.6	0.25	0.00120	0.200
	Q3	101.6	0.11	0.00046	0.100
	Q4	181.0	0.56	0.00230	0.360
	Q5	249.0	229.00	0.92000	0.990
	Q6	279.0	1.26	0.00500	0.560
	Q7	289.0	0.40	0.00160	0.290
	UDP-base	128.0	0.34	0.00140	0.260
	UDP-spike	129.0	0.04	0.00017	0.040
	TCP	333.0	2.75	0.01100	0.740
	Misc	818.0	4.30	0.01720	0.820
5	Q0	124.3	0.33	0.00600	0.240
	Q1	124.3	0.33	0.00600	0.240
	Q2	124.3	0.33	0.00600	0.240
	Q3	124.3	0.14	0.00260	0.120
	Q4	290.0	1.37	0.01800	0.580
	Q5	436.0	1.20	0.01400	0.870
	Q6	498.0	114.00	1.31000	0.990
	Q7	519.0	1.50	0.01800	1.000
	UDP-base	380.0	2.88	0.02650	0.700
	UDP-spike	419.0	0.13	0.00140	0.110
	TCP	445.0	48.00	0.44000	0.990
	Misc	995.0	66.90	1.26250	0.990
6	Q0	452.0	8.80	0.19500	0.900
	Q1	452.0	8.80	0.19500	0.900
	Q2	452.0	8.80	0.19500	0.900
	Q3	452.0	0.82	0.00180	0.450
	Q4	452.0	8.80	0.19500	0.900
	Q5	452.0	8.80	0.19500	0.900
	Q6	452.0	8.80	0.19500	0.900
	Q7	452.0	8.80	0.19500	0.900
	UDP-base	493.0	21.42	0.04730	0.980
	UDP-spike	1681.0	1.00	0.00200	0.520
	TCP	449.0	72.00	0.16000	0.990
	Misc	995.0	82.00	0.18000	0.990
7	Q0	224.0	0.800	0.00360	0.450
	Q1	224.0	0.800	0.00360	0.450
	Q2	224.0	0.80	0.00360	0.450
	Q3	224.0	0.28	0.00130	0.220
	Q4	224.0	0.81	0.00360	0.450
	Q5	224.0	0.81	0.00360	0.450
	Q6	224.0	0.81	0.00360	0.450
	Q7	224.0	0.81	0.00360	0.450
	UDP-base	449.0	8.60	0.03000	0.890
	UDP-spike	449.0	16.00	0.99000	0.140
	TCP	449.0	224.00	0.00073	0.990
	Misc	449.0	0.81	0.03000	0.450

Tab.3

Tab.4

Fig.11

Tab.5

Fig.12

Fig.13

Tab.6

Fig.14

Tab.7

Fig.15

Fig.16

Fig.17


1	ETSI, System architecture for the 5G system, 3GPP TS 23.501, version 15.3. 0, 2018.
2	X. Foukas, G. Patounas, A. Elmokashfi, and M. K. Marina Network slicing in 5G: Survey and challenges[J]. IEEE Commun. Mag., 2017, 55 (5): 94- 100 doi: 10.1109/MCOM.2017.1600951
3	Ericsson AB, Router 6675, Technical specifications, 2019.
4	D. R. Hanks Jr. and H. Reynolds, Juniper MX Series. Sebastopol, CA, USA: O’Reilly Media, 2012.
5	D. Kreutz, F. M. V. Ramos, P. E. Veríssimo, C. E. Rothenberg, S. Azodolmolky, and S. Uhlig Software-defined networking: A comprehensive survey[J]. Proc. IEEE, 2015, 103 (1): 14- 76 doi: 10.1109/JPROC.2014.2371999
6	Cisco Systems, Quality of Service (QoS) configuration guide, Cisco IOS, 2018.
7	R. S. Sutton and A. G. Barto, Reinforcement Learning - An Introduction. 2nd ed. Cambridge, MA, USA: MIT Press, 2018.
8	L. P. Kaelbling, M. L. Littman, and A. R. Cassandra Planning and acting in partially observable stochastic domains[J]. Artif. Intell., 1998, 101 (1&2): 99- 134 doi: 10.1016/S0004-3702(98)00023-X
9	Cisco Systems, QoS: Color-aware policer, Cisco IOS documentation, 2005.
10	C. Semeria, Supporting differentiated service classes: Queue scheduling disciplines, Juniper Networks Whitepaper, 2001.
11	H. Zhang Service disciplines for guaranteed performance service in packet-switching networks[J]. Proc. IEEE, 1995, 83 (10): 1374- 1396 doi: 10.1109/5.469298
12	Cisco Systems, DiffServ – the scalable end-to-end QoS model, WhitePaper, 2005.
13	T. X. Brown, Switch packet arbitration via queue-learning, in Proc. 14^th Int. Conf. Neural Information Processing Systems: Natural and Synthetic, Vancouver, Canada, 2001, pp. 1337–1344.
14	J. A. Boyan and M. L. Littman, Packet routing in dynamically changing networks: A reinforcement learning approach, in Proc. 6^th Int. Conf. Neural Information Processing Systems, Denver, CO, USA, 1993, pp. 671–678.
15	Z. Mammeri Reinforcement learning based routing in networks: Review and classification of approaches[J]. IEEE Access, 2019, 7: 55916- 55950 doi: 10.1109/ACCESS.2019.2913776
16	A. Mestres, A. Rodriguez-Natal, J. Carner, P. Barlet-Ros, E. Alarcón, M. Solé, V. Muntés-Mulero, D. Meyer, S. Barkai, M. J. Hibbett, et al. Knowledge-defined networking[J]. SIGCOMM Comput. Commun. Rev., 2017, 47 (3): 2- 10 doi: 10.1145/3138808.3138810
17	T. C. K. Hui and C. K. Tham Adaptive provisioning of differentiated services networks based on reinforcement learning[J]. IEEE Trans. Syst. Man Cybern. C (Appl. Rev.), 2003, 33 (4): 492- 501 doi: 10.1109/TSMCC.2003.818472
18	J. Rao, X. P. Bu, C. Z. Xu, L. Y. Wang, and G. Yin, VCONF: A reinforcement learning approach to virtual machines auto-configuration, in Proc. 6^th Int. Conf. Autonomic Computing, Barcelona, Spain, 2009, pp. 137–146.
19	A. da Silva Veith, F. R. de Souza, M. D. de Assun??o, L. Lefèvre, and J. C. S. dos Anjos, Multi-objective reinforcement learning for reconfiguring data stream analytics on edge computing, in Proc. 48^th Int. Conf. Parallel Processing, Kyoto, Japan, 2019, p.106.
20	A. Bar-Hillel, A. Di-Nur, L. Ein-Dor, R. Gilad-Bachrach, and Y. Ittach, Workstation capacity tuning using reinforcement learning, in Proc. ACM/IEEE Conf. Supercomputing, Reno, NV, USA, 2007, p. 32.
21	C. H. Yu, J. L. Lan, Z. H. Guo, and Y. X. Hu DROM: Optimizing the routing in software-defined networks with deep reinforcement learning[J]. IEEE Access, 2018, 6: 64533- 64539 doi: 10.1109/ACCESS.2018.2877686
22	T. A. Q. Pham, Y. Hadjadj-Aoul, and A. Outtagarts, Deep reinforcement learning based QoS-aware routing in knowledge-defined networking, in Proc. 14^th EAI Int. Conf. Heterogeneous Networking for Quality, Reliability, Security and Robustness, Ho Chi Minh City, Vietnam, 2019, pp. 14–26.
23	X. Y. You, X. J. Li, Y. D. Xu, H. Feng, and J. Zhao, Toward packet routing with fully-distributed multi-agent deep reinforcement learning, in Proc. of 2019 Int. Symp. Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOPT), Avignon, France, 2019, doi: 10.23919/WiOPT47501.2019.9144110.
24	X. Mai, Q. Z. Fu, and Y. Chen, Packet routing with graph attention multi-agent reinforcement learning, arXiv preprint arXiv: 2107.13181, 2021.
25	R. Bhattacharyya, A. Bura, D. Rengarajan, M. Rumuly, S. Shakkottai, D. Kalathil, R. K. P. Mok, and A. Dhamdhere, QFlow: A reinforcement learning approach to high QoE video streaming over wireless networks, in Proc. 20^th ACM Int. Symp. Mobile Ad Hoc Networking and Computing, Catania, Italy, 2019, pp. 251–260.
26	J. Prados-Garzon, T. Taleb, and M. Bagaa, LEARNET: Reinforcement learning based flow scheduling for asynchronous deterministic networks, in Proc. of 2020 IEEE Int. Conf. Communications, Dublin, Ireland, 2020, doi: 10.1109/ICC40277.2020.9149092.
27	P. Pinyoanuntapong, M. Lee, and P. Wang, Distributed multi-hop traffic engineering via stochastic policy gradient reinforcement learning, in Proc. of 2019 IEEE Global Communications Conf. (GLOBECOM), Waikoloa, HI, USA, https://webpages.uncc.edu/pwang13/pub/routing.pdf, 2019.
28	J. Chavula, M. Densmore, and H. Suleman, Using SDN and reinforcement learning for traffic engineering in UbuntuNet Alliance, in Proc. of 2016 Int. Conf. Advances in Computing and Communication Engineering (ICACCE), Durban, South Africa, 2016, pp. 349–355.
29	K. F. Xiao, S. W. Mao, and J. K. Tugnait TCP-Drinc: Smart congestion control based on deep reinforcement learning[J]. IEEE Access, 2019, 7: 11892- 11904 doi: 10.1109/ACCESS.2019.2892046
30	B. Liu, Q. M. Xie, and E. Modiano, Reinforcement learning for optimal control of Queueing systems, in Proc. of the 57th Annu. Allerton Conf. Communication, Control, and Computing (Allerton), Monticello, IL, USA, 2019, pp. 663–670.
31	J. G. Dai and M. Gluzman, Queueing network controls via deep reinforcement learning, arXiv preprint arXiv: 2008.01644, 2021.
32	M. Raeis, A. Tizghadam, and A. Leon-Garcia Queue-learning: A reinforcement learning approach for providing quality of service[J]. Proc. AAAI Conf. Artif. Intell., 2021, 35 (1): 461- 468
33	A. Kattepur, S. David, and S. Mohalik, Automated configuration of router port queues via model-based reinforcement learning, in Proc. of 2021 IEEE Int. Conf. Communications Workshops, Montreal, Canada, 2021, pp. 1–6.
34	S. Floyd and V. Jacobson Random early detection gateways for congestion avoidance[J]. IEEE/ACM Trans. Netw., 1993, 1 (4): 397- 413 doi: 10.1109/90.251892
35	M. Bertoli, G. Casale, and G. Serazzi JMT: Performance engineering tools for system modeling[J]. ACM SIGMETRICS Perform. Eval. Rev., 2009, 36 (4): 10- 15 doi: 10.1145/1530873.1530877
36	H. Kurniawati, D. Hsu, and W. S. Lee, SARSOP: Efficient point-based POMDP planning by approximating optimally reachable belief spaces, in Proc. Robotics: Science and Systems IV, Zurich, Switzerland, doi: 10.15607/RSS.2008.IV.0092008.
37	E. D. Lazowska, J. Zahorjan, G. S. Graham, and K. C. Sevcik, Quantitative System Performance, Computer System Analysis Using Queueing Network Models. Upper Saddle River, NJ, USA: Prentice-Hall, 1984.

No related articles found!

Viewed

Full text

Abstract

Cited

Shared

Discussed