Please wait a minute...
  2021, Vol. 2 Issue (3): 177-197    doi: 10.23919/ICN.2021.0016
    
Download: PDF (5322 KB)      HTML
Export: BibTeX | EndNote (RIS)      
Received: 25 August 2021      Online: 07 December 2021
Cite this article:

. . , 2021, 2: 177-197.

URL:

http://icn.tsinghuajournals.com/10.23919/ICN.2021.0016     OR     http://icn.tsinghuajournals.com/Y2021/V2/I3/177

Solution space Related work Algorithmic technique Limitation
Intelligent routing DROM[21], knowledge defined networking[22], multi-agent routing[23], graph based routing[24] DROM[21] and knowledge defined networking[22] make use of Deep Deterministic Policy Gradient (DDPG) algorithms to change the weights of network links in an SDN framework. Multi-agent routing[23] upgrades Q-routing with multi-agent RL techniques for coordinated routing. Graph based routing[24] develops a model-free RL technique that considers the graph nature of the network topology tailored to the routing problem. This set of papers makes use of RL within the routing plane of the network. There is no modeling of the internal router/switch queue configurations or machine learning thereof.
Network traffic control QFLOW[25], LEARNET[26], multi-hop routing[27], SDN RL[28], TCP-DRINC[29] QFLOW[25] is a platform for RL-based edge network configuration that uses queuing, learning, and scheduling to meet the quatily of experience of video streaming applications. LEARNET[26] makes use of RL for flow control in time-sensitive deterministic networks. In multi-hop routing[27], a distributed model-free solution based on stochastic policy gradient RL was proposed, which aims to minimize the E2E delay by allowing each router to send a packet to the next-hop router according to the learned optimal probability. SDN-based RL techniques[28] exploits the multi-path forwarding of SDN to increase throughput or reduce latency of packet transmission. TCP-DRINC[29] uses an RL-based congestion control to adjust the congestion window size within TCP networks. This set of traffic engineering techniques makes use of overlay congestion control or SDN-based flow control to improve the throughput, latency, and packet drop of networks. They do not consider the bottlenecks that can arise at the port queue configuration level in underlay networks.
Queue configuration RL-QN[30], Deep RL[31], queue-learning[32] In Ref. [30], model-based RL was used to learn the optimal control policy of queueing networks, so that the average job delay is minimized. In Ref. [31], Proximal Policy Optimization (PPO) algorithm was tested on a parallel-server system and large-size multi-class queueing networks. The algorithm consistently generates control policies that outperform heuristics in a variety of load conditions from light to heavy traffic. Queue-learning[32] studies an RL-based service rate control algorithm for providing QoS in tandem queueing networks. The proposed method is capable of guaranteeing probabilistic upper-bounds on the end-to-end delay of the system. These papers provide a theoretical foundation on applying RL within queueing networks. However, specific nature of protocols used for traffic policing and shaping within routers/switches are not considered.
Tab.1 
Fig.1 
Fig.18 
Fig.19 
Fig.2 
Fig.3 
Fig.20 
Fig.4 
Fig.5 
Fig.21 
Fig.6 
Fig.7 
Fig.8 
Tab.2 
Fig.9 
Fig.10 
Configuration Queue Metric
Throughput (jobs/s) Queue length Residence time (s) Utilization ratio
Baseline Q0 124.3 0.33 0.00070 0.240
Q1 124.3 0.33 0.00070 0.240
Q2 124.3 0.33 0.00070 0.240
Q3 124.3 0.14 0.00034 0.120
Q4 281.0 1.28 0.00300 0.560
Q5 419.0 5.00 0.01200 0.830
Q6 477.0 19.40 0.04600 0.950
Q7 497.0 98.90 0.23000 0.990
UDP-base 353.0 2.22 0.10000 0.700
UDP-spike 379.8 0.13 0.00032 0.110
TCP 445.0 44.00 0.14000 0.990
Misc 995.0 66.90 0.16000 0.990
1 Q0 124.3 0.33 0.00086 0.240
Q1 124.3 0.33 0.00086 0.240
Q2 124.3 0.33 0.00086 0.240
Q3 124.3 0.14 0.00037 0.120
Q4 281.0 1.28 0.00300 0.560
Q5 379.0 3.11 0.00820 0.750
Q6 477.0 19.40 0.05000 0.950
Q7 497.0 103.90 0.27000 0.990
UDP-base 336.0 1.90 0.00510 0.670
UDP-spike 358.0 0.13 0.00033 0.110
TCP 445.0 42.70 0.11000 0.990
Misc 995.0 66.90 0.16000 0.990
2 Q0 124.3 0.330 0.00072 0.240
Q1 124.3 0.33 0.00072 0.240
Q2 124.3 0.33 0.00072 0.240
Q3 124.3 0.14 0.00031 0.120
Q4 281.0 1.28 0.00280 0.560
Q5 457.0 10.30 0.02250 0.910
Q6 477.0 19.10 0.04000 0.950
Q7 497.0 91.00 0.19000 0.990
UDP-base 368.0 2.60 0.00510 0.730
UDP-spike 401.0 0.14 0.00031 0.120
TCP 445.0 46.60 0.10000 0.990
Misc 995.0 67.60 0.14000 0.990
3 Q0 101.6 0.25 0.00120 0.200
Q1 101.6 0.25 0.00120 0.200
Q2 101.6 0.25 0.00120 0.200
Q3 101.6 0.11 0.00051 0.100
Q4 164.0 0.48 0.00200 0.320
Q5 218.0 0.27 0.00130 0.210
Q6 242.0 0.93 0.00430 0.480
Q7 249.0 231.00 1.00000 0.990
UDP-base 100.0 0.25 0.10000 0.200
UDP-spike 101.0 0.03 0.00015 0.030
TCP 266.0 1.40 0.00650 0.590
Misc 812.0 4.16 0.00190 0.820
4 Q0 101.6 0.25 0.00120 0.200
Q1 101.6 0.25 0.00120 0.200
Q2 101.6 0.25 0.00120 0.200
Q3 101.6 0.11 0.00046 0.100
Q4 181.0 0.56 0.00230 0.360
Q5 249.0 229.00 0.92000 0.990
Q6 279.0 1.26 0.00500 0.560
Q7 289.0 0.40 0.00160 0.290
UDP-base 128.0 0.34 0.00140 0.260
UDP-spike 129.0 0.04 0.00017 0.040
TCP 333.0 2.75 0.01100 0.740
Misc 818.0 4.30 0.01720 0.820
5 Q0 124.3 0.33 0.00600 0.240
Q1 124.3 0.33 0.00600 0.240
Q2 124.3 0.33 0.00600 0.240
Q3 124.3 0.14 0.00260 0.120
Q4 290.0 1.37 0.01800 0.580
Q5 436.0 1.20 0.01400 0.870
Q6 498.0 114.00 1.31000 0.990
Q7 519.0 1.50 0.01800 1.000
UDP-base 380.0 2.88 0.02650 0.700
UDP-spike 419.0 0.13 0.00140 0.110
TCP 445.0 48.00 0.44000 0.990
Misc 995.0 66.90 1.26250 0.990
6 Q0 452.0 8.80 0.19500 0.900
Q1 452.0 8.80 0.19500 0.900
Q2 452.0 8.80 0.19500 0.900
Q3 452.0 0.82 0.00180 0.450
Q4 452.0 8.80 0.19500 0.900
Q5 452.0 8.80 0.19500 0.900
Q6 452.0 8.80 0.19500 0.900
Q7 452.0 8.80 0.19500 0.900
UDP-base 493.0 21.42 0.04730 0.980
UDP-spike 1681.0 1.00 0.00200 0.520
TCP 449.0 72.00 0.16000 0.990
Misc 995.0 82.00 0.18000 0.990
7 Q0 224.0 0.800 0.00360 0.450
Q1 224.0 0.800 0.00360 0.450
Q2 224.0 0.80 0.00360 0.450
Q3 224.0 0.28 0.00130 0.220
Q4 224.0 0.81 0.00360 0.450
Q5 224.0 0.81 0.00360 0.450
Q6 224.0 0.81 0.00360 0.450
Q7 224.0 0.81 0.00360 0.450
UDP-base 449.0 8.60 0.03000 0.890
UDP-spike 449.0 16.00 0.99000 0.140
TCP 449.0 224.00 0.00073 0.990
Misc 449.0 0.81 0.03000 0.450
Tab.3 
Tab.4 
Fig.11 
Tab.5 
Fig.12 
Fig.13 
Configuration Queue Metric
Throughput (jobs/s) Queue length Residence time (s) Utilization ratio
Baseline Q0?Q7 249 1.00 0.004 0.50
Egress 999 232.00 0.928 1.00
1 Q0?Q7 485 29.50 0.060 0.97
Egress 786 3.43 0.007 0.77
2 Q0?Q7 416 4.80 0.010 0.83
Egress 999 201.00 0.480 0.99
3 Q0?Q7 97 30.00 0.300 0.97
Egress 388 0.60 0.006 0.38
4 Q0?Q7 249 0.33 0.001 0.25
Egress 999 237.00 0.950 1.00
Tab.6 
Fig.14 
Tab.7 
Fig.15 
Fig.16 
Fig.17 
1   ETSI, System architecture for the 5G system, 3GPP TS 23.501, version 15.3. 0, 2018.
2   X. Foukas, G. Patounas, A. Elmokashfi, and M. K. Marina Network slicing in 5G: Survey and challenges[J]. IEEE Commun. Mag., 2017, 55 (5): 94- 100
doi: 10.1109/MCOM.2017.1600951
3   Ericsson AB, Router 6675, Technical specifications, 2019.
4   D. R. Hanks Jr. and H. Reynolds, Juniper MX Series. Sebastopol, CA, USA: O’Reilly Media, 2012.
5   D. Kreutz, F. M. V. Ramos, P. E. Veríssimo, C. E. Rothenberg, S. Azodolmolky, and S. Uhlig Software-defined networking: A comprehensive survey[J]. Proc. IEEE, 2015, 103 (1): 14- 76
doi: 10.1109/JPROC.2014.2371999
6   Cisco Systems, Quality of Service (QoS) configuration guide, Cisco IOS, 2018.
7   R. S. Sutton and A. G. Barto, Reinforcement Learning - An Introduction. 2nd ed. Cambridge, MA, USA: MIT Press, 2018.
8   L. P. Kaelbling, M. L. Littman, and A. R. Cassandra Planning and acting in partially observable stochastic domains[J]. Artif. Intell., 1998, 101 (1&2): 99- 134
doi: 10.1016/S0004-3702(98)00023-X
9   Cisco Systems, QoS: Color-aware policer, Cisco IOS documentation, 2005.
10   C. Semeria, Supporting differentiated service classes: Queue scheduling disciplines, Juniper Networks Whitepaper, 2001.
11   H. Zhang Service disciplines for guaranteed performance service in packet-switching networks[J]. Proc. IEEE, 1995, 83 (10): 1374- 1396
doi: 10.1109/5.469298
12   Cisco Systems, DiffServ – the scalable end-to-end QoS model, WhitePaper, 2005.
13   T. X. Brown, Switch packet arbitration via queue-learning, in Proc. 14th Int. Conf. Neural Information Processing Systems: Natural and Synthetic, Vancouver, Canada, 2001, pp. 1337–1344.
14   J. A. Boyan and M. L. Littman, Packet routing in dynamically changing networks: A reinforcement learning approach, in Proc. 6th Int. Conf. Neural Information Processing Systems, Denver, CO, USA, 1993, pp. 671–678.
15   Z. Mammeri Reinforcement learning based routing in networks: Review and classification of approaches[J]. IEEE Access, 2019, 7: 55916- 55950
doi: 10.1109/ACCESS.2019.2913776
16   A. Mestres, A. Rodriguez-Natal, J. Carner, P. Barlet-Ros, E. Alarcón, M. Solé, V. Muntés-Mulero, D. Meyer, S. Barkai, M. J. Hibbett, et al. Knowledge-defined networking[J]. SIGCOMM Comput. Commun. Rev., 2017, 47 (3): 2- 10
doi: 10.1145/3138808.3138810
17   T. C. K. Hui and C. K. Tham Adaptive provisioning of differentiated services networks based on reinforcement learning[J]. IEEE Trans. Syst. Man Cybern. C (Appl. Rev.), 2003, 33 (4): 492- 501
doi: 10.1109/TSMCC.2003.818472
18   J. Rao, X. P. Bu, C. Z. Xu, L. Y. Wang, and G. Yin, VCONF: A reinforcement learning approach to virtual machines auto-configuration, in Proc. 6th Int. Conf. Autonomic Computing, Barcelona, Spain, 2009, pp. 137–146.
19   A. da Silva Veith, F. R. de Souza, M. D. de Assun??o, L. Lefèvre, and J. C. S. dos Anjos, Multi-objective reinforcement learning for reconfiguring data stream analytics on edge computing, in Proc. 48th Int. Conf. Parallel Processing, Kyoto, Japan, 2019, p.106.
20   A. Bar-Hillel, A. Di-Nur, L. Ein-Dor, R. Gilad-Bachrach, and Y. Ittach, Workstation capacity tuning using reinforcement learning, in Proc. ACM/IEEE Conf. Supercomputing, Reno, NV, USA, 2007, p. 32.
21   C. H. Yu, J. L. Lan, Z. H. Guo, and Y. X. Hu DROM: Optimizing the routing in software-defined networks with deep reinforcement learning[J]. IEEE Access, 2018, 6: 64533- 64539
doi: 10.1109/ACCESS.2018.2877686
22   T. A. Q. Pham, Y. Hadjadj-Aoul, and A. Outtagarts, Deep reinforcement learning based QoS-aware routing in knowledge-defined networking, in Proc. 14th EAI Int. Conf. Heterogeneous Networking for Quality, Reliability, Security and Robustness, Ho Chi Minh City, Vietnam, 2019, pp. 14–26.
23   X. Y. You, X. J. Li, Y. D. Xu, H. Feng, and J. Zhao, Toward packet routing with fully-distributed multi-agent deep reinforcement learning, in Proc. of 2019 Int. Symp. Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOPT), Avignon, France, 2019, doi: 10.23919/WiOPT47501.2019.9144110.
24   X. Mai, Q. Z. Fu, and Y. Chen, Packet routing with graph attention multi-agent reinforcement learning, arXiv preprint arXiv: 2107.13181, 2021.
25   R. Bhattacharyya, A. Bura, D. Rengarajan, M. Rumuly, S. Shakkottai, D. Kalathil, R. K. P. Mok, and A. Dhamdhere, QFlow: A reinforcement learning approach to high QoE video streaming over wireless networks, in Proc. 20th ACM Int. Symp. Mobile Ad Hoc Networking and Computing, Catania, Italy, 2019, pp. 251–260.
26   J. Prados-Garzon, T. Taleb, and M. Bagaa, LEARNET: Reinforcement learning based flow scheduling for asynchronous deterministic networks, in Proc. of 2020 IEEE Int. Conf. Communications, Dublin, Ireland, 2020, doi: 10.1109/ICC40277.2020.9149092.
27   P. Pinyoanuntapong, M. Lee, and P. Wang, Distributed multi-hop traffic engineering via stochastic policy gradient reinforcement learning, in Proc. of 2019 IEEE Global Communications Conf. (GLOBECOM), Waikoloa, HI, USA, https://webpages.uncc.edu/pwang13/pub/routing.pdf, 2019.
28   J. Chavula, M. Densmore, and H. Suleman, Using SDN and reinforcement learning for traffic engineering in UbuntuNet Alliance, in Proc. of 2016 Int. Conf. Advances in Computing and Communication Engineering (ICACCE), Durban, South Africa, 2016, pp. 349–355.
29   K. F. Xiao, S. W. Mao, and J. K. Tugnait TCP-Drinc: Smart congestion control based on deep reinforcement learning[J]. IEEE Access, 2019, 7: 11892- 11904
doi: 10.1109/ACCESS.2019.2892046
30   B. Liu, Q. M. Xie, and E. Modiano, Reinforcement learning for optimal control of Queueing systems, in Proc. of the 57th Annu. Allerton Conf. Communication, Control, and Computing (Allerton), Monticello, IL, USA, 2019, pp. 663–670.
31   J. G. Dai and M. Gluzman, Queueing network controls via deep reinforcement learning, arXiv preprint arXiv: 2008.01644, 2021.
32   M. Raeis, A. Tizghadam, and A. Leon-Garcia Queue-learning: A reinforcement learning approach for providing quality of service[J]. Proc. AAAI Conf. Artif. Intell., 2021, 35 (1): 461- 468
33   A. Kattepur, S. David, and S. Mohalik, Automated configuration of router port queues via model-based reinforcement learning, in Proc. of 2021 IEEE Int. Conf. Communications Workshops, Montreal, Canada, 2021, pp. 1–6.
34   S. Floyd and V. Jacobson Random early detection gateways for congestion avoidance[J]. IEEE/ACM Trans. Netw., 1993, 1 (4): 397- 413
doi: 10.1109/90.251892
35   M. Bertoli, G. Casale, and G. Serazzi JMT: Performance engineering tools for system modeling[J]. ACM SIGMETRICS Perform. Eval. Rev., 2009, 36 (4): 10- 15
doi: 10.1145/1530873.1530877
36   H. Kurniawati, D. Hsu, and W. S. Lee, SARSOP: Efficient point-based POMDP planning by approximating optimally reachable belief spaces, in Proc. Robotics: Science and Systems IV, Zurich, Switzerland, doi: 10.15607/RSS.2008.IV.0092008.
37   E. D. Lazowska, J. Zahorjan, G. S. Graham, and K. C. Sevcik, Quantitative System Performance, Computer System Analysis Using Queueing Network Models. Upper Saddle River, NJ, USA: Prentice-Hall, 1984.
No related articles found!