During our work on this thesis we were not able to
\subsection{Testing framework}
We provided the testing framework we used for performing tests on ShortCut. This framework can be used for many different test runs, but is still a prototype. The structure is far from optimal and most pipelines can be optimized.
@ -13,4 +14,13 @@ One example is the usage of \textit{iperf3} for performing bandwidth measurement
All measurements were done using one specific tool, depending on the type of measurement, namely \textit{iperf} for bandwidth measurements and the production of data streams and \textit{ping} for latency tests. \textit{iperf} could be replaced with a multitude of software packets and some members of the Mininet community have suggested that e.g. \textit{netperf} (\cite{Jones.2015}) would provide more accurate results. This could be evaluated in further detail.
\subsection{Measuring CPU usage of hosts in Mininet}
Tests in this work are done with a limit of \SI{100}{Mbps} imposed on the links, in tests without a limit to the bandwidth values of around \SI{40}{Gbps} were reached. While this should remove any fluctuations that could be caused by additional operations either on the virtual machine or the host system, it does not completely ensure proper distribution of processing power. For this we could run CPU usage measurements while the actual tests are running. This would enable us to further interpret results and possible spikes in delay or bandwidth.
Tests in this work are done with a limit of \SI{100}{Mbps} imposed on the links, in tests without a limit to the bandwidth values of around \SI{40}{Gbps} were reached. While this should remove any fluctuations that could be caused by additional operations either on the virtual machine or the host system, it does not completely ensure proper distribution of processing power. For this we could run CPU usage measurements while the actual tests are running. This would enable us to further interpret results and possible spikes in delay or bandwidth.
\subsection{Massive testing}
Because of the time constrains of this work we were unable to test in high volumes, even though we experienced some fluctuations in our measurements. To increase the reliability of our results the tests could be run e.g. a hundred times. This could also be integrated into the testing framework with an additional argument specifying in which quantity the test should be run.
\subsection{Adding topologies, FRR variants and FRMs}
In this work we evaluated three pretty similar topologies, as well as a simple implementation of FRR and an implementation of the FRM ShortCut.
Depending on the requirements of a network, an e.g. full topology with all routers inter-connected might be a good starting point for further testing. This should go hand in hand with the implementation of an automatic routing and a more strategic deployment of FRR and FRMs.
As described in \cref{FRM} there are also many different FRMs which could be implemented in Mininet and tested using our test framework.
@ -7,8 +7,44 @@ The evaluations are sorted by topology. For each topology we measured the bandwi
We start with our minimal network in section \ref{eva_minimal_network}, followed by the evaluation of two networks with longer "failure paths", measuring the influence of additional nodes in looped paths in section \ref{eva_failure_path_network}.
Lastly we discuss our results in \cref{discussion}.
\input{content/evaluation/minimal_network}
\input{content/evaluation/failure_path_networks}
\section{Discussion of results}
\label{discussion}
In this section we discuss our results in the previous measurements. We proceed by comparing the results of different measurement types using the three topologies. For each measurement type we collect the implications of a failure for the network and whether ShortCut is able to enhance results. We start with the bandwidth in \cref{discussion_bandwidth}, continuing to the bandwidth with a second data flow in \cref{discussion_bandwidth_link_usage}. After that we talk about our latency measurements in \cref{discussion_latency} followed by our packet flow measurements using TCP and UDP in \cref{discussion_packet_flow_tcp} and \cref{discussion_packet_flow_udp} respectively.
\subsection{Bandwidth}
\label{discussion_bandwidth}
A failure in our topologies did not have an impact on our bandwidth measurement results. The throughput of a network and therefore the bandwidth that can be achieved on a path is not influenced by additional hops on a route, even though these might increase latency. This is, of course, also true for longer failure paths. As long as no additional data flows are sent through the network the bandwidth is not impacted and ShortCut has no need to restore performance.
\subsection{Bandwidth with concurrent data flow}
\label{discussion_bandwidth_link_usage}
Measuring the achieved bandwidth with a concurrent data flow shows that in case a second data flow is running on the network, the looped path now has a real performance impact. The throughput is split between both data flows on all links that are looped. This causes an overall throughput loss of 50\%, as the links included in the looped path create a bottleneck for both data flows.
Using ShortCut under such circumstances will fully restore performance to the state without failure. It has to be noted that, although in our measurements both data flows achieved a maximum throughput with ShortCut, the alternative route still could be used by other data flows which might be influenced. This is however also true for the network when not using ShortCut.
Longer failure paths showed no additional impact on the performance and ability of ShortCut to restore the throughput.
A longer failure path will have an adverse effect in a realistic network. Because more links experience additional traffic, even more data flows might be affected by the failure. The usage of ShortCut would therefore be more beneficial the more links can be removed from looped paths.
\subsection{Latency}
\label{discussion_latency}
Sending packets over additional hops will without a doubt increase the latency. This is confirmed in our results.
Longer failure paths would of course increase this additional delay. Because ShortCut was able to cut off the looped path, it was able to restore the original latency on our test networks.
As such ShortCut provides a reliable way to optimize alternative routes, especially for time sensitive data like VOIP.
\subsection{TCP packet flow}
\label{discussion_packet_flow_tcp}
Our packet flow measurements using a TCP data transfer showed the amount of packets forwarded on each router. A naive assumption would be that the routers on a looped path would have to forward each packet twice and therefore have a 100\% increased load on them. This is, however, not true for TCP as only packets sent to the receiving device are forwarded through our loop in our topologies. ACKs returning from the \textit{iperf} server are not sent through the looped path. Because of this the routers on the looped path actually only forward half of all packets, except for the router at the entry point of the loop. It has to be noted that these packets all contain data and therefore do have full impact on the bandwidth of links involved in the looped path as discussed previously in \cref{discussion_bandwidth_link_usage}.
The entry point of the loop, which in our topologies is always router R1, forwards each packet containing data twice and acknowledgements once, increasing the workload for the router by 50\%.
Longer failure paths do not influence this behaviour.
ShortCut is able to cut the looped path and therefore restores the network to a state in which all routers on the path of routing forward the same amount of packets, with routers on the looped path not forwarding any packets anymore.
\subsection{UDP packet flow}
\label{discussion_packet_flow_udp}
When using UDP for our data transfer and measuring the packets forwarded on routers, the differences between TCP and UDP become quite obvious. As UDP does not send ACKs on successful transmission there a
@ -30,9 +30,11 @@ When measuring the bandwidth of our networks with longer failure paths the resul
The addition of hops to the failure path did not have an effect on the bandwidth.
\subsection{Two concurrent data transfers}
Similar to the the results for our minimal network in \cref{evaluation_minimal_bandwidth_link_usage}, the addition of a second measurement running concurrently on the looped path does reduce throughput for both data flows.
The longer failure path however implicates that the impact in a realistic environment might be much bigger. Because more links experience additional traffic, even more data flows might be affected by the failure. The usage of ShortCut would therefore be more beneficial the more links can be removed from looped paths.
We started two concurrent data flows using \textit{iperf}. In case of a failure, these two data flows would influence each other.
\subsubsection{With FRR}
\begin{figure}
\centering
@ -52,6 +54,11 @@ The longer failure path however implicates that the impact in a realistic enviro
\caption{Bandwidth with concurrent data transfer on H3 to H1}
Similar to the the results for our minimal network in \cref{evaluation_minimal_bandwidth_link_usage}, the addition of a second measurement running concurrently on the looped path does not reduce throughput for both data flows as can be seen in \cref{fig:evaluation_failure_path_1_bandwidth_link_usage_wo_sc_a}.
When introducing a failure however the two data flows use all links from router R3 to router R1 simultaneously. They effectively have to split the available bandwidth, reducing the overall throughput by 50\% as can be seen in \cref{fig:evaluation_failure_path_1_bandwidth_link_usage_wo_sc_b}, but there is no impact on the available bandwidth unique to the longer failure paths. This is why we only added the graphs for the smaller variant shown in \cref{fig:evaluation_failure_path_1_network}.
\begin{figure}
\centering
@ -60,7 +67,11 @@ The longer failure path however implicates that the impact in a realistic enviro
Introducing the failure concurrently to the data transfer causes both bandwidths to abruptly drop, which can be seen in \cref{fig:evaluation_failure_path_1_bandwidth_link_usage_concurrent_wo_sc}. Although the two data flows distribute the bandwidth differently, they achieve an overall throughput of \SI{100}{Mbps}. We assume that the incoherent distribution of bandwidth is caused by the timing of the data transfers.
The data transfer in \cref{fig:evaluation_failure_path_1_bandwidth_link_usage_wo_sc_b} already starts with the failure in place. Because the \textit{iperf} instance producing the additional data flow is started slightly before our main data flow, Mininet seems to allocate more bandwidth to this transfer. The graph also suggests that both bandwidths approximate each other, suggesting that Mininet tries to, over time, allocate both transfers the same bandwidth.
Our measurement with a failure occurring concurrent to our data transfers however evens the playing field. Both \textit{iperf} instances already send data over the network. This could explain the overall more evenly distributed bandwidth, as well as the main data flow even overtaking the additional data flow.
\subsubsection{With FRR and ShortCut}
@ -92,37 +103,42 @@ The longer failure path however implicates that the impact in a realistic enviro
\subsection{Latency}
\label{failure_path_latency}
We measured the latency between host H1 and H6 for our first failure path network and between host H1 and H8 for our second failure path network.
The additional hops in our failure path networks add, as expected, latency to the measurements. In case of our first failure path network around \SI{20}{\milli\second} of additional latency were measured after a failure as can be seen in \cref{fig:evaluation_failure_path_1_latency_wo_sc_b}. The second failure path network adds an additional \SI{10}{\milli\second} to the latency in case of a failure, adding \SI{30}{\milli\second} in total as can be seen in \cref{fig:evaluation_failure_path_2_latency_wo_sc_b}. This is caused by the additional links on the longer path, with Mininet adding \SI{5}{\milli\second} of delay for each link that is passed. Because only ICMP echo requests and not replies use the looped path, as packets returning from either host H6 or H8 are not forwarded to router R2 when arriving on router R1, the additional latency on the network will always be \SI{10}{\milli\second} for each link contained on the looped path. The additional link is passed twice by each ICMP echo request.
Similar to our results when measuring the minimal topology in \cref{evaluation_minimal_latency}, ShortCut is able to restore the original latency after a failure, independent of the length of the cut looped path, as can be seen in \cref{fig:evaluation_failure_path_1_latency_sc}.
@ -17,17 +17,19 @@ The key difference between both is the time they take to become active. Because
Most of the FRR approaches will however create sub-optimal paths which may be already in use or contain loops, effectively reducing the performance of the network.
FRMs like ShortCut (\cite{Shukla.2021}), Resilient Routing Layers (\cite{Kvalbein.2005}), Revive (\cite{Haque.2018}) and Blink (\cite{ThomasHolterbach.2019}) try to alleviate this issue by removing longer paths from the routings only using data available on the device, bridging the gap between FRR and the global convergence protocol.
FRMs like ShortCut (\cite{Shukla.2021}), Resilient Routing Layers (RRL) (\cite{Kvalbein.2005}), Revive (\cite{Haque.2018}) and Blink (\cite{ThomasHolterbach.2019}) try to alleviate this issue by removing longer paths from the routings only using data available on the device, bridging the gap between FRR and the global convergence protocol.
\section{State of the art}
Until the global convergence protocol converges it leaves the routing to In-network methods like FRR which will reroute traffic according to pre-defined alternative routes on the network. In some cases however methods like FRR cause routing paths to be longer than necessary which produces additional traffic on the network and adds delay to transmissions.
Resilient Routing Layers pre-computes alternative routing tables, switching between routing tables in case of failure, but needs to manipulate packets to inform routers of changed routing tables.
RRL pre-computes alternative routing tables, switching between routing tables in case of failure, but needs to manipulate packets to inform routers of changed routing tables.
ShortCut uses information about the incoming packet to determine whether or not the packet returned to the router, using already existing FRR implementations. In case a packet returns it will remove the route with the highest priority from the routing table, assuming that the path is no longer available.
Revive installs backup routes prior WRITE THIS
Revive installs backup routes pro-actively using an optimized algorithm and controllers, but is prone to loops created by alternative paths as failures are not propagated to routers.
Blink WRITE THIS
@ -47,6 +49,6 @@ To test ShortCut we provide a prototype for an implementation using \textit{nfta
We developed a testing framework that can be used to automatically create Mininet topologies, formulate tests in python dictionaries using existing measurement functions, set network wide bandwidth limits or delays and to run automatic sets of tests.
The framework can be called using an argument based \textit{command line interface} (CLI).
Using this framework we test several topologies using FRR with and without ShortCut and discuss the results, showing the usefulness and resource efficiency of the FRM ShortCut.
Using this framework we test several topologies using FRR with and without ShortCut and discuss the results, showing the possible applications and benefits of the FRM ShortCut.