some text changes

master
Frederik Maaßen 2 years ago
parent 460fb33e01
commit ae9a1c4980
  1. 8
      CITATION.cff
  2. 1
      thesis/content/basics/resilient_routing.tex
  3. 2
      thesis/content/evaluation/evaluation.tex
  4. 170
      thesis/content/evaluation/failure_path_networks.tex
  5. 28
      thesis/content/evaluation/minimal_network.tex
  6. 6
      thesis/content/implementation/shortcut_implementation.tex
  7. 25
      thesis/content/introduction.tex

@ -0,0 +1,8 @@
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: Maaßen
given-names: Frederik
title: "Comparison Of Fast Recovery Methods - Bachelor Thesis"
version: 0.0.1
date-released: 2022-05-16

@ -136,3 +136,4 @@ ShortCut is applicable to most network topologies as well as pre-existing FRR an
\subsubsection{Blink}
\subsubsection{Revive}
Revive (\cite{Haque.2018})

@ -3,7 +3,7 @@
In this chapter we evaluate tests that were run using our test framework in Mininet. The tests were performed as described in \ref{cp:testing} with a bandwidth limit on each link of \SI{100}{Mbps}. When testing with delays on the network we noticed that the performance dropped rapidly. This is why we only use an additional delay of \SI{5}{\milli\second} per link in our latency tests - other tests do not use a delay.
The evaluations are sorted by topology. For each topology we measured the bandwidth, bandwidth with a concurrent data flow, latency and packet flow. Each test was repeated once with an implementation of ShortCut.
The evaluations are sorted by topology. For each topology we measured the bandwidth, bandwidth with a concurrent data flow, latency, TCP packet flow and UDP packet flow. We execute each test once with FRR active in the corresponding section \textit{With FRR} and once with FRR and our implementation of ShortCut active in the corresponding section \textit{With FRR and ShortCut}.
We start with our minimal network in section \ref{eva_minimal_network}, followed by the evaluation of two networks with longer "failure paths", measuring the influence of additional nodes in looped paths in section \ref{eva_failure_path_networks}.

@ -20,39 +20,47 @@
\label{fig:evaluation_failure_path_networks}
\end{figure}
In this section we evaluate the results for our two failure path networks seen in \cref{fig:evaluation_failure_path_networks}. The networks were created so that the additional hops on a looped path would be longer, effectively simulating a more severe failure in terms of affected routes.
Most tests however did not produce significantly different results to the minimal network, evaluated in \cref{eva_minimal_network}, which is why we will focus on differences between the two topology classes.
\subsection{Bandwidth}
When measuring the bandwidth of our networks with longer failure paths the results were similar to those of the bandwidth measurement in the minimal network, described in \cref{evaluation_minimal_bandwidth}.
The addition of hops to the failure path did not have an effect on the bandwidth.
\subsection{Two concurrent data transfers}
In this test we evaluated the bandwidth between H1 and H4 with a concurrent data transfer on H2 to H1. Both transfers were run with a limitation of \SI{100}{Mbps}, which constitutes the maximum allowed bandwidth in this test.
Similar to the bandwidth results, the addition of a second measurement running concurrently on the looped path does reduce throughput for both data flows.
The longer failure path however implicates that the impact in a realistic environment might be much bigger. Because more links experience additional traffic, even more data flows might be affected by the failure. The usage of ShortCut would therefore be more beneficial the more links can be removed from looped paths.
\subsubsection{With FRR}
\begin{figure}
\centering
\begin{subfigure}[b]{0.49\textwidth}
\centering
\includegraphics[width=\textwidth]{tests/minimal_bandwidth_link_usage/bandwidth_link_usage_before_wo_sc}
\label{fig:evaluation_minimal_bandwidth_link_usage_wo_sc_a}
\includegraphics[width=\textwidth]{tests/failure_path_1_bandwidth_link_usage/bandwidth_link_usage_before_wo_sc}
\label{fig:evaluation_failure_path_1_bandwidth_link_usage_wo_sc_a}
\caption{Bandwidth before a failure}
\end{subfigure}
\hfill
\begin{subfigure}[b]{0.49\textwidth}
\centering
\includegraphics[width=\textwidth]{tests/minimal_bandwidth_link_usage/bandwidth_link_usage_after_wo_sc}
\label{fig:evaluation_minimal_bandwidth_link_usage_wo_sc_b}
\includegraphics[width=\textwidth]{tests/failure_path_1_bandwidth_link_usage/bandwidth_link_usage_after_wo_sc}
\label{fig:evaluation_failure_path_1_bandwidth_link_usage_wo_sc_b}
\caption{Bandwidth after a failure}
\end{subfigure}
\caption{Bandwidth with concurrent data transfer on H2 to H1}
\label{fig:evaluation_minimal_bandwidth_link_usage_wo_sc}
\label{fig:evaluation_failure_path_1_bandwidth_link_usage_wo_sc}
\end{figure}
\begin{figure}
\centering
\includegraphics[width=10cm]{tests/minimal_bandwidth_link_usage/bandwidth_link_usage_concurrent_wo_sc}
\includegraphics[width=10cm]{tests/failure_path_1_bandwidth_link_usage/bandwidth_link_usage_concurrent_wo_sc}
\caption{Bandwidth H1 to H4 with concurrent data transfer on H2 to H1 - failure occuring after 15 seconds}
\label{fig:evaluation_minimal_bandwidth_link_usage_concurrent_wo_sc}
\label{fig:evaluation_failure_path_1_bandwidth_link_usage_concurrent_wo_sc}
\end{figure}
Before a failure, as can be seen in \cref{fig:evaluation_minimal_bandwidth_link_usage_wo_sc_a}, the throughput is at around \SI{100}{Mbps} which is our current maximum. While the additional transfer between H2 and H1 does in fact use some of the links that are also used in our \textit{iperf} test, namely the link between R1 to R2 and H1 to R1, it does so in a different direction. While the data itself is sent from H1 to H4 over H2, only the tcp acknowledgements are sent on the route back. Data from H2 to H1 is sent from R2 to R1 and therefore only the returning acknowledgements use the link in the same direction, not impacting the achieved throughput.
If a failure is introduced however, traffic from H1 does not only loop over R2, using up bandwidth from R2 to R1, it is also using the same path from R1 to R3 for its traffic. Therefore we experience a huge performance drop to around \SIrange{20}{30}{Mbps}. While in theory this will last until the global convergence protocol rewrites the route, the lost data throughput in our network in this time frame on this route would be around \SI{75}{Mbps}.
\subsubsection{With FRR and ShortCut}
@ -60,209 +68,179 @@ If a failure is introduced however, traffic from H1 does not only loop over R2,
\centering
\begin{subfigure}[b]{0.49\textwidth}
\centering
\includegraphics[width=\textwidth]{tests/minimal_bandwidth_link_usage/bandwidth_link_usage_before_sc}
\label{fig:evaluation_minimal_bandwidth_link_usage_sc_a}
\includegraphics[width=\textwidth]{tests/failure_path_1_bandwidth_link_usage/bandwidth_link_usage_before_sc}
\label{fig:evaluation_failure_path_1_bandwidth_link_usage_sc_a}
\caption{Bandwidth before a failure}
\end{subfigure}
\hfill
\begin{subfigure}[b]{0.49\textwidth}
\centering
\includegraphics[width=\textwidth]{tests/minimal_bandwidth_link_usage/bandwidth_link_usage_after_sc}
\label{fig:evaluation_minimal_bandwidth_link_usage_sc_b}
\includegraphics[width=\textwidth]{tests/failure_path_1_bandwidth_link_usage/bandwidth_link_usage_after_sc}
\label{fig:evaluation_failure_path_1_bandwidth_link_usage_sc_b}
\caption{Bandwidth after a failure}
\end{subfigure}
\caption{Bandwidth with concurrent data transfer on H2 to H1 using ShortCut}
\label{fig:evaluation_minimal_bandwidth_link_usage_sc}
\label{fig:evaluation_failure_path_1_bandwidth_link_usage_sc}
\end{figure}
\begin{figure}
\centering
\includegraphics[width=10cm]{tests/minimal_bandwidth_link_usage/bandwidth_link_usage_concurrent_sc}
\includegraphics[width=10cm]{tests/failure_path_1_bandwidth_link_usage/bandwidth_link_usage_concurrent_sc}
\caption{Bandwidth H1 to H4 with concurrent data transfer on h2 to h1 - failure occuring after 15 seconds using ShortCut}
\label{fig:evaluation_minimal_bandwidth_link_usage_concurrent_sc}
\label{fig:evaluation_failure_path_1_bandwidth_link_usage_concurrent_sc}
\end{figure}
\subsection{Latency}
In the following sections we evaluate the latency measurements run on the minimal topology with 4 routers and 3 hosts first with only FRR in \cref{minimal_latency_with_frr} and then with our implementation of ShortCut running in \cref{minimal_latency_with_frr_and_shortcut}.
\subsubsection{With FRR}
\label{minimal_latency_with_frr}
\label{failure_path_1_latency_with_frr}
\begin{figure}
\centering
\begin{subfigure}[b]{0.49\textwidth}
\centering
\includegraphics[width=\textwidth]{tests/minimal_latency/latency_before_failure_wo_sc}
\label{fig:evaluation_minimal_latency_wo_sc_a}
\includegraphics[width=\textwidth]{tests/failure_path_1_latency/latency_before_failure_wo_sc}
\label{fig:evaluation_failure_path_1_latency_wo_sc_a}
\caption{Latency before a failure}
\end{subfigure}
\begin{subfigure}[b]{0.49\textwidth}
\centering
\includegraphics[width=\textwidth]{tests/minimal_latency/latency_after_failure_wo_sc}
\label{fig:evaluation_minimal_latency_wo_sc_b}
\includegraphics[width=\textwidth]{tests/failure_path_1_latency/latency_after_failure_wo_sc}
\label{fig:evaluation_failure_path_1_latency_wo_sc_b}
\caption{Latency after a failure}
\end{subfigure}
\caption{Latency measured with ping}
\label{fig:evaluation_minimal_latency_wo_sc}
\label{fig:evaluation_failure_path_1_latency_wo_sc}
\end{figure}
\begin{figure}
\centering
\includegraphics[width=10cm]{tests/minimal_latency/latency_concurrent_wo_sc}
\includegraphics[width=10cm]{tests/failure_path_1_latency/latency_concurrent_wo_sc}
\caption{Latency with a concurrent failure after 15 seconds}
\label{fig:evaluation_minimal_latency_concurrent_wo_sc}
\label{fig:evaluation_failure_path_1_latency_concurrent_wo_sc}
\end{figure}
As each link adds \SI{5}{\milli\second} of delay and \textit{ping} logs the difference in time between sending a packet and receiving an answer, the approximate delay would be the amount of links passed \textit{N} multiplied with the delay per link. In our test network there are 6 links between H1 and H4. Because these links are passed twice, one time to H4 and one time back to H1, this results in an approximate delay of \SI{60}{\milli\second}.
The test run confirmed these assumptions. As can be seen in \cref{fig:evaluation_minimal_latency_wo_sc_a} a ping on the network without failure took an average of around 65 milliseconds with slight variations. The additional \SI{5}{\milli\second} are most likely caused in the routing process on the router.
When introducing a failure however, additional links are passed on the way from H1 to H4. Instead of 6 links passed per direction, the network now sends the packets on a sub-optimal path which adds 2 passed links from R1 to R2 and back. These are only passed when sending packets to H4, packets returning from H4 will not take the sub-optimal path. This would, in theory, add around \SI{10}{\milli\second} of delay to our original results.
As can be seen in \cref{fig:evaluation_minimal_latency_wo_sc_b} this is also the case. With an average of around \SI{76}{\milli\second} of latency the results show an additional delay of around \SI{11}{\milli\second} when taking the sub-optimal path. The discrepancy between our assumption of \SI{10}{\milli\second} and the actual added \SI{11}{\milli\second} might be caused by the additional router that is passed in the direction to H4.
When the failure is introduced concurrent to a running test, the latency spikes to around \SI{94}{\milli\second} for one packet as can be seen in \cref{fig:evaluation_minimal_latency_concurrent_wo_sc}. This might be caused by the deactivation of interfaces using \textit{ifconfig} and a packet arriving just at the moment of reconfiguration, as packets are sent every \SI{0.5}{\second} and the failure is introduced exactly \SI{15}{\second} after starting the measurement. Depending on the time \textit{ifconfig} takes to reconfigure this would cause the packet to remain in the queue until the reconfiguration is finished, adding to the latency measured in this one instance.
\subsubsection{With FRR and ShortCut}
\label{minimal_latency_with_frr_and_shortcut}
\label{failure_path_1_latency_with_frr_and_shortcut}
\begin{figure}
\centering
\begin{subfigure}[b]{0.49\textwidth}
\centering
\includegraphics[width=\textwidth]{tests/minimal_latency/latency_before_failure_sc}
\label{fig:evaluation_minimal_latency_sc_a}
\includegraphics[width=\textwidth]{tests/failure_path_1_latency/latency_before_failure_sc}
\label{fig:evaluation_failure_path_1_latency_sc_a}
\caption{Latency before a failure}
\end{subfigure}
\hfill
\begin{subfigure}[b]{0.49\textwidth}
\centering
\includegraphics[width=\textwidth]{tests/minimal_latency/latency_after_failure_sc}
\label{fig:evaluation_minimal_latency_sc_b}
\includegraphics[width=\textwidth]{tests/failure_path_1_latency/latency_after_failure_sc}
\label{fig:evaluation_failure_path_1_latency_sc_b}
\caption{Latency after a failure}
\end{subfigure}
\caption{Latency measured with ping using ShortCut}
\label{fig:evaluation_minimal_latency_sc}
\label{fig:evaluation_failure_path_1_latency_sc}
\end{figure}
\begin{figure}
\centering
\includegraphics[width=10cm]{tests/minimal_latency/latency_concurrent_sc}
\includegraphics[width=10cm]{tests/failure_path_1_latency/latency_concurrent_sc}
\caption{Latency with a concurrent failure after 15 seconds with ShortCut}
\label{fig:evaluation_minimal_latency_concurrent_sc}
\label{fig:evaluation_failure_path_1_latency_concurrent_sc}
\end{figure}
Our implementation of ShortCut using \textit{nftables} does not seem to add any additional delay to packet transfers as is evident when comparing the average delay produced before a failure in \cref{fig:evaluation_minimal_latency_wo_sc_a} and \cref{fig:evaluation_minimal_latency_sc_a}.
When introducing the failure the latency does not change when introducing a failure to the network as can be seen in \cref{fig:evaluation_minimal_latency_sc}. This is caused by the removal of the looped path and therefore the additional delay each packet would be subjected to.
The spike in latency which can be seen in \cref{fig:evaluation_minimal_latency_concurrent_sc}, occurring when the failure is introduced can be attributed to the same possible scenario as explained in \cref{minimal_latency_with_frr}, which is most likely a timing issue with the introduction of the failure on the routers R2 and R4 and simultaneously sent ICMP packets.
\subsection{Packet flow - TCP}
\label{tcp_packet_flow}
To show the amount of TCP packets being forwarded on each router, we measured the packet flow on all routers of this topology. This is done by counting TCP packets with \textit{nftables} while a concurrent data transfer is started from H1 to H4. The results include the amount of packets forwarded on each router per second. This was done with an intermediate and concurrent failure for a network with FRR in \cref{minimal_packet_flow_with_frr}, as well as a network with an additional implementation of ShortCut in \cref{minimal_packet_flow_with_frr_and_shortcut}.
\subsubsection{With FRR}
\label{minimal_packet_flow_with_frr}
\label{failure_path_1_packet_flow_with_frr}
\begin{figure}
\centering
\begin{subfigure}[b]{0.49\textwidth}
\centering
\includegraphics[width=\textwidth]{tests/minimal_packet_flow/before_failure_wo_sc_graph}
\label{fig:evaluation_minimal_packet_flow_wo_sc_a}
\includegraphics[width=\textwidth]{tests/failure_path_1_packet_flow/before_failure_wo_sc_graph}
\label{fig:evaluation_failure_path_1_packet_flow_wo_sc_a}
\caption{TCP Packets on routers before a failure}
\end{subfigure}
\hfill
\begin{subfigure}[b]{0.49\textwidth}
\centering
\includegraphics[width=\textwidth]{tests/minimal_packet_flow/after_failure_wo_sc_graph}
\label{fig:evaluation_minimal_packet_flow_wo_sc_b}
\includegraphics[width=\textwidth]{tests/failure_path_1_packet_flow/after_failure_wo_sc_graph}
\label{fig:evaluation_failure_path_1_packet_flow_wo_sc_b}
\caption{TCP Packets on routers after a failure}
\end{subfigure}
\caption{TCP Packets on all routers measured with \textit{nftables} counters}
\label{fig:evaluation_minimal_packet_flow_wo_sc}
\label{fig:evaluation_failure_path_1_packet_flow_wo_sc}
\end{figure}
\begin{figure}
\centering
\includegraphics[width=10cm]{tests/minimal_packet_flow/concurrent_failure_wo_sc_graph}
\includegraphics[width=10cm]{tests/failure_path_1_packet_flow/concurrent_failure_wo_sc_graph}
\caption{Packet flow on all routers with failure after 15 seconds}
\label{fig:evaluation_minimal_packet_flow_concurrent_wo_sc}
\label{fig:evaluation_failure_path_1_packet_flow_concurrent_wo_sc}
\end{figure}
The results in the network before a failure are as to be expected and can be seen in \cref{fig:evaluation_minimal_packet_flow_wo_sc_a}. Each router on the route from H1 to H4, which includes R1, R2 and R4, report the same amount of packets at each point of measurement. While the packet count fluctuates during the measurement no packet loss was reported and the bandwidth was at an average of \SI{95}{Mbps} during the whole run of the test. This is why we assume that the fluctuations can be attributed to the mechanisms used in \textit{iperf}.
After a failure all four routers receive packets as can be seen in \cref{fig:evaluation_minimal_packet_flow_wo_sc_b}, but router R1 now receives most packets with an average of around 1500 packets while router R3 and R4 receive roughly the same amount of packets as before the failure at an average of around 1000 packets. Router R2 receives the least packets with an average of around 500 packets.
This is most likely caused by the looped path and the implications for packet travel this has. Router R1 receives all packets that are sent to H4 from H1 twice, once sending them to R2 and the second time when receiving the packets back from R2 to send them to R3. But while all packets \textbf{sent} from H1 pass R1 twice, acknowledgements sent back by the \textit{iperf} server on H4 will only pass R1 once, as R1 would not send packets with H1 as destination to R2. Router R2 on the other hand only receives packets sent to H4 but none of the ACKs sent back. This is why, when compared to the average packet count of all routers in \cref{fig:evaluation_minimal_packet_flow_wo_sc_a}, R2 receives roughly half of all packets a router would normally receive as TCP specifies that for each received packet TCP will send an ACK as answer. This also explains why router R1 forwards an average of around 1500 packets per second, forwarding data packets with around 500 packets per second twice and forwarding acknowledgement packets once with also 500 packets per second, producing an additional 50\% load on the router.
Aside from the changed path and therefore the inclusion of router R3 in this path, routers R3 and R4 are unaffected by the failure, forwarding each packet once.
When causing a failure while the bandwidth measurement is running, the failure itself will cause a sudden drop to 0 packets forwarded for a short amount of time. This can be attributed to the time the routers take to change their configuration. The \textit{nftables} counter uses the "forward" netfilter hook which is called in a pre-defined phase in the network stack of linux. Packets which are logged in the forwarding state already received a routing decision, but because Mininet needs some time to reconfigure the interface to shut down a connection, imitating a failure, the packets have to wait for the router to be ready again.
This behaviour has also been observed when measuring the latency and introducing a failure concurrently in \cref{fig:evaluation_minimal_latency_concurrent_wo_sc}, adding delay to packets to be delivered in the moment of failure.
Reconfiguration of routers in Mininet does not reset the \textit{nftables} counters either, which was confirmed in a quick test counting packets of an \textit{iperf} transfer and shutting down an interface on the same router. The packet count did not change after shutting down the interface.
\subsubsection{With FRR and ShortCut}
\label{minimal_packet_flow_with_frr_and_shortcut}
\label{failure_path_1_packet_flow_with_frr_and_shortcut}
\begin{figure}
\centering
\begin{subfigure}[b]{0.49\textwidth}
\centering
\includegraphics[width=\textwidth]{tests/minimal_packet_flow/before_failure_sc_graph}
\label{fig:evaluation_minimal_packet_flow_sc_a}
\includegraphics[width=\textwidth]{tests/failure_path_1_packet_flow/before_failure_sc_graph}
\label{fig:evaluation_failure_path_1_packet_flow_sc_a}
\caption{TCP Packets on routers before a failure}
\end{subfigure}
\hfill
\begin{subfigure}[b]{0.49\textwidth}
\centering
\includegraphics[width=\textwidth]{tests/minimal_packet_flow/before_failure_sc_graph}
\label{fig:evaluation_minimal_packet_flow_sc_b}
\includegraphics[width=\textwidth]{tests/failure_path_1_packet_flow/before_failure_sc_graph}
\label{fig:evaluation_failure_path_1_packet_flow_sc_b}
\caption{TCP Packets on routers after a failure}
\end{subfigure}
\caption{TCP Packets on all routers measured with \textit{nftables} counters using Shortcut}
\label{fig:evaluation_minimal_packet_flow_sc}
\label{fig:evaluation_failure_path_1_packet_flow_sc}
\end{figure}
\begin{figure}
\centering
\includegraphics[width=10cm]{tests/minimal_packet_flow/concurrent_failure_sc_graph}
\includegraphics[width=10cm]{tests/failure_path_1_packet_flow/concurrent_failure_sc_graph}
\caption{TCP Packet flow on all routers with failure after 15 seconds using ShortCut}
\label{fig:evaluation_minimal_packet_flow_concurrent_sc}
\label{fig:evaluation_failure_path_1_packet_flow_concurrent_sc}
\end{figure}
When running the TCP packet flow measurements with an implementation of ShortCut running on the network however, the results change drastically, and as expected all packets sent by the \textit{iperf} transfer are forwarded by router R2 on the original route, but after the failure was introduced the router does not forward any packets. ShortCut has effectively cut out router R2 from the route, forwarding packets from R1 to R3 directly. All remaining routers R1, R3 and R4 now receive all packets and no router forwards any packets twice.
\subsection{Packet flow - UDP}
We repeated the packet flow test in \cref{tcp_packet_flow} using UDP to inspect the differences caused by the two protocols.
\subsubsection{With FRR}
\begin{figure}
\centering
\begin{subfigure}[b]{0.49\textwidth}
\centering
\includegraphics[width=\textwidth]{tests/minimal_packet_flow_udp/packet_flow_udp_before_wo_sc}
\includegraphics[width=\textwidth]{tests/failure_path_1_packet_flow_udp/packet_flow_udp_before_wo_sc}
\caption{Packets on routers before a failure}
\label{fig:evaluation_minimal_packet_flow_udp_wo_sc_a}
\label{fig:evaluation_failure_path_1_packet_flow_udp_wo_sc_a}
\end{subfigure}
\hfill
\begin{subfigure}[b]{0.49\textwidth}
\centering
\includegraphics[width=\textwidth]{tests/minimal_packet_flow_udp/packet_flow_udp_after_wo_sc}
\includegraphics[width=\textwidth]{tests/failure_path_1_packet_flow_udp/packet_flow_udp_after_wo_sc}
\caption{Packets on routers after a failure}
\label{fig:evaluation_minimal_packet_flow_udp_wo_sc_b}
\label{fig:evaluation_failure_path_1_packet_flow_udp_wo_sc_b}
\end{subfigure}
\label{fig:evaluation_minimalk_packet_flow_udp_wo_sc}
\label{fig:evaluation_failure_path_1k_packet_flow_udp_wo_sc}
\end{figure}
\begin{figure}
\centering
\includegraphics[width=10cm]{tests/minimal_packet_flow_udp/packet_flow_udp_concurrent_wo_sc}
\includegraphics[width=10cm]{tests/failure_path_1_packet_flow_udp/packet_flow_udp_concurrent_wo_sc}
\caption{Packet flow on all routers with failure after 15 seconds}
\label{fig:evaluation_minimal_packet_flow_udp_concurrent_wo_sc}
\label{fig:evaluation_failure_path_1_packet_flow_udp_concurrent_wo_sc}
\end{figure}
\subsubsection{With FRR and ShortCut}
@ -270,24 +248,24 @@ We repeated the packet flow test in \cref{tcp_packet_flow} using UDP to inspect
\centering
\begin{subfigure}[b]{0.49\textwidth}
\centering
\includegraphics[width=\textwidth]{tests/minimal_packet_flow_udp/packet_flow_udp_before_sc}
\includegraphics[width=\textwidth]{tests/failure_path_1_packet_flow_udp/packet_flow_udp_before_sc}
\caption{UDP Packets on routers before a failure}
\label{fig:evaluation_minimal_packet_flow_udp_wo_sc_a}
\label{fig:evaluation_failure_path_1_packet_flow_udp_wo_sc_a}
\end{subfigure}
\hfill
\begin{subfigure}[b]{0.49\textwidth}
\centering
\includegraphics[width=\textwidth]{tests/minimal_packet_flow_udp/packet_flow_udp_after_sc}
\includegraphics[width=\textwidth]{tests/failure_path_1_packet_flow_udp/packet_flow_udp_after_sc}
\caption{UDP Packets on routers after a failure}
\label{fig:evaluation_minimal_packet_flow_udp_wo_sc_b}
\label{fig:evaluation_failure_path_1_packet_flow_udp_wo_sc_b}
\end{subfigure}
\label{fig:evaluation_minimalk_packet_flow_udp_wo_sc}
\label{fig:evaluation_failure_path_1k_packet_flow_udp_wo_sc}
\caption{UDP packets on all routers using ShortCut}
\end{figure}
\begin{figure}
\centering
\includegraphics[width=10cm]{tests/minimal_packet_flow_udp/packet_flow_udp_concurrent_sc}
\includegraphics[width=10cm]{tests/failure_path_1_packet_flow_udp/packet_flow_udp_concurrent_sc}
\caption{Packet flow on all routers with failure after 15 seconds using ShortCut}
\label{fig:evaluation_minimal_packet_flow_udp_concurrent_wo_sc}
\label{fig:evaluation_failure_path_1_packet_flow_udp_concurrent_wo_sc}
\end{figure}

@ -6,7 +6,8 @@
\caption{Minimal network}
\label{fig:evaluation_minimal_network}
\end{figure}
\subsection{TCP Bandwidth}
\subsection{Bandwidth}
\label{evaluation_minimal_bandwidth}
We performed multiple tests of influences to the bandwidth with occurring failures. These were run using \textit{iperf} and a logging interval of \SI{0.5}{\second}. All data was collected from the output of the \textit{iperf} server.
\subsubsection{With FRR}
@ -84,10 +85,8 @@ In our further tests we observed that the bandwidth alone does not change heavil
As can be seen in \cref{fig:evaluation_minimal_bandwidth_sc} and \cref{fig:evaluation_minimal_bandwidth_concurrent_sc}, using ShortCut had no further influence on the achieved throughput. This is to be expected, as longer or shorter paths will only influence throughput if e.g. a link with a lower bandwidth is contained in an additional path.
\subsection{UDP Bandwidth}
When using UDP for bandwidth measurements
\subsection{Two concurrent data transfers}
\label{evaluation_minimal_bandwidth_link_usage}
In this test we evaluated the bandwidth between H1 and H4 with a concurrent data transfer on H2 to H1. Both transfers were run with a limitation of \SI{100}{Mbps}, which constitutes the maximum allowed bandwidth in this test.
\subsubsection{With FRR}
\begin{figure}
@ -151,7 +150,8 @@ If a failure is introduced however, traffic from H1 does not only loop over R2,
\subsection{Latency}
In the following sections we evaluate the latency measurements run on the minimal topology with 4 routers and 3 hosts first with only FRR in \cref{minimal_latency_with_frr} and then with our implementation of ShortCut running in \cref{minimal_latency_with_frr_and_shortcut}.
\label{evaluation_minimal_latency}
In the following sections we evaluate the latency measurements run on the minimal topology with 4 routers and 3 hosts first with only FRR in section \textit{With FRR} and then with our implementation of ShortCut running in section \textit{With FRR and ShortCut}.
\subsubsection{With FRR}
\label{minimal_latency_with_frr}
@ -226,7 +226,7 @@ The spike in latency which can be seen in \cref{fig:evaluation_minimal_latency_c
\subsection{Packet flow - TCP}
\label{tcp_packet_flow}
\label{evaluation_minimal_tcp_packet_flow}
To show the amount of TCP packets being forwarded on each router, we measured the packet flow on all routers of this topology. This is done by counting TCP packets with \textit{nftables} while a concurrent data transfer is started from H1 to H4. The results include the amount of packets forwarded on each router per second. This was done with an intermediate and concurrent failure for a network with FRR in \cref{minimal_packet_flow_with_frr}, as well as a network with an additional implementation of ShortCut in \cref{minimal_packet_flow_with_frr_and_shortcut}.
\subsubsection{With FRR}
@ -305,6 +305,7 @@ When running the TCP packet flow measurements with an implementation of ShortCut
\subsection{Packet flow - UDP}
\label{evaluation_minimal_udp_packet_flow}
We repeated the packet flow test in \cref{tcp_packet_flow} using UDP to inspect the differences caused by the two protocols.
\subsubsection{With FRR}
\begin{figure}
@ -332,7 +333,13 @@ We repeated the packet flow test in \cref{tcp_packet_flow} using UDP to inspect
\label{fig:evaluation_minimal_packet_flow_udp_concurrent_wo_sc}
\end{figure}
When running the packet flow test measuring UDP packets the amount of packets changed drastically when compared to TCP packets. This is caused by the different window sizes \textit{iperf} uses for TCP and UDP
When running the packet flow test measuring UDP packets the amount of packets changed drastically when compared to TCP packets. \textit{iperf} uses different packet sizes for each protocol, sending TCP packet with a size of \SI{128}{\kilo\byte} and UDP packets with only a size of \SI{8}{\kilo\byte} (\cite{Dugan.2016}). The same amount of data transmitted should therefore produce a packet count roughly 16 times higher when using UDP compared to TCP. TCP however, as can be seen in \cref{fig:evaluation_minimal_packet_flow_wo_sc_a}, sends around 1000 packets per second when running a bandwidth measurement limited by the overall bandwidth limit on the network of \SI{100}{\mega\bit\per\second}. A naive assumption would be that UDP should sent 16000 packets per second over the network, but that does match with our test results seen in \cref{fig:evaluation_minimal_packet_flow_udp_wo_sc_a}, where only around 7800 packets per second are logged on the routers.
The reason for this is also the key difference between TCP and UDP: TCP uses acknowledgements (ACKs) to confirm the transmission of packets. These are packets returning from the \textit{iperf} server to the client. For each received data package, the server will send back an ACK. If no ACK is sent for a packet, the client will resend the missing packet. This causes the network to transmit twice the amount of packets, one half containing the actual data and one half only containing ACKs.
UDP however will just blindly send packets on their way and does not evaluate whether they actually reached their destination. Therefore all UDP packets contain data and no additional packets for confirmation or congestion control etc. are sent over the network.
Because UDP does not send ACKs the results observed after a failure in \cref{fig:evaluation_minimal_packet_flow_udp_wo_sc_b} are very telling, with routers R2, R3 and R4 all forwarding the same amount of packets and router R1 forwarding exactly double the amount of packets.
\subsubsection{With FRR and ShortCut}
\begin{figure}
\centering
@ -353,9 +360,14 @@ When running the packet flow test measuring UDP packets the amount of packets ch
\caption{UDP packets on all routers using ShortCut}
\end{figure}
When using ShortCut in a UDP packet flow measurement, the negative consequences of the failure disappear. While in \cref{fig:evaluation_minimal_packet_flow_udp_wo_sc_a} routers R1, R2 and R4 receive all packets on the original route, the load switches after a failure from R2 to R3. As expected, the ShortCut implementation has cut out the looped path and restored the original functionality on an alternative route.
\begin{figure}
\centering
\includegraphics[width=10cm]{tests/minimal_packet_flow_udp/packet_flow_udp_concurrent_sc}
\caption{Packet flow on all routers with failure after 15 seconds using ShortCut}
\label{fig:evaluation_minimal_packet_flow_udp_concurrent_wo_sc}
\end{figure}
\end{figure}
WRITE THIS

@ -7,9 +7,11 @@ The routing table manipulation has to take effect as fast as possible. Furthermo
\subsection{Additional behaviour (wip)}
\subsection{Identifying packets}
\label{identifying_packets}
To determine which route should be deleted from the routing table, ShortCut has to gather knowledge about the packets forwarded by the router. The already implemented FRR explained in \cref{implementation_rrt} adds routing tables which will be used depending on the interface the packet was received on. These alternative routes are also added to the default routing table with a lower priority metric, in case the link directly connected to the router would fail.
There are a few options when trying to add behaviour in case certain packets are received.
To identify a packet that is returning we already use the incoming interface when implementing FRR. If we would however be able to execute a function when such a packet is received, we would also be able to delete the old invalid routing table entry. For this there are several approaches which could be used.
The programming language P4 (\cite{Bosshart.2014}) can be used to write router logic and compile it so that a Mininet routers behaviour could be changed completely. This would also allow us to execute additional functionality in certain cases, e.g. if a specific ip route table entry is hit, but would require a manual implementation of core router functionalities like ARP handling.
In the context of this work this is not feasible.

@ -11,18 +11,31 @@ Because of the increased usage, failing networks cause an increasingly severe am
Failures in networks will always occur, be it through the failure of hardware, failures caused by errors in software or human errors. In addition to this the maintenance of networks will also regularly reduce a networks performance or cause the whole network to be unavailable.
Network administrators use a multitude of ways to increase performance, reduce the impact of failures on the network and achieve the highest possible availability and reliability. Two of these methods include the usage of global convergence protocols like Open Shortest Path First (OSPF) (\cite{Moy.041998}) or similar methods, either on the routers themselves or on a controller in a software defined network (SDN), and the usage of Fast Re-Routing (FRR) and Fast Recovery Methods (FRM), which are operations limited to the data contained on a device.
Network administrators use a multitude of ways to increase performance, reduce the impact of failures on the network and achieve the highest possible availability and reliability. Two of these methods include the usage of global convergence protocols like Open Shortest Path First (OSPF) (\cite{Moy.041998}) or similar methods, either on the routers themselves or on a controller in a software defined network (SDN), and the usage of Fast Re-Routing (FRR) (\cite{Chiesa.2021}) approaches.
As global convergence protocols are very slow with sometimes taking time on the second scale (\cite{Liu.2013}) to converge, they leave the routing during the time of route calculation to In-network methods like FRR which will reroute traffic according to pre-defined alternative routes on the network. In some cases however methods like FRR cause routing paths to be longer than necessary which produces additional traffic on the network and adds delay to transmissions.
The key difference between both is the time they take to become active. Because FRR mechanisms only use the available data on the device they tend to take effect near immediately. Global convergence protocols however are slow, sometimes even taking seconds to converge (\cite{Liu.2013}). This is due to them collecting information about the network by communicating with multiple devices, recomputing routes for all affected parts of the network and deploying these flows on routers and switches.
FRMs like ShortCut (\cite{Shukla.2021}) try to alleviate this issue by removing longer paths from the routings only using data available on the device, bridging the gap between FRR and the global convergence protocol.
As technologies like ShortCut are relatively current contributions there is not much data evaluating their performance on a network.
Most of the FRR approaches will however create sub-optimal paths which may be already in use or contain loops, effectively reducing the performance of the network.
FRMs like ShortCut (\cite{Shukla.2021}), Resilient Routing Layers (\cite{Kvalbein.2005}), Revive (\cite{Haque.2018}) and Blink (\cite{ThomasHolterbach.2019}) try to alleviate this issue by removing longer paths from the routings only using data available on the device, bridging the gap between FRR and the global convergence protocol.
\section{State of the art}
In modern networks many network administrators make use of a combination of FRR and global convergence protocols. Due to this there have been several proposals for optimizations using FRMs like ShortCut, resilient routing layers (
Until the global convergence protocol converges it leaves the routing to In-network methods like FRR which will reroute traffic according to pre-defined alternative routes on the network. In some cases however methods like FRR cause routing paths to be longer than necessary which produces additional traffic on the network and adds delay to transmissions.
Resilient Routing Layers pre-computes alternative routing tables, switching between routing tables in case of failure, but needs to manipulate packets to inform routers of changed routing tables.
ShortCut uses information about the incoming packet to determine whether or not the packet returned to the router, using already existing FRR implementations. In case a packet returns it will remove the route with the highest priority from the routing table, assuming that the path is no longer available.
Revive installs backup routes prior
Older FRMs have already been evaluated thoroughly and even though they do work in theory they either have yet to see widespread implementation or face limitations in their applicability, be it by requiring a high amount of resources or by using e.g. packet manipulation, excluding networks which by structure are incompatible to such mechanisms.
. Even though some FRMs were already released and discussed more than a decade ago they have yet to see widespread implementation as they either face limitations in their applicability to networks, e.g. because they require to manipulate packets, or in their resource usage.
\section{Contribution}

Loading…
Cancel
Save