ComparisonOfFastRecoveryMet.../thesis/content/introduction.tex

\chapter{Introduction}
\label{introduction}

\section{Motivation}

In recent years, especially during the COVID-19 pandemic, network usage has risen exponentially. In Germany alone the per capita data usage on the terrestrial network has risen from \SI{98}{\giga\byte} per month in 2017 to \SI{175}{\giga\byte} in 2020 (\cite{BundesnetzagenturDeutschland.2021}).

A large part of the population suddenly had to spent additional time in their homes which has contributed to this rise in data usage. But this development is not limited to the pandemic. Data usage has been constantly rising due to the popularity of streaming services, increased internet usage in daily life and the rising popularity of cloud based services.

Because of the increased usage, failing networks cause an increasingly severe amount of social and economic costs. This is why the reliability of networks is as important as ever.

Failures in networks will always occur, be it through the failure of hardware, failures caused by errors in software or human errors. In addition to this the maintenance of networks will also regularly reduce a networks performance or cause the whole network to be unavailable.

Network administrators use a multitude of ways to increase performance, reduce the impact of failures on the network and achieve the highest possible availability and reliability. Two of these methods include the usage of global convergence protocols like Open Shortest Path First (OSPF) (\cite{Moy.041998}) or similar methods, either on the routers themselves or on a controller in a software defined network (SDN), and the usage of Fast Re-Routing (FRR) (\cite{Chiesa.2021}) approaches.

The key difference between both is the time they take to become active. Because FRR mechanisms only use the available data on the device they tend to take effect near immediately. Global convergence protocols however are slow, sometimes even taking seconds to converge (\cite{Liu.2013}). This is due to them collecting information about the network by communicating with multiple devices, recomputing routes for all affected parts of the network and deploying these flows on routers and switches.

Most of the FRR approaches will however create sub-optimal paths which may be already in use or contain loops, effectively reducing the performance of the network.

FRMs like ShortCut (\cite{Shukla.2021}), Resilient Routing Layers (\cite{Kvalbein.2005}), Revive (\cite{Haque.2018}) and Blink (\cite{ThomasHolterbach.2019}) try to alleviate this issue by removing longer paths from the routings only using data available on the device, bridging the gap between FRR and the global convergence protocol.

\section{State of the art}

Until the global convergence protocol converges it leaves the routing to In-network methods like FRR which will reroute traffic according to pre-defined alternative routes on the network. In some cases however methods like FRR cause routing paths to be longer than necessary which produces additional traffic on the network and adds delay to transmissions.

Resilient Routing Layers pre-computes alternative routing tables, switching between routing tables in case of failure, but needs to manipulate packets to inform routers of changed routing tables.

ShortCut uses information about the incoming packet to determine whether or not the packet returned to the router, using already existing FRR implementations. In case a packet returns it will remove the route with the highest priority from the routing table, assuming that the path is no longer available.

Revive installs backup routes prior


Older FRMs have already been evaluated thoroughly and even though they do work in theory they either have yet to see widespread implementation or face limitations in their applicability, be it by requiring a high amount of resources or by using e.g. packet manipulation, excluding networks which by structure are incompatible to such mechanisms.
. Even though some FRMs were already released and discussed more than a decade ago they have yet to see widespread implementation as they either face limitations in their applicability to networks, e.g. because they require to manipulate packets, or in their resource usage.


\section{Contribution}

- in this context we use mininet, a tool to create virtual networks to implement and test these recovery methods
- by creating multiple network structures and failure scenarios we try to evaluate those mechanisms and compare them based on their performance