small changes

master
Frederik Maaßen 2 years ago
parent ad0b71719d
commit 427d477d2d
  1. 2
      implementation/topologies/4r4h_topo.py
  2. 2
      thesis/content/basics/modern_networks.tex
  3. 21
      thesis/content/basics/test_environment.tex
  4. 9
      thesis/content/evaluation/minimal_network.tex
  5. 11
      thesis/content/introduction.tex

@ -407,7 +407,7 @@ class FourRoutersFourHosts(CustomTopo):
"execute": {
"use_pre_defined_function": True,
"command": ("measure_packet_flow", (
'h1', 'h4', '10.4.0.101', ["r1", "r2", "r3", "r4"], 30, 1, "concurrent_failure", [0, 2000], "Packet flow on all routers before failure", "tcp", 100)),
'h1', 'h4', '10.4.0.101', ["r1", "r2", "r3", "r4"], 30, 1, "concurrent_failure", [0, 3000], "Packet flow on all routers before failure", "tcp", 100)),
},
"failures": [

@ -2,7 +2,7 @@
\label{sec:modern_networks}
In our digital society networks have become an essential infrastructure for countries worldwide. A huge part of the population today is in some form reliant on the availability of networks and associated services, be it on their smartphone or their home internet access. As such, the reliability of a network has a huge impact on the economy and social life.
A study in 2015 (\cite{Montag.2015}) found that their participants used WhatsApp, a messaging service, for around 32 minutes a day, with an overall usage mean of their smartphone of around 162 minutes a day, mostly spent online. Private and commercial users alike cause a huge amount of traffic. A forecast for the traffic usage in Germany by Cisco for the year 2022 estimated the per capita traffic at 101.7 GB. The traffic per capita in 2017 was around 39.2 GB. Together with the rise of e.g. cloud based solutions for companies, network requirements are expected to rise exponentially.
A study in 2015 (\cite{Montag.2015}) found that their participants used WhatsApp, a messaging service, for around 32 minutes a day, with an overall usage mean of their smartphone of around 162 minutes a day, mostly spent online. Private and commercial users alike cause a huge amount of traffic. The german federal network agency reported a per capita network usage on the terrestrial network of \SI{175}{\giga\byte} per month (\cite{BundesnetzagenturDeutschland.2021}) in Germany. The traffic per capita in 2017 was at around \SI{98}{\giga\byte} per month. Together with the rise of e.g. cloud based solutions for companies and , network requirements are expected to rise exponentially.
With bigger networks and higher traffic, the work of a network administrator is getting more complex by the day. Modern networks need to be flexible, scalable and reliable. Configuration changes should be applied near instantly, failures should be corrected as fast as possible and new components should only have to be connected to the existing network, with all additional configuration being applied automatically or only requiring a minimum amount of manual work.

@ -58,36 +58,37 @@ In summary, a non-specialized network is evaluated by it's bandwidth, link usage
\subsubsection{Measuring bandwidth}
\label{measuring_bandwidth}
Measuring bandwidth requires sending arbitrary data over the network. Tools like iperf allow for easy testing of bandwidth.
Measuring bandwidth requires sending arbitrary data over the network. Tools like \textit{iperf} allow for easy testing of bandwidth.
The process of testing includes starting a server on a receiving device and starting a client on the sending device. Over a certain period data is then sent over the network and the client as well as the server will log the transfer rate by second.
After the pre defined time the transfers will stop and the client instance of iperf will shut down, after printing out the average transfer rate. The server instance will have to be shut down manually.
After the pre defined time the transfers will stop and the client instance of \textit{iperf} will shut down, after printing out the average transfer rate. The server instance will have to be shut down manually.
By default iperf will use tcp to send data, but when using the "-u" flag it will instead transfer data with udp. When using udp iperf requires an additional bandwidth parameter, which will specify how much data will be sent over the network. This is done because protocols like TCP use flow control to limit the amount of data sent on the capabilities of the receiving device. A slower device like a mobile phone will e.g. limit the data transfer to not get overwhelmed. Protocols like UDP do not provide any flow control and therefore iperf has to limit the used bandwidth itself.
By default \textit{iperf} will use tcp to send data, but when using the "-u" flag it will instead transfer data with udp. When using udp \textit{iperf} requires an additional bandwidth parameter, which will specify how much data will be sent over the network. This is done because protocols like TCP use flow control to limit the amount of data sent on the capabilities of the receiving device. A slower device like a mobile phone will e.g. limit the data transfer to not get overwhelmed. Protocols like UDP do not provide any flow control and therefore \textit{iperf} has to limit the used bandwidth itself. Using "-u 0" will cause \textit{iperf} to send as many UDP packets as possible.
\subsubsection{Measuring latency}
Measuring latency can be done most easily by using the ping program (package, command?) and sending multiple pings in a certain interval from a sending device to a receiving device. The sending device will send an ICMP echo request packet to the receiving device over the network and if the packet was received, the receiving device will answer with an ICMP echo reply packet. After each sent packet, the sender will wait for the reply and log the time difference between sending and receiving.
Measuring latency can be done most easily by using the \textit{ping} application included in most operating systems and sending multiple "pings" in a certain interval from a sending device to a receiving device. The sending device will send an ICMP echo request packet to the receiving device over the network and if the packet was received, the receiving device will answer with an ICMP echo reply packet. After each sent packet, the sender will wait for the reply and log the time difference between sending and receiving.
A ping test can be run for a given time during which a failure can be introduced.
\subsubsection{Measuring influence of link usage}
\label{measure_link_usage}
There are several approaches on measuring the influence of traffic on a link. The first approach is letting a file transfer run from router 2 to router 1 and simultaneously run an \textit{iperf} test from router 2 to router 3 as can be seen in figure \ref{fig:link_usage}. This would cause the connection between router 1 and router 2 to be strained by both data transfers. They would most likely try to use the maximum amount of bandwidth they can acquire as they are unable to exactly specify which transfer should be prioritized or if they should split the available bandwidth equally.
\begin{figure}
\centering
\fbox{\includegraphics[width=10cm]{link_usage}}
\includegraphics[width=10cm]{link_usage}
\caption{Link usage}
\label{fig:link_usage}
\end{figure}
A far more controlled approach would be to limit the link between router 1 and router 2 to e.g. 50\% of the available bandwidth, simulating an already running and equally prioritized data transfer.
One approach to measure the influence of two data flows on each other is to start two \textit{iperf} measurements simultaneously, with one using host H1 as client and host H4 as server, and the other using host H2 as client and host H1 as server. When a failure is introduced, both data flows will pass the link between routers R1 and R2 in the same direction as can be seen in \cref{fig:link_usage}. Traffic that is sent in opposing directions does not influence each other as ethernet uses a duplex connection.
With this in mind we decided to run additional data transfers on the network. The possible inconsistencies are taken into account when evaluating the results.
This would cause the connection between router 1 and router 2 to be strained by both data transfers. They would most likely try to use the maximum amount of bandwidth they can acquire as they are unable to exactly specify which transfer should be prioritized or if they should split the available bandwidth equally.
\subsubsection{Measuring packet loss}
In a virtual network packet loss will not be caused by faulty devices on an otherwise unknown route as each component is assumed to work correctly. The main cause of packet loss in such a network would be an overflow of packet queues on the routers, where full queues would cause additional packets to be dropped. Because by default all routers would have an virtually unlimited packet queue, only limited by the performance of the host machine, Mininet allows for the configuration of a limit queue. (Ping will only send a few packets, not being able to fill up the queue, how should I go about this?)
In a virtual network packet loss will not be caused by faulty devices on an otherwise unknown route as each component is assumed to work correctly. The main causes of packet loss in such a network would be either an overflow of packet queues on the routers, where full queues would cause additional packets to be dropped, or the introduction of failures and resulting waiting times from these configuration changes. In most networks packet queues should be configured according to the network requirements and will most likely be large enough to hold enough packets for the throughput that has to be achieved. This is why we focus on packet loss caused by failures.
We have to keep in mind that some of this packet loss might be caused by the configuration changes which are used to simulate a link failure in the virtual network. As such packet loss will most likely only occur shortly after a failure is introduced, or, in case UDP is used in a bandwidth measurement, if the bandwidth that has to be achieved overloads the packet queues of the routers.
\subsubsection{Monitoring packet flow}
Network mechanisms like ShortCut influence the routing and therefore the flow of packets in the network. To gain an overview over the routes packets take, we can analyse this packet flow by measuring the amount of IP packets passing each router.

@ -39,13 +39,14 @@ We performed a TCP bandwidth test on the minimal network, a topology with 4 rout
\label{fig:evaluation_minimal_bandwidth_concurrent_wo_sc}
\end{figure}
In \cref{fig:evaluation_minimal_bandwidth_concurrent_wo_sc} however we introduced the failure while the bandwidth test was running. The test was run for \SI{30}{\second} and the failure was introduced at around 15 seconds, which caused a very small drop in performance. The log output of the sending client reported the need to resend 22 packets in this time period; in all transfers before no packet loss occurred.
In \cref{fig:evaluation_minimal_bandwidth_concurrent_wo_sc} however we introduced the failure while the bandwidth test was running. The test was run for \SI{30}{\second} and the failure was introduced at around 15 seconds, which caused no visible performance drop. However in some executions of this test the performance dropped when introducing the failure and the log output of the sending client reported the need to resend up to 100 packets. Because this behaviour is only occurring sporadically, we assume
this to be a timing issue.
In addition to the already deployed transfer limit on the links between routers and hosts, we also added the bandwidth parameter -b to the execution of the \textit{iperf} client and limited the throughput to \SI{100}{Mbps}. This was done because in some measurements we experienced bursts in the bandwidth test after we introduced a failure concurrent to the bandwidth test as can be seen in \cref{fig:evaluation_minimal_bandwidth_concurrent_wo_sc}, exceeding the limit of the network by more than 50\%. Unfortunately the additional limit did not change the behaviour. Upon further investigation we found one possible reason for this burst.
When the connection between routers is cut, our test framework uses the Mininet python API to deactivate the corresponding interfaces on both affected routers. This is done in sequence. In this example the interface on router R2 was deactivated first and the interface on router R4 was deactivated second. We implemented this behaviour after observing the default behaviour of the Mininet network. If the connection between e.g. router R2 and router R4 was only cut by deactivating the interface on router R4, router R2 would not recognize the failure and would loose all packets sent to the link. Because we deactivate the interfaces in sequence and the Mininet python api introduces delay to the operation, the interface on R2 will be deactivated while the interface on R4 will continue receiving packets already on the link and will continue sending packets to the deactivated interface on R2 for a short period of time. All packets sent to R2 in this time period will be lost. But because the \textit{iperf} server itself does not send any actual data, but only acknowledgements (ACK) for already received data, only ACKs are lost this way.
When the connection between routers is cut, our test framework uses the python api to deactivate the corresponding interfaces on both affected routers. This is done in sequence. In this example the interface on router R2 was deactivated first and the interface on router R4 was deactivated second. We implemented this behaviour after observing the default behaviour of the Mininet network. E.g. if the connection between router R2 and router R4 was only cut by deactivating the interface on router R4, router R2 would not recognize the failure and would loose all packets sent to the link. Because we deactivate the interfaces in sequence and the Mininet python api introduces delay to the operation, the interface on R2 will be deactivated while the interface on R4 will continue receiving packets already on the link and will continue sending packets to the deactivated interface on R2 for a short period of time. All packets sent to R2 in this time period will be lost. But because the \textit{iperf} server itself does not send any actual data, but only acknowledgements (ACK) for already received data, only ACKs are lost this way.
TCP (\cite{InformationSciencesInstituteUniversityofSouthernCalifornia.1981}) however does not necessarily resend lost ACKs, and the client does not necessarily resend all packets for which he did not receive an ACK. Data for which the ACKs were lost could still be implicitly acknowledged by the server if they e.g. belonged to the same window as following packets and the ACKs for these packets were received by the client. This could cause a situation in which the server already received data, but the client only receives a notification of the success of the transfer with a delay.
TCP (\cite{InformationSciencesInstituteUniversityofSouthernCalifornia.1981}) however does not necessarily resend lost ACKs, and the client does not necessarily resend all packets for which he did not receive an ACK. Data for which the ACKs were lost could still be implicitly acknowledged by the server if they e.g. belonged to the same window as following packets and the ACKs for these packets were received by the client. This could cause a situation in which the server already received data, but the client only receives a notification of the success of the transfer with a delay. Depending on the functionality of \textit{iperf} it can be speculated that this is one possible cause of this measurement discrepancy.
This could cause some test runs to produce more packet loss than others, with most transfers experiencing no packet loss at all.
In our further tests we observed that the bandwidth alone does not change heavily when using different types of topologies. This is why we omit the evaluation of the bandwidth in further topologies.

@ -3,6 +3,17 @@
\section{Motivation}
In recent years, especially during the COVID-19 pandemic, network usage has risen exponentially. In Germany alone the per capita data usage on the terrestrial network has risen from \SI{98}{\giga\byte} per month in 2017 to \SI{175}{\giga\byte} in 2020 (\cite{BundesnetzagenturDeutschland.2021}).
Many workers stayed at home, worked from home or were otherwise limited in their movement which has contributed to this rise in data usage. But this development is not limited to the pandemic. Data usage has been constantly rising due to the popularity of streaming services, increased internet usage in daily life and the rising popularity of cloud based services.
Because of the increased usage, failing networks cause an increasingly severe amount of social and economic costs. This is why the reliability of networks is as important as ever.
Failures in networks will always occur, be it through the failure of hardware, failures caused by errors in software or human errors. In addition to this the maintenance of networks will also regularly reduce a networks performance or cause the whole network to be unavailable.
Network administrators use a multitude of ways to increase performance, reduce the impact of failures on the network and achieve the highest possible availability and reliability. Two of these methods include the usage of global convergence protocols like Open Shortest Path First (OSPF) (\cite{Moy.041998}) or similar methods, either on the routers themselves or on a controller in a software defined network (SDN), and the usage of Fast Re-Routing (FRR) and Fast Recovery Methods (FRM), which are operations limited to the data contained on a device.
As global convergence protocols are very slow with sometimes taking time on the second scale (\cite{Liu.2013}) to converge, they leave the routing during the time of route calculation to In-network methods like FRR, which will reroute traffic according to pre-defined alternative routes on the network. In some cases however methods like FRR cause routing paths to be longer than necessary, which produces additional traffic on the network and adds delay to transmissions.
- failures in networks will occur undeniably, so they need to be adressed