2022-10-25
Open Thesis Topics
ls9 tu Dortmund
{\huge\bfseries \par}
Simon Kluettermann
Master Thesis in Physics
{\Large submitted to the \par}
Faculty of Mathematics Computer Science and Natural Sciences
{\Large \par}
RWTH Aachen University
Department of Physics
Insitute for theoretical Particle Physics and Cosmology
{ \Large\par}
First Referee: Prof. Dr. Michael Kraemer
Second Referee: Prof. Dr. Felix Kahlhoefer
November 2020
\item First: Find a topic and a supervisor
\item Work one month on this, to make sure
\item you still like your topic
\item and you are sure you can handle the topic
\item then short presentation in front of our chair (15min, relaxed)
\item get some feedback/suggestions
\item afterwards register the thesis
\item (different for CS/DS students)
\item Problem: We are not able to supervise more than 2 students at the same time (CS faculty rules)
\item First: A short summary of each Topic
\item Then time for questions/Talk with your supervisor about each topic that sounds interesting
\item Your own topics are always welcome;)
%from file ../knn1//data/003Anomaly Detection.txt
\item Im working on Anomaly Detection
\item That means characterising an often very complex distributions, to find events that dont match the expected distribution
\item kNN algorithm can also be used for AD
\item if the k closest point is further away, a sample is considered more anomalous
\item $r=\frac{k}{2N\cdot pdf}$
\item Powerful method, as it can model the pdf directly
%from file ../knn1//data/005Better knn.txt
\item The model (mostly) ignores every known sample except one
\item So there are extensions
\item $avg=\frac{1}{N} \sum_i knn_i(x)$
\item $wavg=\frac{1}{N} \sum_i \frac{knn_i(x)}{i}$
Dataset & wavg & avg & 1 & 3 & 5 \\
$vertebral$ & $\textbf{0.4506}$ & $\textbf{0.4506}$ & $\textbf{0.4667}$ & $\textbf{0.4667}$ & $\textbf{0.45}$ \\
... & & & & & \\
$thyroid$ & $\textbf{0.9138}$ & $\textbf{0.9151}$ & $\textbf{0.8763}$ & $\textbf{0.9086}$ & $\textbf{0.914}$ \\
$Iris\_setosa$ & $\textbf{0.9333}$ & $\textbf{0.9333}$ & $\textbf{0.9333}$ & $\textbf{0.9}$ & $\textbf{0.9}$ \\
$breastw$ & $\textbf{0.9361}$ & $\textbf{0.9361}$ & $\textbf{0.9211}$ & $\textbf{0.9248}$ & $\textbf{0.9286}$ \\
$wine$ & $\textbf{0.95}$ & $\textbf{0.95}$ & $\textbf{0.9}$ & $\textbf{0.95}$ & $\textbf{0.95}$ \\
$pendigits$ & $\textbf{0.9487}$ & $\textbf{0.9487}$ & $\textbf{0.9391}$ & $\textbf{0.9295}$ & $\textbf{0.9359}$ \\
$segment$ & $\textbf{0.9747}$ & $\textbf{0.9747}$ & $\textbf{0.9495}$ & $\textbf{0.9545}$ & $\textbf{0.9394}$ \\
$banknote-authentication$ & $\textbf{0.9777}$ & $\textbf{0.9776}$ & $\textbf{0.9408}$ & $\textbf{0.943}$ & $\textbf{0.9583}$ \\
$vowels$ & $\textbf{0.9998}$ & $\textbf{0.9972}$ & $\textbf{0.99}$ & $\textbf{0.92}$ & $\textbf{0.93}$ \\
$Ecoli$ & $\textbf{1.0}$ & $\textbf{1.0}$ & $\textbf{0.9}$ & $\textbf{1.0}$ & $\textbf{1.0}$ \\
$$ & $$ & $$ & $$ & $$ & $$ \\
$Average$ & $\textbf{0.7528} $ & $\textbf{0.7520} $ & $0.7325 $ & $0.7229 $ & $0.7157 $ \\
\item Evaluation as anomaly detector is complicated
\item Requires known anomalies
\item $\Rightarrow$So evaluate as density estimator
\item Does not require anomalies
\item Allows generating infinite amounts of training data
\item Collect Extensions of the oc-knn algorithm
\item Define some distance measure to a known pdf
\item Generate random datapoints following the pdf
\item Evaluate which algorithm finds the pdf the best
\item Knowledge of python ( sum([i for i in range(5) if i\%2]) )
\item Ideally incl numpy
\item Basic university level Math (you could argue that $r_k \propto \frac{k}{pdf}$)
\item Ideally some experience working on a ssh server
\item $\Rightarrow$Good as a Bachelor Thesis
\item For a Master Thesis, I would extend this a bit (Could you also find $k$?)
\item Deep Learning Method, in which the output is normalised
\item $\int f(x) dx=1 \; \forall f(x)$
\item Can be used to estimate probability density functions
\item $\Rightarrow$Thus useful for AD
\item $\int f(h(x)) \|\frac{\delta x}{\delta h}\| dh=1 \; \forall h(x)$
\item How to apply this to graphs?
\item One Paper (Liu 2019) uses two NN:
\item Autoencoder graph$\Rightarrow$vector
\item NF on vector data
\item which is fine, but also not really graph specific
\item No interaction between encoding and transformation
\item So why not do this directly?
\item $\Rightarrow$Requires differentiating a graph
\item Why not use only one Network?
\item Graph$\Rightarrow$Vector$\Rightarrow$pdf
\item $\Rightarrow$Finds trivial solution, as $<pdf> \propto \frac{1}{\sigma_{Vector}}$
\item So regularise the standart deviation of the vector space!
\item Interplay between encoding and NF
\item Could also be useful for highdim data
\item Proficient in python ( [i for i in range(1,N) if not [j for j in range(2,i) if not i\%j]] )
\item Ideally incl numpy, tensorflow, keras
\item Some deep learning experience
\item University level math (google Cholesky Decomposition. Why is this useful for NF?)
\item Ideally some experience working on a ssh server
\item A bit more challenging$\Rightarrow$Better as a Master thesis
\item (Still we would start very slowly of course)
\item Isolation Forest: Different Anomaly Detection Algorithm
\item Problem: Isolation Forests dont work on categorical data
\item $\Rightarrow$Extend them to categorical data
\includegraphics[width=0.9\textwidth]{../prep/20Old_Thesis_Sina/Bildschirmfoto vom 2022-09-26 16-22-30.png}
\label{fig:prep20Old_Thesis_SinaBildschirmfoto vom 2022-09-26 16-22-30png}
\item Reidentification: Find known objects in new images
\item Task: Find if two images of pallet blocks are of the same pallet block
\item Use AD to represent the pallet blocks
\includegraphics[width=0.9\textwidth]{../prep/21Old_Thesis_Britta/Bildschirmfoto vom 2022-09-26 16-23-26.png}
\label{fig:prep21Old_Thesis_BrittaBildschirmfoto vom 2022-09-26 16-23-26png}
\item Ensemble: Combination of multiple models
\item Task: Explain the prediction of a model using the ensemble structure
\includegraphics[width=0.9\textwidth]{../prep/22Old_Thesis_Hsin_Ping/Bildschirmfoto vom 2022-09-26 16-24-14.png}
\label{fig:prep22Old_Thesis_Hsin_PingBildschirmfoto vom 2022-09-26 16-24-14png}
\item Task: Explore a new kind of ensemble
\item Instead of many uncorrelated models, let the models interact during training
\includegraphics[width=0.9\textwidth]{../prep/23Old_Thesis_Nikitha/Bildschirmfoto vom 2022-09-26 16-25-06.png}
\label{fig:prep23Old_Thesis_NikithaBildschirmfoto vom 2022-09-26 16-25-06png}