semi final push

This commit is contained in:
Simon Klüttermann 2021-10-11 17:43:56 +02:00
parent a05699269c
commit ea54626a20
6 changed files with 77 additions and 74 deletions

View File

@ -5,7 +5,7 @@
%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%
% Talk's title % Talk's title
\settalktitle{Anomaly Detection Seminar 2021/2022} \settalktitle{Seminar Unsupervised Machine Learning - Anomaly Detection}
% Author's name % Author's name
\settalkauthor{} \settalkauthor{}

View File

@ -1,9 +1,9 @@
[0] Config.pm:312> INFO - This is Biber 2.15 [0] Config.pm:312> INFO - This is Biber 2.15
[0] Config.pm:315> INFO - Logfile is 'slides.blg' [0] Config.pm:315> INFO - Logfile is 'slides.blg'
[53] biber:330> INFO - === Mo Okt 11, 2021, 17:26:11 [51] biber:330> INFO - === Mo Okt 11, 2021, 17:38:14
[63] Biber.pm:415> INFO - Reading 'slides.bcf' [61] Biber.pm:415> INFO - Reading 'slides.bcf'
[115] Biber.pm:952> INFO - Found 0 citekeys in bib section 0 [111] Biber.pm:952> INFO - Found 0 citekeys in bib section 0
[119] Utils.pm:395> WARN - The file 'slides.bcf' does not contain any citations! [116] Utils.pm:395> WARN - The file 'slides.bcf' does not contain any citations!
[125] bbl.pm:651> INFO - Writing 'slides.bbl' with encoding 'UTF-8' [122] bbl.pm:651> INFO - Writing 'slides.bbl' with encoding 'UTF-8'
[126] bbl.pm:754> INFO - Output to slides.bbl [122] bbl.pm:754> INFO - Output to slides.bbl
[126] Biber.pm:128> INFO - WARNINGS: 1 [122] Biber.pm:128> INFO - WARNINGS: 1

View File

@ -1,4 +1,4 @@
This is LuaHBTeX, Version 1.13.0 (TeX Live 2021/Arch Linux) (format=lualatex 2021.6.8) 11 OCT 2021 17:26 This is LuaHBTeX, Version 1.13.0 (TeX Live 2021/Arch Linux) (format=lualatex 2021.6.8) 11 OCT 2021 17:38
system commands enabled. system commands enabled.
**slides **slides
(./slides.tex (./slides.tex
@ -2068,12 +2068,12 @@ Package polyglossia Info: Option: English, variant=american.
Package polyglossia Info: Option: english variant=american (with additional pat Package polyglossia Info: Option: english variant=american (with additional pat
terns). terns).
Module polyglossia Info: Language data for usenglishmax Module polyglossia Info: Language data for usenglishmax
(polyglossia) patterns hyph-en-us.pat.txt
(polyglossia) hyphenation hyph-en-us.hyp.txt
(polyglossia) righthyphenmin 3
(polyglossia) lefthyphenmin 2
(polyglossia) loader loadhyph-en-us.tex (polyglossia) loader loadhyph-en-us.tex
(polyglossia) synonyms on input line 35 (polyglossia) hyphenation hyph-en-us.hyp.txt
(polyglossia) synonyms
(polyglossia) lefthyphenmin 2
(polyglossia) patterns hyph-en-us.pat.txt
(polyglossia) righthyphenmin 3 on input line 35
Module polyglossia Info: Language usenglishmax was not yet loaded; created with Module polyglossia Info: Language usenglishmax was not yet loaded; created with
id 2 on input line 35 id 2 on input line 35
Package polyglossia Info: Option: english variant=american (with additional pat Package polyglossia Info: Option: english variant=american (with additional pat
@ -2135,12 +2135,12 @@ braces):
> {german/localnumeral} => {polyglossia@C@localnumeral} > {german/localnumeral} => {polyglossia@C@localnumeral}
> {german/Localnumeral} => {polyglossia@C@localnumeral}. > {german/Localnumeral} => {polyglossia@C@localnumeral}.
Module polyglossia Info: Language data for german Module polyglossia Info: Language data for german
(polyglossia) patterns hyph-de-1901.pat.txt
(polyglossia) hyphenation
(polyglossia) righthyphenmin 2
(polyglossia) lefthyphenmin 2
(polyglossia) loader loadhyph-de-1901.tex (polyglossia) loader loadhyph-de-1901.tex
(polyglossia) synonyms on input line 10 (polyglossia) hyphenation
(polyglossia) synonyms
(polyglossia) lefthyphenmin 2
(polyglossia) patterns hyph-de-1901.pat.txt
(polyglossia) righthyphenmin 2 on input line 10
Module polyglossia Info: Language german was not yet loaded; created with id 3 o Module polyglossia Info: Language german was not yet loaded; created with id 3 o
n input line 10 n input line 10
Package polyglossia Info: Option: German, spelling=new. Package polyglossia Info: Option: German, spelling=new.
@ -3104,12 +3104,12 @@ Package biblatex Info: ... file 'german.lbx' found.
(/usr/share/texmf-dist/tex/latex/biblatex/lbx/german.lbx (/usr/share/texmf-dist/tex/latex/biblatex/lbx/german.lbx
File: german.lbx 2020/12/31 v3.16 biblatex localization (PK/MW) File: german.lbx 2020/12/31 v3.16 biblatex localization (PK/MW)
Module polyglossia Info: Language data for ngerman Module polyglossia Info: Language data for ngerman
(polyglossia) patterns hyph-de-1996.pat.txt
(polyglossia) hyphenation
(polyglossia) righthyphenmin 2
(polyglossia) lefthyphenmin 2
(polyglossia) loader loadhyph-de-1996.tex (polyglossia) loader loadhyph-de-1996.tex
(polyglossia) synonyms on input line 561 (polyglossia) hyphenation
(polyglossia) synonyms
(polyglossia) lefthyphenmin 2
(polyglossia) patterns hyph-de-1996.pat.txt
(polyglossia) righthyphenmin 2 on input line 561
Module polyglossia Info: Language ngerman was not yet loaded; created with id 5 Module polyglossia Info: Language ngerman was not yet loaded; created with id 5
on input line 561 on input line 561
) )
@ -3961,15 +3961,15 @@ Here is how much of LuaTeX's memory you used:
n, 63 penalty, 5 margin_kern, 361 glyph, 256 attribute, 92 glue_spec, 256 attrib n, 63 penalty, 5 margin_kern, 361 glyph, 256 attribute, 92 glue_spec, 256 attrib
ute_list, 4 write, 24 pdf_literal, 92 pdf_colorstack, 1 pdf_setmatrix, 1 pdf_sav ute_list, 4 write, 24 pdf_literal, 92 pdf_colorstack, 1 pdf_setmatrix, 1 pdf_sav
e, 1 pdf_restore nodes e, 1 pdf_restore nodes
avail lists: 1:3,2:387,3:215,4:335,5:215,6:59,7:2079,8:9,9:466,10:24,11:136,1 avail lists: 1:4,2:387,3:215,4:335,5:215,6:59,7:2079,8:9,9:466,10:24,11:136,1
2:1 2:1
81249 multiletter control sequences out of 65536+600000 81249 multiletter control sequences out of 65536+600000
116 fonts using 34613951 bytes 116 fonts using 34614111 bytes
136i,20n,154p,819b,2327s stack positions out of 5000i,500n,10000p,200000b,80000s 136i,20n,154p,819b,2327s stack positions out of 5000i,500n,10000p,200000b,80000s
</usr/share/texmf-dist/fonts/opentype/public/libertinus-fonts/LibertinusSans-Bol </usr/share/texmf-dist/fonts/opentype/public/libertinus-fonts/LibertinusSans-Bol
d.otf></usr/share/texmf-dist/fonts/opentype/public/libertinus-fonts/LibertinusSa d.otf></usr/share/texmf-dist/fonts/opentype/public/libertinus-fonts/LibertinusSa
ns-Regular.otf> ns-Regular.otf>
Output written on slides.pdf (22 pages, 2035288 bytes). Output written on slides.pdf (22 pages, 2035304 bytes).
PDF statistics: 262 PDF objects out of 1000 (max. 8388607) PDF statistics: 262 PDF objects out of 1000 (max. 8388607)
172 compressed objects within 2 object streams 172 compressed objects within 2 object streams

Binary file not shown.

View File

@ -39,8 +39,8 @@
\begin{columns} \begin{columns}
\begin{column}{.475\textwidth} \begin{column}{.475\textwidth}
\begin{itemize} \begin{itemize}
\item Kick-Off \item Kick-Off Meeting
\item Some Formal Stuff \item Some Formalities
\item Short Overview of the Topics \item Short Overview of the Topics
\end{itemize} \end{itemize}
\begin{center} \begin{center}
@ -53,7 +53,7 @@
\begin{itemize} \begin{itemize}
\item Choose a couple topics \item Choose a couple topics
\begin{itemize} \begin{itemize}
\item Since we are only a few, you can make these requests quite complicated if you like (I prefer topic 1, but I would also take 3 or 7, except when I can do it in german, then I would prefer topic 12) \item Since we are only a few, you can make these requests quite complicated (I prefer topic 1, but I would also take 3 or 7, except when I can do it in german, then I would prefer topic 12)
\end{itemize} \end{itemize}
\item Send your choice to Simon.Kluettermann@cs.tu-dortmund.de (till tomorrow 13.10.2021 23:59) \item Send your choice to Simon.Kluettermann@cs.tu-dortmund.de (till tomorrow 13.10.2021 23:59)
\item You will be assigned one in the next days \item You will be assigned one in the next days

View File

@ -1,63 +1,66 @@
Contextual Outliers 1) Anomaly Detection for Monitoring
This Paper focuses on interpretability of anomaly detection methods. The Method described works by splitting up the set of normal events into groups and tries to relate any abnormal event to its surrounding normal ones. I would say it is more practical and I want to strongly encurage you to implement this algorithm if choosen. This is more a book and less of a Paper. So it should be perfect for you if you have not that much experience. If focusses on Time Series Analyses, namely the Task of detection when a continous datastream becomes anomalous. This is for examle useful for a machine supervised by sensors that at some point stops working (and thus changes the sensor output)
https://arxiv.org/abs/1711.10589 https://assets.dynatrace.com/content/dam/en/wp/Anomaly-Detection-for-Monitoring-Ruxit.pdf
Active AD via Ensembles... 2) A comprehensive survey of anomaly detection techniques for high dimensional big data
This Paper tries do to a lot. I suggest that you focus on the active learning part. Alternatively we have also a paper on ensembles so if you both want, you can combine these papers to be worked on by two Students. Active AD extends the task of finding anomalies to the case in which the anomaly status of the training events is not clearly defined. Its focus here lies in minimizing the amount of human work needed to classify a given dataset (given some labels, train a model, find those new events that are unclear, classify those, restart). Anomaly Detection is generally more complicated when you are given higher dimensional data (Curse of dimensionality). This seems a little weird, as usually machine learning improves when you are given more informations. I imagine it as useless features confusing this algorithm. This Paper could be seen as a study of this phenomena.
I want to note here, that great work on an easy topic if for us the same as good work on a hard topic. https://journalofbigdata.springeropen.com/track/pdf/10.1186/s40537-020-00320-x.pdf
https://arxiv.org/pdf/1901.08930
Interpretable AD for Device Failure 3) A Comprehensive Survey on Graph Anomaly Detection with Deep Learning
This is an Application Paper. Its complexity comes mostly from the fact that real world data is messy and the Paper addresses ways to mitigate this.
https://arxiv.org/pdf/2007.10088
Neural Transformation Learning for Deep Anomaly Detection Beyond Images
While for Image data, certain pre-Transformations(like Rotations) can clearly improve Machine Learning Tasks like Anomaly Detection, this is much less well defined for Time-Series/Tabular data. This Paper tries to solve this by defining learnable Transformations.
https://arxiv.org/pdf/2103.16440
Unsupervised Anomaly Detection Ensembles using Item Response Theory
Different AD algorithms are usually better at finding different types of anomalies. To get a more general algorithm you can combine multiple ones into one using Ensembles.
This Paper could be merged together with "Active AD via Ensembles" to be handled by two students.
https://arxiv.org/pdf/2106.06243
A Comprehensive Survey on Graph Anomaly Detection with Deep Learning
A lot of datasets that are interesting to AD (For example Email Communications or Trading Data) can be best represented as graphs. This provides unique challeges for AD algorithms. A lot of datasets that are interesting to AD (For example Email Communications or Trading Data) can be best represented as graphs. This provides unique challeges for AD algorithms.
This is a paper that could either be handled by two students or split up into two. Maybe one considers anomalous graphs, while the other one considers anomalous nodes in graphs. This is a paper that could either be handled by two students or split up into two. Maybe one considers anomalous graphs, while the other one considers anomalous nodes in graphs.
https://arxiv.org/pdf/2106.07178 https://arxiv.org/pdf/2106.07178
Additive Explanations for Anomalies Detected from Multivariate Temporal Data 4) LOF: identifying Density-Based Local Outliers
LOF is a classical algorithm used in many Applications. This is the original Paper introducing it. As this is a fairly old Paper, you will also find a lot of other sources describing LOF.
https://www.dbs.ifi.lmu.de/Publikationen/Papers/LOF.pdf
5) HiCS: High Contrast Subspaces for Density-Based Outlier Ranking
As most datapoints are quite high-dimensional, it is often the case that some features are useless and could actually take part in hiding the true abnomalities. This Paper suggests a method to select a subspace that filters out unimportant features.
This paper was cowritten by Prof. Müller and might be related to a future Masters thesis.
https://www.ipd.kit.edu/~muellere/publications/ICDE2012.pdf
6) Neural Transformation Learning for Deep Anomaly Detection Beyond Images
While for Image data, certain pre-Transformations(like Rotations) can clearly improve Machine Learning Tasks like Anomaly Detection, this is much less well defined for Time-Series/Tabular data. This Paper tries to solve this by defining learnable Transformations.
https://arxiv.org/pdf/2103.16440
7) A Survey on GANs for Anomaly Detection
GANs are an advanced ML method, normally used to generate really realistic artificial Images (check out https://thispersondoesnotexist.com/ if you have never done so). But these can also be used for anomaly detection. Your task would be to explain how.
https://arxiv.org/pdf/1906.11632
8) Unsupervised Anomaly Detection Ensembles using Item Response Theory
Different AD algorithms are usually better at finding different types of anomalies. To get a more general algorithm you can combine multiple ones into one using Ensembles.
This Paper could be merged together with "Active AD via Ensembles" to be handled by two students.
https://arxiv.org/pdf/2106.06243
9) Active AD via Ensembles...
This Paper tries do to a lot. I suggest that you focus on the active learning part. Alternatively we have also a paper on ensembles so if you both want, you can combine these papers to be worked on by two Students. Active AD extends the task of finding anomalies to the case in which the anomaly status of the training events is not clearly defined. Its focus here lies in minimizing the amount of human work needed to classify a given dataset (given some labels, train a model, find those new events that are unclear, classify those, restart).
I want to note here, that great work on an easy topic if for us the same as good work on a hard topic.
https://arxiv.org/pdf/1901.08930
10) Contextual Outliers
This Paper focuses on interpretability of anomaly detection methods. The Method described works by splitting up the set of normal events into groups and tries to relate any abnormal event to its surrounding normal ones. I would say it is more practical and I want to strongly encurage you to implement this algorithm if choosen.
https://arxiv.org/abs/1711.10589
11) Additive Explanations for Anomalies Detected from Multivariate Temporal Data
Explaining why a given event is anomalous can be as important as detecting it, as it helps to create Trust. This Paper suggests a Method that is based on differentiating between features that contribute more and less. Explaining why a given event is anomalous can be as important as detecting it, as it helps to create Trust. This Paper suggests a Method that is based on differentiating between features that contribute more and less.
It is also a quite short paper, so it is extra important to look for other papers. It is also a quite short paper, so it is extra important to look for other papers.
https://dl.acm.org/doi/abs/10.1145/3357384.3358121 https://dl.acm.org/doi/abs/10.1145/3357384.3358121
requires vpn, contact me if you have problems with this requires vpn, contact me if you have problems with this
Anomaly Detection for Monitoring 12) Interpretable AD for Device Failure
This is more a book and less of a Paper. So it should be perfect for you if you have not that much experience. If focusses on Time Series Analyses, namely the Task of detection when a continous datastream becomes anomalous. This is for examle useful for a machine supervised by sensors that at some point stops working (and thus changes the sensor output) This is an Application Paper. Its complexity comes mostly from the fact that real world data is messy and the Paper addresses ways to mitigate this.
https://assets.dynatrace.com/content/dam/en/wp/Anomaly-Detection-for-Monitoring-Ruxit.pdf https://arxiv.org/pdf/2007.10088
Fast Unsupervised Anomaly Detection in Traffic Videos 13) Fast Unsupervised Anomaly Detection in Traffic Videos
This is another Application Paper. Its main complexity is the Input data type, as this uses videos (which are very high dimensional and contain temporal correlations). You will see how good preprocessing can make even a basic algorithm viable for complicated problems. This is another Application Paper. Its main complexity is the Input data type, as this uses videos (which are very high dimensional and contain temporal correlations). You will see how good preprocessing can make even a basic algorithm viable for complicated problems.
https://openaccess.thecvf.com/content_CVPRW_2020/papers/w35/Doshi_Fast_Unsupervised_Anomaly_Detection_in_Traffic_Videos_CVPRW_2020_paper.pdf https://openaccess.thecvf.com/content_CVPRW_2020/papers/w35/Doshi_Fast_Unsupervised_Anomaly_Detection_in_Traffic_Videos_CVPRW_2020_paper.pdf
HiCS: High Contrast Subspaces for Density-Based Outlier Ranking
As most datapoints are quite high-dimensional, it is often the case that some features are useless and could actually take part in hiding the true abnomalities. This Paper suggests a method to select a subspace that filters out unimportant features.
This paper was cowritten by Prof. Müller and might be related to a future Masters thesis.
https://www.ipd.kit.edu/~muellere/publications/ICDE2012.pdf
LOF: identifying Density-Based Local Outliers 14) Detecting Spacecraft Anomalies Using LSTMs and Nonparametric Dynamic Thresholding
LOF is a classical algorithm used in many Applications. This is the original Paper introducing it. As this is a fairly old Paper, you will also find a lot of other sources describing LOF.
https://www.dbs.ifi.lmu.de/Publikationen/Papers/LOF.pdf
A comprehensive survey of anomaly detection techniques for high dimensional big data
Anomaly Detection is generally more complicated when you are given higher dimensional data (Curse of dimensionality). This seems a little weird, as usually machine learning improves when you are given more informations. I imagine it as useless features confusing this algorithm. This Paper could be seen as a study of this phenomena.
https://journalofbigdata.springeropen.com/track/pdf/10.1186/s40537-020-00320-x.pdf
A Survey on GANs for Anomaly Detection
GANs are an advanced ML method, normally used to generate really realistic artificial Images (check out https://thispersondoesnotexist.com/ if you have never done so). But these can also be used for anomaly detection. Your task would be to explain how.
https://arxiv.org/pdf/1906.11632
Detecting Spacecraft Anomalies Using LSTMs and Nonparametric Dynamic Thresholding
This is another Application Paper, but this time using a more complicated algorithms from recurrent ML. It tries to monitor the evergrowing amount of spacecrafts for anomalous behaviour. This is another Application Paper, but this time using a more complicated algorithms from recurrent ML. It tries to monitor the evergrowing amount of spacecrafts for anomalous behaviour.
https://arxiv.org/pdf/1802.04431 https://arxiv.org/pdf/1802.04431