semi final push

2021-10-11 17:43:56 +02:00 · 2021-10-11 17:43:56 +02:00 · ea54626a20
commit ea54626a20
parent a05699269c
6 changed files with 77 additions and 74 deletions
--- a/config.tex
+++ b/config.tex
@ -5,7 +5,7 @@
 %%%%%%%%%%%%%%%%%%
 % Talk's title
-\settalktitle{Anomaly Detection Seminar 2021/2022}
+\settalktitle{Seminar Unsupervised Machine Learning - Anomaly Detection}
 % Author's name
 \settalkauthor{}
--- a/slides.blg
+++ b/slides.blg
@ -1,9 +1,9 @@
 [0] Config.pm:312> INFO - This is Biber 2.15
 [0] Config.pm:315> INFO - Logfile is 'slides.blg'
-[53] biber:330> INFO - === Mo Okt 11, 2021, 17:26:11
+[51] biber:330> INFO - === Mo Okt 11, 2021, 17:38:14
-[63] Biber.pm:415> INFO - Reading 'slides.bcf'
+[61] Biber.pm:415> INFO - Reading 'slides.bcf'
-[115] Biber.pm:952> INFO - Found 0 citekeys in bib section 0
+[111] Biber.pm:952> INFO - Found 0 citekeys in bib section 0
-[119] Utils.pm:395> WARN - The file 'slides.bcf' does not contain any citations!
+[116] Utils.pm:395> WARN - The file 'slides.bcf' does not contain any citations!
-[125] bbl.pm:651> INFO - Writing 'slides.bbl' with encoding 'UTF-8'
+[122] bbl.pm:651> INFO - Writing 'slides.bbl' with encoding 'UTF-8'
-[126] bbl.pm:754> INFO - Output to slides.bbl
+[122] bbl.pm:754> INFO - Output to slides.bbl
-[126] Biber.pm:128> INFO - WARNINGS: 1
+[122] Biber.pm:128> INFO - WARNINGS: 1
--- a/slides.log
+++ b/slides.log
@ -1,4 +1,4 @@
-This is LuaHBTeX, Version 1.13.0 (TeX Live 2021/Arch Linux)  (format=lualatex 2021.6.8)  11 OCT 2021 17:26
+This is LuaHBTeX, Version 1.13.0 (TeX Live 2021/Arch Linux)  (format=lualatex 2021.6.8)  11 OCT 2021 17:38
 system commands enabled.
 **slides
 (./slides.tex
@ -2068,12 +2068,12 @@ Package polyglossia Info: Option: English, variant=american.
 Package polyglossia Info: Option: english variant=american (with additional pat
 terns).
 Module polyglossia Info: Language data for usenglishmax
 (polyglossia)            patterns	hyph-en-us.pat.txt
 (polyglossia)            hyphenation	hyph-en-us.hyp.txt
 (polyglossia)            righthyphenmin	3
 (polyglossia)            lefthyphenmin	2
 (polyglossia)            loader	loadhyph-en-us.tex
-(polyglossia)            synonyms	 on input line 35
+(polyglossia)            hyphenation	hyph-en-us.hyp.txt
 (polyglossia)            synonyms	
 (polyglossia)            lefthyphenmin	2
 (polyglossia)            patterns	hyph-en-us.pat.txt
 (polyglossia)            righthyphenmin	3 on input line 35
 Module polyglossia Info: Language usenglishmax was not yet loaded; created with 
 id 2 on input line 35
 Package polyglossia Info: Option: english variant=american (with additional pat
@ -2135,12 +2135,12 @@ braces):
 >  {german/localnumeral}  =>  {polyglossia@C@localnumeral}
 >  {german/Localnumeral}  =>  {polyglossia@C@localnumeral}.
 Module polyglossia Info: Language data for german
 (polyglossia)            patterns	hyph-de-1901.pat.txt
 (polyglossia)            hyphenation	
 (polyglossia)            righthyphenmin	2
 (polyglossia)            lefthyphenmin	2
 (polyglossia)            loader	loadhyph-de-1901.tex
-(polyglossia)            synonyms	 on input line 10
+(polyglossia)            hyphenation	
 (polyglossia)            synonyms	
 (polyglossia)            lefthyphenmin	2
 (polyglossia)            patterns	hyph-de-1901.pat.txt
 (polyglossia)            righthyphenmin	2 on input line 10
 Module polyglossia Info: Language german was not yet loaded; created with id 3 o
 n input line 10
 Package polyglossia Info: Option: German, spelling=new.
@ -3104,12 +3104,12 @@ Package biblatex Info: ... file 'german.lbx' found.
 (/usr/share/texmf-dist/tex/latex/biblatex/lbx/german.lbx
 File: german.lbx 2020/12/31 v3.16 biblatex localization (PK/MW)
 Module polyglossia Info: Language data for ngerman
 (polyglossia)            patterns	hyph-de-1996.pat.txt
 (polyglossia)            hyphenation	
 (polyglossia)            righthyphenmin	2
 (polyglossia)            lefthyphenmin	2
 (polyglossia)            loader	loadhyph-de-1996.tex
-(polyglossia)            synonyms	 on input line 561
+(polyglossia)            hyphenation	
 (polyglossia)            synonyms	
 (polyglossia)            lefthyphenmin	2
 (polyglossia)            patterns	hyph-de-1996.pat.txt
 (polyglossia)            righthyphenmin	2 on input line 561
 Module polyglossia Info: Language ngerman was not yet loaded; created with id 5 
 on input line 561
 )
@ -3961,15 +3961,15 @@ Here is how much of LuaTeX's memory you used:
 n, 63 penalty, 5 margin_kern, 361 glyph, 256 attribute, 92 glue_spec, 256 attrib
 ute_list, 4 write, 24 pdf_literal, 92 pdf_colorstack, 1 pdf_setmatrix, 1 pdf_sav
 e, 1 pdf_restore nodes
-   avail lists: 1:3,2:387,3:215,4:335,5:215,6:59,7:2079,8:9,9:466,10:24,11:136,1
+   avail lists: 1:4,2:387,3:215,4:335,5:215,6:59,7:2079,8:9,9:466,10:24,11:136,1
 2:1
 81249 multiletter control sequences out of 65536+600000
- 116 fonts using 34613951 bytes
+ 116 fonts using 34614111 bytes
 136i,20n,154p,819b,2327s stack positions out of 5000i,500n,10000p,200000b,80000s
 </usr/share/texmf-dist/fonts/opentype/public/libertinus-fonts/LibertinusSans-Bol
 d.otf></usr/share/texmf-dist/fonts/opentype/public/libertinus-fonts/LibertinusSa
 ns-Regular.otf>
-Output written on slides.pdf (22 pages, 2035288 bytes).
+Output written on slides.pdf (22 pages, 2035304 bytes).
 PDF statistics: 262 PDF objects out of 1000 (max. 8388607)
 172 compressed objects within 2 object streams
--- a/slides.pdf
+++ b/slides.pdf
--- a/slides.tex
+++ b/slides.tex
@ -39,8 +39,8 @@
        \begin{columns}
            \begin{column}{.475\textwidth}
                \begin{itemize}
-                    \item Kick-Off
+                    \item Kick-Off Meeting
-                    \item Some Formal Stuff
+                    \item Some Formalities
                    \item Short Overview of the Topics
                \end{itemize}
                \begin{center}
@ -53,7 +53,7 @@
                \begin{itemize}
                    \item Choose a couple topics
                    \begin{itemize}
-                        \item Since we are only a few, you can make these requests quite complicated if you like (I prefer topic 1, but I would also take 3 or 7, except when I can do it in german, then I would prefer topic 12)
+                        \item Since we are only a few, you can make these requests quite complicated (I prefer topic 1, but I would also take 3 or 7, except when I can do it in german, then I would prefer topic 12)
                    \end{itemize}
                    \item Send your choice to Simon.Kluettermann@cs.tu-dortmund.de (till tomorrow 13.10.2021 23:59)
                    \item You will be assigned one in the next days
--- a/summary.txt
+++ b/summary.txt
@ -1,63 +1,66 @@
-Contextual Outliers
+1) Anomaly Detection for Monitoring
-This Paper focuses on interpretability of anomaly detection methods. The Method described works by splitting up the set of normal events into groups and tries to relate any abnormal event to its surrounding normal ones. I would say it is more practical and I want to strongly encurage you to implement this algorithm if choosen.
+This is more a book and less of a Paper. So it should be perfect for you if you have not that much experience. If focusses on Time Series Analyses, namely the Task of detection when a continous datastream becomes anomalous. This is for examle useful for a machine supervised by sensors that at some point stops working (and thus changes the sensor output)
-https://arxiv.org/abs/1711.10589
+https://assets.dynatrace.com/content/dam/en/wp/Anomaly-Detection-for-Monitoring-Ruxit.pdf
-Active AD via Ensembles...
+2) A comprehensive survey of anomaly detection techniques for high dimensional big data
-This Paper tries do to a lot. I suggest that you focus on the active learning part. Alternatively we have also a paper on ensembles so if you both want, you can combine these papers to be worked on by two Students. Active AD extends the task of finding anomalies to the case in which the anomaly status of the training events is not clearly defined. Its focus here lies in minimizing the amount of human work needed to classify a given dataset (given some labels, train a model, find those new events that are unclear, classify those, restart).
+Anomaly Detection is generally more complicated when you are given higher dimensional data (Curse of dimensionality). This seems a little weird, as usually machine learning improves when you are given more informations. I imagine it as useless features confusing this algorithm. This Paper could be seen as a study of this phenomena.
-I want to note here, that great work on an easy topic if for us the same as good work on a hard topic.
+https://journalofbigdata.springeropen.com/track/pdf/10.1186/s40537-020-00320-x.pdf
 https://arxiv.org/pdf/1901.08930
-Interpretable AD for Device Failure
+3) A Comprehensive Survey on Graph Anomaly Detection with Deep Learning
 This is an Application Paper. Its complexity comes mostly from the fact that real world data is messy and the Paper addresses ways to mitigate this.
 https://arxiv.org/pdf/2007.10088
 Neural Transformation Learning for Deep Anomaly Detection Beyond Images
 While for Image data, certain pre-Transformations(like Rotations) can clearly improve Machine Learning Tasks like Anomaly Detection, this is much less well defined for Time-Series/Tabular data. This Paper tries to solve this by defining learnable Transformations.
 https://arxiv.org/pdf/2103.16440
 Unsupervised Anomaly Detection Ensembles using Item Response Theory
 Different AD algorithms are usually better at finding different types of anomalies. To get a more general algorithm you can combine multiple ones into one using Ensembles.
 This Paper could be merged together with "Active AD via Ensembles" to be handled by two students.
 https://arxiv.org/pdf/2106.06243
 A Comprehensive Survey on Graph Anomaly Detection with Deep Learning
 A lot of datasets that are interesting to AD (For example Email Communications or Trading Data) can be best represented as graphs. This provides unique challeges for AD algorithms.
 This is a paper that could either be handled by two students or split up into two. Maybe one considers anomalous graphs, while the other one considers anomalous nodes in graphs.
 https://arxiv.org/pdf/2106.07178
-Additive Explanations for Anomalies Detected from Multivariate Temporal Data
+4) LOF: identifying Density-Based Local Outliers
 LOF is a classical algorithm used in many Applications. This is the original Paper introducing it. As this is a fairly old Paper, you will also find a lot of other sources describing LOF.
 https://www.dbs.ifi.lmu.de/Publikationen/Papers/LOF.pdf
 5) HiCS: High Contrast Subspaces for Density-Based Outlier Ranking
 As most datapoints are quite high-dimensional, it is often the case that some features are useless and could actually take part in hiding the true abnomalities. This Paper suggests a method to select a subspace that filters out unimportant features.
 This paper was cowritten by Prof. Müller and might be related to a future Masters thesis.
 https://www.ipd.kit.edu/~muellere/publications/ICDE2012.pdf
 6) Neural Transformation Learning for Deep Anomaly Detection Beyond Images
 While for Image data, certain pre-Transformations(like Rotations) can clearly improve Machine Learning Tasks like Anomaly Detection, this is much less well defined for Time-Series/Tabular data. This Paper tries to solve this by defining learnable Transformations.
 https://arxiv.org/pdf/2103.16440
 7) A Survey on GANs for Anomaly Detection
 GANs are an advanced ML method, normally used to generate really realistic artificial Images (check out https://thispersondoesnotexist.com/ if you have never done so). But these can also be used for anomaly detection. Your task would be to explain how.
 https://arxiv.org/pdf/1906.11632
 8) Unsupervised Anomaly Detection Ensembles using Item Response Theory
 Different AD algorithms are usually better at finding different types of anomalies. To get a more general algorithm you can combine multiple ones into one using Ensembles.
 This Paper could be merged together with "Active AD via Ensembles" to be handled by two students.
 https://arxiv.org/pdf/2106.06243
 9) Active AD via Ensembles...
 This Paper tries do to a lot. I suggest that you focus on the active learning part. Alternatively we have also a paper on ensembles so if you both want, you can combine these papers to be worked on by two Students. Active AD extends the task of finding anomalies to the case in which the anomaly status of the training events is not clearly defined. Its focus here lies in minimizing the amount of human work needed to classify a given dataset (given some labels, train a model, find those new events that are unclear, classify those, restart).
 I want to note here, that great work on an easy topic if for us the same as good work on a hard topic.
 https://arxiv.org/pdf/1901.08930
 10) Contextual Outliers
 This Paper focuses on interpretability of anomaly detection methods. The Method described works by splitting up the set of normal events into groups and tries to relate any abnormal event to its surrounding normal ones. I would say it is more practical and I want to strongly encurage you to implement this algorithm if choosen.
 https://arxiv.org/abs/1711.10589
 11) Additive Explanations for Anomalies Detected from Multivariate Temporal Data
 Explaining why a given event is anomalous can be as important as detecting it, as it helps to create Trust. This Paper suggests a Method that is based on differentiating between features that contribute more and less.
 It is also a quite short paper, so it is extra important to look for other papers.
 https://dl.acm.org/doi/abs/10.1145/3357384.3358121
 requires vpn, contact me if you have problems with this
-Anomaly Detection for Monitoring
+12) Interpretable AD for Device Failure
-This is more a book and less of a Paper. So it should be perfect for you if you have not that much experience. If focusses on Time Series Analyses, namely the Task of detection when a continous datastream becomes anomalous. This is for examle useful for a machine supervised by sensors that at some point stops working (and thus changes the sensor output)
+This is an Application Paper. Its complexity comes mostly from the fact that real world data is messy and the Paper addresses ways to mitigate this.
-https://assets.dynatrace.com/content/dam/en/wp/Anomaly-Detection-for-Monitoring-Ruxit.pdf
+https://arxiv.org/pdf/2007.10088
-Fast Unsupervised Anomaly Detection in Traffic Videos
+13) Fast Unsupervised Anomaly Detection in Traffic Videos
 This is another Application Paper. Its main complexity is the Input data type, as this uses videos (which are very high dimensional and contain temporal correlations). You will see how good preprocessing can make even a basic algorithm viable for complicated problems.
 https://openaccess.thecvf.com/content_CVPRW_2020/papers/w35/Doshi_Fast_Unsupervised_Anomaly_Detection_in_Traffic_Videos_CVPRW_2020_paper.pdf
 HiCS: High Contrast Subspaces for Density-Based Outlier Ranking
 As most datapoints are quite high-dimensional, it is often the case that some features are useless and could actually take part in hiding the true abnomalities. This Paper suggests a method to select a subspace that filters out unimportant features.
 This paper was cowritten by Prof. Müller and might be related to a future Masters thesis.
 https://www.ipd.kit.edu/~muellere/publications/ICDE2012.pdf
-LOF: identifying Density-Based Local Outliers
+14) Detecting Spacecraft Anomalies Using LSTMs and Nonparametric Dynamic Thresholding
 LOF is a classical algorithm used in many Applications. This is the original Paper introducing it. As this is a fairly old Paper, you will also find a lot of other sources describing LOF.
 https://www.dbs.ifi.lmu.de/Publikationen/Papers/LOF.pdf
 A comprehensive survey of anomaly detection techniques for high dimensional big data
 Anomaly Detection is generally more complicated when you are given higher dimensional data (Curse of dimensionality). This seems a little weird, as usually machine learning improves when you are given more informations. I imagine it as useless features confusing this algorithm. This Paper could be seen as a study of this phenomena.
 https://journalofbigdata.springeropen.com/track/pdf/10.1186/s40537-020-00320-x.pdf
 A Survey on GANs for Anomaly Detection
 GANs are an advanced ML method, normally used to generate really realistic artificial Images (check out https://thispersondoesnotexist.com/ if you have never done so). But these can also be used for anomaly detection. Your task would be to explain how.
 https://arxiv.org/pdf/1906.11632
 Detecting Spacecraft Anomalies Using LSTMs and Nonparametric Dynamic Thresholding
 This is another Application Paper, but this time using a more complicated algorithms from recurrent ML. It tries to monitor the evergrowing amount of spacecrafts for anomalous behaviour.
 https://arxiv.org/pdf/1802.04431