initial push
This commit is contained in:
commit
ce067dbbf2
|
@ -0,0 +1,5 @@
|
|||
<frame >
|
||||
|
||||
<titlepage>
|
||||
|
||||
</frame>
|
|
@ -0,0 +1,14 @@
|
|||
<frame title="Problem">
|
||||
<list>
|
||||
<e>Paper with Benedikt</e>
|
||||
<e>require multiple very specific datasets</e>
|
||||
<l2st>
|
||||
<e>many but not to many features</e>
|
||||
<e>at least some samples (for the NN)</e>
|
||||
<e>Only numerical attributes best</e>
|
||||
<e>specific quality</e>
|
||||
<e>unrelated datasets</e>
|
||||
</l2st>
|
||||
<e>Requires you to search for many datasets and filter them</e>
|
||||
</list>
|
||||
</frame>
|
|
@ -0,0 +1,9 @@
|
|||
<frame title="Students">
|
||||
<list>
|
||||
<e>Not clear what you can use</e>
|
||||
<e>Many different formats</e>
|
||||
<e>train/test splits</e>
|
||||
<e>So for Students I just do this work and send them archives directly</e>
|
||||
<e>->Not a good solution</e>
|
||||
</list>
|
||||
</frame>
|
|
@ -0,0 +1,12 @@
|
|||
<frame title="yano">
|
||||
<list>
|
||||
<e>So I have been packaging all my scripts</e>
|
||||
<e>I had surprisingly much fun doing this</e>
|
||||
<l2st>
|
||||
<e>More than just standard functions</e>
|
||||
<e>A couple of weird decisions</e>
|
||||
<e>And this will likely grow further</e>
|
||||
</l2st>
|
||||
<e>->So I would like to discuss some parts with you and maybe you even have more features you might want</e>
|
||||
</list>
|
||||
</frame>
|
|
@ -0,0 +1,17 @@
|
|||
<frame title="yano">
|
||||
<split>
|
||||
<que>
|
||||
<list>
|
||||
<e>Simply install it over pip</e>
|
||||
<e>Contains 187 real-World Datasets</e>
|
||||
<e>->biggest library of datasets explicitely for anomaly detection</e>
|
||||
<e>not yet happy with this</e>
|
||||
<e>especially only mostly contains numerical and nominal attributes</e>
|
||||
<e>->few categorical and no time-series attributes</e>
|
||||
</list>
|
||||
</que>
|
||||
<que>
|
||||
<i f="../prep/04yano/a.png" wmode="True"></i>
|
||||
</que>
|
||||
</split>
|
||||
</frame>
|
|
@ -0,0 +1,17 @@
|
|||
<section Basics>
|
||||
<frame title="selector">
|
||||
|
||||
<code>
|
||||
import yano
|
||||
from yano.symbols import *
|
||||
condition= (number_of_features>5) &
|
||||
(number_of_features<100) &
|
||||
(number_of_samples>100) &
|
||||
(number_of_samples<10000) &
|
||||
(number_of_samples>2*number_of_features) &
|
||||
~index
|
||||
print(len(condition), "Datasets found")
|
||||
</code>
|
||||
->33 Datasets found
|
||||
|
||||
</frame>
|
|
@ -0,0 +1,29 @@
|
|||
<frame title="selectors">
|
||||
<list>
|
||||
<e>Lots of symbols like this</e>
|
||||
<l2st>
|
||||
<e>name</e>
|
||||
<e>number\_of\_features</e>
|
||||
<e>number\_of\_samples</e>
|
||||
<e>index (correlated datasets)</e>
|
||||
</l2st>
|
||||
<e>Feature types</e>
|
||||
<l2st>
|
||||
<e>numeric</e>
|
||||
<e>nominal</e>
|
||||
<e>categorical</e>
|
||||
<e>(textual)</e>
|
||||
</l2st>
|
||||
<e>Count based</e>
|
||||
<l2st>
|
||||
<e>number\_anomalies</e>
|
||||
<e>number\_normals</e>
|
||||
<e>fraction\_anomalies</e>
|
||||
</l2st>
|
||||
<e>Specific ones</e>
|
||||
<l2st>
|
||||
<e>image\_based</e>
|
||||
<e>(linearly\_seperable)</e>
|
||||
</l2st>
|
||||
</list>
|
||||
</frame>
|
|
@ -0,0 +1,15 @@
|
|||
<frame title="iterating">
|
||||
|
||||
<code>
|
||||
for dataset in condition:
|
||||
print(condition)
|
||||
</code>
|
||||
<l2st>
|
||||
<e>\[annthyroid\]</e>
|
||||
<e>\[breastw\]</e>
|
||||
<e>\[cardio\]</e>
|
||||
<e>\[...\]</e>
|
||||
<e>\[Housing\_low\]</e>
|
||||
</l2st>
|
||||
|
||||
</frame>
|
|
@ -0,0 +1,9 @@
|
|||
<frame title="iterating">
|
||||
|
||||
<code>
|
||||
for dataset in condition:
|
||||
x=dataset.getx()
|
||||
y=dataset.gety()
|
||||
</code>
|
||||
|
||||
</frame>
|
|
@ -0,0 +1,12 @@
|
|||
<frame title="pipeline">
|
||||
|
||||
<code>
|
||||
from yano.iter import *
|
||||
for dataset, x,tx,ty in pipeline(condition,
|
||||
split,
|
||||
shuffle,
|
||||
normalize("minmax")):
|
||||
...
|
||||
</code>
|
||||
|
||||
</frame>
|
|
@ -0,0 +1,16 @@
|
|||
<frame title="pipeline">
|
||||
<list>
|
||||
<e>Again there are a couple modifiers possible</e>
|
||||
<l2st>
|
||||
<e>nonconst->remove constant features</e>
|
||||
<e>shuffle</e>
|
||||
<e>normalize('zscore'/'minmax')</e>
|
||||
<e>cut(10)->at most 10 datasets</e>
|
||||
<e>split->train test split, all anomalies in test set</e>
|
||||
<e>crossval(5)->similar to split, but do multiple times (crossvalidation)</e>
|
||||
</l2st>
|
||||
<e>modifiers interact with each other</e>
|
||||
<e>For example: normalize('minmax'), split</e>
|
||||
<e>->train set always below 1, but no guarantees for the test set</e>
|
||||
</list>
|
||||
</frame>
|
|
@ -0,0 +1,18 @@
|
|||
<frame title="CrossValidation">
|
||||
<list>
|
||||
<e>Learned from DMC: Crossvalidation is important</e>
|
||||
<e>Rarely found in Anomaly Detection, why?</e>
|
||||
<e>A bit more complicated (not all samples are equal), but no reason why not</e>
|
||||
<e>->So I implemented it into yano</e>
|
||||
<l2st>
|
||||
<e>folding only on normal data</e>
|
||||
<e>How to handle anomalies?</e>
|
||||
<e>If not folding them, cross-validation less useful</e>
|
||||
<e>if folding them, often rare anomalies even more rare</e>
|
||||
<e>->test set always 50\% anomalous</e>
|
||||
<e>->Also improves simple evaluation metrics (accuracy)</e>
|
||||
</l2st>
|
||||
<e>Do you know a reason why Cross Validation is not common in AD?</e>
|
||||
<e>Are there Problems with the way I fold my Anomalies?</e>
|
||||
</list>
|
||||
</frame>
|
|
@ -0,0 +1,17 @@
|
|||
<frame title="Logging">
|
||||
|
||||
<code>
|
||||
from yano.logging import Logger
|
||||
from pyod.models.iforest import IForest
|
||||
from extended_iforest import train_extended_ifor
|
||||
l=Logger({"IFor":IForest(n_estimators=100),
|
||||
"eIFor":train_extended_ifor})
|
||||
for dataset, folds in pipeline(condition,
|
||||
crossval(5),
|
||||
normalize("minmax"),
|
||||
shuffle):
|
||||
l.run_cross(dataset, folds)
|
||||
latex=l.to_latex()
|
||||
</code>
|
||||
|
||||
</frame>
|
|
@ -0,0 +1,9 @@
|
|||
<frame title="Seeding">
|
||||
<list>
|
||||
<e>If you dont do anything, everything is seeded.</e>
|
||||
<e>Makes rerunning a Model until the performance is good quite obvious.</e>
|
||||
<e>But as every Run is seeded itself, this might induce bias.</e>
|
||||
<e>Do you think this is worth it?</e>
|
||||
<e>Are there any Problems with this?</e>
|
||||
</list>
|
||||
</frame>
|
|
@ -0,0 +1,24 @@
|
|||
<frame >
|
||||
|
||||
\begin{tabular}{lll}
|
||||
\hline
|
||||
Dataset & eIFor & IFor \\
|
||||
\hline
|
||||
$pc3$ & $\textbf{0.7231} \pm 0.0153$ & $\textbf{0.7223} \pm 0.0178$ \\
|
||||
$pima$ & $\textbf{0.7405} \pm 0.0110$ & $\textbf{0.7347} \pm 0.0126$ \\
|
||||
$Diabetes\_present$ & $\textbf{0.7414} \pm 0.0195$ & $\textbf{0.7344} \pm 0.0242$ \\
|
||||
$waveform-5000$ & $\textbf{0.7687} \pm 0.0123$ & $\textbf{0.7592} \pm 0.0206$ \\
|
||||
$vowels$ & $\textbf{0.7843} \pm 0.0298$ & $\textbf{0.7753} \pm 0.0334$ \\
|
||||
$Vowel\_0$ & $\textbf{0.8425} \pm 0.0698$ & $0.7193 \pm 0.0817$ \\
|
||||
$Abalone\_1\_8$ & $\textbf{0.8525} \pm 0.0263$ & $0.8452 \pm 0.0257$ \\
|
||||
$annthyroid$ & $0.8399 \pm 0.0135$ & $\textbf{0.9087} \pm 0.0090$ \\
|
||||
$Vehicle\_van$ & $\textbf{0.8792} \pm 0.0265$ & $\textbf{0.8697} \pm 0.0383$ \\
|
||||
$ionosphere$ & $\textbf{0.9320} \pm 0.0069$ & $0.9086 \pm 0.0142$ \\
|
||||
$breastw$ & $\textbf{0.9948} \pm 0.0031$ & $\textbf{0.9952} \pm 0.0033$ \\
|
||||
$segment$ & $\textbf{1.0}$ & $\textbf{0.9993} \pm 0.0015$ \\
|
||||
$$ & $$ & $$ \\
|
||||
$Average$ & $\textbf{0.8005}$ & $\textbf{0.7957}$ \\
|
||||
\hline
|
||||
\end{tabular}
|
||||
|
||||
</frame>
|
|
@ -0,0 +1,10 @@
|
|||
<frame title="statistics">
|
||||
<list>
|
||||
<e>Friedman test to see if there is a difference between models</e>
|
||||
<e>Nemenyi test to see which models are equal, mark those equal to the maximum</e>
|
||||
<e>For 2 models, Friedman not defined -> use Wilcoxon test</e>
|
||||
<e>Does this match your expectation from the table?</e>
|
||||
<e>Two models are 'equal' if their probability of being from the same distribution is #LessThan(p_b,p)#, what value should #Eq(p_b,0.1)# have?</e>
|
||||
<e>Do I need to correct for p hacking (n experiments, so increase the difficulty for each, or is that clear from the table)</e>
|
||||
</list>
|
||||
</frame>
|
|
@ -0,0 +1,24 @@
|
|||
<section Experiments 1>
|
||||
<repeat w="['ifor','eifor','qual']">
|
||||
<frame title="Extended Isolation Forests">
|
||||
<split>
|
||||
<que>
|
||||
<list>
|
||||
<e>Isolation Forests are one algorithm for AD</e>
|
||||
<e>Tries to isolate abnormal (rare) points instead of modelling normal ones</e>
|
||||
<e>Creative approach->fairly successful (3000 Citations)</e>
|
||||
<e>Many follow up papers</e>
|
||||
<e>Extended Isolation Forest (Hariri et. al. 2018, 140 Citations)</e>
|
||||
<e>Remove bias from the Isolation Forests</e>
|
||||
<e>Also claim to improve their anomaly detection quality</e>
|
||||
</list>
|
||||
</que>
|
||||
|
||||
<que>
|
||||
<i f="???" wmode="True"></i>
|
||||
</que>
|
||||
</split>
|
||||
|
||||
</frame>
|
||||
|
||||
</repeat>
|
|
@ -0,0 +1,22 @@
|
|||
<frame >
|
||||
|
||||
\begin{tabular}{lll}
|
||||
\hline
|
||||
Dataset & eIFor & IFor \\
|
||||
\hline
|
||||
$Delft\_pump\_5x3\_noisy$ & $\textbf{0.3893} \pm 0.0345$ & $\textbf{0.4272} \pm 0.0680$ \\
|
||||
$vertebral$ & $\textbf{0.4260} \pm 0.0111$ & $\textbf{0.4554} \pm 0.0416$ \\
|
||||
$Liver\_1$ & $0.5367 \pm 0.0508$ & $\textbf{0.5474} \pm 0.0541$ \\
|
||||
$Sonar\_mines$ & $\textbf{0.6882} \pm 0.1264$ & $0.6189 \pm 0.1301$ \\
|
||||
$letter$ & $\textbf{0.6756} \pm 0.0119$ & $0.6471 \pm 0.0111$ \\
|
||||
$Glass\_building\_float$ & $\textbf{0.6480} \pm 0.1012$ & $\textbf{0.6755} \pm 0.1117$ \\
|
||||
$pc3$ & $\textbf{0.7231} \pm 0.0153$ & $\textbf{0.7223} \pm 0.0178$ \\
|
||||
$pima$ & $\textbf{0.7405} \pm 0.0110$ & $\textbf{0.7347} \pm 0.0126$ \\
|
||||
$Diabetes\_present$ & $\textbf{0.7414} \pm 0.0195$ & $\textbf{0.7344} \pm 0.0242$ \\
|
||||
$waveform-5000$ & $\textbf{0.7687} \pm 0.0123$ & $\textbf{0.7592} \pm 0.0206$ \\
|
||||
$steel-plates-fault$ & $\textbf{0.7735} \pm 0.0351$ & $\textbf{0.7682} \pm 0.0402$ \\
|
||||
$vowels$ & $\textbf{0.7843} \pm 0.0298$ & $\textbf{0.7753} \pm 0.0334$ \\
|
||||
\hline
|
||||
\end{tabular}
|
||||
|
||||
</frame>
|
|
@ -0,0 +1,22 @@
|
|||
<frame >
|
||||
|
||||
\begin{tabular}{lll}
|
||||
\hline
|
||||
Dataset & eIFor & IFor \\
|
||||
\hline
|
||||
$Vowel\_0$ & $\textbf{0.8425} \pm 0.0698$ & $0.7193 \pm 0.0817$ \\
|
||||
$Housing\_low$ & $\textbf{0.7807} \pm 0.0333$ & $\textbf{0.7862} \pm 0.0336$ \\
|
||||
$ozone-level-8hr$ & $\textbf{0.7904} \pm 0.0207$ & $\textbf{0.7768} \pm 0.0118$ \\
|
||||
$Spectf\_0$ & $\textbf{0.8155} \pm 0.0255$ & $0.7535 \pm 0.0239$ \\
|
||||
$HeartC$ & $0.7795 \pm 0.0258$ & $\textbf{0.8079} \pm 0.0255$ \\
|
||||
$satellite$ & $\textbf{0.8125} \pm 0.0170$ & $\textbf{0.8103} \pm 0.0061$ \\
|
||||
$optdigits$ & $\textbf{0.8099} \pm 0.0310$ & $\textbf{0.8142} \pm 0.0267$ \\
|
||||
$spambase$ & $\textbf{0.8085} \pm 0.0110$ & $\textbf{0.8202} \pm 0.0042$ \\
|
||||
$Abalone\_1\_8$ & $\textbf{0.8525} \pm 0.0263$ & $0.8452 \pm 0.0257$ \\
|
||||
$qsar-biodeg$ & $\textbf{0.8584} \pm 0.0119$ & $\textbf{0.8628} \pm 0.0135$ \\
|
||||
$annthyroid$ & $0.8399 \pm 0.0135$ & $\textbf{0.9087} \pm 0.0090$ \\
|
||||
$Vehicle\_van$ & $\textbf{0.8792} \pm 0.0265$ & $\textbf{0.8697} \pm 0.0383$ \\
|
||||
\hline
|
||||
\end{tabular}
|
||||
|
||||
</frame>
|
|
@ -0,0 +1,21 @@
|
|||
<frame >
|
||||
|
||||
\begin{tabular}{lll}
|
||||
\hline
|
||||
Dataset & eIFor & IFor \\
|
||||
\hline
|
||||
$ionosphere$ & $\textbf{0.9320} \pm 0.0069$ & $0.9086 \pm 0.0142$ \\
|
||||
$page-blocks$ & $0.9189 \pm 0.0061$ & $\textbf{0.9299} \pm 0.0016$ \\
|
||||
$Ecoli$ & $\textbf{0.9418} \pm 0.0292$ & $0.9192 \pm 0.0332$ \\
|
||||
$cardio$ & $\textbf{0.9564} \pm 0.0043$ & $\textbf{0.9535} \pm 0.0036$ \\
|
||||
$wbc$ & $\textbf{0.9611} \pm 0.0121$ & $\textbf{0.9607} \pm 0.0107$ \\
|
||||
$pendigits$ & $\textbf{0.9641} \pm 0.0097$ & $\textbf{0.9652} \pm 0.0076$ \\
|
||||
$thyroid$ & $0.9818 \pm 0.0024$ & $\textbf{0.9871} \pm 0.0025$ \\
|
||||
$breastw$ & $\textbf{0.9948} \pm 0.0031$ & $\textbf{0.9952} \pm 0.0033$ \\
|
||||
$segment$ & $\textbf{1.0}$ & $\textbf{0.9993} \pm 0.0015$ \\
|
||||
$$ & $$ & $$ \\
|
||||
$Average$ & $\textbf{0.8005} \pm 0.1458$ & $\textbf{0.7957} \pm 0.1431$ \\
|
||||
\hline
|
||||
\end{tabular}
|
||||
|
||||
</frame>
|
|
@ -0,0 +1,4 @@
|
|||
<section Experiments 2>
|
||||
<frame title="highdim">
|
||||
<i f="../prep/19highdim/a.png" wmode="True"></i>
|
||||
</frame>
|
|
@ -0,0 +1,13 @@
|
|||
<frame title="New Condition">
|
||||
|
||||
<code>
|
||||
condition= (number_of_samples>200) &
|
||||
(number_of_samples<10000) &
|
||||
(number_of_features>50) &
|
||||
(number_of_features<500) &
|
||||
~index
|
||||
print(len(condition),"Datasets found")
|
||||
</code>
|
||||
->13 Datasets found
|
||||
|
||||
</frame>
|
|
@ -0,0 +1,12 @@
|
|||
<frame title="New Models">
|
||||
|
||||
<code>
|
||||
from pyod.models.iforest import IForest
|
||||
from pyod.models.knn import KNN
|
||||
from pyod.models.lof import LOF
|
||||
l=Logger({"IFor":Iforest(n_estimators=100),
|
||||
"Lof":LOF(),
|
||||
"Knn": KNN()}, addfeat=True)
|
||||
</code>
|
||||
|
||||
</frame>
|
|
@ -0,0 +1,25 @@
|
|||
<frame >
|
||||
|
||||
\begin{tabular}{llll}
|
||||
\hline
|
||||
Dataset & Knn & Lof & IFor \\
|
||||
\hline
|
||||
$Delft\_pump\_5x3\_noisy(64)$ & $0.3800 \pm 0.0475$ & $0.3462 \pm 0.0327$ & $\textbf{0.4272} \pm 0.0680$ \\
|
||||
$hill-valley(100)$ & $0.4744 \pm 0.0269$ & $\textbf{0.5060} \pm 0.0327$ & $0.4720 \pm 0.0288$ \\
|
||||
$speech(400)$ & $0.4903 \pm 0.0103$ & $\textbf{0.5104} \pm 0.0115$ & $0.4872 \pm 0.0184$ \\
|
||||
$Sonar\_mines(60)$ & $\textbf{0.7284} \pm 0.0939$ & $0.6769 \pm 0.0933$ & $0.6189 \pm 0.1301$ \\
|
||||
$ozone-level-8hr(72)$ & $\textbf{0.8051} \pm 0.0288$ & $0.7738 \pm 0.0292$ & $\textbf{0.7768} \pm 0.0118$ \\
|
||||
$spambase(57)$ & $0.8038 \pm 0.0125$ & $0.7712 \pm 0.0055$ & $\textbf{0.8202} \pm 0.0042$ \\
|
||||
$arrhythmia(274)$ & $\textbf{0.8137} \pm 0.0185$ & $0.8042 \pm 0.0186$ & $\textbf{0.8086} \pm 0.0099$ \\
|
||||
$mnist(100)$ & $0.9345 \pm 0.0039$ & $\textbf{0.9548} \pm 0.0037$ & $0.8732 \pm 0.0069$ \\
|
||||
$Concordia3\_32(256)$ & $0.9246 \pm 0.0107$ & $\textbf{0.9486} \pm 0.0099$ & $\textbf{0.9322} \pm 0.0178$ \\
|
||||
$optdigits(64)$ & $0.9966 \pm 0.0012$ & $\textbf{0.9975} \pm 0.0012$ & $0.8142 \pm 0.0267$ \\
|
||||
$gas-drift(128)$ & $\textbf{0.9790} \pm 0.0018$ & $0.9585 \pm 0.0055$ & $0.8764 \pm 0.0166$ \\
|
||||
$Delft\_pump\_AR(160)$ & $\textbf{0.9965}$ & $\textbf{0.9953} \pm 0.0019$ & $0.9665 \pm 0.0096$ \\
|
||||
$musk(166)$ & $\textbf{1.0}$ & $\textbf{1.0}$ & $0.9808 \pm 0.0117$ \\
|
||||
$$ & $$ & $$ & $$ \\
|
||||
$Average$ & $\textbf{0.7944}$ & $\textbf{0.7879}$ & $0.7580$ \\
|
||||
\hline
|
||||
\end{tabular}
|
||||
|
||||
</frame>
|
|
@ -0,0 +1,11 @@
|
|||
<frame >
|
||||
|
||||
<l2st>
|
||||
<e>Hypothesis: Isolation Forests are better when there are numerical and nominal attributes</e>
|
||||
<e>Easy to test</e>
|
||||
</l2st>
|
||||
<code>
|
||||
condition=condition & (numeric & nominal)
|
||||
</code>
|
||||
|
||||
</frame>
|
|
@ -0,0 +1,20 @@
|
|||
<frame >
|
||||
|
||||
\begin{tabular}{llll}
|
||||
\hline
|
||||
Dataset & Knn & IFor & Lof \\
|
||||
\hline
|
||||
$ozone-level-8hr(72)$ & $\textbf{0.8051} \pm 0.0288$ & $\textbf{0.7768} \pm 0.0118$ & $0.7738 \pm 0.0292$ \\
|
||||
$spambase(57)$ & $0.8038 \pm 0.0125$ & $\textbf{0.8202} \pm 0.0042$ & $0.7712 \pm 0.0055$ \\
|
||||
$arrhythmia(274)$ & $\textbf{0.8137} \pm 0.0185$ & $\textbf{0.8086} \pm 0.0099$ & $0.8042 \pm 0.0186$ \\
|
||||
$musk(166)$ & $\textbf{1.0}$ & $0.9808 \pm 0.0117$ & $\textbf{1.0}$ \\
|
||||
$$ & $$ & $$ & $$ \\
|
||||
$Average$ & $\textbf{0.8556}$ & $\textbf{0.8466}$ & $\textbf{0.8373}$ \\
|
||||
\hline
|
||||
\end{tabular}
|
||||
<l2st>
|
||||
<e>Only 4 datasets, so not clear at all</e>
|
||||
<e>->More datasets</e>
|
||||
</l2st>
|
||||
|
||||
</frame>
|
|
@ -0,0 +1,12 @@
|
|||
<section Experiments 3>
|
||||
<frame title="Unsupervised Optimization">
|
||||
<list>
|
||||
<e>There are analysis that are only possible with many datasets</e>
|
||||
<e>Here: unsupervised optimization</e>
|
||||
<e>Given multiple AD models, find which is best:</e>
|
||||
<e>Use AUC score? Requires Anomalies->Overfitting</e>
|
||||
<e>Can you find an unsupervised Method?</e>
|
||||
<e>In general very complicated, so here only focus on very small differences in the model.</e>
|
||||
<e>So each model is an autoencoder, trained on the same dataset, where the difference is only in the initialisation</e>
|
||||
</list>
|
||||
</frame>
|
|
@ -0,0 +1,20 @@
|
|||
<repeat w="['page-blocks','pima']">
|
||||
<frame title="Loss Optimization">
|
||||
<split>
|
||||
<que>
|
||||
<list>
|
||||
<e>First guess Loss of the Model on the training Data</e>
|
||||
<e>How to evaluate this?</e>
|
||||
<e>Train many models, look at the average AUC score.</e>
|
||||
<e>For the alternative, take groups of 20 models, and look at the AUC score of the best model.</e>
|
||||
<e>Is there a meaningfull difference between results? Give result as z\_score (#(m_1-m_2)/sqrt(s_1**2+s_2**2)#)</e>
|
||||
<e>This difference depends a lot on the dataset</e>
|
||||
<e>->even #LessThan(30,z)# does not mean much</e>
|
||||
</list>
|
||||
</que>
|
||||
<que>
|
||||
<i f="histone_???" wmode="True"></i>
|
||||
</que>
|
||||
</split>
|
||||
</frame>
|
||||
</repeat>
|
|
@ -0,0 +1,12 @@
|
|||
<frame title="loss">
|
||||
<split>
|
||||
<que>
|
||||
<list>
|
||||
<e>Pick the Model with the lowest l2\-loss</e>
|
||||
</list>
|
||||
</que>
|
||||
<que>
|
||||
<i f="../prep/27loss/z_loss.pdf" wmode="True"></i>
|
||||
</que>
|
||||
</split>
|
||||
</frame>
|
|
@ -0,0 +1,14 @@
|
|||
<frame title="Robustness">
|
||||
<split>
|
||||
<que>
|
||||
<list>
|
||||
<e>Pick points with 1\% width difference in input space around each point.</e>
|
||||
<e>for each point, find the maximum difference in output space.</e>
|
||||
<e>average this difference</e>
|
||||
</list>
|
||||
</que>
|
||||
<que>
|
||||
<i f="../prep/28Robustness/z_robu.pdf" wmode="True"></i>
|
||||
</que>
|
||||
</split>
|
||||
</frame>
|
|
@ -0,0 +1,14 @@
|
|||
<frame title="Distance Correlation">
|
||||
<split>
|
||||
<que>
|
||||
<list>
|
||||
<e>Pick random points in the input space.</e>
|
||||
<e>measure the distance in input and output space</e>
|
||||
<e>a low correlation is a good model</e>
|
||||
</list>
|
||||
</que>
|
||||
<que>
|
||||
<i f="../prep/29Distance_Correlation/z_dist.pdf" wmode="True"></i>
|
||||
</que>
|
||||
</split>
|
||||
</frame>
|
|
@ -0,0 +1,12 @@
|
|||
<section Conclusion>
|
||||
<frame title="Other">
|
||||
<list>
|
||||
<e>Things I still want to add:</e>
|
||||
<l2st>
|
||||
<e>Ensemble Methods</e>
|
||||
<e>Visualisation options</e>
|
||||
<e>Alternative Evaluations</e>
|
||||
<e>Hyperparameter optimisation (with crossvalidation)</e>
|
||||
</l2st>
|
||||
</list>
|
||||
</frame>
|
|
@ -0,0 +1,7 @@
|
|||
<frame title="Feedback">
|
||||
<list>
|
||||
<e>What do you think about this?</e>
|
||||
<e>Is there something I should also add?</e>
|
||||
<e>What would you need for you to actually use this?</e>
|
||||
</list>
|
||||
</frame>
|
|
@ -0,0 +1,12 @@
|
|||
<plt>
|
||||
|
||||
<name Current experiment status>
|
||||
<title pip install yano>
|
||||
<stitle yano>
|
||||
|
||||
<institute ls9 tu Dortmund>
|
||||
|
||||
<theme CambridgeUS>
|
||||
<colo dolphin>
|
||||
|
||||
</plt>
|
|
@ -0,0 +1 @@
|
|||
Subproject commit 62ffd6ae589d7983791feea9d44d7658534d54a0
|
|
@ -0,0 +1,3 @@
|
|||
pdflatex main.tex
|
||||
pdflatex main.tex
|
||||
|
|
@ -0,0 +1,3 @@
|
|||
pdflatex main.tex
|
||||
pdflatex main.tex
|
||||
|
|
@ -0,0 +1,127 @@
|
|||
[
|
||||
{
|
||||
"typ": "img",
|
||||
"files": [
|
||||
"../prep/04yano/a.png"
|
||||
],
|
||||
"label": "prep04yanoapng",
|
||||
"caption": "",
|
||||
"where": "../yano//data/004yano.txt"
|
||||
},
|
||||
{
|
||||
"typ": "section",
|
||||
"title": "Basics",
|
||||
"label": "Basics",
|
||||
"file": "../yano//data/005selector.txt",
|
||||
"issec": true
|
||||
},
|
||||
{
|
||||
"typ": "section",
|
||||
"title": "Experiments 1",
|
||||
"label": "Experiments 1",
|
||||
"file": "../yano//data/016Extended Isolation Forests.txt",
|
||||
"issec": true
|
||||
},
|
||||
{
|
||||
"typ": "img",
|
||||
"files": [
|
||||
"../imgs/ifor"
|
||||
],
|
||||
"label": "ifor",
|
||||
"caption": "",
|
||||
"where": "../yano//data/016Extended Isolation Forests.txt"
|
||||
},
|
||||
{
|
||||
"typ": "img",
|
||||
"files": [
|
||||
"../imgs/eifor"
|
||||
],
|
||||
"label": "eifor",
|
||||
"caption": "",
|
||||
"where": "../yano//data/016Extended Isolation Forests.txt"
|
||||
},
|
||||
{
|
||||
"typ": "img",
|
||||
"files": [
|
||||
"../imgs/qual"
|
||||
],
|
||||
"label": "qual",
|
||||
"caption": "",
|
||||
"where": "../yano//data/016Extended Isolation Forests.txt"
|
||||
},
|
||||
{
|
||||
"typ": "section",
|
||||
"title": "Experiments 2",
|
||||
"label": "Experiments 2",
|
||||
"file": "../yano//data/020highdim.txt",
|
||||
"issec": true
|
||||
},
|
||||
{
|
||||
"typ": "img",
|
||||
"files": [
|
||||
"../prep/19highdim/a.png"
|
||||
],
|
||||
"label": "prep19highdimapng",
|
||||
"caption": "",
|
||||
"where": "../yano//data/020highdim.txt"
|
||||
},
|
||||
{
|
||||
"typ": "section",
|
||||
"title": "Experiments 3",
|
||||
"label": "Experiments 3",
|
||||
"file": "../yano//data/026Unsupervised Optimization.txt",
|
||||
"issec": true
|
||||
},
|
||||
{
|
||||
"typ": "img",
|
||||
"files": [
|
||||
"../imgs/histone_page-blocks"
|
||||
],
|
||||
"label": "histone_page-blocks",
|
||||
"caption": "",
|
||||
"where": "../yano//data/027Loss Optimization.txt"
|
||||
},
|
||||
{
|
||||
"typ": "img",
|
||||
"files": [
|
||||
"../imgs/histone_pima"
|
||||
],
|
||||
"label": "histone_pima",
|
||||
"caption": "",
|
||||
"where": "../yano//data/027Loss Optimization.txt"
|
||||
},
|
||||
{
|
||||
"typ": "img",
|
||||
"files": [
|
||||
"../prep/27loss/z_loss.pdf"
|
||||
],
|
||||
"label": "prep27lossz_losspdf",
|
||||
"caption": "",
|
||||
"where": "../yano//data/028loss.txt"
|
||||
},
|
||||
{
|
||||
"typ": "img",
|
||||
"files": [
|
||||
"../prep/28Robustness/z_robu.pdf"
|
||||
],
|
||||
"label": "prep28Robustnessz_robupdf",
|
||||
"caption": "",
|
||||
"where": "../yano//data/029Robustness.txt"
|
||||
},
|
||||
{
|
||||
"typ": "img",
|
||||
"files": [
|
||||
"../prep/29Distance_Correlation/z_dist.pdf"
|
||||
],
|
||||
"label": "prep29Distance_Correlationz_distpdf",
|
||||
"caption": "",
|
||||
"where": "../yano//data/030Distance Correlation.txt"
|
||||
},
|
||||
{
|
||||
"typ": "section",
|
||||
"title": "Conclusion",
|
||||
"label": "Conclusion",
|
||||
"file": "../yano//data/031Other.txt",
|
||||
"issec": true
|
||||
}
|
||||
]
|
|
@ -0,0 +1,258 @@
|
|||
\relax
|
||||
\providecommand\hyper@newdestlabel[2]{}
|
||||
\providecommand\HyperFirstAtBeginDocument{\AtBeginDocument}
|
||||
\HyperFirstAtBeginDocument{\ifx\hyper@anchor\@undefined
|
||||
\global\let\oldcontentsline\contentsline
|
||||
\gdef\contentsline#1#2#3#4{\oldcontentsline{#1}{#2}{#3}}
|
||||
\global\let\oldnewlabel\newlabel
|
||||
\gdef\newlabel#1#2{\newlabelxx{#1}#2}
|
||||
\gdef\newlabelxx#1#2#3#4#5#6{\oldnewlabel{#1}{{#2}{#3}}}
|
||||
\AtEndDocument{\ifx\hyper@anchor\@undefined
|
||||
\let\contentsline\oldcontentsline
|
||||
\let\newlabel\oldnewlabel
|
||||
\fi}
|
||||
\fi}
|
||||
\global\let\hyper@last\relax
|
||||
\gdef\HyperFirstAtBeginDocument#1{#1}
|
||||
\providecommand\HyField@AuxAddToFields[1]{}
|
||||
\providecommand\HyField@AuxAddToCoFields[2]{}
|
||||
\@writefile{nav}{\headcommand {\slideentry {0}{0}{1}{1/1}{}{0}}}
|
||||
\@writefile{nav}{\headcommand {\beamer@framepages {1}{1}}}
|
||||
\newlabel{Problem<1>}{{2}{2}{}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {Problem<1>}{2}}
|
||||
\newlabel{Problem}{{2}{2}{}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {Problem}{2}}
|
||||
\@writefile{nav}{\headcommand {\slideentry {0}{0}{2}{2/2}{}{0}}}
|
||||
\@writefile{nav}{\headcommand {\beamer@framepages {2}{2}}}
|
||||
\newlabel{Students<1>}{{3}{3}{}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {Students<1>}{3}}
|
||||
\newlabel{Students}{{3}{3}{}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {Students}{3}}
|
||||
\@writefile{nav}{\headcommand {\slideentry {0}{0}{3}{3/3}{}{0}}}
|
||||
\@writefile{nav}{\headcommand {\beamer@framepages {3}{3}}}
|
||||
\newlabel{yano<1>}{{4}{4}{}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {yano<1>}{4}}
|
||||
\newlabel{yano}{{4}{4}{}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {yano}{4}}
|
||||
\@writefile{nav}{\headcommand {\slideentry {0}{0}{4}{4/4}{}{0}}}
|
||||
\@writefile{nav}{\headcommand {\beamer@framepages {4}{4}}}
|
||||
\newlabel{yano<1>}{{5}{5}{}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {yano<1>}{5}}
|
||||
\newlabel{yano}{{5}{5}{}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {yano}{5}}
|
||||
\newlabel{fig:prep04yanoapng}{{5}{5}{}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {fig:prep04yanoapng}{5}}
|
||||
\@writefile{nav}{\headcommand {\slideentry {0}{0}{5}{5/5}{}{0}}}
|
||||
\@writefile{nav}{\headcommand {\beamer@framepages {5}{5}}}
|
||||
\@writefile{toc}{\beamer@sectionintoc {1}{Basics}{6}{0}{1}}
|
||||
\@writefile{nav}{\headcommand {\beamer@sectionpages {1}{5}}}
|
||||
\@writefile{nav}{\headcommand {\beamer@subsectionpages {1}{5}}}
|
||||
\@writefile{nav}{\headcommand {\sectionentry {1}{Basics}{6}{Basics}{0}}}
|
||||
\newlabel{sec:Basics}{{1}{6}{Basics}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {sec:Basics}{6}}
|
||||
\newlabel{selector<1>}{{6}{6}{Basics}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {selector<1>}{6}}
|
||||
\newlabel{selector}{{6}{6}{Basics}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {selector}{6}}
|
||||
\@writefile{nav}{\headcommand {\slideentry {1}{0}{1}{6/6}{}{0}}}
|
||||
\@writefile{nav}{\headcommand {\beamer@framepages {6}{6}}}
|
||||
\newlabel{selectors<1>}{{7}{7}{Basics}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {selectors<1>}{7}}
|
||||
\newlabel{selectors}{{7}{7}{Basics}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {selectors}{7}}
|
||||
\@writefile{nav}{\headcommand {\slideentry {1}{0}{2}{7/7}{}{0}}}
|
||||
\@writefile{nav}{\headcommand {\beamer@framepages {7}{7}}}
|
||||
\newlabel{iterating<1>}{{8}{8}{Basics}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {iterating<1>}{8}}
|
||||
\newlabel{iterating}{{8}{8}{Basics}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {iterating}{8}}
|
||||
\@writefile{nav}{\headcommand {\slideentry {1}{0}{3}{8/8}{}{0}}}
|
||||
\@writefile{nav}{\headcommand {\beamer@framepages {8}{8}}}
|
||||
\newlabel{iterating<1>}{{9}{9}{Basics}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {iterating<1>}{9}}
|
||||
\newlabel{iterating}{{9}{9}{Basics}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {iterating}{9}}
|
||||
\@writefile{nav}{\headcommand {\slideentry {1}{0}{4}{9/9}{}{0}}}
|
||||
\@writefile{nav}{\headcommand {\beamer@framepages {9}{9}}}
|
||||
\newlabel{pipeline<1>}{{10}{10}{Basics}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {pipeline<1>}{10}}
|
||||
\newlabel{pipeline}{{10}{10}{Basics}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {pipeline}{10}}
|
||||
\@writefile{nav}{\headcommand {\slideentry {1}{0}{5}{10/10}{}{0}}}
|
||||
\@writefile{nav}{\headcommand {\beamer@framepages {10}{10}}}
|
||||
\newlabel{pipeline<1>}{{11}{11}{Basics}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {pipeline<1>}{11}}
|
||||
\newlabel{pipeline}{{11}{11}{Basics}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {pipeline}{11}}
|
||||
\@writefile{nav}{\headcommand {\slideentry {1}{0}{6}{11/11}{}{0}}}
|
||||
\@writefile{nav}{\headcommand {\beamer@framepages {11}{11}}}
|
||||
\newlabel{CrossValidation<1>}{{12}{12}{Basics}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {CrossValidation<1>}{12}}
|
||||
\newlabel{CrossValidation}{{12}{12}{Basics}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {CrossValidation}{12}}
|
||||
\@writefile{nav}{\headcommand {\slideentry {1}{0}{7}{12/12}{}{0}}}
|
||||
\@writefile{nav}{\headcommand {\beamer@framepages {12}{12}}}
|
||||
\newlabel{Logging<1>}{{13}{13}{Basics}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {Logging<1>}{13}}
|
||||
\newlabel{Logging}{{13}{13}{Basics}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {Logging}{13}}
|
||||
\@writefile{nav}{\headcommand {\slideentry {1}{0}{8}{13/13}{}{0}}}
|
||||
\@writefile{nav}{\headcommand {\beamer@framepages {13}{13}}}
|
||||
\newlabel{Seeding<1>}{{14}{14}{Basics}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {Seeding<1>}{14}}
|
||||
\newlabel{Seeding}{{14}{14}{Basics}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {Seeding}{14}}
|
||||
\@writefile{nav}{\headcommand {\slideentry {1}{0}{9}{14/14}{}{0}}}
|
||||
\@writefile{nav}{\headcommand {\beamer@framepages {14}{14}}}
|
||||
\@writefile{nav}{\headcommand {\slideentry {1}{0}{10}{15/15}{}{0}}}
|
||||
\@writefile{nav}{\headcommand {\beamer@framepages {15}{15}}}
|
||||
\newlabel{statistics<1>}{{16}{16}{Basics}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {statistics<1>}{16}}
|
||||
\newlabel{statistics}{{16}{16}{Basics}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {statistics}{16}}
|
||||
\@writefile{nav}{\headcommand {\slideentry {1}{0}{11}{16/16}{}{0}}}
|
||||
\@writefile{nav}{\headcommand {\beamer@framepages {16}{16}}}
|
||||
\@writefile{toc}{\beamer@sectionintoc {2}{Experiments 1}{17}{0}{2}}
|
||||
\@writefile{nav}{\headcommand {\beamer@sectionpages {6}{16}}}
|
||||
\@writefile{nav}{\headcommand {\beamer@subsectionpages {6}{16}}}
|
||||
\@writefile{nav}{\headcommand {\sectionentry {2}{Experiments 1}{17}{Experiments 1}{0}}}
|
||||
\newlabel{sec:Experiments 1}{{2}{17}{Experiments 1}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {sec:Experiments 1}{17}}
|
||||
\newlabel{Extended Isolation Forests<1>}{{17}{17}{Experiments 1}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {Extended Isolation Forests<1>}{17}}
|
||||
\newlabel{Extended Isolation Forests}{{17}{17}{Experiments 1}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {Extended Isolation Forests}{17}}
|
||||
\newlabel{fig:ifor}{{17}{17}{Experiments 1}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {fig:ifor}{17}}
|
||||
\@writefile{nav}{\headcommand {\slideentry {2}{0}{1}{17/17}{}{0}}}
|
||||
\@writefile{nav}{\headcommand {\beamer@framepages {17}{17}}}
|
||||
\newlabel{Extended Isolation Forests<1>}{{18}{18}{Experiments 1}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {Extended Isolation Forests<1>}{18}}
|
||||
\newlabel{Extended Isolation Forests}{{18}{18}{Experiments 1}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {Extended Isolation Forests}{18}}
|
||||
\newlabel{fig:eifor}{{18}{18}{Experiments 1}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {fig:eifor}{18}}
|
||||
\@writefile{nav}{\headcommand {\slideentry {2}{0}{2}{18/18}{}{0}}}
|
||||
\@writefile{nav}{\headcommand {\beamer@framepages {18}{18}}}
|
||||
\newlabel{Extended Isolation Forests<1>}{{19}{19}{Experiments 1}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {Extended Isolation Forests<1>}{19}}
|
||||
\newlabel{Extended Isolation Forests}{{19}{19}{Experiments 1}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {Extended Isolation Forests}{19}}
|
||||
\newlabel{fig:qual}{{19}{19}{Experiments 1}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {fig:qual}{19}}
|
||||
\@writefile{nav}{\headcommand {\slideentry {2}{0}{3}{19/19}{}{0}}}
|
||||
\@writefile{nav}{\headcommand {\beamer@framepages {19}{19}}}
|
||||
\@writefile{nav}{\headcommand {\slideentry {2}{0}{4}{20/20}{}{0}}}
|
||||
\@writefile{nav}{\headcommand {\beamer@framepages {20}{20}}}
|
||||
\@writefile{nav}{\headcommand {\slideentry {2}{0}{5}{21/21}{}{0}}}
|
||||
\@writefile{nav}{\headcommand {\beamer@framepages {21}{21}}}
|
||||
\@writefile{nav}{\headcommand {\slideentry {2}{0}{6}{22/22}{}{0}}}
|
||||
\@writefile{nav}{\headcommand {\beamer@framepages {22}{22}}}
|
||||
\@writefile{toc}{\beamer@sectionintoc {3}{Experiments 2}{23}{0}{3}}
|
||||
\@writefile{nav}{\headcommand {\beamer@sectionpages {17}{22}}}
|
||||
\@writefile{nav}{\headcommand {\beamer@subsectionpages {17}{22}}}
|
||||
\@writefile{nav}{\headcommand {\sectionentry {3}{Experiments 2}{23}{Experiments 2}{0}}}
|
||||
\newlabel{sec:Experiments 2}{{3}{23}{Experiments 2}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {sec:Experiments 2}{23}}
|
||||
\newlabel{highdim<1>}{{23}{23}{Experiments 2}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {highdim<1>}{23}}
|
||||
\newlabel{highdim}{{23}{23}{Experiments 2}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {highdim}{23}}
|
||||
\newlabel{fig:prep19highdimapng}{{23}{23}{Experiments 2}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {fig:prep19highdimapng}{23}}
|
||||
\@writefile{nav}{\headcommand {\slideentry {3}{0}{1}{23/23}{}{0}}}
|
||||
\@writefile{nav}{\headcommand {\beamer@framepages {23}{23}}}
|
||||
\newlabel{New Condition<1>}{{24}{24}{Experiments 2}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {New Condition<1>}{24}}
|
||||
\newlabel{New Condition}{{24}{24}{Experiments 2}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {New Condition}{24}}
|
||||
\@writefile{nav}{\headcommand {\slideentry {3}{0}{2}{24/24}{}{0}}}
|
||||
\@writefile{nav}{\headcommand {\beamer@framepages {24}{24}}}
|
||||
\newlabel{New Models<1>}{{25}{25}{Experiments 2}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {New Models<1>}{25}}
|
||||
\newlabel{New Models}{{25}{25}{Experiments 2}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {New Models}{25}}
|
||||
\@writefile{nav}{\headcommand {\slideentry {3}{0}{3}{25/25}{}{0}}}
|
||||
\@writefile{nav}{\headcommand {\beamer@framepages {25}{25}}}
|
||||
\@writefile{nav}{\headcommand {\slideentry {3}{0}{4}{26/26}{}{0}}}
|
||||
\@writefile{nav}{\headcommand {\beamer@framepages {26}{26}}}
|
||||
\@writefile{nav}{\headcommand {\slideentry {3}{0}{5}{27/27}{}{0}}}
|
||||
\@writefile{nav}{\headcommand {\beamer@framepages {27}{27}}}
|
||||
\@writefile{nav}{\headcommand {\slideentry {3}{0}{6}{28/28}{}{0}}}
|
||||
\@writefile{nav}{\headcommand {\beamer@framepages {28}{28}}}
|
||||
\@writefile{toc}{\beamer@sectionintoc {4}{Experiments 3}{29}{0}{4}}
|
||||
\@writefile{nav}{\headcommand {\beamer@sectionpages {23}{28}}}
|
||||
\@writefile{nav}{\headcommand {\beamer@subsectionpages {23}{28}}}
|
||||
\@writefile{nav}{\headcommand {\sectionentry {4}{Experiments 3}{29}{Experiments 3}{0}}}
|
||||
\newlabel{sec:Experiments 3}{{4}{29}{Experiments 3}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {sec:Experiments 3}{29}}
|
||||
\newlabel{Unsupervised Optimization<1>}{{29}{29}{Experiments 3}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {Unsupervised Optimization<1>}{29}}
|
||||
\newlabel{Unsupervised Optimization}{{29}{29}{Experiments 3}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {Unsupervised Optimization}{29}}
|
||||
\@writefile{nav}{\headcommand {\slideentry {4}{0}{1}{29/29}{}{0}}}
|
||||
\@writefile{nav}{\headcommand {\beamer@framepages {29}{29}}}
|
||||
\newlabel{Loss Optimization<1>}{{30}{30}{Experiments 3}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {Loss Optimization<1>}{30}}
|
||||
\newlabel{Loss Optimization}{{30}{30}{Experiments 3}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {Loss Optimization}{30}}
|
||||
\newlabel{fig:histone_page-blocks}{{30}{30}{Experiments 3}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {fig:histone_page-blocks}{30}}
|
||||
\@writefile{nav}{\headcommand {\slideentry {4}{0}{2}{30/30}{}{0}}}
|
||||
\@writefile{nav}{\headcommand {\beamer@framepages {30}{30}}}
|
||||
\newlabel{Loss Optimization<1>}{{31}{31}{Experiments 3}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {Loss Optimization<1>}{31}}
|
||||
\newlabel{Loss Optimization}{{31}{31}{Experiments 3}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {Loss Optimization}{31}}
|
||||
\newlabel{fig:histone_pima}{{31}{31}{Experiments 3}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {fig:histone_pima}{31}}
|
||||
\@writefile{nav}{\headcommand {\slideentry {4}{0}{3}{31/31}{}{0}}}
|
||||
\@writefile{nav}{\headcommand {\beamer@framepages {31}{31}}}
|
||||
\newlabel{loss<1>}{{32}{32}{Experiments 3}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {loss<1>}{32}}
|
||||
\newlabel{loss}{{32}{32}{Experiments 3}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {loss}{32}}
|
||||
\newlabel{fig:prep27lossz_losspdf}{{32}{32}{Experiments 3}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {fig:prep27lossz_losspdf}{32}}
|
||||
\@writefile{nav}{\headcommand {\slideentry {4}{0}{4}{32/32}{}{0}}}
|
||||
\@writefile{nav}{\headcommand {\beamer@framepages {32}{32}}}
|
||||
\newlabel{Robustness<1>}{{33}{33}{Experiments 3}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {Robustness<1>}{33}}
|
||||
\newlabel{Robustness}{{33}{33}{Experiments 3}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {Robustness}{33}}
|
||||
\newlabel{fig:prep28Robustnessz_robupdf}{{33}{33}{Experiments 3}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {fig:prep28Robustnessz_robupdf}{33}}
|
||||
\@writefile{nav}{\headcommand {\slideentry {4}{0}{5}{33/33}{}{0}}}
|
||||
\@writefile{nav}{\headcommand {\beamer@framepages {33}{33}}}
|
||||
\newlabel{Distance Correlation<1>}{{34}{34}{Experiments 3}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {Distance Correlation<1>}{34}}
|
||||
\newlabel{Distance Correlation}{{34}{34}{Experiments 3}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {Distance Correlation}{34}}
|
||||
\newlabel{fig:prep29Distance_Correlationz_distpdf}{{34}{34}{Experiments 3}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {fig:prep29Distance_Correlationz_distpdf}{34}}
|
||||
\@writefile{nav}{\headcommand {\slideentry {4}{0}{6}{34/34}{}{0}}}
|
||||
\@writefile{nav}{\headcommand {\beamer@framepages {34}{34}}}
|
||||
\@writefile{toc}{\beamer@sectionintoc {5}{Conclusion}{35}{0}{5}}
|
||||
\@writefile{nav}{\headcommand {\beamer@sectionpages {29}{34}}}
|
||||
\@writefile{nav}{\headcommand {\beamer@subsectionpages {29}{34}}}
|
||||
\@writefile{nav}{\headcommand {\sectionentry {5}{Conclusion}{35}{Conclusion}{0}}}
|
||||
\newlabel{sec:Conclusion}{{5}{35}{Conclusion}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {sec:Conclusion}{35}}
|
||||
\newlabel{Other<1>}{{35}{35}{Conclusion}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {Other<1>}{35}}
|
||||
\newlabel{Other}{{35}{35}{Conclusion}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {Other}{35}}
|
||||
\@writefile{nav}{\headcommand {\slideentry {5}{0}{1}{35/35}{}{0}}}
|
||||
\@writefile{nav}{\headcommand {\beamer@framepages {35}{35}}}
|
||||
\newlabel{Feedback<1>}{{36}{36}{Conclusion}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {Feedback<1>}{36}}
|
||||
\newlabel{Feedback}{{36}{36}{Conclusion}{Doc-Start}{}}
|
||||
\@writefile{snm}{\beamer@slide {Feedback}{36}}
|
||||
\@writefile{nav}{\headcommand {\slideentry {5}{0}{2}{36/36}{}{0}}}
|
||||
\@writefile{nav}{\headcommand {\beamer@framepages {36}{36}}}
|
||||
\@writefile{nav}{\headcommand {\beamer@partpages {1}{36}}}
|
||||
\@writefile{nav}{\headcommand {\beamer@subsectionpages {35}{36}}}
|
||||
\@writefile{nav}{\headcommand {\beamer@sectionpages {35}{36}}}
|
||||
\@writefile{nav}{\headcommand {\beamer@documentpages {36}}}
|
||||
\@writefile{nav}{\headcommand {\gdef \inserttotalframenumber {36}}}
|
||||
\gdef \@abspage@last{36}
|
File diff suppressed because it is too large
Load Diff
|
@ -0,0 +1,92 @@
|
|||
\headcommand {\slideentry {0}{0}{1}{1/1}{}{0}}
|
||||
\headcommand {\beamer@framepages {1}{1}}
|
||||
\headcommand {\slideentry {0}{0}{2}{2/2}{}{0}}
|
||||
\headcommand {\beamer@framepages {2}{2}}
|
||||
\headcommand {\slideentry {0}{0}{3}{3/3}{}{0}}
|
||||
\headcommand {\beamer@framepages {3}{3}}
|
||||
\headcommand {\slideentry {0}{0}{4}{4/4}{}{0}}
|
||||
\headcommand {\beamer@framepages {4}{4}}
|
||||
\headcommand {\slideentry {0}{0}{5}{5/5}{}{0}}
|
||||
\headcommand {\beamer@framepages {5}{5}}
|
||||
\headcommand {\beamer@sectionpages {1}{5}}
|
||||
\headcommand {\beamer@subsectionpages {1}{5}}
|
||||
\headcommand {\sectionentry {1}{Basics}{6}{Basics}{0}}
|
||||
\headcommand {\slideentry {1}{0}{1}{6/6}{}{0}}
|
||||
\headcommand {\beamer@framepages {6}{6}}
|
||||
\headcommand {\slideentry {1}{0}{2}{7/7}{}{0}}
|
||||
\headcommand {\beamer@framepages {7}{7}}
|
||||
\headcommand {\slideentry {1}{0}{3}{8/8}{}{0}}
|
||||
\headcommand {\beamer@framepages {8}{8}}
|
||||
\headcommand {\slideentry {1}{0}{4}{9/9}{}{0}}
|
||||
\headcommand {\beamer@framepages {9}{9}}
|
||||
\headcommand {\slideentry {1}{0}{5}{10/10}{}{0}}
|
||||
\headcommand {\beamer@framepages {10}{10}}
|
||||
\headcommand {\slideentry {1}{0}{6}{11/11}{}{0}}
|
||||
\headcommand {\beamer@framepages {11}{11}}
|
||||
\headcommand {\slideentry {1}{0}{7}{12/12}{}{0}}
|
||||
\headcommand {\beamer@framepages {12}{12}}
|
||||
\headcommand {\slideentry {1}{0}{8}{13/13}{}{0}}
|
||||
\headcommand {\beamer@framepages {13}{13}}
|
||||
\headcommand {\slideentry {1}{0}{9}{14/14}{}{0}}
|
||||
\headcommand {\beamer@framepages {14}{14}}
|
||||
\headcommand {\slideentry {1}{0}{10}{15/15}{}{0}}
|
||||
\headcommand {\beamer@framepages {15}{15}}
|
||||
\headcommand {\slideentry {1}{0}{11}{16/16}{}{0}}
|
||||
\headcommand {\beamer@framepages {16}{16}}
|
||||
\headcommand {\beamer@sectionpages {6}{16}}
|
||||
\headcommand {\beamer@subsectionpages {6}{16}}
|
||||
\headcommand {\sectionentry {2}{Experiments 1}{17}{Experiments 1}{0}}
|
||||
\headcommand {\slideentry {2}{0}{1}{17/17}{}{0}}
|
||||
\headcommand {\beamer@framepages {17}{17}}
|
||||
\headcommand {\slideentry {2}{0}{2}{18/18}{}{0}}
|
||||
\headcommand {\beamer@framepages {18}{18}}
|
||||
\headcommand {\slideentry {2}{0}{3}{19/19}{}{0}}
|
||||
\headcommand {\beamer@framepages {19}{19}}
|
||||
\headcommand {\slideentry {2}{0}{4}{20/20}{}{0}}
|
||||
\headcommand {\beamer@framepages {20}{20}}
|
||||
\headcommand {\slideentry {2}{0}{5}{21/21}{}{0}}
|
||||
\headcommand {\beamer@framepages {21}{21}}
|
||||
\headcommand {\slideentry {2}{0}{6}{22/22}{}{0}}
|
||||
\headcommand {\beamer@framepages {22}{22}}
|
||||
\headcommand {\beamer@sectionpages {17}{22}}
|
||||
\headcommand {\beamer@subsectionpages {17}{22}}
|
||||
\headcommand {\sectionentry {3}{Experiments 2}{23}{Experiments 2}{0}}
|
||||
\headcommand {\slideentry {3}{0}{1}{23/23}{}{0}}
|
||||
\headcommand {\beamer@framepages {23}{23}}
|
||||
\headcommand {\slideentry {3}{0}{2}{24/24}{}{0}}
|
||||
\headcommand {\beamer@framepages {24}{24}}
|
||||
\headcommand {\slideentry {3}{0}{3}{25/25}{}{0}}
|
||||
\headcommand {\beamer@framepages {25}{25}}
|
||||
\headcommand {\slideentry {3}{0}{4}{26/26}{}{0}}
|
||||
\headcommand {\beamer@framepages {26}{26}}
|
||||
\headcommand {\slideentry {3}{0}{5}{27/27}{}{0}}
|
||||
\headcommand {\beamer@framepages {27}{27}}
|
||||
\headcommand {\slideentry {3}{0}{6}{28/28}{}{0}}
|
||||
\headcommand {\beamer@framepages {28}{28}}
|
||||
\headcommand {\beamer@sectionpages {23}{28}}
|
||||
\headcommand {\beamer@subsectionpages {23}{28}}
|
||||
\headcommand {\sectionentry {4}{Experiments 3}{29}{Experiments 3}{0}}
|
||||
\headcommand {\slideentry {4}{0}{1}{29/29}{}{0}}
|
||||
\headcommand {\beamer@framepages {29}{29}}
|
||||
\headcommand {\slideentry {4}{0}{2}{30/30}{}{0}}
|
||||
\headcommand {\beamer@framepages {30}{30}}
|
||||
\headcommand {\slideentry {4}{0}{3}{31/31}{}{0}}
|
||||
\headcommand {\beamer@framepages {31}{31}}
|
||||
\headcommand {\slideentry {4}{0}{4}{32/32}{}{0}}
|
||||
\headcommand {\beamer@framepages {32}{32}}
|
||||
\headcommand {\slideentry {4}{0}{5}{33/33}{}{0}}
|
||||
\headcommand {\beamer@framepages {33}{33}}
|
||||
\headcommand {\slideentry {4}{0}{6}{34/34}{}{0}}
|
||||
\headcommand {\beamer@framepages {34}{34}}
|
||||
\headcommand {\beamer@sectionpages {29}{34}}
|
||||
\headcommand {\beamer@subsectionpages {29}{34}}
|
||||
\headcommand {\sectionentry {5}{Conclusion}{35}{Conclusion}{0}}
|
||||
\headcommand {\slideentry {5}{0}{1}{35/35}{}{0}}
|
||||
\headcommand {\beamer@framepages {35}{35}}
|
||||
\headcommand {\slideentry {5}{0}{2}{36/36}{}{0}}
|
||||
\headcommand {\beamer@framepages {36}{36}}
|
||||
\headcommand {\beamer@partpages {1}{36}}
|
||||
\headcommand {\beamer@subsectionpages {35}{36}}
|
||||
\headcommand {\beamer@sectionpages {35}{36}}
|
||||
\headcommand {\beamer@documentpages {36}}
|
||||
\headcommand {\gdef \inserttotalframenumber {36}}
|
|
@ -0,0 +1,5 @@
|
|||
\BOOKMARK [2][]{Outline0.1}{\376\377\000B\000a\000s\000i\000c\000s}{}% 1
|
||||
\BOOKMARK [2][]{Outline0.2}{\376\377\000E\000x\000p\000e\000r\000i\000m\000e\000n\000t\000s\000\040\0001}{}% 2
|
||||
\BOOKMARK [2][]{Outline0.3}{\376\377\000E\000x\000p\000e\000r\000i\000m\000e\000n\000t\000s\000\040\0002}{}% 3
|
||||
\BOOKMARK [2][]{Outline0.4}{\376\377\000E\000x\000p\000e\000r\000i\000m\000e\000n\000t\000s\000\040\0003}{}% 4
|
||||
\BOOKMARK [2][]{Outline0.5}{\376\377\000C\000o\000n\000c\000l\000u\000s\000i\000o\000n}{}% 5
|
Binary file not shown.
|
@ -0,0 +1,71 @@
|
|||
\beamer@slide {Problem<1>}{2}
|
||||
\beamer@slide {Problem}{2}
|
||||
\beamer@slide {Students<1>}{3}
|
||||
\beamer@slide {Students}{3}
|
||||
\beamer@slide {yano<1>}{4}
|
||||
\beamer@slide {yano}{4}
|
||||
\beamer@slide {yano<1>}{5}
|
||||
\beamer@slide {yano}{5}
|
||||
\beamer@slide {fig:prep04yanoapng}{5}
|
||||
\beamer@slide {sec:Basics}{6}
|
||||
\beamer@slide {selector<1>}{6}
|
||||
\beamer@slide {selector}{6}
|
||||
\beamer@slide {selectors<1>}{7}
|
||||
\beamer@slide {selectors}{7}
|
||||
\beamer@slide {iterating<1>}{8}
|
||||
\beamer@slide {iterating}{8}
|
||||
\beamer@slide {iterating<1>}{9}
|
||||
\beamer@slide {iterating}{9}
|
||||
\beamer@slide {pipeline<1>}{10}
|
||||
\beamer@slide {pipeline}{10}
|
||||
\beamer@slide {pipeline<1>}{11}
|
||||
\beamer@slide {pipeline}{11}
|
||||
\beamer@slide {CrossValidation<1>}{12}
|
||||
\beamer@slide {CrossValidation}{12}
|
||||
\beamer@slide {Logging<1>}{13}
|
||||
\beamer@slide {Logging}{13}
|
||||
\beamer@slide {Seeding<1>}{14}
|
||||
\beamer@slide {Seeding}{14}
|
||||
\beamer@slide {statistics<1>}{16}
|
||||
\beamer@slide {statistics}{16}
|
||||
\beamer@slide {sec:Experiments 1}{17}
|
||||
\beamer@slide {Extended Isolation Forests<1>}{17}
|
||||
\beamer@slide {Extended Isolation Forests}{17}
|
||||
\beamer@slide {fig:ifor}{17}
|
||||
\beamer@slide {Extended Isolation Forests<1>}{18}
|
||||
\beamer@slide {Extended Isolation Forests}{18}
|
||||
\beamer@slide {fig:eifor}{18}
|
||||
\beamer@slide {Extended Isolation Forests<1>}{19}
|
||||
\beamer@slide {Extended Isolation Forests}{19}
|
||||
\beamer@slide {fig:qual}{19}
|
||||
\beamer@slide {sec:Experiments 2}{23}
|
||||
\beamer@slide {highdim<1>}{23}
|
||||
\beamer@slide {highdim}{23}
|
||||
\beamer@slide {fig:prep19highdimapng}{23}
|
||||
\beamer@slide {New Condition<1>}{24}
|
||||
\beamer@slide {New Condition}{24}
|
||||
\beamer@slide {New Models<1>}{25}
|
||||
\beamer@slide {New Models}{25}
|
||||
\beamer@slide {sec:Experiments 3}{29}
|
||||
\beamer@slide {Unsupervised Optimization<1>}{29}
|
||||
\beamer@slide {Unsupervised Optimization}{29}
|
||||
\beamer@slide {Loss Optimization<1>}{30}
|
||||
\beamer@slide {Loss Optimization}{30}
|
||||
\beamer@slide {fig:histone_page-blocks}{30}
|
||||
\beamer@slide {Loss Optimization<1>}{31}
|
||||
\beamer@slide {Loss Optimization}{31}
|
||||
\beamer@slide {fig:histone_pima}{31}
|
||||
\beamer@slide {loss<1>}{32}
|
||||
\beamer@slide {loss}{32}
|
||||
\beamer@slide {fig:prep27lossz_losspdf}{32}
|
||||
\beamer@slide {Robustness<1>}{33}
|
||||
\beamer@slide {Robustness}{33}
|
||||
\beamer@slide {fig:prep28Robustnessz_robupdf}{33}
|
||||
\beamer@slide {Distance Correlation<1>}{34}
|
||||
\beamer@slide {Distance Correlation}{34}
|
||||
\beamer@slide {fig:prep29Distance_Correlationz_distpdf}{34}
|
||||
\beamer@slide {sec:Conclusion}{35}
|
||||
\beamer@slide {Other<1>}{35}
|
||||
\beamer@slide {Other}{35}
|
||||
\beamer@slide {Feedback<1>}{36}
|
||||
\beamer@slide {Feedback}{36}
|
File diff suppressed because it is too large
Load Diff
|
@ -0,0 +1,5 @@
|
|||
\beamer@sectionintoc {1}{Basics}{6}{0}{1}
|
||||
\beamer@sectionintoc {2}{Experiments 1}{17}{0}{2}
|
||||
\beamer@sectionintoc {3}{Experiments 2}{23}{0}{3}
|
||||
\beamer@sectionintoc {4}{Experiments 3}{29}{0}{4}
|
||||
\beamer@sectionintoc {5}{Conclusion}{35}{0}{5}
|
|
@ -0,0 +1 @@
|
|||
<titlepage>
|
|
@ -0,0 +1,10 @@
|
|||
Paper with Benedikt
|
||||
require multiple very specific datasets
|
||||
<l2st>
|
||||
many but not to many features
|
||||
at least some samples (for the NN)
|
||||
Only numerical attributes best
|
||||
specific quality
|
||||
unrelated datasets
|
||||
</l2st>
|
||||
Requires you to search for many datasets and filter them
|
|
@ -0,0 +1,6 @@
|
|||
Not clear what you can use
|
||||
Many different formats
|
||||
train/test splits
|
||||
So for Students I just do this work and send them archives directly
|
||||
->Not a good solution
|
||||
|
|
@ -0,0 +1,8 @@
|
|||
So I have been packaging all my scripts
|
||||
I had surprisingly much fun doing this
|
||||
<l2st>
|
||||
More than just standard functions
|
||||
A couple of weird decisions
|
||||
And this will likely grow further
|
||||
</l2st>
|
||||
->So I would like to discuss some parts with you and maybe you even have more features you might want
|
Binary file not shown.
After Width: | Height: | Size: 16 KiB |
|
@ -0,0 +1,6 @@
|
|||
Simply install it over pip
|
||||
Contains 187 real-World Datasets
|
||||
->biggest library of datasets explicitely for anomaly detection
|
||||
not yet happy with this
|
||||
especially only mostly contains numerical and nominal attributes
|
||||
->few categorical and no time-series attributes
|
|
@ -0,0 +1,17 @@
|
|||
<code>
|
||||
import yano
|
||||
from yano.symbols import *
|
||||
|
||||
|
||||
condition= (number_of_features>5) &
|
||||
(number_of_features<100) &
|
||||
(number_of_samples>100) &
|
||||
(number_of_samples<10000) &
|
||||
(number_of_samples>2*number_of_features) &
|
||||
~index
|
||||
|
||||
print(len(condition), "Datasets found")
|
||||
|
||||
|
||||
</code>
|
||||
->33 Datasets found
|
|
@ -0,0 +1,26 @@
|
|||
Lots of symbols like this
|
||||
<l2st>
|
||||
name
|
||||
number\_of\_features
|
||||
number\_of\_samples
|
||||
index (correlated datasets)
|
||||
</l2st>
|
||||
Feature types
|
||||
<l2st>
|
||||
numeric
|
||||
nominal
|
||||
categorical
|
||||
(textual)
|
||||
</l2st>
|
||||
Count based
|
||||
<l2st>
|
||||
number\_anomalies
|
||||
number\_normals
|
||||
fraction\_anomalies
|
||||
</l2st>
|
||||
Specific ones
|
||||
<l2st>
|
||||
image\_based
|
||||
(linearly\_seperable)
|
||||
</l2st>
|
||||
|
|
@ -0,0 +1,14 @@
|
|||
<code>
|
||||
for dataset in condition:
|
||||
print(condition)
|
||||
|
||||
|
||||
</code>
|
||||
<l2st>
|
||||
<e>\[annthyroid\]</e>
|
||||
<e>\[breastw\]</e>
|
||||
<e>\[cardio\]</e>
|
||||
<e>\[...\]</e>
|
||||
<e>\[Housing\_low\]</e>
|
||||
</l2st>
|
||||
|
|
@ -0,0 +1,8 @@
|
|||
<code>
|
||||
|
||||
for dataset in condition:
|
||||
x=dataset.getx()
|
||||
y=dataset.gety()
|
||||
|
||||
|
||||
</code>
|
|
@ -0,0 +1,15 @@
|
|||
<code>
|
||||
|
||||
from yano.iter import *
|
||||
|
||||
for dataset, x,tx,ty in pipeline(condition,
|
||||
split,
|
||||
shuffle,
|
||||
normalize("minmax")):
|
||||
...
|
||||
|
||||
|
||||
|
||||
|
||||
</code>
|
||||
|
|
@ -0,0 +1,12 @@
|
|||
Again there are a couple modifiers possible
|
||||
<l2st>
|
||||
nonconst->remove constant features
|
||||
shuffle
|
||||
normalize('zscore'/'minmax')
|
||||
cut(10)->at most 10 datasets
|
||||
split->train test split, all anomalies in test set
|
||||
crossval(5)->similar to split, but do multiple times (crossvalidation)
|
||||
</l2st>
|
||||
modifiers interact with each other
|
||||
For example: normalize('minmax'), split
|
||||
->train set always below 1, but no guarantees for the test set
|
|
@ -0,0 +1,14 @@
|
|||
Learned from DMC: Crossvalidation is important
|
||||
Rarely found in Anomaly Detection, why?
|
||||
A bit more complicated (not all samples are equal), but no reason why not
|
||||
->So I implemented it into yano
|
||||
<l2st>
|
||||
folding only on normal data
|
||||
How to handle anomalies?
|
||||
If not folding them, cross-validation less useful
|
||||
if folding them, often rare anomalies even more rare
|
||||
->test set always 50\% anomalous
|
||||
->Also improves simple evaluation metrics (accuracy)
|
||||
</l2st>
|
||||
Do you know a reason why Cross Validation is not common in AD?
|
||||
Are there Problems with the way I fold my Anomalies?
|
|
@ -0,0 +1,21 @@
|
|||
<code>
|
||||
from yano.logging import Logger
|
||||
from pyod.models.iforest import IForest
|
||||
from extended_iforest import train_extended_ifor
|
||||
|
||||
l=Logger({"IFor":IForest(n_estimators=100),
|
||||
"eIFor":train_extended_ifor})
|
||||
|
||||
for dataset, folds in pipeline(condition,
|
||||
crossval(5),
|
||||
normalize("minmax"),
|
||||
shuffle):
|
||||
l.run_cross(dataset, folds)
|
||||
|
||||
latex=l.to_latex()
|
||||
|
||||
|
||||
</code>
|
||||
|
||||
|
||||
|
|
@ -0,0 +1,5 @@
|
|||
If you dont do anything, everything is seeded.
|
||||
Makes rerunning a Model until the performance is good quite obvious.
|
||||
But as every Run is seeded itself, this might induce bias.
|
||||
Do you think this is worth it?
|
||||
Are there any Problems with this?
|
|
@ -0,0 +1,21 @@
|
|||
\begin{tabular}{lll}
|
||||
\hline
|
||||
Dataset & eIFor & IFor \\
|
||||
\hline
|
||||
$pc3$ & $\textbf{0.7231} \pm 0.0153$ & $\textbf{0.7223} \pm 0.0178$ \\
|
||||
$pima$ & $\textbf{0.7405} \pm 0.0110$ & $\textbf{0.7347} \pm 0.0126$ \\
|
||||
$Diabetes\_present$ & $\textbf{0.7414} \pm 0.0195$ & $\textbf{0.7344} \pm 0.0242$ \\
|
||||
$waveform-5000$ & $\textbf{0.7687} \pm 0.0123$ & $\textbf{0.7592} \pm 0.0206$ \\
|
||||
$vowels$ & $\textbf{0.7843} \pm 0.0298$ & $\textbf{0.7753} \pm 0.0334$ \\
|
||||
$Vowel\_0$ & $\textbf{0.8425} \pm 0.0698$ & $0.7193 \pm 0.0817$ \\
|
||||
$Abalone\_1\_8$ & $\textbf{0.8525} \pm 0.0263$ & $0.8452 \pm 0.0257$ \\
|
||||
$annthyroid$ & $0.8399 \pm 0.0135$ & $\textbf{0.9087} \pm 0.0090$ \\
|
||||
$Vehicle\_van$ & $\textbf{0.8792} \pm 0.0265$ & $\textbf{0.8697} \pm 0.0383$ \\
|
||||
$ionosphere$ & $\textbf{0.9320} \pm 0.0069$ & $0.9086 \pm 0.0142$ \\
|
||||
$breastw$ & $\textbf{0.9948} \pm 0.0031$ & $\textbf{0.9952} \pm 0.0033$ \\
|
||||
$segment$ & $\textbf{1.0}$ & $\textbf{0.9993} \pm 0.0015$ \\
|
||||
$$ & $$ & $$ \\
|
||||
$Average$ & $\textbf{0.8005}$ & $\textbf{0.7957}$ \\
|
||||
\hline
|
||||
\end{tabular}
|
||||
|
|
@ -0,0 +1,7 @@
|
|||
Friedman test to see if there is a difference between models
|
||||
Nemenyi test to see which models are equal, mark those equal to the maximum
|
||||
For 2 models, Friedman not defined -> use Wilcoxon test
|
||||
|
||||
Does this match your expectation from the table?
|
||||
Two models are 'equal' if their probability of being from the same distribution is #LessThan(p_b,p)#, what value should #Eq(p_b,0.1)# have?
|
||||
Do I need to correct for p hacking (n experiments, so increase the difficulty for each, or is that clear from the table)
|
|
@ -0,0 +1,9 @@
|
|||
Isolation Forests are one algorithm for AD
|
||||
Tries to isolate abnormal (rare) points instead of modelling normal ones
|
||||
Creative approach->fairly successful (3000 Citations)
|
||||
Many follow up papers
|
||||
Extended Isolation Forest (Hariri et. al. 2018, 140 Citations)
|
||||
Remove bias from the Isolation Forests
|
||||
Also claim to improve their anomaly detection quality
|
||||
(repeat with both cuts and ad quality)
|
||||
|
|
@ -0,0 +1,19 @@
|
|||
\begin{tabular}{lll}
|
||||
\hline
|
||||
Dataset & eIFor & IFor \\
|
||||
\hline
|
||||
$Delft\_pump\_5x3\_noisy$ & $\textbf{0.3893} \pm 0.0345$ & $\textbf{0.4272} \pm 0.0680$ \\
|
||||
$vertebral$ & $\textbf{0.4260} \pm 0.0111$ & $\textbf{0.4554} \pm 0.0416$ \\
|
||||
$Liver\_1$ & $0.5367 \pm 0.0508$ & $\textbf{0.5474} \pm 0.0541$ \\
|
||||
$Sonar\_mines$ & $\textbf{0.6882} \pm 0.1264$ & $0.6189 \pm 0.1301$ \\
|
||||
$letter$ & $\textbf{0.6756} \pm 0.0119$ & $0.6471 \pm 0.0111$ \\
|
||||
$Glass\_building\_float$ & $\textbf{0.6480} \pm 0.1012$ & $\textbf{0.6755} \pm 0.1117$ \\
|
||||
$pc3$ & $\textbf{0.7231} \pm 0.0153$ & $\textbf{0.7223} \pm 0.0178$ \\
|
||||
$pima$ & $\textbf{0.7405} \pm 0.0110$ & $\textbf{0.7347} \pm 0.0126$ \\
|
||||
$Diabetes\_present$ & $\textbf{0.7414} \pm 0.0195$ & $\textbf{0.7344} \pm 0.0242$ \\
|
||||
$waveform-5000$ & $\textbf{0.7687} \pm 0.0123$ & $\textbf{0.7592} \pm 0.0206$ \\
|
||||
$steel-plates-fault$ & $\textbf{0.7735} \pm 0.0351$ & $\textbf{0.7682} \pm 0.0402$ \\
|
||||
$vowels$ & $\textbf{0.7843} \pm 0.0298$ & $\textbf{0.7753} \pm 0.0334$ \\
|
||||
\hline
|
||||
\end{tabular}
|
||||
|
|
@ -0,0 +1,19 @@
|
|||
\begin{tabular}{lll}
|
||||
\hline
|
||||
Dataset & eIFor & IFor \\
|
||||
\hline
|
||||
$Vowel\_0$ & $\textbf{0.8425} \pm 0.0698$ & $0.7193 \pm 0.0817$ \\
|
||||
$Housing\_low$ & $\textbf{0.7807} \pm 0.0333$ & $\textbf{0.7862} \pm 0.0336$ \\
|
||||
$ozone-level-8hr$ & $\textbf{0.7904} \pm 0.0207$ & $\textbf{0.7768} \pm 0.0118$ \\
|
||||
$Spectf\_0$ & $\textbf{0.8155} \pm 0.0255$ & $0.7535 \pm 0.0239$ \\
|
||||
$HeartC$ & $0.7795 \pm 0.0258$ & $\textbf{0.8079} \pm 0.0255$ \\
|
||||
$satellite$ & $\textbf{0.8125} \pm 0.0170$ & $\textbf{0.8103} \pm 0.0061$ \\
|
||||
$optdigits$ & $\textbf{0.8099} \pm 0.0310$ & $\textbf{0.8142} \pm 0.0267$ \\
|
||||
$spambase$ & $\textbf{0.8085} \pm 0.0110$ & $\textbf{0.8202} \pm 0.0042$ \\
|
||||
$Abalone\_1\_8$ & $\textbf{0.8525} \pm 0.0263$ & $0.8452 \pm 0.0257$ \\
|
||||
$qsar-biodeg$ & $\textbf{0.8584} \pm 0.0119$ & $\textbf{0.8628} \pm 0.0135$ \\
|
||||
$annthyroid$ & $0.8399 \pm 0.0135$ & $\textbf{0.9087} \pm 0.0090$ \\
|
||||
$Vehicle\_van$ & $\textbf{0.8792} \pm 0.0265$ & $\textbf{0.8697} \pm 0.0383$ \\
|
||||
\hline
|
||||
\end{tabular}
|
||||
|
|
@ -0,0 +1,18 @@
|
|||
\begin{tabular}{lll}
|
||||
\hline
|
||||
Dataset & eIFor & IFor \\
|
||||
\hline
|
||||
$ionosphere$ & $\textbf{0.9320} \pm 0.0069$ & $0.9086 \pm 0.0142$ \\
|
||||
$page-blocks$ & $0.9189 \pm 0.0061$ & $\textbf{0.9299} \pm 0.0016$ \\
|
||||
$Ecoli$ & $\textbf{0.9418} \pm 0.0292$ & $0.9192 \pm 0.0332$ \\
|
||||
$cardio$ & $\textbf{0.9564} \pm 0.0043$ & $\textbf{0.9535} \pm 0.0036$ \\
|
||||
$wbc$ & $\textbf{0.9611} \pm 0.0121$ & $\textbf{0.9607} \pm 0.0107$ \\
|
||||
$pendigits$ & $\textbf{0.9641} \pm 0.0097$ & $\textbf{0.9652} \pm 0.0076$ \\
|
||||
$thyroid$ & $0.9818 \pm 0.0024$ & $\textbf{0.9871} \pm 0.0025$ \\
|
||||
$breastw$ & $\textbf{0.9948} \pm 0.0031$ & $\textbf{0.9952} \pm 0.0033$ \\
|
||||
$segment$ & $\textbf{1.0}$ & $\textbf{0.9993} \pm 0.0015$ \\
|
||||
$$ & $$ & $$ \\
|
||||
$Average$ & $\textbf{0.8005} \pm 0.1458$ & $\textbf{0.7957} \pm 0.1431$ \\
|
||||
\hline
|
||||
\end{tabular}
|
||||
|
Binary file not shown.
After Width: | Height: | Size: 421 KiB |
|
@ -0,0 +1,13 @@
|
|||
<code>
|
||||
|
||||
condition= (number_of_samples>200) &
|
||||
(number_of_samples<10000) &
|
||||
(number_of_features>50) &
|
||||
(number_of_features<500) &
|
||||
~index
|
||||
|
||||
print(len(condition),"Datasets found")
|
||||
|
||||
|
||||
</code>
|
||||
->13 Datasets found
|
|
@ -0,0 +1,13 @@
|
|||
<code>
|
||||
from pyod.models.iforest import IForest
|
||||
from pyod.models.knn import KNN
|
||||
from pyod.models.lof import LOF
|
||||
|
||||
|
||||
l=Logger({"IFor":Iforest(n_estimators=100),
|
||||
"Lof":LOF(),
|
||||
"Knn": KNN()}, addfeat=True)
|
||||
|
||||
|
||||
|
||||
</code>
|
|
@ -0,0 +1,21 @@
|
|||
\begin{tabular}{llll}
|
||||
\hline
|
||||
Dataset & Knn & Lof & IFor \\
|
||||
\hline
|
||||
$Delft\_pump\_5x3\_noisy(64)$ & $0.3800 \pm 0.0475$ & $0.3462 \pm 0.0327$ & $\textbf{0.4272} \pm 0.0680$ \\
|
||||
$hill-valley(100)$ & $0.4744 \pm 0.0269$ & $\textbf{0.5060} \pm 0.0327$ & $0.4720 \pm 0.0288$ \\
|
||||
$speech(400)$ & $0.4903 \pm 0.0103$ & $\textbf{0.5104} \pm 0.0115$ & $0.4872 \pm 0.0184$ \\
|
||||
$Sonar\_mines(60)$ & $\textbf{0.7284} \pm 0.0939$ & $0.6769 \pm 0.0933$ & $0.6189 \pm 0.1301$ \\
|
||||
$ozone-level-8hr(72)$ & $\textbf{0.8051} \pm 0.0288$ & $0.7738 \pm 0.0292$ & $\textbf{0.7768} \pm 0.0118$ \\
|
||||
$spambase(57)$ & $0.8038 \pm 0.0125$ & $0.7712 \pm 0.0055$ & $\textbf{0.8202} \pm 0.0042$ \\
|
||||
$arrhythmia(274)$ & $\textbf{0.8137} \pm 0.0185$ & $0.8042 \pm 0.0186$ & $\textbf{0.8086} \pm 0.0099$ \\
|
||||
$mnist(100)$ & $0.9345 \pm 0.0039$ & $\textbf{0.9548} \pm 0.0037$ & $0.8732 \pm 0.0069$ \\
|
||||
$Concordia3\_32(256)$ & $0.9246 \pm 0.0107$ & $\textbf{0.9486} \pm 0.0099$ & $\textbf{0.9322} \pm 0.0178$ \\
|
||||
$optdigits(64)$ & $0.9966 \pm 0.0012$ & $\textbf{0.9975} \pm 0.0012$ & $0.8142 \pm 0.0267$ \\
|
||||
$gas-drift(128)$ & $\textbf{0.9790} \pm 0.0018$ & $0.9585 \pm 0.0055$ & $0.8764 \pm 0.0166$ \\
|
||||
$Delft\_pump\_AR(160)$ & $\textbf{0.9965}$ & $\textbf{0.9953} \pm 0.0019$ & $0.9665 \pm 0.0096$ \\
|
||||
$musk(166)$ & $\textbf{1.0}$ & $\textbf{1.0}$ & $0.9808 \pm 0.0117$ \\
|
||||
$$ & $$ & $$ & $$ \\
|
||||
$Average$ & $\textbf{0.7944}$ & $\textbf{0.7879}$ & $0.7580$ \\
|
||||
\hline
|
||||
\end{tabular}
|
|
@ -0,0 +1,7 @@
|
|||
<l2st>
|
||||
<e>Hypothesis: Isolation Forests are better when there are numerical and nominal attributes</e>
|
||||
<e>Easy to test</e>
|
||||
</l2st>
|
||||
<code>
|
||||
condition=condition & (numeric & nominal)
|
||||
</code>
|
|
@ -0,0 +1,19 @@
|
|||
\begin{tabular}{llll}
|
||||
\hline
|
||||
Dataset & Knn & IFor & Lof \\
|
||||
\hline
|
||||
$ozone-level-8hr(72)$ & $\textbf{0.8051} \pm 0.0288$ & $\textbf{0.7768} \pm 0.0118$ & $0.7738 \pm 0.0292$ \\
|
||||
$spambase(57)$ & $0.8038 \pm 0.0125$ & $\textbf{0.8202} \pm 0.0042$ & $0.7712 \pm 0.0055$ \\
|
||||
$arrhythmia(274)$ & $\textbf{0.8137} \pm 0.0185$ & $\textbf{0.8086} \pm 0.0099$ & $0.8042 \pm 0.0186$ \\
|
||||
$musk(166)$ & $\textbf{1.0}$ & $0.9808 \pm 0.0117$ & $\textbf{1.0}$ \\
|
||||
$$ & $$ & $$ & $$ \\
|
||||
$Average$ & $\textbf{0.8556}$ & $\textbf{0.8466}$ & $\textbf{0.8373}$ \\
|
||||
\hline
|
||||
\end{tabular}
|
||||
|
||||
<l2st>
|
||||
<e>Only 4 datasets, so not clear at all</e>
|
||||
<e>->More datasets</e>
|
||||
|
||||
</l2st>
|
||||
|
|
@ -0,0 +1,7 @@
|
|||
There are analysis that are only possible with many datasets
|
||||
Here: unsupervised optimization
|
||||
Given multiple AD models, find which is best:
|
||||
Use AUC score? Requires Anomalies->Overfitting
|
||||
Can you find an unsupervised Method?
|
||||
In general very complicated, so here only focus on very small differences in the model.
|
||||
So each model is an autoencoder, trained on the same dataset, where the difference is only in the initialisation
|
|
@ -0,0 +1,8 @@
|
|||
First guess Loss of the Model on the training Data
|
||||
How to evaluate this?
|
||||
Train many models, look at the average AUC score.
|
||||
For the alternative, take groups of 20 models, and look at the AUC score of the best model.
|
||||
Is there a meaningfull difference between results? Give result as z\_score (#(m_1-m_2)/sqrt(s_1**2+s_2**2)#)
|
||||
This difference depends a lot on the dataset
|
||||
->even a really good z\_score does not mean much (sometimes #LessThan(30,z)#)
|
||||
(repeat with two histones)
|
|
@ -0,0 +1 @@
|
|||
Pick the Model with the lowest l2\-loss
|
Binary file not shown.
|
@ -0,0 +1,3 @@
|
|||
Pick points with 1\% width difference in input space around each point.
|
||||
for each point, find the maximum difference in output space.
|
||||
average this difference
|
Binary file not shown.
|
@ -0,0 +1,3 @@
|
|||
Pick random points in the input space.
|
||||
measure the distance in input and output space
|
||||
a low correlation is a good model
|
Binary file not shown.
|
@ -0,0 +1,9 @@
|
|||
Things I still want to add:
|
||||
<l2st>
|
||||
Ensemble Methods
|
||||
Visualisation options
|
||||
Alternative Evaluations
|
||||
Hyperparameter optimisation (with crossvalidation)
|
||||
|
||||
|
||||
</l2st>
|
|
@ -0,0 +1,3 @@
|
|||
What do you think about this?
|
||||
Is there something I should also add?
|
||||
What would you need for you to actually use this?
|
Binary file not shown.
After Width: | Height: | Size: 20 KiB |
Binary file not shown.
After Width: | Height: | Size: 989 KiB |
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue