10 lines
3.5 KiB
Plaintext
10 lines
3.5 KiB
Plaintext
Ensembles are a way to combine multiple models to create a more powerful model.In anomaly detection you can use a concept called feature bagging to create multiple predictions from the same algoritm. For this each run of the algorithm only works on some features. Generally this is used to increase the robustness of the anomaly detection method (if 2 features seem really important, anomaly detection methods might neglect the other features. If you have runs without these important features, this forces the algorithm to still consider less important features), but I would like to explore a sligthly different question:
|
|
If you are given multiple predictions, you will see events that are anomalous to some predictions, but normal to other ones. And when each model has different inputs, you might find that models considering a feature are anomalous, while models that dont consider the current event normal. In this case you could say that the input feature is the reason this event is anomalous.
|
|
Youre task would be to develop this into a method to analyze the reason for a found anomaly. Normally I would now include some example code, but since my trivial example needs thausands of models (see below) to output something useful, I only show 2 example images.
|
|
In both I train an ensemble of anomaly methods to differentiate mnist data (letters). The model should consider a "7" as normal, while finding every other letter as anomalous. The images shown are my favorites from ~20 I have looked at.
|
|
The first image (example1.pdf pdf because vector graphics) shows a slightly weird 7 (a 7 with another line at the top) on the left and the "anomaly reason" on the right side. You see the part of the 7 which we would initially consider normal in black (low anomaly reason), but not the additional line as this is not a usual part of the "7". So we can clearly see which parts of the image make this "7" normal.
|
|
The second image (example2.pdf) shows a "2" and thus an anomaly. See this "2" here again as a "7" with another line. Again you see the basic structure of the "7" represented in the image in black, but this time the second line is really anomalous (We can not expect there to be a 7 with a line below, but we could imagine in the training set being another 7 with a line above), and so it is found by the algorithm and as you see in the heatmap, this is represented: This image is not a "7" since it contains another line at the bottom.
|
|
The biggest drawback of this algorithm is that it requires many different anomaly predictions (I used here ~2000, I use a very fast algorithm I invented, but this still takes a couple of hours of computation time). This is partially the case since the mnist images used have many (784) features, and we can assume that this effect will be less strong with fewer features. But you can also probably still improve the speed (number of models) quite a lot. A better querry strategy for the feature bagging, a better combination function for the resulting anomaly scores or even some more active idea (train this model to test the current hypothesis) should help quite a lot.
|
|
On the other hand, this algorithm could also be even more useful for fewer features (where it will be much faster), and then you could also consider relations between the features (given two inputs, which are always between 0 and 1, but always the same: They are anomalous not for any value, but always when they are not the same. To find this relationship you really need to consider the relation between features)
|
|
If you have any questions, feel free to write an email to Simon.Kluettermann@cs.tu-dortmund.de
|