thesffs/README

Usual feature selection tries to find the most important features of a given dataset. This is often done to make any downstream machine learning task easier.
This repository (and the connected thesis) tries to extend the space of possibly selected features from just the given features to linear combinations of features. This could every existing machine learning pipeline relying on feature selection to work better and make lots of data more interpretable.

To do this, it uses a special tensorflow layer (implemented in n2ulayer.py and applied in mu.py), which only allows for special linear combinations (Rotations in nd space, as they dont allow the network to lose information or to create artificial patterns that could be detected by the loss).
This loss (defined in loss.py) is minimal for a high contrast between features. It is currently only able to work with 2 dimensional output data, but is (mostly) differentiable, making it possible to use both it and this kind of rotation networks (in main.py) to combine the features of some toy dataset (defined in data.py) in a highly contrastful manner.
data.py contains a hidden feature that should become visible by comparing x to y-z. But as you can see from multiple runs of main.py, there are also other reations (for example y and z, as y contains some z) and combinations of features to be found. In show.py you find the three toy features x,y,z plottet against each other and the designated feature plotted to for reference.

As you migth see, the basic idea works, but there are some caveats:
- The loss through training is not monotonic. Often is the final loss higher than 1 (the expectation for random data). We can only solve this by restoring the best weights. This migth only be solved by a better loss function or different optimizer
- This is important as it is not clear if the achived value is actually the minimum
- This is implemented only for 2 dimensional output data. For a good feature selection algorithm this would be important to extend.
- It is also not clear how well this scales to higher dimensions. The number of layers is proportional to input_dim*output_dim, but it is unclear how well the algorithm converges with a high number of layers
- You might be able to solve a restricted version of this algorithm analytically and extend this to a greedy algorithm (but only if this is something youre interested in)
initial push 2021-12-30 09:23:31 +01:00			`Usual feature selection tries to find the most important features of a given dataset. This is often done to make any downstream machine learning task easier.`
			`This repository (and the connected thesis) tries to extend the space of possibly selected features from just the given features to linear combinations of features. This could every existing machine learning pipeline relying on feature selection to work better and make lots of data more interpretable.`

			`To do this, it uses a special tensorflow layer (implemented in n2ulayer.py and applied in mu.py), which only allows for special linear combinations (Rotations in nd space, as they dont allow the network to lose information or to create artificial patterns that could be detected by the loss).`
			`This loss (defined in loss.py) is minimal for a high contrast between features. It is currently only able to work with 2 dimensional output data, but is (mostly) differentiable, making it possible to use both it and this kind of rotation networks (in main.py) to combine the features of some toy dataset (defined in data.py) in a highly contrastful manner.`
			`data.py contains a hidden feature that should become visible by comparing x to y-z. But as you can see from multiple runs of main.py, there are also other reations (for example y and z, as y contains some z) and combinations of features to be found. In show.py you find the three toy features x,y,z plottet against each other and the designated feature plotted to for reference.`

			`As you migth see, the basic idea works, but there are some caveats:`
			`- The loss through training is not monotonic. Often is the final loss higher than 1 (the expectation for random data). We can only solve this by restoring the best weights. This migth only be solved by a better loss function or different optimizer`
			`- This is important as it is not clear if the achived value is actually the minimum`
			`- This is implemented only for 2 dimensional output data. For a good feature selection algorithm this would be important to extend.`
			`- It is also not clear how well this scales to higher dimensions. The number of layers is proportional to input_dim*output_dim, but it is unclear how well the algorithm converges with a high number of layers`
			`- You might be able to solve a restricted version of this algorithm analytically and extend this to a greedy algorithm (but only if this is something youre interested in)`