To understand why, first consider how to combine different tests
Since the loss is just a (quadratic) sum of the feature/particle losses, this is what we need
to model this, lets consider losses made from overlapping gaussians
now lets add them together
but also add a multiplicative constant #c# to one of them
##Eq(d,d_1+c*d_2)##
depending on #c# the auc of the addition chances
There is an optimum value of c
and if you use a value of c that is way to large, it can actually hurt your auc
so assume: #Eq(c,1)#(unweighted addition) is a #c# that is way to big for toptagging
so lets calculate the perfect c for a given distribution
auc as function of c
%show animation here
##Eq(mu_1B,0),Eq(mu_2B,0),Eq(mu_1S,1),Eq(mu_2S,c*alpha)##
##Eq(sigma_iB,sigma_iS),Eq(sigma_1,s_1),Eq(sigma_2,alpha*c*s_2)##
##Eq(mu_B,0),Eq(mu_S,1+c*alpha),Eq(sigma,sqrt(sigma_1**2+sigma_2**2))##
fix the scale by demanding #Eq(mu_S,1)#, then maximum auc means minimum #sigma# (or #(sigma/s1)**2#)
##Eq((sigma/s1)**2,(1+(s_2/s_1)**2*alpha**2*c**2)/(1+alpha*c))##
##Eq(d/dc * (sigma/s1)**2,0)##
##Eq((1/(1+alpha*c)**3)*2*y*(c*alpha*(s_2/s_1)**2-1),0)##
##Eq(c,1/(alpha*(s_2/s_1)**2))##
##Eq(alpha,1.0),Eq(s_2,0.75),Eq(s_1,0.5)##
compare to numerics:
##Eq(c,0.4444),Eq(c_n,0.4436),Eq(sigma_c_n,0.0024)##
##Eq(c,1/(alpha*(s_2/s1)**2))##
but you can approximate
\begin{equation} \alpha \propto loss \end{equation}
\begin{equation} #s# \propto loss \end{equation}
so
\begin{equation} c \propto loss^{-3} \end{equation}
%some tabular comparing the benefits/problems of this bodge
%atm some test que
Benefits
Problems
So maybe use weigths in training to let the network focus more on the important things
First Goal: Reach the same quality for a small Network (8 nodes) in splittet and nonsplittet training
here 8 nodes, 4 of those weigthed with a factor
auc as a function of this factor
apparently still something i dont understand
First Goal: Reach the same quality for a small Network (8 nodes) in splittet and nonsplittet training
here 8 nodes, 4 of those weigthed with a factor
auc as a function of this factor
apparently still something i dont understand