First guess Loss of the Model on the training Data How to evaluate this? Train many models, look at the average AUC score. For the alternative, take groups of 20 models, and look at the AUC score of the best model. Is there a meaningfull difference between results? Give result as z\_score (#(m_1-m_2)/sqrt(s_1**2+s_2**2)#) This difference depends a lot on the dataset ->even #LessThan(30,z)# does not mean much