Friedman test to see if there is a difference between models Nemenyi test to see which models are equal, mark those equal to the maximum For 2 models, Friedman not defined -> use Wilcoxon test Does this match your expectation from the table? Two models are 'equal' if their probability of being from the same distribution is #LessThan(p_b,p)#, what value should #Eq(p_b,0.1)# have? Do I need to correct for p hacking (n experiments, so increase the difficulty for each, or is that clear from the table)