Friedman test to see if there is a difference between models
Nemenyi test to see which models are equal, mark those equal to the maximum
For 2 models, Friedman not defined -> use Wilcoxon test
Does this match your expectation from the table?
Two models are 'equal' if their probability of being from the same distribution is #LessThan(p_b,p)#, what value should #Eq(p_b,0.1)# have?
Do I need to correct for p hacking (n experiments, so increase the difficulty for each, or is that clear from the table)