The idea is always the same: Define complicated model to learn (often millions of parameters) Define loss function that this model should minimize (example: $\sum_i (y_i-f(x_i))^2$) Find parameters that minimize the loss (->Backpropagation) Usually Neural Networks: $f(x)=f_n(x)=activation(A_n\cdot f_{n-1}(x)+b_n)$ $f_0(x)=x$ Powerful, as you can show that when there are 3 Layers+ (and infinitely sized matrices), you can approximate any function ->So a model becomes a loss function