I use the Dataset provided in this Paper (arXiv:1902.09914)
up to 600k Anti-#k_T# jets in the Training Set with:
#p_T# between $550 \cdot \textrm{GeV}$ and $650 \cdot \textrm{GeV}$
$R_{i}^{2} = \eta_{i}^{2} + \phi_{i}^{2} \leq {0.8}^{2}$
the 4 vectors in each event are sorted by #p_t#
and are preprocessed here into
#flag#: a constant
$\Delta{\eta}$: $\eta = \log{\left(\frac{p + p_{3}}{p - p_{3}} \right)} / 2$, and $\Delta{\eta} = \eta - \operatorname{mean}{\left(\eta \right)}$
$\Delta{\phi}$: $\phi = \operatorname{arctan_{2}}{\left(p_{2},p_{1} \right)}$, and $\Delta{\phi} = \phi - \operatorname{mean}{\left(\phi \right)}$
$lp_{T}$: $p_{T}^{2} = p_{1}^{2} + p_{2}^{2}$, and $lp_{T} = - \log{\left(\frac{p_{T}}{p_{T}^{jet}} \right)}$
flag (a constant)
#Eq(eta,ln((p+p_3)/(p-p_3))/2)#
#Eq(phi,atan2(p_2,p_1))#
#Eq(ln(p_t_jet/p_t),ln(sqrt((p_1_jet**2+p_2_jet**2)/(p_1**2+p_2**2))))#
Preproccessing
Sort by the transverse momentum
Encoder
Learn a graph (topK: connect each node to K neighbours)
Run graph updates
4 nodes -> 1 node
Decoder
1 node -> 4 nodes
Run graph updates
Sort again by the transverse momentum
50k jets
Learning rate of #0.0003#
Batch size of 200
Train until the loss does not improve for 30 Epochs
Compression size of 7