About MLhad

The Machine Learning for Hadronization collaboration (MLhad) seeks to design and deploy data-driven empirical hadronization models by extracting essential features directly from the wealth of available experimental data. These hadronization models will be able to make predictions at many different energies and environments. Additionally, any developments on Machine Learning based surrogate models can be used to increase the speed of hadronization simulations, which traditionally depend on Monte Carlo sampling and rejection techniques. This would not only impact present analyses, reducing uncertainties associated with hadronization, but would also benefit theoretical studies in particle and astroparticle physics. MLhad efforts currently focus on the most widely deployed event generator, Pythia.

The MLhad collaboration has produced public code along with associated publications.


Pythia code block

Overview of the Pythia code blocks. Hadronization is the transition between unobservable colored partons to measurable colorless hadrons.

Hadronization is the process where quarks and gluons, free and unobservable particles that are fundamental entities of the Standard Model of Particle physics, evolve into longer-distance bound states such as protons, neutrons and pions, which are measured at the experimental facilities. It is an inherently non-perturbative process and is thus particularly challenging to model. Empirical models have been developed and refined over the years. The two main phenomenological models used in simulating hadronization are the Lund string model and clustering model, both based on some intuition about how the Quantum Chromodynamics (QCD) works. For example, in the string model, quark–anti-quark pairs are thought of being connected by a string, a flux tube of the strong force confined in the lateral direction. As the quark–anti-quark pair moves apart, the string breaks, creating new quark–anti-quark pairs and resulting in the emission of hadrons. To model the rich phenomenology of hadronization, the string model as implemented in the multipurpose event generator Pythia has a large number of parameters that need to be tuned to experimental measurements. One should note that, although very successful, these empirical models are known to fail to consistently describe the experimental data across a wide range of collision energies. If we compare, for instance, the data from proton–proton and ion–ion collisions with the default hadronization model from Pythia, we find discrepancies at the level of 20-50%. Because of this, different experiments and theoretical calculations use different tunings of the model parameters, even modifying the existing hadronization algorithms when necessary, usually at the expense of increased computational cost.


MLHad Is supported in part by the DOE grant DE-SC1019775, the NSF grant OAC-2103889, and the NSF grant OAC-2103889 and NSF-PHY-2209769.

Pythia code block Pythia code block