The papers produced by the MLhad collaboration, listed here in chronological order:

### Describing Hadronization via Histories and Observables for Monte-Carlo Event Reweighting

*Distributions for the fragmentation function averaged over all string break variables except z. All model weights originate from the model trained with the unbinned high-level observables.*

We introduce a novel method for extracting a fragmentation model directly from experimental data without requiring an explicit parametric form, called Histories and Observables for Monte-Carlo Event Reweighting (HOMER), consisting of three steps: the training of a classifier between simulation and data, the inference of single fragmentation weights, and the calculation of the weight for the full hadronization chain. We illustrate the use of HOMER on a simplified hadronization problem, a qq string fragmenting into pions, and extract a modified Lund string fragmentation function f(z). We then demonstrate the use of HOMER on three types of experimental data: (i) binned distributions of high level observables, (ii) unbinned event-by-event distributions of these observables, and (iii) full particle cloud information. After demonstrating that f(z) can be extracted from data (the inverse of hadronization), we also show that, at least in this limited setup, the fidelity of the extracted f(z) suffers only limited loss when moving from (i) to (ii) to (iii).

Accompanying code is available here.

### Towards a data-driven model of hadronization using normalizing flows

*A comparison between the (histograms) Pythia and (solid lines) NF generated single emission longitudinal momentum distributions produced at four different fixed values of transverse mass which were not used in the training of the model.*

We introduce a model of hadronization based on invertible neural networks that faithfully reproduces a simplified version of the Lund string model for meson hadronization. Additionally, we introduce a new training method for normalizing flows, termed MAGIC, that improves the agreement between simulated and experimental distributions of high-level (macroscopic) observables by adjusting single-emission (microscopic) dynamics. Our results constitute an important step toward realizing a machine-learning based model of hadronization that utilizes experimental data during training. Finally, we demonstrate how a Bayesian extension to this normalizing-flow architecture can be used to provide analysis of statistical and modeling uncertainties on the generated observable distributions.

Accompanying code is available here.

### Reweighting Monte Carlo Predictions and Automated Fragmentation Variations in Pythia 8

*Average time required to generate a single event as a function of the number of alternative parameter values calculated during the generation. The error on each point is the standard error of the mean. The amount of time required to generate a single event increases linearly; the best-fit curve is shown in red, and its equation is given in the legend.*

This work reports on a method for uncertainty estimation in simulated collider-event predictions. The method is based on a Monte Carlo-veto algorithm, and extends previous work on uncertainty estimates in parton showers by including uncertainty estimates for the Lund string-fragmentation model. This method is advantageous from the perspective of simulation costs: a single ensemble of generated events can be reinterpreted as though it was obtained using a different set of input parameters, where each event now is accompanied with a corresponding weight. This allows for a robust exploration of the uncertainties arising from the choice of input model parameters, without the need to rerun full simulation pipelines for each input parameter choice. Such explorations are important when determining the sensitivities of precision physics measurements.

Accompanying code is available here.

### Modeling hadronization using machine learning

*Comparison of the average number of hadrons produced in the fragmentation chain of a single string as a function of the initial parton energy E, produced using Pythia (blue) and a cSWAE (red). The density plot shows the multiplicity distributions obtained with MLhad for 20000 fragmentation chains.*

We present the first steps in the development of a new class of hadronization models utilizing machine learning techniques. We successfully implement, validate, and train a conditional sliced-Wasserstein autoencoder to replicate the Pythia generated kinematic distributions of first-hadron emissions, when the Lund string model of hadronization implemented in Pythia is restricted to the emissions of pions only. The trained models are then used to generate the full hadronization chains, with an IR cutoff energy imposed externally. The hadron multiplicities and cumulative kinematic distributions are shown to match the Pythia generated ones. We also discuss possible future generalizations of our results.

Accompanying code is available here.