Navigation

Developments

O1 (WP1) – The theory of transport and machine learning architectures

Deliverable: A comprehensive turbulent transport model

Within this work package, we established the theoretical foundations for describing the dynamics of charged particles in fusion plasmas characterized by turbulence (Task 1.1), for the correct description of turbulent random fields (Task 1.2), and for the main machine-learning tools relevant to the purpose of this project (Task 1.3).

  • T1.1: Modeling the equations of motion for particles
  • We consider a population of charged particles, with mass $m$ and charge $q$, confined in a tokamak configuration dominated by a strong magnetic field $\mathbf{B}$. The usual particle coordinates in phase space are $(\mathbf{x},\mathbf{v})$. When the magnetic field is sufficiently strong compared to other electromagnetic components, the motion can be viewed as a superposition of a smooth large-scale motion and a fast Larmor gyration. This scale separation is the basis of Lie perturbation theory [1], which provides a rigorous mathematical framework for charged-particle dynamics in magnetized plasmas.

    In this approach, the true 6D phase space $(\mathbf{x},\mathbf{v})$ is replaced by the reduced gyrocentre coordinates $(\mathbf{X},v_{\parallel},\mu)$, which suppress the fast gyromotion through a perturbative expansion based on gyrokinetic ordering [2]. In this project we adopt the high-flow ordering [3], explicitly retaining the polarization drift [4].

    The gyrocentre dynamics satisfy

    $$ \frac{d\mathbf{X}}{dt} = v_{\parallel}\frac{\mathbf{B}^\star}{B_{\parallel}^\star} + \frac{\mathbf{E}^\star \times \mathbf{b}}{B_{\parallel}^\star} \qquad\text{(1)} $$

    $$ \frac{dv_{\parallel}}{dt} = \frac{q}{m}\frac{\mathbf{E}^\star\cdot\mathbf{B}^\star}{B_{\parallel}^\star} \qquad\text{(2)} $$

    $$ \frac{d\mu}{dt}=0 \qquad\text{(3)} $$

    where $$\mathbf{E}^\star = -\nabla\Phi^\star - \frac{\partial \mathbf{A}^\star}{\partial t}, \qquad \mathbf{B}^\star = \nabla\times\mathbf{A}^\star,$$ and

    $$ q\Phi^\star = \frac{m v_\parallel^2}{2} - \frac{m u^2}{2} + \mu B + q\phi_1^{\text{neo}} + q\phi_1^{\text{gc}} \qquad\text{(4)} $$

    $$ \mathbf{A}^\star = \mathbf{A}_0 + \frac{m}{q}(v_{\parallel}\mathbf{b}+u+v_E) + \mathbf{A}_1^{\text{gc}} \qquad\text{(5)} $$

    The zeroth-order magnetic potential $\mathbf{A}_0$ satisfies $\mathbf{B} = \nabla\times\mathbf{A}_0$. For axisymmetric equilibria, the field can be written as: $$ \mathbf{B} = F(\psi)\nabla\phi+\nabla\phi\times\nabla\psi, $$ where $\phi$ is the toroidal angle in cylindrical $(R,Z,\phi)$ or toroidal $(r,\theta,\phi)$ coordinates (COCOS = 2) [5].

    The local safety factor $q_\theta(\psi,\theta)$, its flux-surface average $\bar{q}(\psi)$, and the generalized poloidal angle $\chi(\psi,\theta)$ are defined by:

    $$ q_\theta(\psi,\theta) = \frac{\mathbf{B}\cdot\nabla\phi}{\mathbf{B}\cdot\nabla\theta} \qquad\text{(6)} $$

    $$ \bar{q}(\psi) = \frac{1}{2\pi}\int_0^{2\pi} q_\theta(\psi,\theta)\,d\theta \qquad\text{(7)} $$

    $$ \left.\frac{\partial\chi}{\partial\theta}\right|_{\psi} = \frac{q_\theta(\psi,\theta)}{\bar{q}(\psi)} \qquad\text{(8)} $$

    Zeroth-order MHD equilibrium is toroidally invariant and constant on magnetic surfaces, so $n=n(\psi)$, $P=P(\psi)$, $T_i=T_i(\psi)$, $T_e=T_e(\psi)$.

    A macroscopic radial electric field $E_0$ produces a toroidal rotation $$ \mathbf{u} = R^2\Omega_t(\psi)\,\nabla\phi, $$ where $\Omega_t(\psi)$ may have a gradient. The equations of motion are expressed in a frame rotating with $\mathbf{u}$ [3,7]. Poloidal flows are neglected [5], but a neoclassical poloidal potential $\phi_1^{\text{neo}}(\psi,\theta)$ appears:

    $$ \phi_1^{\text{neo}} = \frac{m\Omega_t(\psi)^2}{2|e|\,(1+T_i/T_e)} \left(R^2 - \langle R^2\rangle_\theta\right) \qquad\text{(9)} $$

    The first-order gyrocentre potentials $\phi_1^{\text{gc}}$ and $\mathbf{A}_1^{\text{gc}}$ include turbulent fluctuations and small external perturbations (RMPs). Gyro-averaging gives:

    $$ \tilde{\phi}_1^{\text{gc}}(k,t) = \tilde{\phi}_1(k,t)\, J_0(k_\perp \rho_L), $$ with $\rho_L=\sqrt{2m\mu}/(qB)$.

    The drift-wave turbulence relevant for ions is dominated by ITG/TEM modes [8].

    Circular geometry

    In circular approximation, $$ \mathbf{B} = B_0 R_0\,\nabla\phi + \frac{r b_\theta(r)}{R}\,\nabla\theta \qquad\text{(10)} $$ $$ F(\psi)=B_0 R_0 \qquad\text{(11)} $$ $$ \psi(r)=B_0R_0\int_0^r b_\theta(r')dr' \qquad\text{(12)} $$ $$ b_\theta(r)=\frac{r\,\bar{q}(r)}{R_0^2-r^2} \qquad\text{(13)} $$ $$ \chi(r,\theta)=2\tan^{-1} \left[\frac{R_0-r}{R_0+r}\tan\left(\frac{\theta}{2}\right)\right] \qquad\text{(14)} $$

    Solov’ev equilibria

    With linear profiles: $$ \mu_0\frac{dP}{d\psi}=A, \qquad \frac{dF^2}{d\psi}=\frac{2 A R_0^2 \gamma}{1+\alpha^2} \qquad\text{(15)} $$ The poloidal flux is $$ \psi=\frac{A}{2(1+\alpha^2)} \left[R^2 - \gamma R_0^2 - Z^2 + \frac{\alpha^2}{4}(R^2-R_0^2)^2\right] \qquad\text{(16)} $$ Minor radius: $$ a=\frac{R_0}{2\sqrt{2-\gamma-\sqrt{\gamma}}} \qquad\text{(17)} $$ Elongation: $$ \kappa=\frac{\alpha}{\sqrt{1-\gamma/(2-\gamma)}} \qquad\text{(18)} $$ Current: $$ F(\psi)=B_0R_0\left[1+\frac{\psi}{2A\gamma B_0(1+\alpha^2)}\right] \qquad\text{(19)} $$

    Experimentally reconstructed equilibria

    Experimental magnetic equilibria (G-EQDSK format) provide $\psi(R,Z)$ and profiles $F(\psi)$, $\bar{q}(\psi)$ on numerical grids. They are generated by reconstruction tools such as CHEASE [9], EFIT [10], PLEQUE [11], NICE [12], and EQUINOX [13]. T3ST uses these equilibria when analyzing specific discharges.

  • T1.2: Modeling the turbulence
  • Given the definitions of the magnetic equilibrium quantities $q_\theta(\psi,\theta)$, $\bar{q}(\psi)$, and $\chi(\psi,\theta)$ (Eqs. (6–8)), one can show that the Clebsch form of the magnetic field is valid:

    $$ \mathbf{B} = \nabla(\phi - \bar{q}\,\chi)\times\nabla\psi. $$

    For representing small-scale turbulent fluctuations, most gyrokinetic (GK) codes adopt field-aligned coordinates $(x,y,z)$ [14], which simplify the description of fluctuations elongated along magnetic field lines. These coordinates correspond to the radial, binormal, and parallel directions ($\mathbf{B}\propto\nabla y\times\nabla x\propto\partial \mathbf{r}/\partial z$):

    $$ x = C_x f(\psi) \qquad\text{(20)} $$

    $$ y = C_y\,( \phi - \bar{q}\,\chi ) \qquad\text{(21)} $$

    $$ z = C_z\,\chi \qquad\text{(22)} $$

    In practice, T3ST uses constant values $C_x=a$ (the tokamak minor radius), $C_y=r_0/\bar{q}(r_0)$ (with $r_0$ a reference radius), $C_z=1$, and defines $f(\psi)=\rho_t = \Phi_t(\psi)/\Phi_t(\psi_{\text{edge}})$, where $\rho_t\in[0,1]$ is the normalized effective radius. The toroidal magnetic flux is

    $$ \Phi_t(\psi)=\int_{\text{axis}}^{\psi} \mathbf{B}\cdot\mathbf{e}_\phi\, dS(\psi'), $$

    thus $\rho_t \approx r/a$ (in circular geometry).

    Instead of computing turbulent fields self-consistently, T3ST models them statistically as an ensemble of random fields $\{\phi_1, A_1\}$ with prescribed statistical properties, consistent with experimental ITG/TEM observations. Several moderate assumptions are used:

    Based on these assumptions, T3ST constructs an ensemble of random fields $\phi_1(x,t)$ from white-noise fields in Fourier space $\eta(\mathbf{k},\omega)$ satisfying $$ \langle \eta(\mathbf{k},\omega)\,\eta(\mathbf{k}',\omega')\rangle = \delta(\mathbf{k}+\mathbf{k}')\,\delta(\omega+\omega'). $$ The Fourier components of the potential are

    $$ \tilde{\phi}_1(\mathbf{k},\omega) = \sqrt{S(\mathbf{k},\omega)}\,\eta(\mathbf{k},\omega), $$

    ensuring Gaussianity and correct correlations.

    In toroidal geometry, the fields can be expressed through a double Fourier series:

    $$ \phi(x,t)=\sum_{n,m}\phi_{nm}(r,t)\, e^{i(n\phi+m\theta)} \qquad\text{(23)} $$

    which preserves natural periodicity. However, due to the anisotropic nature of turbulence (long correlation lengths along the field, short perpendicular), field-aligned representation is preferred. Thus T3ST evaluates $\phi_1$ in field-aligned coordinates, even though magnetic drifts are computed in cylindrical coordinates.

    The random field is represented as:

    $$ \phi_1(x,y,z,t) =\int dk\, d\omega\; \tilde{\phi}_1(\mathbf{k},\omega)\; e^{\,i(k_x x + k_y y + k_z z - [\omega^\star(\mathbf{k})+\omega] t)} \qquad\text{(24)} $$

    For $k_\parallel\ll k_\perp$, the correspondence with toroidal mode numbers is approximately:

    $$ k_y \approx \frac{n}{C_y},\qquad m = -\big[\bar{q}(x_0)\,n\big] + \Delta m,\qquad k_z = \frac{\Delta m + \{\bar{q}(x_0)n\}}{C_z}, \qquad\text{(25)} $$

    where $[\cdot]$ and $\{\cdot\}$ denote the integer and fractional parts, and $n,\Delta m\in\mathbb{Z}$.

    The spectrum $S(\mathbf{k},\omega)$ is modeled analytically to reproduce saturated drift-wave turbulence characteristics:

    $$ S = A_\phi^2 \, \frac{\tau_c\,\lambda_x\lambda_y\lambda_z}{(2\pi)^{5/2}}\, \frac{ \exp\!\left[-(k_x^2\lambda_x^2 + k_z^2\lambda_z^2)/2\right] }{ 1+\tau_c^2\omega^2 }\, \Big[ e^{-\frac{(k_y-k_0)^2\lambda_y^2}{2}} \;-\; e^{-\frac{(k_y+k_0)^2\lambda_y^2}{2}} \Big] \qquad\text{(26)} $$

    The parameters $\lambda_x,\lambda_y,\lambda_z$ are the correlation lengths in the field-aligned directions; $k_0$ selects the most unstable mode $$ k_{y,\text{max}} \approx \frac{k_0}{2} + \sqrt{\frac{k_0^2}{4} + \frac{2}{\lambda_y^2}}. $$ The correlation time $\tau_c$ measures deviations from the dispersion relation, and $A_\phi$ sets the turbulence amplitude. Typical parameters:

    $$ \lambda_x \approx 5\rho_i,\quad k_0 \approx 0.1/\rho_i,\quad \lambda_y \approx 5\rho_i,\quad \lambda_z \approx 1,\quad \tau_c \approx R_0/v_{\text{th}},\quad |e|A_\phi \approx 1\%\,T_i. $$

    Since $\lambda_z\approx 1$, we obtain $\lambda_\parallel\approx\bar{q}R_0$, consistent with experimental observations.

    Drift-wave turbulence in tokamaks is generally a superposition of ITG and TEM modes, weighted by coefficients $A_i$ and $A_e$:

    $$ \phi_1 = A_i\,\phi_1^{\text{ITG}} + A_e\,\phi_1^{\text{TEM}}, \qquad A_i + A_e = 1. $$

    It is well known that magnetic curvature, shear, internal transport barriers, ELM activity, and blob structures can induce non-Gaussian behavior and spatial variations of turbulence. Thus the assumptions of Gaussianity and homogeneity are only approximate. Nevertheless, numerical results (Sec. 4) show that particle transport is predominantly local, with particles moving only a few Larmor radii from their initial surface. Therefore, although not globally exact, these assumptions remain valid locally for the purposes of the present analysis.

    A possible extension recently implemented in the T3ST code is the ability to include inhomogeneities along the direction parallel to the magnetic field (“z”) through a simple prefactor:

    $$ \phi_1 \rightarrow \phi_1 \, g(z) $$

    $$ g(z) = e^{-\, z^2 / (2 \Lambda_z^2)} $$

    where \( \Lambda_z \) is a measure of the parallel correlation and of the “ballooning” character specific to electrostatic turbulence.

  • T1.3: Modeling the architecture of machine learning codes
  • Let an arbitrary function be \(F(\mathbf{X}) = y\), where \(\mathbf{X}\) is the “input” vector of size \(\mathbf{X}\in\mathbb{R}^{N}\), and \(y\) is the “output” which, in our case, is a scalar \(y\in\mathbb{R}\). Machine-learning models are designed to approximate complex relationships between input–output pairs, i.e. mappings \(\mathbf{X}\mapsto y\), especially when the analytical form of the function is unknown, difficult to derive, or impossible to reproduce explicitly.

    In the context of this project, the objective is to predict the asymptotic diffusion and convection values as functions of the free parameters of the transport model. These free parameters can be grouped into three categories: parameters describing the plasma equilibrium profiles, parameters of the particle distribution under consideration, and turbulence parameters. In total, depending on the available information, the asymptotic diffusion may depend on approximately 20 independent variables, so that the input vector has dimension \(\mathbf{X}\in\mathbb{R}^{20}\).

    The main obstacle to achieving accurate predictions comes from the high complexity of the multivariate function, driven by the large dimensionality of the input space. Moreover, this difficulty is amplified by the limited size of the training database, which cannot be expanded arbitrarily due to the computational cost of the numerical simulations.

    In this setting, it is necessary to identify machine-learning methods—combined with sampling and preprocessing techniques—that provide good performance in high-dimensional spaces and adapt efficiently to small datasets. For such problems, the most suitable models are nonlinear regression methods, in particular feed-forward neural networks (NNs) [15], which are universal approximators and can capture complex nonlinear relationships in continuous high-dimensional functions. Complementarily, dimensionality reduction techniques such as PCA [16] can be used to extract relevant features and reduce the NN input space, improving both stability and predictive accuracy.

    In essence, a neural network is a function with a specific structure that maps an input of dimension \(d_0\) to an output of dimension \(d_{n+1}\). The basic structure is composed of layers of three types: an input layer (\(L_0 \equiv L_{\text{in}}\)), an output layer (\(L_{n+1} \equiv L_{\text{out}}\)), and, between them, one or more “hidden” layers (\(L_i\), \(i=\overline{1,n}\)). The basic elements of each layer are neurons, and their number may vary from layer to layer (denoted by \(d_i\)).

    General architecture of a fully connected feed-forward neural network
    Figure 1: General architecture of a fully connected feed-forward NN; the grayscale level of each connection line symbolically represents the contribution (weight) associated with each neuron.

    For the problem analyzed in this project, the most appropriate neural-network model is a fully connected feed-forward neural network (FCNN) [16]. In such an architecture, each neuron in a layer is connected to all neurons in the next layer, and information propagates unidirectionally from input to output through one or more hidden layers. A generic FCNN architecture is illustrated schematically in Figure 1.

    The inputs and outputs of an NN must be numerically quantifiable, so that each neuron represents a single numerical value. In the present study, the input is the set of \(\sim 20\) free parameters of the transport model, and the output is the scalar value of the asymptotic radial diffusion.

    Equation (1.22) describes how the value of a generic neuron, denoted \(\alpha^{(i)}_j\), is computed. This quantity represents the value of the \(i\)-th neuron in layer \(L_j\):

    $$ \alpha^{(i)}_j = f\!\left( \sum_{k=1}^{d_{j-1}} w^{(i,k)}_j\, \alpha^{(k)}_{j-1} + \beta^{(i)}_j \right), \qquad (1.22) $$

    where:

    • \(w^{(i,k)}_j\) is the weight assigned to the value of the \(k\)-th neuron in the previous layer (\(L_{j-1}\));
    • \(f\) is the activation function, which can take various forms; commonly used choices are sigmoid or the hyperbolic tangent (\(\tanh\)). One of its roles is to introduce nonlinearity into an otherwise linear combination of weights and neuron values (a weighted sum over the \(d_{j-1}\) neurons of layer \(L_{j-1}\));
    • \(\beta^{(i)}_j\) is the bias associated with neuron \(i\) in layer \(L_j\), whose role is to shift the input range of the activation function.

    In matrix notation, the compact form of Eq. (1.22) for the entire layer \(L_j\) is:

    $$ \mathbf{A}_j = f\!\left( \mathbf{W}_{j-1}\,\mathbf{A}_{j-1} + \mathbf{B}_j \right), \qquad (1.23) $$

    Here \(\mathbf{A}_j\) and \(\mathbf{A}_{j-1}\) are column vectors of sizes \(d_j\times 1\) and \(d_{j-1}\times 1\), containing the values of all neurons in layers \(L_j\) and \(L_{j-1}\), respectively; \(\mathbf{B}_j\) is a \(d_j\times 1\) column vector containing the biases of each neuron in layer \(L_j\); and \(\mathbf{W}_{j-1}\) is a \(d_j\times d_{j-1}\) matrix containing the weights corresponding to the connections between layers \(L_{j-1}\) and \(L_j\). Therefore, for the connection between layers \(L_{j-1}\) and \(L_j\) there are in total \(d_j(d_{j-1}+1)\) parameters.

    Consequently, the total number of neural-network parameters is

    $$ N = \sum_{j=1}^{n+1} d_j\,(d_{j-1}+1), $$

    where \(n\) is the total number of hidden layers.

    The neural network is trained on an existing dataset in order to determine the optimal parameter values (weights and biases) such that the error between true values and predictions is minimized. The training process includes several steps:

    • Initialization: before training, weights and biases are set to random values according to a chosen scheme. Identical initializations, or very large-magnitude values (for \(\tanh\)/sigmoid), can slow down or stall learning, keeping the network in poor local minima. A suitable initialization keeps activation inputs in regions with significant gradients; otherwise gradients tend to zero and learning becomes inefficient. If all weights start identically, they also evolve identically, preventing effective optimization.
    • Loss function and training epochs: training proceeds over multiple epochs during which the dataset is split into batches that are successively fed to the network. An epoch ends after the full dataset has been processed, followed by backpropagation. At each epoch, weights and biases are adjusted by an optimization algorithm to reduce the loss function.
    • Optimization: the standard method is Stochastic Gradient Descent (SGD) [17]: at each iteration a batch is selected, the loss gradient is computed, and the weights are updated in the direction opposite to the gradient. A widely used variant is ADAM [18], which dynamically adapts the learning rate. Its advantages include separate learning rates for each parameter, a momentum term enabling faster convergence, and bias corrections that improve gradient estimates.
    O2 (WP2) – Developing and testing numerical codes

    Deliverable: 1 code for test particle simulations, 2 codes for machine learning

    O3 (WP3) – The construction of the database: extensive simulations

    Deliverable: Database, trained and tested machine learning tools

    ...
    O4 (WP4) – Predictions and physical mechanisms

    Deliverable: Characterization of transport mechanisms

    ...

    ← Return to the main page