Efficient simulation of discrete galaxy populations and associated radiation fields over the first billion years

Sep 1, 2025·
Steven G. Murray
Steven G. Murray
· 4 min read

Introducing discrete galaxies into fast reionization simulations

Our new paper updates 21cmFAST with a fast, flexible way to generate realistic, discrete galaxy populations and their radiation fields across the first billion years of cosmic history.

What is 21cmFAST?

21cmFAST is the most widely used simulation code for modeling the 21-cm signal from the Cosmic Dawn and Epoch of Reionization. Simulations allow us to connect theoretical models of early galaxy formation to observations from current and upcoming radio telescopes like HERA and the SKA: that is, to be able to interpret that next-generation data in terms of the astrophysics we care about! In a nutshell, traditional “semi-numerical” simulations like 21cmFAST construct a 3D grid that represents a large volume (a ‘box’) of the early Universe, and then use approximate but efficient algorithms to predict the density, temperature, ionization, and 21-cm brightness for each cell in that box over time. The speed of these simulations makes them ideal for exploring large parameter spaces and generating mock observations (like we did here and here).

Why this paper matters

As we just mentioned, most semi-numeric reionization and Cosmic Dawn simulations approximate sources by computing the total emissivity inside each simulation cell. That’s efficient, but it neglects some important things: in reality the number of galaxies in each simulation cell is random—even if you know the average number of galaxies expected in that cell (given its density), the actual number can vary. This effect is sometimes referred to as “stochasticity” or “shot noise”, and–as shown in Ivan Nikolic’s excellent paper– it can have a sizeable impact on predictions for the 21-cm signal.

Another important consideration is that the 21cm signal is not the only signal we care about from the early Universe. Emission at other wavelengths (e.g. Lyman-alpha) can also be observed, not with radio telescopes but with space-based infrared observatories like the JWST. Both the 21cm signal and these other signals are driven by the same underlying galaxy populations, so to make consistent predictions across multiple observables (like we did here and here) we need to model the galaxies themselves, not just their average emissivity.

This paper introduces a new technique to 21cmFAST that addresses these issues by explicitly modeling discrete galaxies in the simulation volume.

What’s new in the method

Instead of treating emissivity as a continuous field per cell, we sample a dark-matter halo population consistent with the underlying density in each simulation cell, and the halo mass function, assign physical properties to individual halos (e.g. star-formation rate and X-ray emissivity), and compute radiation fields from these discrete sources. A key advance is the mechanism that correlates halo populations over cosmic time (equivalently, over redshift). Real structure formation is continuous: halos grow, merge, and the same density peaks tend to host successive generations of halos. The paper introduces a practical algorithm that preserves temporal correlations in the halo population so that halos at nearby redshifts are not independent random draws. This avoids unphysical temporal noise in source counts and radiation fields and produces smoother, physically motivated evolution of observables (lightcones, power spectra, maps). The correlation mechanism is lightweight and tuned to semi-numeric workflows: it captures the relevant memory of the density field and halo bias without the overhead of a full N-body merger tree.

Speed and a clean API

One of the motivating use-cases for 21cmFAST is running large suites of simulations to explore astrophysical parameter space (often within a Bayesian inference loop, for example using 21CMMC). To run the many thousands of simulations required for these studies, speed is essential. Despite adding per-halo sampling and time-correlations, our new implementation remains fast. In fact, several parts of the code run faster than before, thanks to algorithmic improvements, while the new features add some overhead that we have optimized carefully.

The updated 21cmFASTv4 also refreshes and modernizes the API, making the new features easier to adopt, and making it easier to cleanly define and share the configuration files that define a simulation.