Astronomy has a major data problem – simulating realistic images of the sky can help train algorithms
- Written by John Peterson, Assoc. Professor of Physics and Astronomy, Purdue University
 
Professional astronomers don’t make discoveries by looking through an eyepiece like you might with a backyard telescope. Instead, they collect digital images in massive cameras attached to large telescopes[1].
Just as you might have an endless library of digital photos stored in your cellphone, many astronomers collect more photos than they would ever have the time to look at. Instead, astronomers like me[2] look at some of the images, then build algorithms and later use computers to combine and analyze the rest.
But how can we know that the algorithms we write will work, when we don’t even have time to look at all the images? We can practice on some of the images, but one new way to build the best algorithms is to simulate some fake images as accurately as possible.
With fake images, we can customize the exact properties of the objects in the image. That way, we can see if the algorithms we’re training can uncover those properties correctly.
My research group and collaborators have found that the best way to create fake but realistic astronomical images is to painstakingly simulate light and its interaction with everything it encounters. Light is composed of particles called photons[3], and we can simulate each photon. We wrote a publicly available code to do this called the photon simulator, or PhoSim[4].
The goal of the PhoSim project is to create realistic fake images that help us understand where distortions in images from real telescopes come from. The fake images help us train programs that sort through images from real telescopes. And the results from studies using PhoSim can also help astronomers correct distortions and defects in their real telescope images.
The data deluge
But first, why is there so much astronomy data in the first place? This is primarily due to the rise of dedicated survey telescopes. A survey telescope maps out a region on the sky rather than just pointing at specific objects.
These observatories all have a large collecting area, a large field of view and a dedicated survey mode to collect as much light over a period of time as possible. Major surveys from the past two decades include the SDSS[5], Kepler[6], Blanco-DECam[7], Subaru HSC[8], TESS[9], ZTF[10] and Euclid[11].
The Vera Rubin Observatory[12] in Chile has recently finished construction and will soon join those. Its survey begins soon after its official “first look” event on June 23, 2025[13]. It will have a particularly strong set of survey capabilities.
The Rubin observatory can look at a region of the sky all at once that is several times larger than the full Moon, and it can survey the entire southern celestial hemisphere every few nights.
A survey can shed light on practically every topic in astronomy.
Some of the ambitious research questions include: making measurements about dark matter[16] and dark energy[17], mapping the Milky Way’s distribution of stars, finding asteroids[18] in the solar system, building a three-dimensional map of galaxies in the universe, finding new planets outside the solar system[19] and tracking millions of objects that change over time, including supernovas[20].
All of these surveys create a massive data deluge. They generate tens of terabytes every night – that’s millions to billions of pixels collected in seconds. In the extreme case of the Rubin observatory[21], if you spent all day long looking at images equivalent to the size of a 4K television screen for about one second each, you’d be looking at them 25 times too slow and you’d never keep up.
At this rate, no individual human could ever look at all the images. But automated programs can process the data.
Astronomers don’t just survey an astronomical object like a planet, galaxy or supernova once, either. Often we measure[22] the same object’s size, shape, brightness and position in many different ways under many different conditions.
But more measurements do come with more complications. For example, measurements taken under certain weather conditions or on one part of the camera may disagree with others at different locations or under different conditions. Astronomers can correct these errors – called systematics – with careful calibration or algorithms, but only if we understand the reason for the inconsistency between different measurements. That’s where PhoSim comes in. Once corrected, we can use all the images and make more detailed measurements.
Simulations: One photon at a time
To understand the origin of these systematics, we built PhoSim[23], which can simulate the propagation of light particles – photons – through the Earth’s atmosphere and then into the telescope and camera.
A simulation of photons traveling from a single star to the Vera Rubin Observatory, made using PhoSim. The layers of turbulence in the atmosphere move according to wind patterns (top middle), and the mirrors deform (top right) depending on the temperature and forces exerted on them. The photons with different wavelengths (colors) are sampled from a star, refract through the atmosphere and then interact with the telescope’s mirrors, filter and lenses. Finally, the photons eject electrons in the sensor (bottom middle) that are counted in pixels to make an image (bottom right). John Peterson/PurduePhoSim simulates the atmosphere, including air turbulence, as well as distortions from the shape of the telescope’s mirrors and the electrical properties of the sensors. The photons are propagated using a variety of physics that predict what photons do when they encounter the air and the telescope’s mirrors and lenses.
The simulation ends by collecting electrons that have been ejected by photons[24] into a grid of pixels, to make an image.
Representing the light as trillions of photons is computationally efficient and an application of the Monte Carlo method[25], which uses random sampling. Researchers used PhoSim to verify some aspects of the Rubin observatory’s design and estimate how its images would look.
The results are complex, but so far we’ve connected the variation in temperature across telescope mirrors directly to astigmatism – angular blurring – in the images. We’ve also studied how high-altitude turbulence in the atmosphere that can disturb light on its way to the telescope shifts the positions of stars and galaxies in the image and causes blurring patterns that correlate with the wind. We’ve demonstrated how the electric fields in telescope sensors – which are intended to be vertical – can get distorted and warp the images.
Researchers can use these new results to correct their measurements and better take advantage of all the data that telescopes collect.
Traditionally, astronomical analyses haven’t worried about this level of detail, but the meticulous measurements with the current and future surveys will have to. Astronomers can make the most out of this deluge of data by using simulations to achieve a deeper level of understanding.
References
- ^ massive cameras attached to large telescopes (theconversation.com)
- ^ astronomers like me (www.physics.purdue.edu)
- ^ particles called photons (theconversation.com)
- ^ photon simulator, or PhoSim (www.phosim.org)
- ^ SDSS (www.sdss.org)
- ^ Kepler (science.nasa.gov)
- ^ Blanco-DECam (noirlab.edu)
- ^ Subaru HSC (hsc.mtk.nao.ac.jp)
- ^ TESS (science.nasa.gov)
- ^ ZTF (www.ztf.caltech.edu)
- ^ Euclid (www.esa.int)
- ^ Vera Rubin Observatory (www.rubinobservatory.org)
- ^ first look” event on June 23, 2025 (doi.org)
- ^ Rubin Observatory/NSF/AURA/B. Quint (noirlab.edu)
- ^ CC BY-SA (creativecommons.org)
- ^ dark matter (theconversation.com)
- ^ dark energy (theconversation.com)
- ^ finding asteroids (theconversation.com)
- ^ planets outside the solar system (theconversation.com)
- ^ including supernovas (theconversation.com)
- ^ extreme case of the Rubin observatory (rubinobservatory.org)
- ^ we measure (theconversation.com)
- ^ PhoSim (www.phosim.org)
- ^ ejected by photons (en.wikipedia.org)
- ^ Monte Carlo method (en.wikipedia.org)
Authors: John Peterson, Assoc. Professor of Physics and Astronomy, Purdue University


