Realtime global illumination (part 3)
Because I've had little time to work on this lately, I'll try to wrap up as much as I can in this post. First, I'll discuss an improvement to form factor compression, over the results from the previous post. After that, more practical matters for using radiosity-based GI in terms of scalability and modularization will be considered. I'll then briefly mention what I have used the technique for thus far, together with some performance numbers.
Improved form factor compression
To recap, the last post introduced form factor filtering and compression using an hierarchical representation of the light map. Soon after posting, I came up with a simple modification that improves it.
Previously, each texel in the light map have gathered its illumination using form factors separate from other texels. The illumination for each form factor, covering the hemisphere of the texel, are sampled from an hierarchical representation. So this is really hierarchical evaluation over the hemisphere. I realized that it's possible to do hierarchical evaluation in the spatial domain as well, over multiple grouped texels.
Sometimes form factors change very slowly over surfaces. In these cases, it makes sense to not gather illumination per high resolution texel, but from a lower resolution level, amortizing the evaluation of a single hierarchical form factor over many texels. A single light map texel can have some form factors gathered at high spatial frequency and others at low spatial frequency. Higher levels in the mip map hierarchy can then be quickly upsampled to the lowest level, used for rendering. Intuitively this means that multiple layers of the light map are generated at different resolutions and composited into the final light map.
The problem is now to identify form factors to be evaluated in a low resolution layer. The simplest approach is to group texels based on the nearest texel in a higher mip map level. I calculate the min and max values over the coefficients for each form factor within the group. If the difference between min and max is below a threshold, that form factor is eliminated and replaced with a single grouped evaluation higher in the mip map hierarchy.
I haven't yet found a reliable and fast way to filter the result when upsampling to lower mip map levels. Bilinear filtering doesn't work because of discontinuities between neighboring texels. For that reason, there will be visible seams unless a very low threshold is used. However, I have been able to improve compression by about 40% without discernible artifacts, over the 0.1 density version from the last post. On average, each light map texel in the test scene now need to sample illumination using 16 form factors (previously 27). With filtering during upsampling, compression could likely be higher, so it's definitely something worth investigating in the future.
Modularization and scalability
Some good ideas for getting scalability for radiosity-based methods can be found in this 2010 presentation. Ideas include using low resolution light maps, low detail meshes that can be projected to more detailed ones, and light probes for dynamic objects.
To increase flexibility, the radiosity computations can be modularized, i.e., geometry can be divided into modules/systems that are largely independent. Form factor information between modules can then trivially be baked so that a module evaluates input illumination from a set of input modules (typically neighbors) using one set of form factors per input module. This makes it easy to modulate, replace, or animate any set of form factors at runtime (to model doors, for example). It also makes updating modules independent of each other so that different modules can be updated at different rates and/or in parallel over multiple threads. Finally, it allows a single set of precomputed form factors to be used in multiple places in case a certain module organization is recurring.
Other ways to improve scalability include having different LOD levels for evaluating radiance. Possibilities include pushing illumination gathering to higher mip map levels (similar to increasing the merge threshold described in the previous section), or simply baking form factor information with various quality settings and mesh detail. Similarly, the geometry rasterized when injecting direct illumination could have different LOD levels too, with progressively simpler geometry and mapping light map texels in higher mip map levels. Another quite obvious, but potentially important, optimization is caching input illumination from (temporarily) static light sources.
My use case
Mapping a small test scene to a game is definitely not a trivial matter. The things that are discussed in the previous section are merely some tools to make that mapping possible. A lot of engineering work will ultimately be required to decide how meshes should be split into different modules, decide update rates, LOD levels, etc. My use case is fortunately pretty simple.
Depending on your browser, you may or may not see the following video. If not, use this download link. Please excuse the programmer art :)
In my game project, I build up a space station from various pre-authored sections. I combine a baked ambient occlusion term, for details, with low resolution indirect illumination, for correct light levels and color bleeding. The combination seems to work well for my purposes. Each section in the space station naturally becomes a radiosity module. Form factors can be baked by considering the unique sets of close neighbors a module can have (there's no point in considering neighbors beyond the few closest ones). Currently, I use a single radiosity mesh for all the modules and reproject the specific geometry to its light map (similar to what is described in the presentation mentioned earlier). This was mostly done to minimize the amount of time spent baking combinations of neighboring modules. I will likely use more accurate geometry later on.
Modules that are considered for update each frame include those that are visible in the viewport and their input modules. This greatly reduces the number of radiosity modules to update each frame. It could very well be worth updating the non-visible input modules at a reduced rate, but currently I update them each frame as well. Because multiple bounces of illumination may cause more than the immediate input modules to impact illumination, it's probably a good idea to extend the set of modules to consider, but I have not seen any artifacts from not doing that yet. I suspect the performance impact of including more modules would be low, however, since their updates could be staggered over frames.
As for numbers, each texel samples illumination using 13 form factors on average, and about 8000 texels are evaluated each frame. On a single thread on my 2.4 GHz Intel Core 2 Duo (Merom) laptop CPU, it takes 1.8 ms to render the 128x128x6 cube map and inject direct illumination, 0.4 ms to propagate illumination using precomputed form factors, and 0.1 ms to assemble the final light map from the various modules. Everything is currently done on the CPU and the propagation is not particularly optimized yet. Cube map rendering could benefit from GPU acceleration but I'm envisioning many small light sources, in which case the CPU feels reasonable because of the small amount of geometry per cube map. Perhaps large light sources could be GPU accelerated and small ones run entirely on the CPU. In the future, it would be interesting to investigate how far a CUDA or OpenCL implementation could take this technique on a recent GPU.