---
title: Notes on Loudness Normalization
date: 2025-06-23
tags: [audio, linux]
description: "Dialogue in a movie is barely audible, but explosion are far too loud. So I experimented with loadness normalization."
---

You know the situation: Dialogue in a movie is barely audible, so you turn the
volume all the way up. The next scene has an car crash and your ears explode.

To prevent this, there are algorithms to normalize loudness. I wasn't really
interested in reading everything there is to know about loudness normalization
though. Instead, I just experimented with a few options. I also looked into how
they can be used on a Linux desktop.

# PipeWire filters

[PipeWire](https://gitlab.freedesktop.org/pipewire/pipewire) has recently
replaced older sound servers like PulseAudio or Jack on Linux desktops.
It provides backwards compatibility with the old systems, so you can for
example use the Pulse Volume Control GUI. It also provides features similar to
Jack (or even [PureData](https://en.wikipedia.org/wiki/Pure_data)) where you
can create different audio processing nodes and connect them together.

Creating a [filter node](https://docs.pipewire.org/group__pw__filter.html) was
easy enough. However, I had to manually connect it to the audio streams I
wanted to process. For stereo audio, theat meant manually making 4 links (2
links from movie to filter and 2 links from filter to speakers). I tried to
automatically create these links via the API to no avail. I also tried to
fiddle with
[WirePlumber](https://pipewire.pages.freedesktop.org/wireplumber/policies/smart_filters.html)
with similar results.

Finally I found [filter
chains](https://docs.pipewire.org/page_module_filter_chain.html), apparanetly a
completely unrelated feature in PipeWire that creates a virtual sink in front
of the filter and automatically connects its output to the default sink. This
makes it really easy to use the filter with standard GUIs.

Filter chains are configured using a syntax that looks like JSON without commas.
The documentation says they should be saved to
`~/.config/pipewire/filter-chain.conf.d/`, but for me they didn't load unless I
saved them to `~/.config/pipewire/pipewire.conf.d/`.

If there is any error in the configuration, the filter will just be ignored. I
added `ExecStart=/usr/bin/pipewire -vvv` to
`/usr/lib/systemd/user/pipewire.service` to get some debug output, which helped
a little but not much.

For the filters themselves you have a couple of options:

-   a couple of builtin low-level primitives like multiplication or logarithms
-   ladspa/lv2 plugins
-   SOFA filters for spatially oriented audio
-   EBU R 128 filters (we will get to that)

Out of all of these, ldaspa/lv2 plugins provide the most flexibility. However,
I didn't get them to work. So I was mostly stuck with the builtin primitives to
build my filters.

This whole experience was a bit bumpy. Once I got this to work it was a joy,
but documentation and the debug experience could certainly be improved.

# Reshaping curves

My first idea was to apply a function directly to the audio signal. I landed on
$f(x) = 1.5x - 0.5x^3$. This function is symmetric around (0, 0), boosts small
values, and compresses larger values so the maximum value is still at 1.

It also reshapes the sound waves. A pure sine wave would be distorted when send
through this filter. I was curious to hear how that would effect the sound.

This is the PipeWire configuration I came up with:

```
context.modules = [
    {
        name = libpipewire-module-filter-chain
        args = {
            node.description = "compressor"
            media.name = "compressor"
            filter.graph = {
                nodes = [
                    {
                        type = builtin
                        name = copy
                        label = copy
                    }
                    {
                        type = builtin
                        name = cube
                        label = mult
                    }
                    {
                        type = builtin
                        name = mixer
                        label = mixer
                        control {
                            "Gain 1" = 1.5
                            "Gain 2" = -0.5
                        }
                    }
                ]
                links = [
                    { output = "copy:Out" input = "cube:In 1" }
                    { output = "copy:Out" input = "cube:In 2" }
                    { output = "copy:Out" input = "cube:In 3" }
                    { output = "copy:Out" input = "mixer:In 1" }
                    { output = "cube:Out" input = "mixer:In 2" }
                ]
            }
            audio.channels = 2
            capture.props = {
                node.name =  "effect_input.compressor"
                media.class = Audio/Sink
            }
            playback.props = {
                node.name =  "effect_output.compressor"
                node.passive = true
            }
        }
    }
]
```

The result sounded ok, but also not quite like what I had in mind: The
compression for larger values was barely noticeable because the audio data
doesn't really contain many large values. On the plus side, this meant that the
wave distortion effect was small. But it didn't really do much beyond
increasing the volume.

## Fourier Transforms

It is a fun exercise to apply techniques from image processing to sound or the
other way around.

I had previously experimented with optimizing images by spreading each of the
red, green, and blue channels so that the minimum value for each is 0% and the
maximum value is 100%. That technique turned out useful to remove color casts
from old photos.

To apply this technique to sound, my approach was to first do a Fourier
transform to get the strength of each frequency, spread these strengths, and
then do the inverse Fourier transform.

The minimum turned out to be 0 in most cases. But I thought this might also be
a good chance to do some additional noise reduction. So I shifted the minimum
anyway.

On the other end, I didn't want to cancel out all differences in loudness. So
instead of stretching the maximum to 100% everywhere, I opted to just push it
slightly in that direction by applying a square root.

Finally, I didn't want to have abrupt changes in loudness. So I smoothed the
minimum and maximum by mixing it with the previous values.

Because I didn't know how to implement this using PipeWire filter chains, I
prototyped it in python instead:

```python
import sys

import numpy as np
import soundfile as sf

CHUNK_SIZE = 2048
KEEP = 0.9
CUTOFF = 0.02
BOOST = 0.5

audio_data, sample_rate = sf.read(sys.argv[1])

chunks = []
min_magnitude = 0
max_magnitude = 1

for start in range(0, len(audio_data), CHUNK_SIZE):
    end = min(start + chunk_size, len(audio_data))

    fft_data = np.fft.fft(audio_data[start:end])
    magnitude = np.abs(fft_data)
    min_magnitude = np.min(magnitude) * (1 - KEEP) + min_magnitude * KEEP
    max_magnitude = np.max(magnitude) * (1 - KEEP) + max_magnitude * KEEP

    spread_magnitude = (
        (magnitude - min_magnitude - (max_magnitude - min_magnitude) * CUTOFF)
        / (max_magnitude - min_magnitude - (max_magnitude - min_magnitude) * CUTOFF)
        * (max_magnitude ** BOOST)
    )
    spread_magnitude = np.clip(spread_magnitude, 0, 1)

    new_fft_data = spread_magnitude * np.exp(1j * np.angle(fft_data))
    processed_chunk = np.fft.ifft(new_fft_data)
    chunks.append(np.real(processed_chunk))

sf.write('processed.flac', processed, sample_rate)
```

The result sounded OK (no noticeable distortion) but the quiet parts were
still too quiet.

# EBU R 128

In the meantime I did some reading on the last kind of filter that PipeWire had
to offer. I had never heard of EBU R 128 before. It turns out it has quite an
interesting story.

EBU is short for the "European Broadcasting Union". That is the same
organization that does the [Eurovision Song
Contest](https://en.wikipedia.org/wiki/European_Song_Contest), so this already
starts glamorous.

In the last few decades, there was a thing called the [Loudness
War](https://en.wikipedia.org/wiki/Loudness_war): Audio producers who wanted
their songs and jingles to be more noticeable used compression to increase the
average loudness of the sound, while leaving the peaks at the same level. [EBU
R 128](https://tech.ebu.ch/files/live/sites/tech/files/shared/r/r128.pdf)
provides loudness recommendations for its member organizations, which
effectively stopped the loudness war.

We shouldn't give too much credit to EBU though. Much of the specification is
in turn based on [ITU-R
BS.1770-5](https://www.itu.int/rec/R-REC-BS.1770-5-202311-I/en) by the
International Telecommunication Union. This might actually be one of the best
standards I have ever read. It first gives a conceptual overview, then provides
all normative formulas, and then goes deep into the rationale and methodology.
It was a very interesting and at the same time approachable read.

The only downside is of course the name. I can understand why EBU R 128 is more
commonly used.

Loudness is typically measured as logarithm of power, which in turn is
calculated as the integral over the squared audio signal. In the case of ITU-R
BS.1770-S:

$$Loudness(y) = 10 \log_10\left(\int_{t=0}^T y(t)^2 dt\right) - 0.691$$

The unit for loudness is LKFS (Loudness, K-weighted, relative to full scale).
EBU uses the same unit, but calls it LUFS (Loudness units relative to full
scale).

Before all that is calculated, frequencies are weighted to account for human
hearing. The industry standard is a curve simply called
[A-weighting](https://en.wikipedia.org/wiki/Sound_level_meter#Frequency_weighting).
ITU-R BS.1770-S however refers to a [study by
Soulodre](https://jcaa.caa-aca.ca/index.php/jcaa/article/download/1673/1420/1810)
that found that no weighting actually performs better, and a new curve called
RLB performs even better than that.

In addition to the frequency weighting curve, ITU-R BS.1770-S also specifies an
algorithm to calculate "gated" loudness. In this version, power is calculated
as the average over many small chunks. Chunks that are too quiet are ignored.

On top of this, [EBU Tech
3341](https://tech.ebu.ch/files/live/sites/tech/files/shared/tech/tech3341.pdf)
defines three profiles:

-   "Momentary Loudness" is measured over a 400ms window without gating
-   "Short-term Loudness" is measured over a 3s window without gating
-   "Integrated Loudness" is measured over the complete audio with gating

If you want to use this system with PipeWire, the repo contains an [example of
how to use its ebur128
filter](https://gitlab.freedesktop.org/pipewire/pipewire/-/blob/master/src/daemon/filter-chain/35-ebur128.conf).
Fair warning though: The current version has [a
typo](https://gitlab.freedesktop.org/pipewire/pipewire/-/issues/4667) so that
"Shortterm" must be written as "Shorttem" instead.

I have used this filter with some success. This really does normalize loudness.
However, there are still some issues. With the "Short-term" profile there is a
noticeable ramp when going from a quiet section to a loud section or the other
way around. So when there is a sudden bang after a quiet section, it gets
amplified even further.

# Conclusion

I want to be able to hear all dialogue, but I don't want loud explosions or
background noise to be amplified. It is tricky to make that distinction with
these simple techniques. I feel like I could get lost in trying to tweak all
the parameters to perfection, so I better stop here.

PipeWire turned out to be extremely flexible in theory, but also very limited
in practice. For example, I wish I there was a builtin power filter (it has
$const^x$, but not $x^{const}$) or that it was possible to apply these filters to
control values (e.g. the gain factor generated by ebur128). While the
documentation is decent, I still had issues finding relevant information.

At this point this is just a collection of notes. I will use the ebur128 filter
for a while and then maybe come back to this topic with some new ideas.