Sweet Timbres and Audio Compression

These sound very nice

Oct 14, 2025

I’ve made some sweet-sounding synth sounds which play some games with the harmonic series to sound more consonant than is normally possible. You can download them and use them with a MIDI controller yourself.

The psychoacoustic observation that the human brain will accept a series of tones as one note if they correspond to the harmonic series all exponentiated by the same amount seems to be original. The way the intervals are still recognizably the same even with a very different series of overtones still shocks me. The trick where harmonics are snapped to standard 12 tone positions is much more obvious but I haven’t seen it done before, and I’m still surprised that doing just that makes the tritone consonant.

There are several other tricks I used which are probably more well known but one in particular seems to have deeper implications for psychoacoustics in general and audio compression in particular.

It is a fact that the human ear can’t hear the phase of a sound. But we can hear an indirect effect of it, in that we can hear the beating between two close together sine waves because it’s on a longer timescale, perceiving it as modulation in volume. In some sense this is literally true because sin(a) + sin(b) = 2*sin((a+b)/2)cos((a-b)/2) is an identity, but when generalizing to more waves the simplification that the human ear perceives sounds within a narrow band as a single pitch with a single volume still seems to apply.

To anyone familiar with compression algorithms an inability to differentiate between different things sets off a giant alarm bell that compression is possible. I haven’t fully validated that this really is a human cognitive limitation. So far I’ve just used it as a trick to make beatless harmonics by modulating the frequency and not the volume. Further work would need to use it to do a good job of lossily reproducing at exact arbitrary sound rather than just emulating the vibe of general fuzz. It would also need to account for some structural weirdness, most obviously that if you have a single tone whose pitch is being modulated within each window of pitches you need to do something about one of them wandering into a neighboring window. But the fundamental observation that phase can’t be heard and hence for a single sine wave that information could be thrown out is clearly true, and it does appear to be that as the complexity goes up the amount of data which could in principle be thrown out goes up in proportion to it rather than being a fixed single value.

I am not going to go down the rabbit hole of fleshing this out to make a better lossy audio compression algorithm than currently exists. But in principle it should be possible to use it to get a massive improvement over the current state of the art.

Bram’s Thoughts

Discussion about this post