This post accounts for my efforts with sound compression as of late. …It gets a little technical… just a little. But a Long Read, nonetheless. (Also, keep in mind that my advice is not professional; follow it at your own risk.)
It’s always great when I get a new CD— songs I’ve never heard or heard at full quality with good equalization… or never got the chance to analyze… to just… have. From The Beatles (of course) to Simon & Garfunkel, and even music samplers… to be honest, I don’t have very many albums. But when it comes to what I have, I would rather have the best quality.
To the untrained listener, a conventional MP3 may be an acceptable means of owning a song. But I’m not an untrained listener (self-educated, though). All sorts of psychoacoustic tricks are deployed to fool the listener to save space. To settle on a “standard” of 160 kilobits/second is to settle on a fairly low quality for the buck. Even a rip-off, considering listening tests found the minimum bit rate for transparency to be in the 200s. That is to say, 160 kbps is not nearly enough to substitute a CD, for most music. …But many people settle for less, it seems. And, for a period of time, so did I.
When I began ripping CDs to the new computer, I thought the 320k MP3 setting was fine that I even burned a copy of Abbey Road (because the original was getting dirty), plus another in MP3 form (a two-in-one disc). …Boy, was I wrong. I was an idiot to assume the ripped songs were close to the originals— such an idiot that I assumed the spectrograms, when viewed in Audacity, looked much like the originals… without looking at the originals. It was months later, after I made MP2s for results of the NES emulator I work on, that I discovered that MP2 doesn’t deliver very good results… I didn’t need to go far in my analysis of the ripped MP3s. Listening to Green Day’s “Maria” at half-rate, the artifacting was not only audible, it was awful… And this was supposedly the best MP3 quality the Windows Media Player encoder could offer! …So I kind of wasted a blank CD, and had to redo the songs on phone. (I use the smartphone to play music in the living room and now sometimes in the car.)
Now, the problem wasn’t that MP3 encoding is incapable of good sound. A format from the ’90s, yes, but… many encoders do a poor job at quality. The ripper likely uses a conventional LAME encoder, which is… well… lame.
(Side note: due to resolution and different methods used, use of spectrograms is not entirely reliable when it comes to determining quality, although some graphs may reveal an obviously lower quality, with the above image as an example. All spectrograms in this post were made with Audacity, Gaussian a=4.5, window size=8192.)
It took a bit of trial and error, downloading other encoders, even newer versions of LAME… The results weren’t great. It was as if everyone settled for less, given what I found most people had been using all over the internet.
It turns out, I already had the best I could find, and it came with the NCH Switch audio converter. (I should note that a lot of the NCH software is very buggy.) There’s still some audible loss during loud acoustics, but it’s okay to my trained ears. There’s indication that the encoder uses a modified version of LAME 3.8x… The results look nothing like the official 3.8x, though.
I’m sure many people in the field would disagree with me, thinking generic LAME is fine, and I know there have been listening tests… It boggles my mind. The fact remains: You’re not doing yourself any favors by settling with very lossy compression when you don’t have to, and worse when it’s not the best quality for the format.
I began ripping CDs losslessly (Windows Media Audio), and never went back. But I never stopped looking for better.
As for the lossless formats (because that’s the only real way to archive content while saving space), I found FLAC (free, and even has free in its name: Free Lossless Audio Codec) did a better job than WMA. ALAC (Apple’s lossless format) sits in between the two, but is not widely supported.
Strangely, again, what I had all along produced the best lossless compression for most sound: Monkey’s Audio; it carries the extension “.APE”. (Haha.) Its average compression rates are around 56%. The very loud “Maria” compressed to 72.293% size— still better than all others, including general compression (which never does a good job). Cons: APE does not see reduced precision in the least significant digit ranges as FLAC does, so compressing the results of, say, LossyWAV will not result in smaller files (it may even increase the output size). Also, a higher compression setting with Monkey’s Audio may be less efficient with short clips.
Unfortunately for me, Google Play (e.g., my Android phone) only supports WMA or FLAC (with potential stuttering on older phones). Update: Google Play supports FLAC; the File Manager may not recognize the extension.
As for the lossy formats (the only way to store more songs in limited space), it wasn’t until I downloaded FFmpeg that I was able to test the full range of codecs supported thru Audacity. The archive’s over 11 megabytes, but it’s worth it. Many of the library’s encoders don’t deliver great results (it carries conventional LAME and a broken AIFF-C IMA encoder), but now I can open most formats directly in Audacity, including the files I ripped from the CDs (yay).
…Some official testing was in order. And after many man-hours of trial and error, here’s a round-up of the top five lossy formats, all compressing Maria at a target rate of 320 kbit/s. There are a number of factors to consider, with audible artifacting being the biggest, of course.
1. AAC (Advanced Audio Coding). As the successor to MP3, AAC is technically more efficient in its methods of storage, theoretically providing better quality per kilobit. All modern music players today support AAC. The format also supports the 96KHz sampling rate. The main problem: Most AAC encoders still don’t produce ideal results. The default encoder in FFmpeg offered particularly low quality for a long time. Good AAC encoders, however, can nail tonal patterns to the point they can repeatedly transcode without losing much audible quality. It is also the only codec that can reliably reconstruct frequencies below 30 Hz. (Or modern codec that can; AC-3 can.)
From my own tests it’s best to use the Apple encoder first, FhG for rates above 256 kbps, or FDK if you don’t mind a 20KHz cutoff. faac, while inefficient, will produce the best results at higher settings (q>2000) but likely won’t play on iPhones.
For Maria, Apple using QAAC delivered a peak loss of -14.1db but maintained most spectral integrity. The lowest peak loss the encoder would deliver was -18db (359 kbps).
One issue that AAC encoders in general have is threshold of channel separation, that for complex sequences, weaker bands of any range may be forced mono as opposed to the traditional top-frequencies-down approach in perceptual coding. This hurts stereo clarity, and results in less “atmosphere.” In worst cases, even at the highest settings for Apple or Nero, spectral holes can be heard/seen in the mid-range if you were to subtract left and right or plug in a headphone jack half-way. This is the one area where MP3’s joint-stereo has an advantage of being optimized for stereo fidelity (below 16KHz, anyway). Sometimes the only way to improve results is to raise the bit rate or undergo trial and error with different encoders. Fortunately, the FhG encoder generally has great stereo fidelity. VBR implementations typically make this problem worse, so it’s best to go CBR if fidelity is the goal, with qaac being the main exception.
2. Vorbis. The non-proprietary industry standard, recognized by most modern players, features one of the best temporal resolutions for a psychoacoustic codec. Its best encoders recognize the limits of human hearing so well that the updated aoTuV offers probably the best psychoacoustic model available for moderate-to-high-rate VBR, and virtually zero bit rates in the case of silence. Where it falls behind in comparison to AAC, is no Vorbis encoder allows completing the spectrum regardless of settings. As a codec, it’s superior to MP3, but despite the fact that Vorbis is free and open-sourced, there’s been very little competition to produce a better encoder. That being said, aoTuV still typically produces better overall fidelity than other lossy formats in the 190-310 kbps range. (It’s better than I thought.)
3. Opus. The Opus format maintains spectral and perceptible quality, exceeds AAC in stereo separation quality for lower-to-moderate bit rates, and can reach considerable transparency in under 100 kbps. However, full-band encoding always uses the 48KHz sampling rate, so rates like 44.1KHz will always require resampling, built-in or not (at best a peak loss of ~-16.5db for Maria). An important area where Opus lacks support is that bands above 20KHz are not supported at all, so this format is more suited for quality, low-latency voice transmission/streaming than CD-quality music. (It would work very well in radio, but it’s not Hi-Fi.) Opus is very much a transform codec so it needs unconstrained VBR for quality. With the latest encoder, this codec can produce similar perceived transparency to AAC with 25-40% more efficiency. For higher rates, it’s better to go with Vorbis, but I would say YouTube’s switch to Opus was a good thing.
Mozilla software has supported Opus as a standard since 2012, so if you don’t have a supporting player on your device, you can still play Opus files in FireFox.
4. AC-3 (Dolby Digital). The AC-3 encoder in FFmpeg produces some results louder than actual (a basic 80% white noise test went well out of bounds), but it maintained spectral integrity well; loss with Maria was consistently low (-17.7db peak with the default cutoff). The loss wasn’t as clean as I’d liked (and indicates that the encoder could be improved), but the only loss I could hear with my trained ears was during snares, so the format is particularly transparent (at 320+ kbps) despite not being the best with transients. AC3 files add 256 samples to the beginning and possibly silence at the end. At inaudible levels frequencies may be warped like a cassette tape, but nothing is missing (unlike MP3, which can add or chop about anything at the ends).
AC-3 (ATSC A/52A) is patented in a strict manner that commercial use requires a license; as a result, most players don’t support it despite its use in projectors since 1992. Since the codec is required for DD sound, common video/disc player software may support AC3 files (which typically carry video), including PowerDVD and VLC. VLC is also available for phones and tablets.
E-AC-3 (extended version) supports more sound channels (more than the traditional Dolby 5.1) and supposedly improves artifacting at lower bit rates… However, the reason for the lower artifacting appears to be priority around some of the upper frequencies, with the consequence of more overall loss than regular AC-3 at least as calculated using the FFmpeg E-AC-3 encoder at the same rate. Player support for E-AC-3 is so scarce that even most DVD player software don’t support it.
In the end, however, AC-3 is a legacy format. It may produce a better spectrum than MP3 at good rates, but it isn’t scalable (no VBR), temporal precision with the format is one of the lowest and will never be increased as AC-3 is deprecated, and the high cost for a professional encoder means you probably will never get the best out of the codec as it is.
AC-3 was moved from #1 to above MP3 due to the aforementioned issues.
5. MP3. The former king of the hill.
As you can see, in the above image, upper frequencies are sacrificed to maintain integrity of the more audible acoustics with MP3— something universal with perceptual coding (psychoacoustics).
The MP3enc executable consistently delivers low loss (-20.9db peak for Maria), and is good enough that the loss for anything I give it is close to bursts of noise. Despite producing “clean” loss, spectral integrity is often lost considerably due to limitations in the format, and distortions can be heard when played at half-rate. MP3, however, has one advantage over other formats: unmatched mono or near-mono quality; MP3enc even produces better separation from mono than AC-3, with She’s Not There as a good example. The MP3 standard has no specified length for its primer (added ‘silence’ at the beginning), but encoders are fairly consistent with theirs given mode of input: for constant rates, LAME adds 2,257 samples; MP3enc adds 1,105.
For best quality, always use “Joint” mode. (Joint switches between stereo methods to eliminate spectral redundancies between left and right so more information can be stored per frame whereas “Simple Stereo” compresses left/right separately.)
I would also advise, for high quality, CBR against the use of variable bit rates with MP3. I know many would disagree with me on that point alone, but my ears can tell, and it shows with spectral degradation and amplitude errors.
Absolutely no MP3 encoder I’ve obtained has met the quality of a constant rate versus variable. It’s strongly recommended that you use AAC instead for VBR. Alternatively, ‘Preset’ rates (under 320) seem to work fine for MP3.
Concerning public use, the MP3 license is patented but the rights are pretty much unenforced against end-users, and apparently expired worldwide in 2017.
There are other MPEG-like formats out there, but none of them perform the same as above. Newer isn’t necessarily better. Codecs like Musepack have some advantages with their standard encoders but all of the other issues of non-AAC codecs and none of the compatibility.
The last format I looked at was LossyWAV. With the purpose of shrinking files thru WAVpack, the method deployed limits significant digit precision. The main problem: This kind of method is incapable of preserving true sine waves. (The type of coding found in MPEG does a fairly good job at preserving sine waves; AAC, superior.) The “standard” setting produced stepping precision below 8 bits (below the minimum of 10 necessary to fool human perception). Processing Maria with FLAC, the bit rate topped 552 kbps, making LossyWAV also inefficient compared to high-quality MP3/AAC despite a peak loss of -26.4db. The “insane” mode doesn’t preserve 10 bits of precision, so LossyWAV’s best quality is actually pretty bad for music. I would use WavPack’s lossy mode instead.
In the end, though, when it comes to lossy codecs it’s all about transparency in a smaller file more than signal-to-noise ratio. Some will use LossyWAV anyway… even though it’s not a codec, nor high fidelity.
…Well. That’s what I have. I hope my results help out anyone who reads this, that they don’t have to settle for less… or endlessly search and work for “slightly” better results, like I have. And, if you have— god forbid— read through all this in one sitting, you’re probably as exhausted as I am. In that case, you’re gonna want to take a nap. …Good night.