Lessons from the land of Codec (updated)

IMG_20160426_185516-cropped
Codec: short for “compressor/decompressor.”

This post accounts for my efforts with sound compression as of late. …It gets a little technical… just a little.  But a Long Read, nonetheless.  (Also, keep in mind that my advice is not professional; follow it at your own risk.)

It’s always great when I get a new CD— songs I’ve never heard or heard at full quality with good equalization… or never got the chance to analyze… to just… have.  From The Beatles (of course) to Simon & Garfunkel, and even music samplers… to be honest, I don’t have very many albums.  But when it comes to what I have, I would rather have the best quality.

To the untrained listener, a conventional MP3 may be an acceptable means of owning a song.  But I’m not an untrained listener (self-educated, though).  All sorts of psychoacoustic tricks are deployed to fool the listener to save space.  To settle on a “standard” of 160 kilobits/second is to settle on a fairly low quality for the buck.  Even a rip-off, considering listening tests found the minimum bit rate for transparency to be in the 200s.  That is to say, 160 kbps is not nearly enough to substitute a CD, for most music. …But many people settle for less, it seems.  And, for a period of time, so did I.

When I began ripping CDs to the new computer, I thought the 320k MP3 setting was fine that I even burned a copy of Abbey Road (because the original was getting dirty), plus another in MP3 form (a two-in-one disc). …Boy, was I wrong. I was an idiot to assume the ripped songs were close to the originals— such an idiot that I assumed the spectrograms, when viewed in Audacity, looked much like the originals… without looking at the originals. It was months later, after I made MP2s for results of the NES emulator I work on, that I discovered that MP2 doesn’t deliver very good results… I didn’t need to go far in my analysis of the ripped MP3s.  Listening to Green Day’s “Maria” at half-rate, the artifacting was not only audible, it was awful… And this was supposedly the best MP3 quality the Windows Media Player encoder could offer! …So I kind of wasted a blank CD, and had to redo the songs on phone.  (I use the smartphone to play music in the living room and now sometimes in the car.)

Now, the problem wasn’t that MP3 encoding is incapable of good sound.  A format from the ’90s, yes, but… many encoders do a poor job at quality.  The ripper likely uses a conventional LAME encoder, which is… well… lame.

maria-j320-lame-399_8
Spectrogram of the best quality LAME 3.99 offers for “Maria”, low-pass disabled.

(Side note: due to resolution and different methods used, use of spectrograms is not entirely reliable when it comes to determining quality, although some graphs may reveal an obviously lower quality, with the above image as an example.  All spectrograms in this post were made with Audacity, Gaussian a=4.5, window size=8192.)

It took a bit of trial and error, downloading other encoders, even newer versions of LAME… The results weren’t great.  It was as if everyone settled for less, given what I found most people had been using all over the internet.

It turns out, I already had the best I could find, and it came with the NCH Switch audio converter.  (I should note that a lot of the NCH software is very buggy.) There’s still some audible loss during loud acoustics, but it’s okay to my trained ears.  There’s indication that the encoder uses a modified version of LAME 3.8x… The results look nothing like the official 3.8x, though.

I’m sure many people in the field would disagree with me, thinking generic LAME is fine, and I know there have been listening tests… It boggles my mind. The fact remains: You’re not doing yourself any favors by settling with very lossy compression when you don’t have to, and worse when it’s not the best quality for the format.

I began ripping CDs losslessly (Windows Media Audio), and never went back.  But I never stopped looking for better.

As for the lossless formats (because that’s the only real way to archive content while saving space), I found FLAC (free, and even has free in its name: Free Lossless Audio Codec) did a better job than WMA.  ALAC (Apple’s lossless format) sits in between the two, but is not widely supported.

Strangely, again, what I had all along produced the best lossless compression for most sound: Monkey’s Audio; it carries the extension “.APE”.  (Haha.)  Its average compression rates are around 56%.  The very loud “Maria” compressed to 72.293% size— still better than all others, including general compression (which never does a good job).  Cons: APE does not see reduced precision in the least significant digit ranges as FLAC does, so compressing the results of, say, LossyWAV will not result in smaller files (it may even increase the output size).  Also, a higher compression setting with Monkey’s Audio may be less efficient with short clips.

Unfortunately for me, Google Play (e.g., my Android phone) only supports WMA. Not that I have much choice but to use lossy with the phone given its relatively small drive… Update: Google Play also supports FLAC; it’s the file manager app that doesn’t recognize the extension.

As for the lossy formats (the only way to store more songs in limited space), it wasn’t until I downloaded FFmpeg that I was able to test the full range of codecs supported thru Audacity.  The archive’s over 11 megabytes, but it’s worth it.  Many of the library’s encoders don’t deliver great results (it carries conventional LAME and a broken AIFF-C IMA encoder), but now I can open most formats directly in Audacity, including the files I ripped from the CDs (yay).

…Some official testing was in order.  And after many man-hours of trial and error, here’s a round-up of the top four lossy formats, all compressing Maria at a target rate of 320 kbit/s.  There are a number of factors to consider, with audible artifacting being the biggest, of course.

1. AC-3 (Dolby Digital).  The AC-3 encoder in FFmpeg always cuts off frequencies above a certain point and produces some results louder than actual (a basic 80% white noise test went well out of bounds), but it maintained spectral integrity well; loss with Maria was consistently low (-17.7db peak with the default cutoff). The loss wasn’t as clean as I’d like (and indicates that the encoder could be improved), but the only loss I could hear with my trained ears was during snares, so the format is very transparent.  AC3 files add 256 samples to the beginning and possibly silence at the end, but nothing is missing (unlike MP3, which can add or chop about anything at the ends).

AC-3 (ATSC A/52A) is patented in a strict manner that commercial use requires a license; as a result, most players don’t support it despite its use in projectors since 1992.  Since the codec is required for DD sound, common video/disc player software may support AC3 files (which typically carry video), including PowerDVD and VLC.  VLC is also available for phones and tablets.

E-AC-3 (extended version) supports more sound channels (more than the traditional Dolby 5.1) and supposedly improves artifacting at lower bit rates… However, the reason for the lower artifacting appears to be priority around some of the upper frequencies, with the consequence of more overall loss than regular AC-3 as calculated using the FFmpeg E-AC-3 encoder at the same rate.  The primer for EAC3 files is also 256 samples.  Player support for E-AC-3 is so scarce that even most DVD player software don’t support it.

2. AAC (Advanced Audio Coding).  As the successor to MP3, AAC is technically more efficient in its method of storage, theoretically providing better quality per kilobit, especially with True VBR.  Modern music players today support AAC compression.  The format also supports the 96KHz sampling rate.  The main problem: Most AAC encoders still don’t produce good results.  The default encoder in FFmpeg was particularly weak at quality.  It’s probably best to use the iTunes encoder.  The highest VBR settings with Apple QuickTime can easily exceed 320 kbit/s, but likely not nearly as high as the stereo maximum of 512 in the spec, considering my 80% white noise test didn’t result in a bit rate as high as some music tracks.

For Maria, QAAC + QT 7.7.9 delivered a peak loss of -14.1db but maintained most spectral integrity.  The lowest peak loss the encoder would deliver was -18db (359 kbps).

Update: one issue that AAC encoders in general have is threshold of channel separation, that for complex sequences, weaker bands of any range may be forced mono as opposed from top-down in typical perceptual coding.  In worst cases, even at the highest VBR setting, spectral holes can be heard/seen throughout the midrange after subtracting one channel from the other.

3. Opus.  The Opus format maintains spectral and perceptible quality, and exceeds AAC in stereo separation quality at 320 kbps.  However, the official encoder has a non-optional sample rate conversion to the nearest of a few rates that don’t include 44.1KHz (resulted in a -5.4db peak loss at best with v1.1.1-rc49; with pre-resampling, about -16.5db), resulting in a band cutoff (~20KHz for CD rate).  Another con: only the official encoders worked.  (This is a relatively new format.)

Mozilla software has supported Opus as a standard since 2012, so if you don’t have a supporting player on your device, you can still play Opus files in FireFox.

4. MP3.  The former king of the hill still makes the list.

Maria-CD-AC3-MP3-cropped.8
“Maria”, left channel (from stereo), from top to bottom: original CD; AC-3; MP3. (Click to enlarge.)

As you can see, in the above image, upper frequencies are sacrificed to maintain integrity of the more audible acoustics with MP3— something universal with perceptual coding (psychoacoustics).

The MP3enc executable consistently delivers low loss (-20.9db peak for Maria), and is good enough that the loss for anything I give it is close to bursts of white noise.  Despite producing “clean” loss, spectral integrity is often lost considerably due to limitations in the format, and distortions can be heard when played at half-rate.  MP3enc still has an advantage, though, of allowing monophonic encoding at 320 kbps (AAC and Opus will only allow 256 per channel maximum), resulting in unmatched mono quality.  The MP3 standard has no specified length for its primer (added ‘silence’ at the beginning), but encoders are fairly consistent with theirs given mode of input: for constant rates, LAME adds 2,257 samples; MP3enc adds 1,105.

For the best quality, use “Joint” mode.  (“Joint” switches between stereo methods to eliminate spectral redundancies between left and right so more information can be stored per frame whereas “Simple Stereo” compresses left/right separately.)

I would also advise, for high quality, against the use of variable bit rates with MP3.  I know many would disagree with me on that point alone, but my ears can tell, and it shows with spectral degradation and amplitude errors.

Maria-MP3-VBR
112-320 kbps VBR.  Top quarter of the spectrum is mostly gone due to an amplitude threshold.

Absolutely no MP3 encoder I’ve obtained has met the quality of a constant rate versus variable, comparing even to constant rates within the variable range.  It’s strongly recommended that you use AAC instead for VBR.  Alternatively, ‘Preset’ rates (under 320) seem to work fine for MP3.

Concerning public use, the MP3 license is patented but the rights are pretty much unenforced against end-users, and apparently expires in 2017.

Others

There are other MPEG-like formats out there, but none of them come close to the above.  Newer isn’t necessarily better.  Vorbis, for example, despite being an “industry standard,” consistently delivers messy results (weird spectral holes) with all of the encoders I could find.

vrc7_test-Vorbis-MP3-320.8
At 160 kbps mono, from top: Vorbis; MP3.  (Notice the holes near the top.)

Increasing the bit rate did not fill the holes; Vorbis should not be considered a quality alternative to MP3 or AAC for this reason.

The last format I looked at was LossyWAV.  With the purpose of shrinking files thru WAVpack, the method deployed limits significant digit precision.  The main problem: This kind of method is incapable of preserving true sine waves.  (The type of coding found in MPEG does a fairly good job at preserving sine waves.)  The “standard” setting produced depths below 8 bits (below the stepping minimum of 10 necessary to fool human perception).  Processing Maria with FLAC, the bit rate topped 552 kbps, making LossyWAV also inefficient compared to high-quality MP3 despite a peak loss of -26.4db.  The “insane” mode doesn’t even preserve 10 bits of precision, so LossyWAV’s best quality is actually pretty bad for music.

…Well.  That’s what I have. …I hope my results help out anyone who reads this, that they don’t have to settle for less… or endlessly search and work for “slightly” better results like I have.  And, if you— god forbid— had read through all this in one sitting, you’re probably as exhausted as I am.  In that case, you’re gonna want to take a nap. …Good night.

Advertisements

Thoughts? Reply:

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s