January 1, 2005

Drowning in CDs? Create a Digital Archive: Part One

If you’ve collected hundreds or even thousands of CDs, the thought of creating a digital archive of every one might rank on your list of chores somewhere between cleaning the gutters and flossing your teeth. But despite what seems a huge initial effort, there are two compelling reasons to create such an archive.

One is to listen to it. I spend a lot more time lashed to the computer in my home office than relaxing in the sweet spot of my reference audio system. Over the years, I developed two separate and unequal music libraries: the stuff that had found its way into my PC and -- tucked away in my CD cabinets -- everything else. My ears sorely missed the latter. Indeed, there were more than a few CDs that I hadn’t listened to in years, and some I’d never heard. Surely, it seemed to me, the convenience of surfing my entire CD collection with a few mouse clicks would be a sign of enlightened civilization.

Two is to preserve it. An elite circle of my CDs enjoyed a disproportionate share of use and abuse, and became the worse for wear. Sure, I made the occasional CD-R copy of a particularly vulnerable disc, but this backup system was far from comprehensive. With a full digital archive, I could faithfully reconstruct any CD in my collection should the need arise.

Disk space

"But won’t this take up a lot of disk space?" you ask. Yes and no. Yes -- a large digital archive can consume 100 gigabytes, or much more. No -- hundreds of gigabytes (GB) are now more affordable than ever.

My own digital archive began with 400 CDs of varying lengths and genres -- rock, pop, electronic/club, film soundtracks. The "pristine" archive -- exact replicas of the original albums -- consumes about 135GB, or 31 DVD-R discs. The compressed archive, suitable for casual listening, consumes 30GB. With hard drives as large as 200-250GB now selling for under $150, creating a large digital music archive need not drain your wallet.

Hardware

Most CD-ROM, CD-RW, DVD-ROM, and DVD-RW drives can "rip" music from audio CDs. They vary in this ability in two ways: speed and accuracy. The latest models of drives can usually rip very fast, as much as 20-30 times as fast as real-time -- two to three minutes for a 40-minute CD. Older drives rip more slowly, but will still copy the average CD in a matter of minutes.

Audio CDs contain many errors. Newer drives are better at handling these errors and making perfect-quality rips. Older drives can sometimes introduce pops or clicks due to CD errors, although your choice of ripping software can eliminate this problem by repeatedly re-reading trouble spots (more on this shortly). Ultimately, you can create perfect rips with new or old drives, but newer models will speed up the process.

You can also rip with multiple drives simultaneously. I set up a PC with two CD-RW drives so that I could blaze through the archive two discs at a time. This may seem an indulgence, but the latest CD-RW and DVD-RW drives capable of fast, high-quality ripping can be had these days for less than $50. This technique will tax the PC somewhat and slow down each rip. On a PC faster than about 1GHz, you’ll still save time compared to ripping the two CDs in sequence. On a very fast PC (2GHz and above), the dual-ripping penalty is negligible.

Software

There are four basic steps to creating a useful digital archive:

  1. Rip the audio data.
  2. Assign metadata -- artist, album, and track information, for example.
  3. Transcode raw audio into a storage format.
  4. Store files in logical, hierarchical folders.

The better software packages make it easy to create this little assembly line. Popular Windows applications include Exact Audio Copy and Audiograbber, both of which are free. I also like Easy CD-DA Extractor, a $30 purchase that includes a free trial period. Users of Macintosh computers often rely on Apple’s ever-popular iTunes, which is also available for Windows.

Tagging

When you insert an audio CD, these software applications connect to an Internet database to try to download descriptive data. This metadata, including artist, album, and track names, has mostly been provided by other Internet users over the years, in a kind of collective public works project. The good news is that the database is huge; most commercially released discs will be recognized. The bad news is that, because the base is user-created, the quality of the data is not 100% perfect. There can be variations in the way artist names have been entered -- "Tori Amos" vs. "Amos, Tori," "White Stripes" vs. "The White Stripes." There can be typos and missing information. And the categorization is unreliable -- one person’s "Pop" might be another’s "Alternative Rock."

You always have the option of manually entering this information on your own, even after the rip is complete. But with several hundred discs to archive, you’ll quickly appreciate the online database, even if it does require manual attention from time to time. In archiving my 400 CDs, all but a handful were recognized; of those, the database provided inaccurate information for only 5-10% of them.

Ultimately, these data are stored in "tags" embedded in the music file. Tagging is extremely useful; not only can you can see what’s playing, but the tags help you usefully manage a large collection. Most playback software supports tags, which let you sort, search, and build playlists based on information such as Artist, Album, or the much-maligned Genre.

Folder and file layout

Your ripping software will probably let you define a folder layout for your music files. In addition to tags, a logical folder hierarchy will prevent your music tracks from becoming needles in a virtual haystack.

Most users sort CDs into an Artist/Album/Track folder layout. The exceptions are compilation discs, such as film soundtracks, and "Various Artists" mixes, on which each track might be by a different artist. Applying the above folder layout would then result in lots of scattered folders, one for each artist in the mix, rather than all of the tracks filed under a single folder for the album.

A handy feature included in some ripping software, such as Exact Audio Copy and Easy CD-DA Extractor (Windows), is the ability to recognize compilation discs and apply to them a different folder layout. For these, you may prefer Album/Track or Album/Artist/Track. If your ripping software doesn’t support an alternative layout for compilation discs, you may want to put those aside and rip them separately, with the appropriate folder layout.

Some people like to include the track number in the filename, such as "8 -- Tori Amos -- Cornflake Girl," to preserve the track order of the original CD. There are two other ways to preserve track order if, like me, you don’t want track numbers cluttering up the filename. First, most rippers include the track number in the tag for the file. Second, many rippers offer the option of creating a playlist file when the CD is ripped. A playlist file, such as the popular M3U format, simply lists the track files in order. Playback software can read this list and play the tracks in order, if you wish.

Encoding formats

The most important decision you’ll make when creating a digital archive is what type of data-compression format to choose. There are two types of compression: lossless and lossy. Lossless compression preserves an exact replica of the original audio; none of the original data are lost. Lossy compression uses psychoacoustic algorithms to excise parts of the signal data determined to be least audible, in order to save storage space. The popular MP3 and AAC (used by iTunes) are both lossy compression formats.

If you rip using a typical uncompressed lossless format such as WAV or AIFF, you preserve the audio signal in its original form, but at the cost of disk space and the loss of tags. WAV and AIFF files consume about ten times as much disk space as the average MP3 file, and you can’t attach artist, album, and track name information to them.

Lossless compression

With lossless compression, you can have your cake and eat half of it, anyway. Lossless compression is much like other forms of data compression, such as Zip and Stuffit -- it saves on disk space while preserving an exact replica of the original data. Typical lossless audio compression squeezes your music into half the space the original would have taken up without sacrificing a single bit of sound -- and you can add tags to the files.

There are many lossless-compression formats available, but four of the most popular are FLAC, Monkey’s Audio, Windows Media Audio 9 Lossless, and Apple Lossless Compression. They differ slightly in compression rate and processor demand.

FLAC is an open-source format that is available for all major operating systems -- you can play FLAC-compressed audio files on virtually any computer that has the appropriate software installed. FLAC encodes very quickly, meaning that the penalty for compression time is negligible.

Monkey’s Audio is currently available only for Windows. (Versions for other platforms are said to be in the works.) Despite the strange name, Monkey’s Audio, or "APE," files have been popular for several years and are increasingly well supported. APE files are up to 10% smaller than FLAC files, but require slightly more processor time to encode.

Windows Media Audio 9 Lossless and Apple Lossless Compression are products of their respective vendors, so the options for playback outside of certain Windows and Mac applications may be more limited. In other respects, they achieve the same goals as FLAC and Monkey’s Audio.

Lossy compression

Why would you ever want to use lossy compression if it means sacrificing audio bits?

The first reason is compatibility: Your favorite playback software might not support lossless-compression formats. This is less of a problem on the desktop, particularly as you can easily change playback software or find the necessary plug-ins for the software you already use. But few portable devices, such as the iPod or its many competitors, support lossless formats.

The second is disk space: Portable players have far less disk space than desktop computers.

Sound quality

There is a sea of difference between a good, transparent lossy encoding and a bad, chunky one. Some think that lossy encoding is all about the bit rate. It used to be said that an MP3 recording, with its bit rate of 128 kilobits per second (kbps), was "CD quality." Such claims should not be taken literally. Like horsepower in a car or CPU speed in a computer, a lossy-compression format’s bit rate is just one of the factors contributing to its final performance.

A good illustration that bit rate is not everything can be found in comparing lossy formats. Many people have found that, at a given bit rate -- say, 128kbps -- Apple’s preferred AAC format sounds better than MP3. Others argue that the open-source format Ogg Vorbis beats both. The fact is, each lossy format performs best within certain bit-rate ranges, but this fact alone shouldn’t dictate your choice of which format to use.

Many Apple users choose the AAC lossy format because it’s the easiest with which to produce high-quality encodings in iTunes. Windows users can play AAC songs with the right software, but other than the iPod, many portable devices cannot -- and only iTunes can play AAC tracks purchased from the iTunes Music Store, which adds an extra layer of data protection.

Many diehard audio geeks prefer the lossy format Ogg Vorbis, or OGG, for its superior quality at low bit rates and its open licensing terms. Support for OGG in portable players is rare, but it’s a good choice for a lossy archive that will be limited to a computer.

MP3 is the most popular lossy format, due in part to its being first on the scene and having enjoyed wide support. I chose MP3 for my lossy archive for its compatibility, particularly for on-the-go listening with my handheld portable player and my aftermarket car stereo.

But even all MP3 encoders are not created equal. The job of a lossy encoder -- to determine which parts of the signal can be discarded without audible consequences -- is extremely complex, and different encoders employ different techniques. And encoders grow "smarter" as they evolve over time. The gold-standard MP3 encoder, ironically called LAME, is an encoding engine with an interface usually provided by the ripping application. The better Windows-based ripping software often already include LAME encoding.

Configuring LAME to encode into MP3 can be intimidating -- there are hundreds of possible parameter combinations. LAME itself is an encoding engine. Its configuration interface will depend on the ripping or encoding software you use. The latest versions of LAME include presets that take care of twiddling all the little parameters. The two presets of special interest are ALT Preset Standard and ALT Preset Extreme. If your ripping or encoding software supports the latest version of LAME, it will present these encoding presets, among other choices.

Both of these presets use variable bit-rate (VBR) encoding. Rather than apply the same sampling size -- whether 128kbps or 192kbps, for example -- to every frame of audio, VBR is adaptive. Within a defined range, it increases or decreases the bit rate to accommodate the audio signal. VBR encoding is a smart way to maximize encoding quality while minimizing the disk space consumed. Virtually all playback software on all devices support VBR decoding. When using LAME to encode MP3, ALT Preset Extreme will use a higher bit-rate range than ALT Preset Standard. What’s more, both presets have sibling variants: ALT Preset Fast, Fast Standard, and Fast Extreme. Even better, the Fast variants encode twice as fast. There is little consensus among audio geeks as to whether there are audible differences among the Fast presets. It can’t hurt to conduct your own listening tests.

Archiving strategy

Before you sit down with floor-to-ceiling stacks of CDs, plan an archiving strategy that takes into account all parts of the process as well as your ultimate goals. Prototype your plan by going through the motions with one or two CDs, using all the software you plan to use throughout the process. This is a good time to try different software packages -- nearly all ripping software is available with at least a free trial period.

After extensive testing, I decided to create my digital archive primarily using Easy CD-DA Extractor. I like the user interface, and it was easy to run two instances at the same time, so that I could rip from two drives simultaneously. Many people prefer the free Exact Audio Copy because it ensures perfect-quality rips no matter what your drive. In the process, however, you lose ripping speed, sometimes a lot, and the EAC interface is somewhat clunky. I reserved EAC for copying CDs that were visibly roughed up, and stayed with Easy CD-DA Extractor for full-speed rips of clean CDs.

I created two archives of my collection: one lossless, one lossy. With a lossless archive in place, you then can generate any number of lossy offspring, now or in the future, without suffering generational degradation.

Lossless archive

I used Monkey’s Audio to make my lossless-compression archive. I was torn between it and FLAC, but Monkey’s Audio saved 10% of disk space. Actually, choosing between the two was difficult only because the choice really didn’t matter. The nice thing about lossless compression is that you can always transcode from one format to the other with no loss in sound quality.

Backup on removable media

Once I’d ripped the original CDs to Monkey’s Audio’s APE format, I prepared to archive them to DVD-R media for long-term storage. I created a series of folders simply named "DVD1," "DVD2," etc. I then moved the ripped albums into these folders, distributing them so as to maximize each DVD-R’s 4.36GB capacity.

I used "Tag and Rename" for Windows to catalog the tags from each track in each backup folder and generate a text report. Using these data, I created a database with which I could quickly determine which DVD any artist, album, or track was on, should I need to retrieve the lossless copy. I’ll continue to use this cataloging system as I buy more CDs, rip them, and add them to the digital archive.

The result? I have gleefully squeezed a stack of 400 CDs onto 31 blank DVD-R discs. Thanks to lossless compression, I did not sacrifice a bit of audio in the process.

Compress into lossy archive

Converting the lossless archive into a lossy one was a simple matter of using Easy CD-DA Extractor’s audio conversion tool to convert all of the APE files into MP3 files, encoded with LAME’s ALT Preset Fast Standard option. Because I’d backed up the lossless files, I told the software to delete them from the hard drive once the MP3 conversion was complete.

This conversion process is very time-consuming -- not for you, but for your computer. I converted the archive in batches so that the computer could churn away overnight while I slept, rather than while I was trying to get work done. In fact, I divided the work between two computers, which shared access to the files over a LAN. In total, the APE-to-MP3 conversion of my 400 CDs took 35-40 hours of computing time.

In the end, 135GB of APE files were compressed into 30GB of MP3 files. On my home office stereo system, I can’t reliably hear a difference between the lossy and the lossless files. However, this could be due not only to limitations in the stereo system, but in the performance of my inexpensive soundcard. But it’s comforting to know that the pristine lossless files are all there, zippered up in a little black flipbook, for whatever future purpose and whenever I need them.

Future luxuries

With the digital archive in place, I now enjoy the giddy spontaneity of finding and listening to any track in my entire collection within seconds. Or, if I wanted to, I could put the whole thing on shuffle play and host a 47-year-long nonstop musical party. In fact, this project isn’t over -- there are a number of additional luxuries that could further enhance my digital archive.

Playlist management can help you cut through the mountain of music, slicing and dicing it into logical subsets. Volume leveling can intelligently adjust playback so you don’t have to dive for the amplifier remote when Norah Jones segues into System of a Down. You can even download and display album art, giving your computer the cool feel of a Tower Records kiosk. In the next installment, we’ll explore these and other ways to further expand your digital music archive.

...Aaron Weiss
aaronw@soundstageav.com

 


All contents copyright © Schneider Publishing, Inc.; all rights reserved.
Any reproduction, without permission, is prohibited.
SoundStage! is part of Schneider Publishing, Inc. and the SoundStage! Network